Skip to content

Collection Options (collection.cfg)

collection.cfg is the main configuration file for a funnelback frontend.

The collection.cfg configuration file is created when a collection is created and can be edited through the admin home page's edit collection configuration link.

Location

The locations on disk where these options can be read:

  • $SEARCH_HOME/conf/[collection]/collection.cfg: Collection configuration options in this file have the highest precendence overriding values in all other files.
  • $SEARCH_HOME/conf/collection.cfg: Options in this file provide defaults for all profiles in all collections.
  • $SEARCH_HOME/conf/collection.cfg.default: Options in this file are read only, this provides the product default values.

Format

The format of the file is a simple name=value pair per line, with the values $SEARCH_HOME and $COLLECTION_NAME automatically expanded to the funnelback installation path and the name of the current collection automatically.

Configuration options

The following tables contain descriptions of the options that are used in the configuration file.

Option Description
accessibility-auditor.check Turns modern accessibility checks on or off.
accessibility-auditor.min-time-between-recording-history-in-seconds Specifies how much time must have passed since the last time Accessibility Auditor data was recording before new data will be recorded.
admin.undeletable This option controls whether a collection can be deleted from the administration interface.
admin_email Specifies an email that will be emailed after each collection update.
analytics.data_miner.range_in_days Length of time range (in days) the analytics data miner will go back from the current date when mining query and click log records.
analytics.max_heap_size Set Java heap size used for analytics.
analytics.outlier.day.minimum_average_count Control the minimum number of occurrences of a query required before a day pattern can be detected.
analytics.outlier.day.threshold Control the day pattern detection threshold.
analytics.outlier.exclude_collection Disable query spike detection (trend alerts) for a collection.
analytics.outlier.exclude_profiles Disable query spike detection for a profile
analytics.outlier.hour.minimum_average_count Control the minimum number of occurrences of a query required before a hour pattern can be detected.
analytics.outlier.hour.threshold Control the hour pattern detection threshold.
analytics.reports.checkpoint_rate Controls the rate at which the query reports system checkpoints data to disk.
analytics.reports.disable_incremental_reporting Disable incremental reports database updates. If set all existing query and click logs will be processed for each reports update.
analytics.reports.max_facts_per_dimension_combination Specifies the amount of data that is stored by query reports.
analytics.scheduled_database_update Control whether reports for the collection are updated on a scheduled basis.
annie.index_opts Specify options for the "annie-a" annotation indexing program.
build_autoc_options Specifies additional configuration options that can be supplied to the auto completion builder.
changeover_percent Specifies the minimum ratio of documents that must be gathered for an update to succeed.
click_data.archive_dirs The directories that contain archives of click logs to be included in producing indexes.
click_data.num_archived_logs_to_use The number of archived click logs to use from each archive directory.
click_data.use_click_data_in_index A boolean value indicating whether or not click information should be included in the index.
click_data.week_limit Optional restriction of click data to a set number of weeks into the past.
collection The internal name of a collection.
collection-update.step.stepTechnicalName.run Determines if an update step should be run or not.
collection_root Specifies the location of a collection's data folder.
collection_type Specifies the type of the collection.
contextual-navigation.cannot_end_with Prevent from returning suggestions ending in any of defined words or phrases.
contextual-navigation.case_sensitive List of words to preserve case sensitivity.
contextual-navigation.categorise_clusters Group contextual navigation suggestions into types and topics.
contextual-navigation.enabled Enable or disable contextual navigation.
contextual-navigation.kill_list Exclude contextual navigation suggestions which contain any words or phrases in this list.
contextual-navigation.max_phrase_length Limit the maximum length of suggestions to the specified number of words.
contextual-navigation.max_phrases Limit the number of contextual navigation phrases that should be processed.
contextual-navigation.max_results_to_examine Specify the maximum number of results to examine when generating suggestions.
contextual-navigation.site.granularity Type of granularity to be used for contextual navigation site suggestions.
contextual-navigation.site.max_clusters Limits the number of site suggestions
contextual-navigation.summary_fields Metadata classes that are analysed for contextual navigation.
contextual-navigation.timeout_seconds Timeout for how long contextual navigation should run for.
contextual-navigation.topic.max_clusters Limits the number of topic suggestions
contextual-navigation.type.max_clusters Limits the number of type suggestions
crawler Specifies the name of the crawler binary.
crawler.accept_cookies This option enables, or disables, the crawler's use of cookies.
crawler.accept_files Restricts the file extensions the web crawler should crawl.
crawler.allow_concurrent_in_crawl_form_interaction Enable/disable concurrent processing of in-crawl form interaction.
crawler.allowed_redirect_pattern Specify a regex to allow crawler redirections that would otherwise by disallowed by the current include/exclude patterns.
crawler.cache.DNSCache_max_size Maximum size of internal DNS cache. Upon reaching this size the cache will drop old elements.
crawler.cache.LRUCache_max_size Maximum size of LRUCache. Upon reaching this size the cache will drop old elements.
crawler.cache.URLCache_max_size Specifies the maximum size of URLCache.
crawler.check_alias_exists Check if aliased URLs exists - if not, revert back to original URL.
crawler.checkpoint_to Specifies the location of crawler checkpoint files.
crawler.classes.Crawler This option defines the Java class to be used as the main crawling process.
crawler.classes.Frontier Specifies the java class used for the frontier (a list of URLs not yet visited).
crawler.classes.Policy Specifies the Java class used for enforcing the include/exclude policy for URLs.
crawler.classes.RevisitPolicy Specifies the Java class used for enforcing the revisit policy for URLs.
crawler.classes.URLStore Specifies the Java class used to store content on disk e.g. create a mirror of files crawled
crawler.classes.statistics List of statistics classes to use during a crawl in order to generate figures for data reports
crawler.cookie_jar_file Specifies the a file containing cookies to be pre-loaded when a web crawl begins
crawler.eliminate_duplicates Whether to eliminate duplicate documents while crawling.
crawler.extract_links_from_javascript Whether to extract links from Javascript while crawling.
crawler.follow_links_in_comments Whether to follow links in HTML comments while crawling.
crawler.form_interaction.in_crawl.groupId.cleartext.urlParameterKey Specifies a clear text form parameter for in-crawl authentication.
crawler.form_interaction.in_crawl.groupId.encrypted.urlParameterKey Specifies an encrypted form parameter for in-crawl authentication.
crawler.form_interaction.in_crawl.groupId.url_pattern Specifies a URL of a HTML web form action for an in-crawl form interaction rule.
crawler.form_interaction.pre_crawl.groupId.cleartext.urlParameterKey Specifies a clear text form parameter for pre-crawl authentication.
crawler.form_interaction.pre_crawl.groupId.encrypted.urlParameterKey Specifies an encrypted form parameter for pre-crawl authentication.
crawler.form_interaction.pre_crawl.groupId.form_number Specifies which form element at the specified URL should be processed.
crawler.form_interaction.pre_crawl.groupId.url Specifies a URL of the page containing the HTML web form for a pre-crawl form interaction rule.
crawler.frontier_hosts Lists of hosts running crawlers if performing a distributed web crawl.
crawler.frontier_num_top_level_dirs Specifies the number of top level directories to store disk based frontier files in.
crawler.frontier_port Port on which DistributedFrontier will listen on.
crawler.frontier_use_ip_mapping Whether to map hosts to frontiers based on IP address.
crawler.header_logging Option to control whether HTTP headers are written out to a separate log file.
crawler.incremental_logging Option to control whether a list of new and changed URLs should be written to a log file during incremental crawling.
crawler.inline_filtering_enabled Option to control whether text extraction from binary files is done "inline" during a web crawl.
crawler.link_extraction_group The group in the crawler.link_extraction_regular_expression option which should be extracted as the link/URL.
crawler.link_extraction_regular_expression Specifies the regular expression used to extract links from each document.
crawler.logfile Specifies the crawler's log path and filename.
crawler.lowercase_iis_urls Whether to lowercase all URLs from IIS web servers.
crawler.max_dir_depth Specifies the maximum number of sub directories a URL may have before it will be ignored.
crawler.max_download_size Specifies the maximum size of files the crawler will download (in MB).
crawler.max_files_per_area Specifies a limit on the number of files from a single directory or dynamically generated URLs that will be crawled.
crawler.max_files_per_server Specifies the maximum number of files that will be crawled per server.
crawler.max_files_stored Specifies the maximum number of files to download.
crawler.max_individual_frontier_size Specifies the maximum size of an individual frontier.
crawler.max_link_distance Specifies the maximum distance a URL can be from a start URL for it to be downloaded.
crawler.max_parse_size Crawler will not parse documents beyond this many megabytes in size.
crawler.max_timeout_retries Maximum number of times to retry after a network timeout.
crawler.max_url_length Specifies the maximum length a URL can be in order for it to be crawled.
crawler.max_url_repeating_elements A URL with more than this many repeating elements (directories) will be ignored.
crawler.monitor_authentication_cookie_renewal_interval Specifies the time interval at which to renew crawl authentication cookies.
crawler.monitor_checkpoint_interval Time interval at which to checkpoint (seconds).
crawler.monitor_delay_type Type of delay to use during crawl (dynamic or fixed).
crawler.monitor_halt Specifies if a crawl should stop running.
crawler.monitor_preferred_servers_list Specifies an optional list of servers to prefer during crawling.
crawler.monitor_time_interval Specifies a time interval at which to output monitoring information (seconds).
crawler.monitor_url_reject_list Optional parameter listing URLs to reject during a running crawl.
crawler.non_html Which non-html file formats to crawl (e.g. pdf, doc, xls etc.).
crawler.ntlm.domain NTLM domain to be used for web crawler authentication.
crawler.ntlm.password NTLM password to be used for web crawler authentication.
crawler.ntlm.username NTLM username to be used for web crawler authentication.
crawler.num_crawlers Number of crawler threads which simultaneously crawl different hosts.
crawler.overall_crawl_timeout Specifies the maximum time the crawler is allowed to run. When exceeded, the crawl will stop and the update will continue.
crawler.overall_crawl_units Specifies the units for the crawl timeout.
crawler.parser.mimeTypes Extract links from these comma-separated or regexp: content-types.
crawler.predirects_enabled Enable crawler predirects.
crawler.protocols Crawl URLs via these protocols.
crawler.reject_files Do not crawl files with these extensions.
crawler.remove_parameters Optional list of parameters to remove from URLs.
crawler.request_delay Milliseconds between HTTP requests per crawler thread.
crawler.request_header Optional additional header to be inserted in HTTP(S) requests made by the webcrawler.
crawler.request_header_url_prefix Optional URL prefix to be applied when processing the crawler.request_header parameter
crawler.request_timeout Timeout for HTTP page GETs (milliseconds)
crawler.revisit.edit_distance_threshold Threshold for edit distance between two versions of a page when deciding whether it has changed or not.
crawler.revisit.num_times_revisit_skipped_threshold Threshold for number of times a page revisit has been skipped when deciding whether to revisit it.
crawler.revisit.num_times_unchanged_threshold Threshold for the number of times a page has been unchanged when deciding whether to revisit it.
crawler.robotAgent Matching is case-insensitive over the length of the name in a robots.txt file
crawler.secondary_store_root Location of secondary (previous) store - used in incremental crawling.
crawler.send-http-basic-credentials-without-challenge Specifies whether HTTP basic credentials should be sent without the web server sending a challenge.
crawler.server_alias_file Path to optional file containing server alias mappings e.g. www.daff.gov.au=www.affa.gov.au
crawler.sslClientStore Specifies a path to a SSL Client certificate store.
crawler.sslClientStorePassword Password for the SSL Client certificate store.
crawler.sslTrustEveryone Trust ALL Root Certificates and ignore server hostname verification.
crawler.sslTrustStore Specifies the path to a SSL Trusted Root store.
crawler.start_urls_file Path to a file that contains a list of URLs (one per line) that will be used as the starting point for a crawl.
crawler.store_all_types If true, override accept/reject rules and crawl and store all file types encountered
crawler.store_empty_content_urls Specifies if URLs that contain no content after filtering should be stored.
crawler.store_headers Whether HTTP header information should be written at the top of HTML files.
crawler.use_sitemap_xml Specifies whether to process sitemap.xml files during a web crawl.
crawler.user_agent The browser ID that the crawler uses when making HTTP requests.
crawler.verbosity Verbosity level (0-6) of crawler logs. Higher number results in more messages.
crawler_binaries Specifies the location of the crawler files.
custom.base_template The template used when a custom collection was created.
data_report A switch that can be used to enable or disable the data report stage during a collection update.
data_root The directory under which the documents to index reside.
db.bundle_storage_enabled Allows storage of data extracted from a database in a compressed form.
db.custom_action_java_class Allows a custom java class to modify data extracted from a database before indexing.
db.full_sql_query The SQL query to perform on a database to fetch all records for searching.
db.incremental_sql_query The SQL query to perform to fetch new or changed records from a database.
db.incremental_update_type Allows the selection of different modes for keeping database collections up to date.
db.jdbc_class The name of the Java JDBC driver to connect to a database.
db.jdbc_url The URL specifying database connection parameters such as the server and database name.
db.password The password for connecting to the database.
db.primary_id_column The primary id (unique identifier) column for each database record.
db.single_item_sql An SQL command for extracting an individual record from the database
db.update_table_name The name of a table in the database which provides a record of all additions, updates and deletes.
db.use_column_labels Flag to control whether column labels are used in JDBC calls in the database gatherer.
db.username The username for connecting to the database.
db.xml_root_element The top level element for records extracted from the database.
directory.context_factory Sets the java class to use for creating directory connections.
directory.domain Sets the domain to use for authentication in a directory collection.
directory.exclude_rules Sets the rules for excluding content from a directory collection.
directory.page_size Sets the number of documents to fetch from the directory in each request.
directory.password Sets the password to use for authentication in a directory collection.
directory.provider_url Sets the URL for accessing the directory in a directory collection.
directory.search_base Sets the base from which content will be gathered in a directory collection.
directory.search_filter Sets the filter for selecting content to gather in a directory collection.
directory.username Sets the username to use for authentication in a directory collection.
exclude_patterns The crawler will ignore a URL if it matches any of these exclude patterns.
facebook.access-token Specify an optional access token
facebook.app-id Specifies the Facebook application ID.
facebook.app-secret Specifies the Facebook application secret.
facebook.debug Enable debug mode to preview Facebook fetched records.
facebook.event-fields Specify a list of Facebook event fields as specified in the Facebook event API documentation
facebook.page-fields Specify a list of Facebook page fields as specified in the Facebook page API documentation
facebook.page-ids Specifies a list of IDs of the Facebook pages/accounts to crawl.
facebook.post-fields Specify a list of Facebook post fields as specified in the Facebook post API documentation
faceted_navigation.black_list Exclude specific values for facets.
faceted_navigation.black_list.facet Exclude specific values for a specific facet.
faceted_navigation.date.sort_mode (deprecated) Specify how to sort date based facets.
faceted_navigation.white_list Include only a list of specific values for facets.
faceted_navigation.white_list.facet Include only a list of specific values for a specific facet.
filecopy.cache Enable/disable using the live view as a cache directory where pre-filtered text content can be copied from.
filecopy.discard_filtering_errors Whether to index the file names of files that failed to be filtered.
filecopy.domain Filecopy sources that require a username to access files will use this setting as a domain for the user.
filecopy.exclude_pattern Filecopy collections will exclude files which match this regular expression.
filecopy.filetypes The list of filetypes (i.e. file extensions) that will be included by a filecopy collection.
filecopy.include_pattern If specified, filecopy collections will only include files which match this regular expression.
filecopy.max_files_stored If set, this limits the number of documents a filecopy collection will gather when updating.
filecopy.num_fetchers Number of fetcher threads for interacting with the fileshare in a filecopy collection.
filecopy.num_workers Number of worker threads for filtering and storing files in a filecopy collection.
filecopy.passwd Filecopy sources that require a password to access files will use this setting as a password.
filecopy.request_delay Specifies how long to delay between copy requests in milliseconds.
filecopy.security_model Sets the plugin to use to collect security information on files.
filecopy.source This is the file system path or URL that describes the source of data files.
filecopy.source_list If specified, this option is set to a file which contains a list of other files to copy, rather than using the filecopy.source.
filecopy.store_class Specifies which storage class to be used by a filecopy collection (e.g. WARC, Mirror).
filecopy.user Filecopy sources that require a username to access files will use this setting as a username.
filecopy.walker_class Main class used by the filecopier to walk a file tree.
filter.classes Specifies which java classes should be used for filtering documents.
filter.csv-to-xml.custom-header Defines a custom header to use for the CSV.
filter.csv-to-xml.format Sets the CSV format to use when filtering a CSV document.
filter.csv-to-xml.has-header Controls if the CSV file has a header or not.
filter.csv-to-xml.url-template The template to use for the URLs of the documents created in the CSVToXML Filter.
filter.document_fixer.timeout_ms Controls the maximum amount of time the document fixer may spend on a document.
filter.ignore.mimeTypes Specifies a list of MIME types for the filter to ignore.
filter.jsoup.classes Specify which java/groovy classes will be used for filtering, and operate on JSoup objects rather than byte streams.
filter.jsoup.undesirable_text-source.key_name Specify sources of undesirable text strings to detect and present within content auditor.
filter.text-cleanup.ranges-to-replace Specify Unicode blocks for replacement during filtering (to avoid 'corrupt' character display).
filter.tika.types Specifies which file types to filter using the TikaFilterProvider.
flickr.api-key Flickr API key
flickr.api-secret Flickr API secret
flickr.auth-secret Flickr authentication secret
flickr.auth-token Flickr authentication token
flickr.debug Enable debug mode to preview Flickr fetched records.
flickr.groups.private List of Flickr group IDs to crawl within a "private" view.
flickr.groups.public List of Flickr group IDs to crawl within a "public" view.
flickr.user-ids Comma delimited list of Flickr user accounts IDs to crawl.
ftp_passwd Password to use when gathering content from an FTP server.
ftp_user Username to use when gathering content from an FTP server.
gather Specifies if gathering is enabled or not.
gather.max_heap_size Set Java heap size used for gathering documents.
gather.slowdown.days Days on which gathering should be slowed down.
gather.slowdown.hours.from Start hour for slowdown period.
gather.slowdown.hours.to End hour for slowdown period.
gather.slowdown.request_delay Request delay to use during slowdown period.
gather.slowdown.threads Number of threads to use during slowdown period.
groovy.extra_class_path Specify extra class paths to be used by Groovy when using $GROOVY_COMMAND.
group.customer_id The customer group under which collection will appear - Useful for multi-tenant systems.
group.project_id The project group under which collection will appear in selection drop down menu on main Administration page.
gscopes.options Specify options for the "padre-gs" gscopes program.
gscopes.other_gscope Specifies the gscope to set when no other gscopes are set.
http_passwd Password used for accessing password protected content during a crawl.
http_proxy The hostname (e.g. proxy.company.com) of the HTTP proxy to use during crawling.
http_proxy_passwd The proxy password to be used during crawling.
http_proxy_port Port of HTTP proxy used during crawling.
http_proxy_user The proxy user name to be used during crawling.
http_source_host IP address or hostname used by crawler, on a machine with more than one available.
http_user Username used for accessing password-protected content during a crawl.
include_patterns Specifies the pattern that URLs must match in order to be crawled.
index A switch that can be used to enable or disable the indexing stage during a collection update.
index.target For datasources, indicate which index the data is sent to.
indexer The name of the indexer program to be used for this collection.
indexer_options Indexer command line options, each separated by whitespace and thus cannot contain embedded whitespace characters.
indexing.additional-metamap-source.key_name Declares additional sources of metadata mappings to be used when indexing HTML documents.
indexing.collapse_fields Define which fields to consider for result collapsing.
indexing.use_manifest Specifies if a manifest file should be used for indexing.
java_libraries The path where the Java libraries are located when running most gatherers.
java_options Command line options to pass to the Java virtual machine.
knowledge-graph.max_heap_size Set Java heap size used for Knowledge Graph update process.
logging.hostname_in_filename Control whether hostnames are used in log filenames.
logging.ignored_x_forwarded_for_ranges Defines all IP ranges in the X-Forwarded-For header to be ignored by Funnelback when choosing the IP address to Log.
mail.on_failure_only Specifies whether to always send collection update emails or only when an update fails.
matrix_password Username for logging into Matrix and the Squiz Suite Manager.
matrix_username Password for logging into Matrix and the Squiz Suite Manager.
mcf.authority-url URL for contacting a ManifoldCF authority.
mcf.domain Default domain for users in the ManifoldCF authority.
noindex_expression Optional regular expression to specify content that should not be indexed.
post_archive_command Command to run after archiving query and click logs.
post_collection_create_command Command to run after collection was created.
post_delete-list_command Command to run after deleting documents during an instant delete update.
post_delete-prefix_command Command to run after deleting documents during an instant delete update.
post_gather_command Command to run after the gathering phase during a collection update.
post_index_command Command to run after the index phase during a collection update.
post_instant-gather_command Command to run after the gather phase during an instant update.
post_instant-index_command Command to run after the index phase during an instant update.
post_meta_dependencies_command Command to run after a component collection updates its meta parents during a collection update.
post_recommender_command Command to run after the recommender phase during a collection update.
post_reporting_command Command to run after query analytics runs.
post_swap_command Command to run after live and offline views are swapped during a collection update.
post_update_command Command to run after an update has successfully completed.
pre_archive_command Command to run before archiving query and click logs.
pre_collection_delete_command Command to run before collection will be permanently deleted.
pre_delete-list_command Command to run before deleting documents during an instant delete update.
pre_delete-prefix_command Command to run before deleting documents during an instant delete update.
pre_gather_command Command to run before the gathering phase during a collection update.
pre_index_command Command to run before the index phase during a collection update.
pre_instant-gather_command Command to run before the gather phase during an instant update.
pre_instant-index_command Command to run before the index phase during an instant update.
pre_meta_dependencies_command Command to run before a component collection updates its meta parents during a collection update.
pre_recommender_command Command to run before the recommender phase during a collection update.
pre_report_command Command run before query or click logs are to be used during an update.
pre_reporting_command Command to run before query analytics runs.
pre_swap_command Command to run before live and offline views are swapped during a collection update.
progress_report_interval Interval (in seconds) at which the gatherer will update the progress message for the Admin UI.
push.auto-start Specifies whether the the Push collection will start with the web server.
push.commit-type The type of commit that push should use by default.
push.commit.index.parallel.max-index-thread-count The maximum number of threads that can be used during a commit for indexing.
push.commit.index.parallel.min-documents-for-parallel-indexing The minimum number of documents required in a single commit for parallel indexing to be used during that commit.
push.commit.index.parallel.min-documents-per-thread The minimum number of documents each thread must have when using parallel indexing in a commit.
push.init-mode The initial mode in which push should start.
push.max-generations The maximum number of generations push can use.
push.merge.index.parallel.max-index-thread-count The maximum number of threads that can be used during a merge for indexing.
push.merge.index.parallel.min-documents-for-parallel-indexing The minimum number of documents required in a single merge for parallel indexing to be used during that merge.
push.merge.index.parallel.min-documents-per-thread The minimum number of documents each thread must have when using parallel indexing in a merge.
push.replication.compression-algorithm The compression algorithm to use when transferring compressible files to Push slaves.
push.replication.ignore.data When set Query processors will ignore the 'data' section in snapshots, which is used for serving cached copies.
push.replication.ignore.delete-lists When set Query processors will ignore the delete lists.
push.replication.ignore.index-redirects When set Query processors will ignore the index redirects file in snapshots.
push.replication.master.host-name A query processor push collection's master's hostname.
push.replication.master.push-api.port The master's push-api port for a query processor push collection.
push.run Controls if a Push collection is allowed to to run or not.
push.scheduler.auto-click-logs-processing-timeout-seconds Number of seconds before a Push collection will automatically trigger processing of click logs.
push.scheduler.auto-commit-timeout-seconds Number of seconds a Push collection should wait before a commit is automatically triggered.
push.scheduler.changes-before-auto-commit Number of changes to a Push collection before a commit is automatically triggered.
push.scheduler.delay-between-content-auditor-runs Minimum time in milliseconds between each executions of the Content Auditor summary generation task.
push.scheduler.delay-between-meta-dependencies-runs Minimum time in milliseconds between each executions of updating the Push collection's meta parents.
push.scheduler.generation.re-index.killed-percent The percentage of killed documents in a single generation for it to be considered for re-indexing.
push.scheduler.generation.re-index.min-doc-count The minimum number of documents in a single generation for it to be considered for re-indexing.
push.scheduler.killed-percentage-for-reindex Percentage of killed documents before Push re-indexes.
push.store.always-flush Used to stop a Push collection from performing caching on PUT or DELETE calls.
query_processor The name of the query processor executable to use.
query_processor_options Query processor command line options.
quicklinks Turn quicklinks functionality on or off.
quicklinks.blacklist_terms List of words to ignore as the link title.
quicklinks.depth The number of sub pages to search for link titles.
quicklinks.domain_searchbox Turn on or off the inline domain restricted search box on the search result page.
quicklinks.max_len Maximum character length for the link title.
quicklinks.max_words Maximum number of link titles to display.
quicklinks.min_len Minimum character length for the link title.
quicklinks.min_links Minimum number of links to display.
quicklinks.rank The number of search results to enable quick links on.
quicklinks.total_links Total number of links to display.
recommender Specifies if the the recommendations system is enabled.
retry_policy.max_tries Maximum number of times to retry an operation that has failed.
rss.copyright Sets the copyright element in the RSS feed
rss.ttl Sets the ttl element in the RSS feed.
schedule.incremental_crawl_ratio The number of scheduled incremental crawls that are performed between each full crawl.
search_user The email address to use for administrative purposes.
security.earlybinding.locks-keys-matcher.ldlibrarypath Full path to security plugin library
security.earlybinding.locks-keys-matcher.name Name of security plugin library that matches user keys with document locks at query time
security.earlybinding.user-to-key-mapper Selected security plugin for translating usernames into lists of document keys.
security.earlybinding.user-to-key-mapper.cache-seconds Number of seconds for which a users's list of keys may be cached
security.earlybinding.user-to-key-mapper.groovy-class Name of a custom Groovy class to use to translate usernames into lists of document keys
service.thumbnail.max-age Specify how long thumbnails may be cached for.
service_name Human readable name of the collection.
slack.channel-names-to-exclude List of Slack channel names to exclude from search.
slack.hostname The hostname of the Slack instance.
slack.target-collection Specify the push collection which messages from a Slack collection should be stored into.
slack.target-push-api The push API endpoint to which slack messages should be added.
slack.user-names-to-exclude Slack user names to exclude from search.
spelling.suggestion_lexicon_weight Specify weighting to be given to suggestions from the lexicon relative to other sources.
spelling.suggestion_sources Specify sources of information for generating spelling suggestions.
spelling.suggestion_threshold Threshold which controls how suggestions are made.
spelling_enabled Whether to enable spell checking in the search interface.
squizapi.target_url URL of the Squiz Suite Manager for a Matrix collection.
start_url A list of URLs from which the crawler will start crawling.
store.push.collection Name of a push collection to push content into when using a PushStore or Push2Store.
store.push.host Hostname of the machine to push documents to if using a PushStore or Push2Store.
store.push.password The password to use when authenticating against push if using a PushStore or Push2Store.
store.push.port Port that Push is configured to listen on (if using a PushStore).
store.push.url The URL that the push api is located at (if using a Push2Store).
store.push.user The user name to use when authenticating against push if using a PushStore or Push2Store.
store.raw-bytes.class Fully qualified classname of a raw bytes Store class to use.
store.record.type This parameter defines the type of store that Funnelback uses to store its records.
store.temp.class Fully qualified classname of a class to use for temporary storage.
store.xml.class Fully qualified classname of an XML storage class to use
trim.collect_containers Whether to collect the container of each TRIM records or not.
trim.database The 2-digit identifier of the TRIM database to index.
trim.default_live_links Whether search results links should point to a copy of TRIM document, or launch TRIM client.
trim.domain Windows domain for the TrimPush crawl user.
trim.extracted_file_types A list of file extensions that will be extracted from TRIM databases.
trim.filter_timeout Timeout to apply when filtering binary documents.
trim.free_space_check_exclude Volume letters to exclude from free space disk check.
trim.free_space_threshold Minimal amount of free space on disk under which a TRIMPush crawl will stop.
trim.gather_direction Whether to go forward or backward when gathering TRIM records.
trim.gather_end_date The date at which to stop the gather process.
trim.gather_mode Date field to use when selecting records (registered date or modified date).
trim.gather_start_date The date from which newly registered or modified documents will be gathered.
trim.license_number TRIM license number as found in the TRIM client system information panel.
trim.max_filter_errors The maximum number of filtering errors to tolerate before stopping the crawl.
trim.max_size The maximum size of record attachments to process.
trim.max_store_errors The maximum number of storage errors to tolerate before stopping the crawl.
trim.passwd Password for the TRIMPush crawl user.
trim.properties_blacklist List of properties to ignore when extracting TRIM records.
trim.push.collection Specifies the Push collection to store the extracted TRIM records in.
trim.request_delay Milliseconds between TRIM requests (for a particular thread).
trim.stats_dump_interval Interval (in seconds) at which statistics will be written to the monitor.log file name.
trim.store_class Class to use to store TRIM records.
trim.threads Number of simultaneous TRIM database connections to use.
trim.timespan Interval to split the gather date range into.
trim.timespan.unit Number of time spans to split the gather date range into.
trim.user Username for the TRIMPush crawl user.
trim.userfields_blacklist List of user fields to ignore when extracting TRIM records.
trim.verbose Defines how verbose the TRIM crawl is.
trim.version Configure the version of TRIM to be crawled.
trim.web_server_work_path Location of the temporary folder used by TRIM to extract binary files.
trim.workgroup_port The port on the TRIM workgroup server to connect to when gathering content from TRIM.
trim.workgroup_server The name of the TRIM workgroup server to connect to when gathering content from TRIM.
twitter.debug Enable debug mode to preview Twitter fetched records.
twitter.oauth.access-token Twitter OAuth access token.
twitter.oauth.consumer-key Twitter OAuth consumer key.
twitter.oauth.consumer-secret Twitter OAuth consumer secret.
twitter.oauth.token-secret Twitter OAuth token secret.
twitter.users Comma delimited list of Twitter user names to crawl.
ui.modern.content-auditor.count_urls Define how deep into URLs Content Auditor users can navigate using facets.
ui.modern.content-auditor.date-modified.ok-age-years Define how many years old a document may be before it is considered problematic.
ui.modern.content-auditor.duplicate_num_ranks Define how many results should be considered in detecting duplicates for Content Auditor.
ui.modern.content-auditor.reading-grade.lower-ok-limit Define the reading grade below which documents are considered problematic.
ui.modern.content-auditor.reading-grade.upper-ok-limit Define the reading grade above which documents are considered problematic.
ui.modern.curator.custom_field Configure custom fields for Curator messages.
ui.modern.extra_searches Configure extra searches to be aggregated with the main result data, when using the Modern UI.
ui.modern.form.rss.content_type Sets the content type of the RSS template.
ui.modern.padre_response_size_limit_bytes Sets the maxmimum size of padre-sw responses to process.
ui_cache_disabled Disable the cache controller from accessing any cached documents.
ui_cache_link Base URL used by PADRE to link to the cached copy of a search result. Can be an absolute URL.
update-pipeline-groovy-pre-post-commands.max_heap_size Set Java heap size used for groovy scripts in pre/post update commands.
update-pipeline.max_heap_size Set Java heap size used for update pipelines.
update.restrict_to_host Specify that collection updates should be restricted to only run on a specific host.
userid_to_log Controls how logging of IP addresses is performed.
vital_servers Changeover only happens if vital servers exist in the new crawl.
warc.compression Control how content is compressed in a WARC file.
workflow.publish_hook Name of the publish hook Perl script
workflow.publish_hook.meta Name of the publish hook Perl script that will be called each time a meta collection is modified
youtube.api-key YouTube API key retrieved from the Google API console.
youtube.channel-ids YouTube channel IDs to crawl.
youtube.debug Enable debug mode to preview YouTube fetched records.
youtube.liked-videos Enables fetching of YouTube videos liked by a channel ID.
youtube.playlist-ids YouTube playlist IDs to crawl.

top

Funnelback logo
v15.24.0