Skip to content

Collection Options

collection.cfg is the main configuration file for a funnelback frontend.

The collection.cfg configuration file is created when a collection is created and can be edited through the admin home page's edit collection configuration link.

Location

The locations on disk where these options can be read:

  • $SEARCH_HOME/conf/[collection]/collection.cfg: Collection configuration options in this file have the highest precendence overriding values in all other files.
  • $SEARCH_HOME/conf/collection.cfg: Options in this file provide defaults for all profiles in all collections.
  • $SEARCH_HOME/conf/collection.cfg.default: Options in this file are read only, this provides the product default values.

Format

The format of the file is a simple name=value pair per line, with the values $SEARCH_HOME and $COLLECTION_NAME automatically expanded to the funnelback installation path and the name of the current collection automatically.

Configuration options

The following tables contain descriptions of the options that are used in the configuration file.

OptionDescription
accessibility-auditor.checkTurns modern accessibility checks on or off.
accessibility-auditor.min-time-between-recording-history-in-secondsSpecifies how much time must have passed since the last time Accessibility Auditor data was recording before new data will be recorded.
admin.undeletableThis option controls whether a collection can be deleted from the administration interface.
admin_emailSpecifies an email that will be emailed after each collection update.
analytics.data_miner.range_in_daysLength of time range (in days) the analytics data miner will go back from the current date when mining query and click log records.
analytics.max_heap_sizeSet Java heap size used for analytics.
analytics.outlier.day.minimum_average_countControl the minimum number of occurrences of a query required before a day pattern can be detected.
analytics.outlier.day.thresholdControl the day pattern detection threshold.
analytics.outlier.exclude_collectionDisable query spike detection (trend alerts) for a collection.
analytics.outlier.exclude_profilesDisable query spike detection for a profile
analytics.outlier.hour.minimum_average_countControl the minimum number of occurrences of a query required before a hour pattern can be detected.
analytics.outlier.hour.thresholdControl the hour pattern detection threshold.
analytics.reports.checkpoint_rateControls the rate at which the query reports system checkpoints data to disk.
analytics.reports.disable_incremental_reportingDisable incremental reports database updates. If set all existing query and click logs will be processed for each reports update.
analytics.reports.max_facts_per_dimension_combinationSpecifies the amount of data that is stored by query reports.
analytics.scheduled_database_updateControl whether reports for the collection are updated on a scheduled basis.
annie.index_optsSpecify options for the "annie-a" annotation indexing program.
build_autoc_optionsSpecifies additional configuration options that can be supplied to the auto completion builder.
changeover_percentSpecifies the minimum ratio of documents that must be gathered for an update to succeed.
click_data.archive_dirsThe directories that contain archives of click logs to be included in producing indexes.
click_data.num_archived_logs_to_useThe number of archived click logs to use from each archive directory.
click_data.use_click_data_in_indexA boolean value indicating whether or not click information should be included in the index.
click_data.week_limitOptional restriction of click data to a set number of weeks into the past.
collectionThe internal name of a collection.
collection-update.step.stepTechnicalName.runDetermines if an update step should be run or not.
collection_rootSpecifies the location of a collection's data folder.
collection_typeSpecifies the type of the collection.
crawlerSpecifies the name of the crawler binary.
crawler.accept_cookiesThis option enables, or disables, the crawler's use of cookies.
crawler.accept_filesRestricts the file extensions the web crawler should crawl.
crawler.allow_concurrent_in_crawl_form_interactionEnable/disable concurrent processing of in-crawl form interaction.
crawler.allowed_redirect_patternSpecify a regex to allow crawler redirections that would otherwise by disallowed by the current include/exclude patterns.
crawler.cache.DNSCache_max_sizeMaximum size of internal DNS cache. Upon reaching this size the cache will drop old elements.
crawler.cache.LRUCache_max_sizeMaximum size of LRUCache. Upon reaching this size the cache will drop old elements.
crawler.cache.URLCache_max_sizeSpecifies the maximum size of URLCache.
crawler.check_alias_existsCheck if aliased URLs exists - if not, revert back to original URL.
crawler.checkpoint_toSpecifies the location of crawler checkpoint files.
crawler.classes.CrawlerThis option defines the Java class to be used as the main crawling process.
crawler.classes.FrontierSpecifies the java class used for the frontier (a list of URLs not yet visited).
crawler.classes.PolicySpecifies the Java class used for enforcing the include/exclude policy for URLs.
crawler.classes.RevisitPolicySpecifies the Java class used for enforcing the revisit policy for URLs.
crawler.classes.URLStoreSpecifies the Java class used to store content on disk e.g. create a mirror of files crawled
crawler.classes.statisticsList of statistics classes to use during a crawl in order to generate figures for data reports
crawler.cookie_jar_fileSpecifies the a file containing cookies to be pre-loaded when a web crawl begins
crawler.eliminate_duplicatesWhether to eliminate duplicate documents while crawling.
crawler.extract_links_from_javascriptWhether to extract links from Javascript while crawling.
crawler.follow_links_in_commentsWhether to follow links in HTML comments while crawling.
crawler.form_interaction_fileSpecifies a path to a file which configures interaction with form-based authentication.
crawler.form_interaction_in_crawlSpecify whether crawler should submit web form login details during crawl rather than in a pre-crawl phase.
crawler.frontier_hostsLists of hosts running crawlers if performing a distributed web crawl.
crawler.frontier_num_top_level_dirsSpecifies the number of top level directories to store disk based frontier files in.
crawler.frontier_portPort on which DistributedFrontier will listen on.
crawler.frontier_use_ip_mappingWhether to map hosts to frontiers based on IP address.
crawler.header_loggingOption to control whether HTTP headers are written out to a separate log file.
crawler.incremental_loggingOption to control whether a list of new and changed URLs should be written to a log file during incremental crawling.
crawler.inline_filtering_enabledOption to control whether text extraction from binary files is done "inline" during a web crawl.
crawler.link_extraction_groupThe group in the crawler.link_extraction_regular_expression option which should be extracted as the link/URL.
crawler.link_extraction_regular_expressionSpecifies the regular expression used to extract links from each document.
crawler.logfileSpecifies the crawler's log path and filename.
crawler.lowercase_iis_urlsWhether to lowercase all URLs from IIS web servers.
crawler.max_dir_depthSpecifies the maximum number of sub directories a URL may have before it will be ignored.
crawler.max_download_sizeSpecifies the maximum size of files the crawler will download (in MB).
crawler.max_files_per_areaSpecifies a limit on the number of files from a single directory or dynamically generated URLs that will be crawled.
crawler.max_files_per_serverSpecifies the maximum number of files that will be crawled per server.
crawler.max_files_storedSpecifies the maximum number of files to download.
crawler.max_individual_frontier_sizeSpecifies the maximum size of an individual frontier.
crawler.max_link_distanceSpecifies the maximum distance a URL can be from a start URL for it to be downloaded.
crawler.max_parse_sizeCrawler will not parse documents beyond this many megabytes in size.
crawler.max_timeout_retriesMaximum number of times to retry after a network timeout.
crawler.max_url_lengthSpecifies the maximum length a URL can be in order for it to be crawled.
crawler.max_url_repeating_elementsA URL with more than this many repeating elements (directories) will be ignored.
crawler.monitor_authentication_cookie_renewal_intervalSpecifies the time interval at which to renew crawl authentication cookies.
crawler.monitor_checkpoint_intervalTime interval at which to checkpoint (seconds).
crawler.monitor_delay_typeType of delay to use during crawl (dynamic or fixed).
crawler.monitor_haltSpecifies if a crawl should stop running.
crawler.monitor_preferred_servers_listSpecifies an optional list of servers to prefer during crawling.
crawler.monitor_time_intervalSpecifies a time interval at which to output monitoring information (seconds).
crawler.monitor_url_reject_listOptional parameter listing URLs to reject during a running crawl.
crawler.non_htmlWhich non-html file formats to crawl (e.g. pdf, doc, xls etc.).
crawler.ntlm.domainNTLM domain to be used for web crawler authentication.
crawler.ntlm.passwordNTLM password to be used for web crawler authentication.
crawler.ntlm.usernameNTLM username to be used for web crawler authentication.
crawler.num_crawlersNumber of crawler threads which simultaneously crawl different hosts.
crawler.overall_crawl_timeoutSpecifies the maximum time the crawler is allowed to run. When exceeded, the crawl will stop and the update will continue.
crawler.overall_crawl_unitsSpecifies the units for the crawl timeout.
crawler.parser.mimeTypesExtract links from these comma-separated or regexp: content-types.
crawler.predirects_enabledEnable crawler predirects.
crawler.protocolsCrawl URLs via these protocols.
crawler.reject_filesDo not crawl files with these extensions.
crawler.remove_parametersOptional list of parameters to remove from URLs.
crawler.request_delayMilliseconds between HTTP requests per crawler thread.
crawler.request_headerOptional additional header to be inserted in HTTP(S) requests made by the webcrawler.
crawler.request_header_url_prefixOptional URL prefix to be applied when processing the crawler.request_header parameter
crawler.request_timeoutTimeout for HTTP page GETs (milliseconds)
crawler.revisit.edit_distance_thresholdThreshold for edit distance between two versions of a page when deciding whether it has changed or not.
crawler.revisit.num_times_revisit_skipped_thresholdThreshold for number of times a page revisit has been skipped when deciding whether to revisit it.
crawler.revisit.num_times_unchanged_thresholdThreshold for the number of times a page has been unchanged when deciding whether to revisit it.
crawler.robotAgentMatching is case-insensitive over the length of the name in a robots.txt file
crawler.secondary_store_rootLocation of secondary (previous) store - used in incremental crawling.
crawler.send-http-basic-credentials-without-challengeSpecifies whether HTTP basic credentials should be sent without the web server sending a challenge.
crawler.server_alias_filePath to optional file containing server alias mappings e.g. www.daff.gov.au=www.affa.gov.au
crawler.sslClientStoreSpecifies a path to a SSL Client certificate store.
crawler.sslClientStorePasswordPassword for the SSL Client certificate store.
crawler.sslTrustEveryoneTrust ALL Root Certificates and ignore server hostname verification.
crawler.sslTrustStoreSpecifies the path to a SSL Trusted Root store.
crawler.start_urls_filePath to a file that contains a list of URLs (one per line) that will be used as the starting point for a crawl.
crawler.store_all_typesIf true, override accept/reject rules and crawl and store all file types encountered
crawler.store_empty_content_urlsSpecifies if URLs that contain no content after filtering should be stored.
crawler.store_headersWhether HTTP header information should be written at the top of HTML files.
crawler.use_sitemap_xmlSpecifies whether to process sitemap.xml files during a web crawl.
crawler.user_agentThe browser ID that the crawler uses when making HTTP requests.
crawler.verbosityVerbosity level (0-6) of crawler logs. Higher number results in more messages.
crawler_binariesSpecifies the location of the crawler files.
custom.base_templateThe template used when a custom collection was created.
data_reportA switch that can be used to enable or disable the data report stage during a collection update.
data_rootThe directory under which the documents to index reside.
db.bundle_storage_enabledAllows storage of data extracted from a database in a compressed form.
db.custom_action_java_classAllows a custom java class to modify data extracted from a database before indexing.
db.full_sql_queryThe SQL query to perform on a database to fetch all records for searching.
db.incremental_sql_queryThe SQL query to perform to fetch new or changed records from a database.
db.incremental_update_typeAllows the selection of different modes for keeping database collections up to date.
db.jdbc_classThe name of the Java JDBC driver to connect to a database.
db.jdbc_urlThe URL specifying database connection parameters such as the server and database name.
db.passwordThe password for connecting to the database.
db.primary_id_columnThe primary id (unique identifier) column for each database record.
db.single_item_sqlAn SQL command for extracting an individual record from the database
db.update_table_nameThe name of a table in the database which provides a record of all additions, updates and deletes.
db.use_column_labelsFlag to control whether column labels are used in JDBC calls in the database gatherer.
db.usernameThe username for connecting to the database.
db.xml_root_elementThe top level element for records extracted from the database.
directory.context_factorySets the java class to use for creating directory connections.
directory.domainSets the domain to use for authentication in a directory collection.
directory.exclude_rulesSets the rules for excluding content from a directory collection.
directory.page_sizeSets the number of documents to fetch from the directory in each request.
directory.passwordSets the password to use for authentication in a directory collection.
directory.provider_urlSets the URL for accessing the directory in a directory collection.
directory.search_baseSets the base from which content will be gathered in a directory collection.
directory.search_filterSets the filter for selecting content to gather in a directory collection.
directory.usernameSets the username to use for authentication in a directory collection.
exclude_patternsThe crawler will ignore a URL if it matches any of these exclude patterns.
facebook.access-tokenSpecify an optional access token
facebook.app-idSpecifies the Facebook application ID.
facebook.app-secretSpecifies the Facebook application secret.
facebook.debugEnable debug mode to preview Facebook fetched records.
facebook.event-fieldsSpecify a list of Facebook event fields as specified in the Facebook event API documentation
facebook.page-fieldsSpecify a list of Facebook page fields as specified in the Facebook page API documentation
facebook.page-idsSpecifies a list of IDs of the Facebook pages/accounts to crawl.
facebook.post-fieldsSpecify a list of Facebook post fields as specified in the Facebook post API documentation
faceted_navigation.black_listExclude specific values for facets.
faceted_navigation.black_list.facetExclude specific values for a specific facet.
faceted_navigation.date.sort_mode(deprecated) Specify how to sort date based facets.
faceted_navigation.white_listInclude only a list of specific values for facets.
faceted_navigation.white_list.facetInclude only a list of specific values for a specific facet.
filecopy.cacheEnable/disable using the live view as a cache directory where pre-filtered text content can be copied from.
filecopy.discard_filtering_errorsWhether to index the file names of files that failed to be filtered.
filecopy.domainFilecopy sources that require a username to access files will use this setting as a domain for the user.
filecopy.exclude_patternFilecopy collections will exclude files which match this regular expression.
filecopy.filetypesThe list of filetypes (i.e. file extensions) that will be included by a filecopy collection.
filecopy.include_patternIf specified, filecopy collections will only include files which match this regular expression.
filecopy.max_files_storedIf set, this limits the number of documents a filecopy collection will gather when updating.
filecopy.num_fetchersNumber of fetcher threads for interacting with the fileshare in a filecopy collection.
filecopy.num_workersNumber of worker threads for filtering and storing files in a filecopy collection.
filecopy.passwdFilecopy sources that require a password to access files will use this setting as a password.
filecopy.request_delaySpecifies how long to delay between copy requests in milliseconds.
filecopy.security_modelSets the plugin to use to collect security information on files.
filecopy.sourceThis is the file system path or URL that describes the source of data files.
filecopy.source_listIf specified, this option is set to a file which contains a list of other files to copy, rather than using the filecopy.source.
filecopy.store_classSpecifies which storage class to be used by a filecopy collection (e.g. WARC, Mirror).
filecopy.userFilecopy sources that require a username to access files will use this setting as a username.
filecopy.walker_classMain class used by the filecopier to walk a file tree.
filter.classesSpecifies which java classes should be used for filtering documents.
filter.csv-to-xml.custom-headerDefines a custom header to use for the CSV.
filter.csv-to-xml.formatSets the CSV format to use when filtering a CSV document.
filter.csv-to-xml.has-headerControls if the CSV file has a header or not.
filter.csv-to-xml.url-templateThe template to use for the URLs of the documents created in the CSVToXML Filter.
filter.document_fixer.timeout_msControls the maximum amount of time the document fixer may spend on a document.
filter.ignore.mimeTypesSpecifies a list of MIME types for the filter to ignore.
filter.jsoup.classesSpecify which java/groovy classes will be used for filtering, and operate on JSoup objects rather than byte streams.
filter.jsoup.undesirable_text-source.key_nameSpecify sources of undesirable text strings to detect and present within content auditor.
filter.text-cleanup.ranges-to-replaceSpecify Unicode blocks for replacement during filtering (to avoid 'corrupt' character display).
filter.tika.typesSpecifies which file types to filter using the TikaFilterProvider.
flickr.api-keyFlickr API key
flickr.api-secretFlickr API secret
flickr.auth-secretFlickr authentication secret
flickr.auth-tokenFlickr authentication token
flickr.debugEnable debug mode to preview Flickr fetched records.
flickr.groups.privateList of Flickr group IDs to crawl within a "private" view.
flickr.groups.publicList of Flickr group IDs to crawl within a "public" view.
flickr.user-idsComma delimited list of Flickr user accounts IDs to crawl.
ftp_passwdPassword to use when gathering content from an FTP server.
ftp_userUsername to use when gathering content from an FTP server.
gatherSpecifies if gathering is enabled or not.
gather.max_heap_sizeSet Java heap size used for gathering documents.
gather.slowdown.daysDays on which gathering should be slowed down.
gather.slowdown.hours.fromStart hour for slowdown period.
gather.slowdown.hours.toEnd hour for slowdown period.
gather.slowdown.request_delayRequest delay to use during slowdown period.
gather.slowdown.threadsNumber of threads to use during slowdown period.
groovy.extra_class_pathSpecify extra class paths to be used by Groovy when using $GROOVY_COMMAND.
group.customer_idThe customer group under which collection will appear - Useful for multi-tenant systems.
group.project_idThe project group under which collection will appear in selection drop down menu on main Administration page.
gscopes.optionsSpecify options for the "padre-gs" gscopes program.
gscopes.other_gscopeSpecifies the gscope to set when no other gscopes are set.
http_passwdPassword used for accessing password protected content during a crawl.
http_proxyThe hostname (e.g. proxy.company.com) of the HTTP proxy to use during crawling.
http_proxy_passwdThe proxy password to be used during crawling.
http_proxy_portPort of HTTP proxy used during crawling.
http_proxy_userThe proxy user name to be used during crawling.
http_source_hostIP address or hostname used by crawler, on a machine with more than one available.
http_userUsername used for accessing password-protected content during a crawl.
include_patternsSpecifies the pattern that URLs must match in order to be crawled.
indexA switch that can be used to enable or disable the indexing stage during a collection update.
index.targetFor datasources, indicate which index the data is sent to.
indexerThe name of the indexer program to be used for this collection.
indexer_optionsIndexer command line options, each separated by whitespace and thus cannot contain embedded whitespace characters.
indexing.additional-metamap-source.key_nameDeclares additional sources of metadata mappings to be used when indexing HTML documents.
indexing.collapse_fieldsDefine which fields to consider for result collapsing.
indexing.use_manifestSpecifies if a manifest file should be used for indexing.
java_librariesThe path where the Java libraries are located when running most gatherers.
java_optionsCommand line options to pass to the Java virtual machine.
knowledge-graph.max_heap_sizeSet Java heap size used for Knowledge Graph update process.
logging.hostname_in_filenameControl whether hostnames are used in log filenames.
logging.ignored_x_forwarded_for_rangesDefines all IP ranges in the X-Forwarded-For header to be ignored by Funnelback when choosing the IP address to Log.
mail.on_failure_onlySpecifies whether to always send collection update emails or only when an update fails.
matrix_passwordUsername for logging into Matrix and the Squiz Suite Manager.
matrix_usernamePassword for logging into Matrix and the Squiz Suite Manager.
mcf.authority-urlURL for contacting a ManifoldCF authority.
mcf.domainDefault domain for users in the ManifoldCF authority.
noindex_expressionOptional regular expression to specify content that should not be indexed.
post_archive_commandCommand to run after archiving query and click logs.
post_delete-list_commandCommand to run after deleting documents during an instant delete update.
post_delete-prefix_commandCommand to run after deleting documents during an instant delete update.
post_gather_commandCommand to run after the gathering phase during a collection update.
post_index_commandCommand to run after the index phase during a collection update.
post_instant-gather_commandCommand to run after the gather phase during an instant update.
post_instant-index_commandCommand to run after the index phase during an instant update.
post_meta_dependencies_commandCommand to run after a component collection updates its meta parents during a collection update.
post_recommender_commandCommand to run after the recommender phase during a collection update.
post_reporting_commandCommand to run after query analytics runs.
post_swap_commandCommand to run after live and offline views are swapped during a collection update.
post_update_commandCommand to run after an update has successfully completed.
pre_archive_commandCommand to run before archiving query and click logs.
pre_delete-list_commandCommand to run before deleting documents during an instant delete update.
pre_delete-prefix_commandCommand to run before deleting documents during an instant delete update.
pre_gather_commandCommand to run before the gathering phase during a collection update.
pre_index_commandCommand to run before the index phase during a collection update.
pre_instant-gather_commandCommand to run before the gather phase during an instant update.
pre_instant-index_commandCommand to run before the index phase during an instant update.
pre_meta_dependencies_commandCommand to run before a component collection updates its meta parents during a collection update.
pre_recommender_commandCommand to run before the recommender phase during a collection update.
pre_report_commandCommand run before query or click logs are to be used during an update.
pre_reporting_commandCommand to run before query analytics runs.
pre_swap_commandCommand to run before live and offline views are swapped during a collection update.
progress_report_intervalInterval (in seconds) at which the gatherer will update the progress message for the Admin UI.
push.auto-startSpecifies whether the the Push collection will start with the web server.
push.commit-typeThe type of commit that push should use by default.
push.commit.index.parallel.max-index-thread-countThe maximum number of threads that can be used during a commit for indexing.
push.commit.index.parallel.min-documents-for-parallel-indexingThe minimum number of documents required in a single commit for parallel indexing to be used during that commit.
push.commit.index.parallel.min-documents-per-threadThe minimum number of documents each thread must have when using parallel indexing in a commit.
push.init-modeThe initial mode in which push should start.
push.max-generationsThe maximum number of generations push can use.
push.merge.index.parallel.max-index-thread-countThe maximum number of threads that can be used during a merge for indexing.
push.merge.index.parallel.min-documents-for-parallel-indexingThe minimum number of documents required in a single merge for parallel indexing to be used during that merge.
push.merge.index.parallel.min-documents-per-threadThe minimum number of documents each thread must have when using parallel indexing in a merge.
push.replication.compression-algorithmThe compression algorithm to use when transferring compressible files to Push slaves.
push.replication.ignore.dataWhen set Query processors will ignore the 'data' section in snapshots, which is used for serving cached copies.
push.replication.ignore.delete-listsWhen set Query processors will ignore the delete lists.
push.replication.ignore.index-redirectsWhen set Query processors will ignore the index redirects file in snapshots.
push.replication.master.host-nameA query processor push collection's master's hostname.
push.replication.master.push-api.portThe master's push-api port for a query processor push collection.
push.runControls if a Push collection is allowed to to run or not.
push.scheduler.auto-click-logs-processing-timeout-secondsNumber of seconds before a Push collection will automatically trigger processing of click logs.
push.scheduler.auto-commit-timeout-secondsNumber of seconds a Push collection should wait before a commit is automatically triggered.
push.scheduler.changes-before-auto-commitNumber of changes to a Push collection before a commit is automatically triggered.
push.scheduler.delay-between-content-auditor-runsMinimum time in milliseconds between each executions of the Content Auditor summary generation task.
push.scheduler.delay-between-meta-dependencies-runsMinimum time in milliseconds between each executions of updating the Push collection's meta parents.
push.scheduler.generation.re-index.killed-percentThe percentage of killed documents in a single generation for it to be considered for re-indexing.
push.scheduler.generation.re-index.min-doc-countThe minimum number of documents in a single generation for it to be considered for re-indexing.
push.scheduler.killed-percentage-for-reindexPercentage of killed documents before Push re-indexes.
push.store.always-flushUsed to stop a Push collection from performing caching on PUT or DELETE calls.
query_processorThe name of the query processor executable to use.
query_processor_optionsQuery processor command line options.
recommenderSpecifies if the the recommendations system is enabled.
retry_policy.max_triesMaximum number of times to retry an operation that has failed.
rss.copyrightSets the copyright element in the RSS feed
rss.ttlSets the ttl element in the RSS feed.
schedule.incremental_crawl_ratioThe number of scheduled incremental crawls that are performed between each full crawl.
search_userThe email address to use for administrative purposes.
security.earlybinding.locks-keys-matcher.ldlibrarypathFull path to security plugin library
security.earlybinding.locks-keys-matcher.nameName of security plugin library that matches user keys with document locks at query time
security.earlybinding.user-to-key-mapperSelected security plugin for translating usernames into lists of document keys.
security.earlybinding.user-to-key-mapper.cache-secondsNumber of seconds for which a users's list of keys may be cached
security.earlybinding.user-to-key-mapper.groovy-className of a custom Groovy class to use to translate usernames into lists of document keys
service.thumbnail.max-ageSpecify how long thumbnails may be cached for.
service_nameName of the collection to display to users.
slack.channel-names-to-excludeList of Slack channel names to exclude from search.
slack.hostnameThe hostname of the Slack instance.
slack.target-collectionSpecify the push collection which messages from a Slack collection should be stored into.
slack.target-push-apiThe push API endpoint to which slack messages should be added.
slack.user-names-to-excludeSlack user names to exclude from search.
spelling.suggestion_lexicon_weightSpecify weighting to be given to suggestions from the lexicon relative to other sources.
spelling.suggestion_sourcesSpecify sources of information for generating spelling suggestions.
spelling.suggestion_thresholdThreshold which controls how suggestions are made.
spelling_enabledWhether to enable spell checking in the search interface.
squizapi.target_urlURL of the Squiz Suite Manager for a Matrix collection.
start_urlA list of URLs from which the crawler will start crawling.
store.push.collectionName of a push collection to push content into when using a PushStore or Push2Store.
store.push.hostHostname of the machine to push documents to if using a PushStore or Push2Store.
store.push.passwordThe password to use when authenticating against push if using a PushStore or Push2Store.
store.push.portPort that Push is configured to listen on (if using a PushStore).
store.push.urlThe URL that the push api is located at (if using a Push2Store).
store.push.userThe user name to use when authenticating against push if using a PushStore or Push2Store.
store.raw-bytes.classFully qualified classname of a raw bytes Store class to use.
store.record.typeThis parameter defines the type of store that Funnelback uses to store its records.
store.temp.classFully qualified classname of a class to use for temporary storage.
store.xml.classFully qualified classname of an XML storage class to use
trim.collect_containersWhether to collect the container of each TRIM records or not.
trim.databaseThe 2-digit identifier of the TRIM database to index.
trim.default_live_linksWhether search results links should point to a copy of TRIM document, or launch TRIM client.
trim.domainWindows domain for the TrimPush crawl user.
trim.extracted_file_typesA list of file extensions that will be extracted from TRIM databases.
trim.filter_timeoutTimeout to apply when filtering binary documents.
trim.free_space_check_excludeVolume letters to exclude from free space disk check.
trim.free_space_thresholdMinimal amount of free space on disk under which a TRIMPush crawl will stop.
trim.gather_directionWhether to go forward or backward when gathering TRIM records.
trim.gather_end_dateThe date at which to stop the gather process.
trim.gather_modeDate field to use when selecting records (registered date or modified date).
trim.gather_start_dateThe date from which newly registered or modified documents will be gathered.
trim.license_numberTRIM license number as found in the TRIM client system information panel.
trim.max_filter_errorsThe maximum number of filtering errors to tolerate before stopping the crawl.
trim.max_sizeThe maximum size of record attachments to process.
trim.max_store_errorsThe maximum number of storage errors to tolerate before stopping the crawl.
trim.passwdPassword for the TRIMPush crawl user.
trim.properties_blacklistList of properties to ignore when extracting TRIM records.
trim.push.collectionSpecifies the Push collection to store the extracted TRIM records in.
trim.request_delayMilliseconds between TRIM requests (for a particular thread).
trim.stats_dump_intervalInterval (in seconds) at which statistics will be written to the monitor.log file name.
trim.store_classClass to use to store TRIM records.
trim.threadsNumber of simultaneous TRIM database connections to use.
trim.timespanInterval to split the gather date range into.
trim.timespan.unitNumber of time spans to split the gather date range into.
trim.userUsername for the TRIMPush crawl user.
trim.userfields_blacklistList of user fields to ignore when extracting TRIM records.
trim.verboseDefines how verbose the TRIM crawl is.
trim.versionConfigure the version of TRIM to be crawled.
trim.web_server_work_pathLocation of the temporary folder used by TRIM to extract binary files.
trim.workgroup_portThe port on the TRIM workgroup server to connect to when gathering content from TRIM.
trim.workgroup_serverThe name of the TRIM workgroup server to connect to when gathering content from TRIM.
twitter.debugEnable debug mode to preview Twitter fetched records.
twitter.oauth.access-tokenTwitter OAuth access token.
twitter.oauth.consumer-keyTwitter OAuth consumer key.
twitter.oauth.consumer-secretTwitter OAuth consumer secret.
twitter.oauth.token-secretTwitter OAuth token secret.
twitter.usersComma delimited list of Twitter user names to crawl.
ui.integration_urlURL to use to reach the search service, when wrapped inside another system (e.g. CMS).
ui.modern.content-auditor.count_urlsDefine how deep into URLs Content Auditor users can navigate using facets.
ui.modern.content-auditor.date-modified.ok-age-yearsDefine how many years old a document may be before it is considered problematic.
ui.modern.content-auditor.duplicate_num_ranksDefine how many results should be considered in detecting duplicates for Content Auditor.
ui.modern.content-auditor.reading-grade.lower-ok-limitDefine the reading grade below which documents are considered problematic.
ui.modern.content-auditor.reading-grade.upper-ok-limitDefine the reading grade above which documents are considered problematic.
ui.modern.curator.custom_fieldConfigure custom fields for Curator messages.
ui.modern.extra_searchesConfigure extra searches to be aggregated with the main result data, when using the Modern UI.
ui.modern.form.rss.content_typeSets the content type of the RSS template.
ui.modern.padre_response_size_limit_bytesSets the maxmimum size of padre-sw responses to process.
ui_cache_disabledDisable the cache controller from accessing any cached documents.
ui_cache_linkBase URL used by PADRE to link to the cached copy of a search result. Can be an absolute URL.
update-pipeline-groovy-pre-post-commands.max_heap_sizeSet Java heap size used for groovy scripts in pre/post update commands.
update-pipeline.max_heap_sizeSet Java heap size used for update pipelines.
update.restrict_to_hostSpecify that collection updates should be restricted to only run on a specific host.
userid_to_logControls how logging of IP addresses is performed.
vital_serversChangeover only happens if vital servers exist in the new crawl.
warc.compressionControl how content is compressed in a WARC file.
workflow.publish_hookName of the publish hook Perl script
workflow.publish_hook.metaName of the publish hook Perl script that will be called each time a meta collection is modified
youtube.api-keyYouTube API key retrieved from the Google API console.
youtube.channel-idsYouTube channel IDs to crawl.
youtube.debugEnable debug mode to preview YouTube fetched records.
youtube.liked-videosEnables fetching of YouTube videos liked by a channel ID.
youtube.playlist-idsYouTube playlist IDs to crawl.

top

Funnelback logo
v15.22.0