crawler.accept_files

Restricts the file extensions the web crawler should crawl.

Key: crawler.accept_files
Type: List<String>
Can be set in: collection.cfg

Description

This is a comma-separated list of file extensions that will be downloaded by the crawler. It is normally left empty, so that the crawler will accept all valid content regardless of the suffix.

Default Value

(Empty) - This means there are no restrictions on what files will be downloaded.

Examples

crawler.accept_files=htm,html,asp,php,txt,stm,jsp,xml,cfm,pdf

In this example a specific list of filetypes (based on suffix) is listed - only files of these types will be downloaded.

crawler.accept_files

Description

Default Value

Examples

See Also

Contents