crawler.accept_files
Restricts the file extensions the web crawler should crawl.
Key: crawler.accept_files
Type: List<String>
Can be set in: collection.cfg
Description
This is a comma-separated list of file extensions that will be downloaded by the crawler. It is normally left empty, so that the crawler will accept all valid content regardless of the suffix.
Default Value
(Empty) - This means there are no restrictions on what files will be downloaded.
Examples
crawler.accept_files=htm,html,asp,php,txt,stm,jsp,xml,cfm,pdf
In this example a specific list of filetypes (based on suffix) is listed - only files of these types will be downloaded.