crawler.parser.mimeTypes
Extract links from these comma-separated or regexp: content-types.
Key: crawler.parser.mimeTypes
Type: List<String>
Can be set in: collection.cfg
Description
This is a comma-separated list of MIME types. The webcrawler will attempt to parse all downloaded documents that have this MIME type in order to extract URLs for further crawling.
Note: You should not specify binary (application) MIME types in this parameter.
Default Value
crawler.parser.mimeTypes=text/html,text/plain,text/xml,application/xhtml+xml,application/rss+xml,application/atom+xml,application/json,application/rdf+xml,application/xml