crawler.parser.mimeTypes

Extract links from these comma-separated or regexp: content-types.

Key: crawler.parser.mimeTypes
Type: List<String>
Can be set in: collection.cfg

Description

This is a comma-separated list of MIME types. The webcrawler will attempt to parse all downloaded documents that have this MIME type in order to extract URLs for further crawling.

Note: You should not specify binary (application) MIME types in this parameter.

Default Value

crawler.parser.mimeTypes=text/html,text/plain,text/xml,application/xhtml+xml,application/rss+xml,application/atom+xml,application/json,application/rdf+xml,application/xml

crawler.parser.mimeTypes

Description

Default Value

See Also

Contents