Skip to content

crawler.parser.mimeTypes (collection.cfg setting)

Description

This is a comma-separated list of MIME types. The webcrawler will attempt to parse all downloaded documents that have this MIME type in order to extract URLs for further crawling.

Note: You should not specify binary (application) MIME types in this parameter.

Default value

crawler.parser.mimeTypes=text/html,text/plain,text/xml,application/xhtml+xml,application/rss+xml,application/atom+xml,application/json,application/rdf+xml,application/xml

See also

top

Funnelback logo
v15.16.0