Skip to content

crawler.max_parse_size

Crawler will not parse documents beyond this many megabytes in size.

Key: crawler.max_parse_size
Type: Integer
Can be set in: collection.cfg

Description

The crawler will stop parsing documents larger than the specified value (in megabytes), and their content will be truncated. This only applies to MIME types listed in the crawler.parser.mimeTypes parameter (e.g. HTML, text, XML). Here parsing refers to link extraction from these file types.

Default Value

crawler.max_parse_size=10

Examples

Increase the limit to fifteen megabytes.

crawler.max_parse_size=15

See Also

top

Funnelback logo
v15.22.0