crawler.max_parse_size
Crawler will not parse documents beyond this many megabytes in size.
Key: crawler.max_parse_size
Type: Integer
Can be set in: collection.cfg
Description
The crawler will stop parsing documents larger than the specified value (in megabytes), and their content will be truncated. This only applies to MIME types listed in the crawler.parser.mimeTypes parameter (e.g. HTML, text, XML). Here parsing refers to link extraction from these file types.
Default Value
crawler.max_parse_size=10
Examples
Increase the limit to fifteen megabytes.
crawler.max_parse_size=15