Skip to content

crawler.revisit.edit_distance_threshold (collection.cfg setting)

Description

This parameter specifies a threshold to use when deciding whether the content of a URL has changed compared to a previous version. The edit distance is the number of operations (add, edit, delete) that would be required to transform one string into the other.

If the edit distance is less than this threshold then the page is marked as "unchanged" and this information will be fed into the crawler's revisit policy. Pages that don't change very often may not be revisited as often and a copy of their content may be used instead.

Default value

crawler.revisit.edit_distance_threshold=20

See also

top

Funnelback logo
v15.18.0