crawler.max_dir_depth
Specifies the maximum number of sub directories a URL may have before it will be ignored.
Key: crawler.max_dir_depth
Type: Integer
Can be set in: collection.cfg
Description
This option sets the limit for the number of directories in a valid URL. The crawler will ignore all URLs that have more than this number of directories. Typically if there are too many directories, it is likely to be a crawler trap, so this limit should not be set too high.
Note: this limit is not checked for dynamic URLs, e.g. ones containing a '?'.
Default Value
crawler.max_dir_depth=15
Examples
crawler.max_dir_depth=2
Will have the following effect:
http://host/one/two/ok
http://host/one/two/three/fails