Skip to content

crawler.allowed_redirect_pattern

Specify a regex to allow crawler redirections that would otherwise by disallowed by the current include/exclude patterns.

Key: crawler.allowed_redirect_pattern
Type: String
Can be set in: collection.cfg

Description

When the crawler is redirected to a URL, it will check it against the include/exclude patterns, to determine whether it should continue processing that URL. Usually if the URL doesn't match the include/exclude rules, it means the crawler has wandered offsite and shouldn't proceed any further.

However, some websites use external authentication portals. The purpose of this variable is to allow the crawler to continue processing a URL even though it has been redirected offsite. The contents of the offsite pages won't be stored, but the crawler will still be allowed to proceed, e.g. for the purposes of authentication / form interaction.

Note: This check is case-sensitive.

Default Value

crawler.allowed_redirect_pattern=

Examples

The following will allow the crawler to be redirected to any URL containing gatekeeper.com (without scraping additional links from the redirected site).

crawler.allowed_redirect_pattern=gatekeeper.com

See Also

top

Funnelback logo
v15.24.0