Skip to content

crawler.remove_parameters (collection.cfg setting)

Description

This is a regular expression to remove portions of an URL. For example, if you wish to remove a session-id or stylesheet parameter from all URLs you would use this parameter to implement this. If the URL matches the given regular expression then the matching portion will be stripped off before the URL is downloaded.

Default value

crawler.remove_parameters=

Examples

To remove style=mediaRelease, stylesheet=mediaRelease and x=123 from URLs:

crawler.remove_parameters=regexp:&style(sheet)?=mediaRelease|&x=\d+

To remove all parameters starting with utm_ from URLs:

crawler.remove_parameters=regexp:utm_[^=&]+=[^&]+&?

See also

top

Funnelback logo
v15.16.0