crawler.user_agent
The browser ID that the crawler uses when making HTTP requests.
Key: crawler.user_agent
Type: String
Can be set in: collection.cfg
Description
This parameter specifies the user agent string used by the web crawler when making HTTP(S) requests.
Default Value
crawler.user_agent=Mozilla/5.0 (compatible; Funnelback)
This default browser-based user-agent is used to maximise the chances that we will get content from websites which return different content depending on browser type.
Some sites will return "Your browser doesn't support frames" as a response if their code doesn't see
a specific user-agent like Mozilla/5.0
, and the Funnelback web crawler would then get no content from
the site.
Examples
If you are crawling other people's web sites, then it is proper "netiquette" to identify yourself:
crawler.user_agent=Mozilla/5.0 (compatible; Funnelback)
You may also wish to use this more specific string to identify the Funnelback webcrawler in your web server access logs.