Crawling password protected websites
Introduction
Some websites are protected by an authentication scheme which requires a username/password combination to access the site. In order for Funnelback to successfully crawl password protected sites, it must be given a valid user name and password to use.
The authentication schemes that Funnelback currently supports are:
- HTTP Basic Authentication
- Windows Integrated Authentication (NTLM)
Giving Funnelback a username and password
Funnelback supports multiple HTTP Basic username/password pairs per collection. If you have a single account to configure you can set the values using parameters in a collection's collection.cfg file. To allow Funnelback access to the protected website:
For basic HTTP authentication:
- Set the
http_user
parameter to a valid HTTP Basic username. - Set the
http_passwd
parameter to the HTTP Basic username's password.
For NTLM/Windows Integrated authentication:
- Set the
crawler.ntlm.domain
parameter to a valid NTLM domain. - Set the
crawler.ntlm.username
parameter to a valid username in the NTLM domain. - Set the
crawler.ntlm.password
parameter to the NTLM username's password.
For FTP sites:
- Set the
ftp_user
parameter to a valid FTP username. - Set the
ftp_passwd
parameter to the FTP Basic username's password.
Note: ftp will need to be added to the crawler.protocols in order to crawl an FTP site.
Specifying multiple HTTP Basic usernames and passwords
If you need to specify multiple HTTP Basic accounts for different web servers you can configure this using site profiles.