Skip to content

Crawling password protected websites

Introduction

Some websites are protected by an authentication scheme which requires a username/password combination to access the site. In order for Funnelback to successfully crawl password protected sites, it must be given a valid user name and password to use.

The authentication schemes that Funnelback currently supports are:

  • HTTP Basic Authentication
  • Windows Integrated Authentication (NTLM)

Giving Funnelback a username and password

Funnelback supports multiple HTTP Basic username/password pairs per collection. If you have a single account to configure you can set the values using parameters in a collection's collection.cfg file. To allow Funnelback access to the protected website:

For basic HTTP authentication:

  • Set the http_user parameter to a valid HTTP Basic username.
  • Set the http_passwd parameter to the HTTP Basic username's password.

For NTLM/Windows Integrated authentication:

  • Set the crawler.ntlm.domain parameter to a valid NTLM domain.
  • Set the crawler.ntlm.username parameter to a valid username in the NTLM domain.
  • Set the crawler.ntlm.password parameter to the NTLM username's password.

For FTP sites:

  • Set the ftp_user parameter to a valid FTP username.
  • Set the ftp_passwd parameter to the FTP Basic username's password.

Note: ftp will need to be added to the crawler.protocols in order to crawl an FTP site.

Specifying multiple HTTP Basic usernames and passwords

If you need to specify multiple HTTP Basic accounts for different web servers you can configure this using site profiles.

See also

top

Funnelback logo
v15.16.0