Skip to content

start_url (collection.cfg setting)

Description

This option is the URL from which the web crawler should start crawling. All links found on the page are followed according to the include/exclude patterns.

The administration interface allows an administrator to input a single URL or a list of URLs, one per line.

Normally you would enter the home page of the main web server in an organisation as the start URL. If pages on this server link to other web servers in the organisation then the web crawler will find them (assuming the include/exclude patterns allow it).

Note: Only use HTTP/HTTPS protocols in the URL

Default value

(set when a web collection is created)

Examples

Start crawling from a single site:

http://company.com/

Start with a list of sites

http://www.company.com/
http://store.company.com/

Note: The parameter is stored in the collection.cfg file as

start_url=

and this parameter is used to store a single URL. If a list of URLs is entered in the administration interface then the crawler.start_urls_file parameter is used to point at a text file which stores the list.

See also

top

Funnelback logo
v15.16.0