start_url (collection.cfg setting)
This option is the URL from which the web crawler should start crawling. All links found on the page are followed according to the include/exclude patterns.
The administration interface allows an administrator to input a single URL or a list of URLs, one per line.
Normally you would enter the home page of the main web server in an organisation as the start URL. If pages on this server link to other web servers in the organisation then the web crawler will find them (assuming the include/exclude patterns allow it).
Note: Only use HTTP/HTTPS protocols in the URL
(set when a web collection is created)
Start crawling from a single site:
Start with a list of sites
Note: The parameter is stored in the
collection.cfg file as
and this parameter is used to store a single URL. If a list of URLs is entered in the administration interface then the crawler.start_urls_file parameter is used to point at a text file which stores the list.