start_url
A list of URLs from which the crawler will start crawling.
Key: start_url
Type: List<String>
Can be set in: collection.cfg
Description
This option is the list of URLs from which the web crawler should start crawling. All links found on the pages are followed according to the include/exclude patterns.
The crawler will start crawling URLs from both this setting as well as from URLs in the file specified by crawler.start_urls_file.
Only use HTTP/HTTPS protocols in the URL.
Within configuration files the format is a space separated list of URLs.
Default Value
By default the option is empty.
start_url=
Examples
To configure the crawler to start crawling from the following two URLs:
http://www.company.com/
http://store.company.com/
start_url=http://www.company.com/ http://store.company.com/
⚠ Caveats
The key must be set for all web and matrix collections, even if all URLs would come from crawler.start_urls_file. In that case the value can be set to empty.
While permission to read and edit this key is configured by read.key.start_url and edit.key.start_url, to fully restrict the URLs that will be crawled, you will need to also consider sec.collection-start-urls, read.key.start_urls_file and edit.key.start_urls_file.