crawler.start_urls_file
Path to a file that contains a list of URLs (one per line) that will be used as the starting point for a crawl.
Key: crawler.start_urls_file
Type: String
Can be set in: collection.cfg
Description
The list of start URLs that will be initially crawled is a combination of all URLs declared in the file specified here and those which are in start_url.
Only use HTTP/HTTPS protocols in the URL.
Default Value
crawler.start_urls_file=collection.cfg.start.urls
Examples
crawler.start_urls_file=/conf/myurllist.txt
This file might then contain something like:
http://www.funnellback.com/news/index
http://www.mycompany.com/
https://some.secure.site.com/
⚠ Caveats
While permission to read and edit this key is configured by read.key.start_urls_file and edit.key.start_urls_file, to fully restrict the URLs that will be crawled, you will need to also consider sec.collection-start-urls, read.key.start_url and edit.key.start_url.