crawler.start_urls_file (collection.cfg setting)
Rather than using a seed page to start a crawl going, it is possible to specify a list of URLs that shall be used as the basis for a web crawl. This is done by setting the
crawler.start_urls_file option as a path to a file containing a list of URLs to begin the crawl with, one per line.
URLs must use specify either HTTP or HTTPS protocol
This file might then contain something like:
http://www.funnellback.com/news/index http://www.mycompany.com/ https://some.secure.site.com/