Skip to content

crawler.start_urls_file (collection.cfg setting)

Description

Rather than using a seed page to start a crawl going, it is possible to specify a list of URLs that shall be used as the basis for a web crawl. This is done by setting the crawler.start_urls_file option as a path to a file containing a list of URLs to begin the crawl with, one per line.

Caveats

URLs must use specify either HTTP or HTTPS protocol

Default value

crawler.start_urls_file=collection.cfg.start.urls

Examples

crawler.start_urls_file=/conf/myurllist.txt

This file might then contain something like:

http://www.funnellback.com/news/index
http://www.mycompany.com/
https://some.secure.site.com/

See also

top

Funnelback logo
v15.18.0