Skip to content

crawler.start_urls_file

Path to a file that contains a list of URLs (one per line) that will be used as the starting point for a crawl.

Key: crawler.start_urls_file
Type: String
Can be set in: collection.cfg

Description

The list of start URLs that will be initially crawled is a combination of all URLs declared in the file specified here and those which are in start_url.

Only use HTTP/HTTPS protocols in the URL.

Default Value

crawler.start_urls_file=collection.cfg.start.urls

Examples

crawler.start_urls_file=/conf/myurllist.txt

This file might then contain something like:

http://www.funnellback.com/news/index
http://www.mycompany.com/
https://some.secure.site.com/

⚠ Caveats

While permission to read and edit this key is configured by read.key.start_urls_file and edit.key.start_urls_file, to fully restrict the URLs that will be crawled, you will need to also consider sec.collection-start-urls, read.key.start_url and edit.key.start_url.

See Also

top

Funnelback logo
v15.22.0