Skip to content

start_url

A list of URLs from which the crawler will start crawling.

Key: start_url
Type: List<String>
Can be set in: collection.cfg

Description

This option is the list of URLs from which the web crawler should start crawling. All links found on the pages are followed according to the include/exclude patterns.

The crawler will start crawling URLs from both this setting as well as from URLs in the file specified by crawler.start_urls_file.

Only use HTTP/HTTPS protocols in the URL.

Within configuration files the format is a space separated list of URLs.

Default Value

By default the option is empty.

start_url=

Examples

To configure the crawler to start crawling from the following two URLs:

http://www.company.com/
http://store.company.com/
start_url=http://www.company.com/ http://store.company.com/

⚠ Caveats

The key must be set for all web and matrix collections, even if all URLs would come from crawler.start_urls_file. In that case the value can be set to empty.

While permission to read and edit this key is configured by read.key.start_url and edit.key.start_url, to fully restrict the URLs that will be crawled, you will need to also consider sec.collection-start-urls, read.key.start_urls_file and edit.key.start_urls_file.

See Also

top

Funnelback logo
v15.22.0