Skip to content

crawler.classes.Frontier

Specifies the java class used for the frontier (a list of URLs not yet visited).

Key: crawler.classes.Frontier
Type: String
Can be set in: collection.cfg

Description

The web crawler's frontier is the queue of URLs that are waiting to be processed. The following are the main frontier types available:

  • com.funnelback.common.frontier.MultipleRequestsFrontier: Supports sending multiple parallel requests to servers as specified in a site_profiles.cfg file. This frontier is the default.
  • com.funnelback.common.frontier.HostFrontier: Uses separate queues for each host.
  • com.funnelback.common.frontier.FIFOFrontier: This is a first-in first-out (FIFO) management scheme. Using this frontier allows the crawler to create multiple simultaneous connections to a target host. Note: This can generate heavy load on a target web server.

Default Value

crawler.classes.Frontier=com.funnelback.common.frontier.MultipleRequestsFrontier:com.funnelback.common.frontier.DiskFIFOFrontier:1000

This specifies that a MultipleRequestsFrontier should be used, which in turn makes use of disk-based FIFOFrontiers of size 1000 each. Here size refers to the number of URLs per frontier i.e. in the example above each disk-based FIFOFrontier will store up to 1000 URLs each. When a frontier fills up a new one will be created so that all the URLs in the entire frontier are stored in a chain of backing frontiers.

See Also

top

Funnelback logo
v15.24.0