Skip to content

server_alias.cfg

Introduction

Name

server_alias.cfg

Location

~/conf/collection/

Description

Used to manually specify the preferred name for a particular server or set of servers.

Format

A list of mappings, one per line, of the form:

canonical=alias1,alias2,...,aliasn

Protocols can be explicit (e.g. https://www.example.com/ ), otherwise the http protocol is assumed. Comments in the file are allowed by starting a line with the # character.

Examples

# Specify that www.daff.gov.au is always the preferred name
www.daff.gov.au=www.affa.gov.au,www.dpie.gov.au

During a crawl the webcrawler may decide that one site is a duplicate of another by comparing the content of their root page e.g. old-site.example.com may be marked as a duplicate of new-site.example.com because their home pages are the same. This is done to avoid downloading a lot of duplicate content.

However, there may be a case where some content is only present on the old site and should still be gathered. If this is the case you can use the server_alias.cfg mechanism to ensure that the old site is still fully crawled and not marked as a duplicate e.g.

# Specify that content from old-site.example.com should be stored under that name
old-site.example.com=old-site.example.com

See also

top

Funnelback logo
v15.16.0