Used to manually specify the preferred name for a particular server or set of servers.
A list of mappings, one per line, of the form:
Protocols can be explicit (e.g. https://www.example.com/ ), otherwise the http protocol is assumed. Comments in the file are allowed by starting a line with the # character.
# Specify that www.daff.gov.au is always the preferred name www.daff.gov.au=www.affa.gov.au,www.dpie.gov.au
During a crawl the webcrawler may decide that one site is a duplicate of another by comparing the content of their root page e.g. old-site.example.com may be marked as a duplicate of new-site.example.com because their home pages are the same. This is done to avoid downloading a lot of duplicate content.
However, there may be a case where some content is only present on the old site and should still be gathered. If this is the case you can use the
server_alias.cfg mechanism to ensure that the old site is still fully crawled and not marked as a duplicate e.g.
# Specify that content from old-site.example.com should be stored under that name old-site.example.com=old-site.example.com