server_alias.cfg
Introduction
Name
server_alias.cfg
Location
~/conf/collection/
Description
Used to manually specify the preferred name for a particular server or set of servers.
To access the Server Alias editor, go to administration home page then under the Administer tab select Browse Collection Configuration Files and it will open up the file manager.
Then click on Edit Configuration Files
button and it will navigate you to the configuration file manager.
Alternatively you can use a WebDAV Client to edit this file directly.
Format
A list of mappings, one per line, of the form:
canonical=alias1,alias2,...,aliasn
Protocols can be explicit (e.g. https://www.example.com/ ), otherwise the http protocol is assumed. Comments in the file are allowed by starting a line with the # character.
Examples
# Specify that www.daff.gov.au is always the preferred name
www.daff.gov.au=www.affa.gov.au,www.dpie.gov.au
During a crawl the webcrawler may decide that one site is a duplicate of another by comparing the content of their root page e.g. old-site.example.com may be marked as a duplicate of new-site.example.com because their home pages are the same. This is done to avoid downloading a lot of duplicate content.
However, there may be a case where some content is only present on the old site and should still be gathered. If this is the case you can use the server_alias.cfg
mechanism to ensure that the old site is still fully crawled and not marked as a duplicate e.g.
# Specify that content from old-site.example.com should be stored under that name
old-site.example.com=old-site.example.com