Skip to content

Built-in filters: Metadata normaliser filter (MetadataNormaliser)

Introduction

The metadata normaliser filter can be used to clean and normalise metadata values. Normalisation is particularly useful for faceted navigation, allowing similar categories to be merged into a single category.

The filter processed HTML meta tags (<meta name="key" content="value">) and tests the value against regular expressions. The value is replaced when the value matches a regular expression.

Enabling

To enable the filter add MetadataNormaliser to the filter chain where <default_filter_chain> is the default value.

filter.classes=<default_filter_chain>:MetadataNormaliser

Configuring the metadata normaliser filter

Mapping must be defined in collection.cfg, using the following key:

filter.md_normaliser.keys=...

For example, to perform metadata normalisation on <meta name="Author" ... > and <meta name="Publisher" ... >, this value would be set to:

filter.md_normaliser.keys=author,publisher

Keys are case insensitive. Any key name can be used - recommended practice is to use the same meta "name" attribute.

A corresponding mapping file must be defined for each key in

$SEARCH_HOME/conf/<collection>/md_normaliser.<key>.mapping

Example filename:

$SEARCH_HOME/conf/<collection>/md_normaliser.author.mapping

The first line in the mapping file is the <key> expression, i.e. author. The key is case-insensitive and is treated as a regular expression (so expressions like DC.Creator|Author are valid).

  • Each following line must be <regex>=<replacement>
  • Capture groups can be used (e.g. (.*)@domain.com=$1)
  • Lines starting with # are considered comments

Regular expressions are tried in order. The filter terminates on the first matching regular expression.

Example

To normalise non-preferred values of Shakespeare and John Smith that may exist in Author and Creator metadata fields:

collection.cfg:

filter.md_normaliser.keys=author

md_normaliser.author.mapping:

Author|Creator  
.*shakespeare.*=Shakespeare
[wW]\.?[sS]\.?=Shakespeare
[Ss]\.?[wW]\.?=Shakespeare  
jsmith=John Smith
jack smith=John Smith
j\. smith=John Smith
johnny smith=John Smith

See also

top

Funnelback logo
v15.24.0