Metadata classes - multiple and alternative values
Funnelback's metadata classes support multple and alternative metadata values.
- multiple values is when a metadata class includes more than one fielded value such as a delimited HTML keywords (e.g. keywords=red,blue,green) meta field that contains many keywords or an XML record containing multiple keyword fields (e.g.
<kw>red</kw><kw>blue</kw><kw>green</kw>). Each of the values is matched when a search is run.
- alternative values is when a metadata class includes several alternative values of which only one will be matched when a query is run. e.g. a price field that contains the price in US dollars, Pounds Sterling, Euros, Australian dollars, and New Zealand dollars. Depending on the store or location only one of these alternative metadata values will be relevant.
Multple metadata values
There are two ways in which a Funnelback metadata class will end up containing multiple values:
- there are multiple metadata sources mapped to a single metadata class, and the source document matches more than one of the sources.
- any of the matching metadata sources for a metadata class contain multiple values, either as delimited data within a single field, or as repeated fields. The delimiter is a vertical bar symbol (
|) by default but this can be changed on a per-collection basis (the delimiters that are set will apply to all metadata classes for the purpose of splitting the into multiple values).
Classes containing multiple values are important and useful because it allows for filtering (using faceted navigation) on the unique values contained within a metadata class. So for the second colours example below, a filter based on colour could be attached to the search allowing the results to be filtered by colour - and the record in the example would appear if blue, red or orange were applied in the filter.
Changing the delimiter
The delimiter (
|) used to split an XML field can be changed by setting the
-facet_item_sepchars indexer option.
This option allows multiple (single character) delimiters to be specified (e.g.
-facet_item_sepchars=|,; will split all fields using a vertical bar, comma or semi-colon as the delimiter). Note: use this option with caution to avoid unintentionally splitting fields. e.g. setting the delimiter to a comma will often result in description fields being split in the middle of a sentence.
An unofficial filter is also available from the Funnelback GitHub site allowing configuration for per-field delimiters. See: MetadataDelimiters.groovy.
Example: Multiple values from several fields.
A html document might include the following in the document header:
<title>Hamlet - the complete works of William Shakespeare</title> <meta name="dc.title" content="Hamlet"/>
If both of these (
dc.title) are listed as metadata sources for a class called docTitle the index will contain both the values delimited with a vertical bar:
docTitle: Hamlet - the complete works of William Shakespeare|Hamlet
Example: Multiple values from a single field.
Consider a html document containing
<meta name="colours" content="blue|red|orange">
or an equivalent XML document:
... <colours> <colour>blue</colour> <colour>red</colour> <colour>orange</colour> <colours> ...
Assuming the default delimiter (
|) is being used and there are mappings for a HTML source
colour or xml source
//colour to a metadata class called productColours then the index will contain:
Alternative metadata values
Funnelback supports the storing of multiple alternative values within a single metadata string. When comparing or presenting values from a metadata string, a particular value can be selected by specifying a key, with fallback to a default value if the key is not present.
Note: If a metadata field containing multiple alternative values is accessed without using the special options in the table below, the whole string will be used. To use the default value you must use the special options with a non-existent key, such as 'default'.
Documents containing alternative metadata values should publish this metadata in the following form:
<meta name="FIELD" content="DEFAULT_VALUE;NUM_EXCEPTIONS;(KEY;VALUE)...(KEY;VALUE)" />
- Keys may contain spaces and commas but not semicolons, double-quotes or parentheses.
- Values may include semicolons, double-quotes and parentheses but only within double-quotes. To include a double-quote within a quoted part of a value, use double double-quotes. If a value is just double-quote, you will need to represent it using four consecutive double-quotes.
- Semicolons are used to separate keys and values and also to terminate the default value and number of fields (currently ignored).
- Values do not have to be numeric.
- A maximum of ten fields may be made selectable.
- Only one selector can be specified per field. In an e-commerce example, the price of Vegemite could be made to depend either on the size of the jar or on the store, but not both.
Querying selectable metadata
The selectable metadata mechanism can be controlled via CGI parameters:
|selector_class||string||specifies the key to use when accessing the given metadata class|
|slt_class||float||Performs a "Less than" operation on metadata class, accessed by the key|
|sle_class||float||Performs a "Less than or equals" operation on metadata class, accessed by the key|
|sgt_class||float||Performs a "Greater than" operation on metadata class, accessed by the key|
|sge_class||float||Performs a "Greater than or equals" operation on metadata class, accessed by the key|
|seq_class||float||Performs an "Equals" operation on metadata class, accessed by the key|
Example search strings
Return items whose price in 'mystore' is no greater than 4.20:
Display search results with French versions of the 'category' and 'description' metadata:
Example 1: e-commerce
A large on-line retailer sells the same item for different prices, depending upon the location of the customer's nearest store. Using Funnelback's Selectable Metadata, only one document is needed for each item available for sale. In that document the price metadata is stored in the form of a string such as:
<meta name="price" content="4.10;5;(London;2.50)(Canberra;4.99)(Sydney;4.50)(Brisbane;4.63)(Szczecin;12.80)"/>
where 4.10 is the default price, 5 specifies that there are 5 exceptions, and the pairs of entries in parentheses show the prices which apply for the five different cities. When a person searches from a city, the city name can be inserted into the query string as a selector and the price shown and used in numerical range searches will be the one applicable to that city. For Melbourne, where no exception price is shown, the default price of 4.10 will be used.
Example 2: multi-lingual environment
An online collection for a Swiss museum contains images of artefacts along with applicable metadata. Some of the metadata is language independent (e.g. catalogue number) but other metadata such as the description of the artefact needs to exist in more than one language, for example:
<meta name="artefactDescription" content="Steinaxt;3;(FR;hache de pierre);(EN;stone axe)(IT;ascia di pietra)"/>
Where the default description is in German but alternatives are available for French, English and Italian.