Controlling metadata field weighting
Introduction
Funnelback's ranking algorithm includes settings that control metadata weightings. These settings can be used to upweight or downweight results where the query terms appear within specific metadata fields. This is achieved by setting the sco
and wmeta
ranking options.
Scoring mode 2
Setting the -sco=2
ranking option allows specification of the metadata fields that will be considered as part of the ranking algorithm.
By default link text, clicked queries and titles are included (-sco=2[k,K,t]
). The list of metadata fields to use with sco=2
is defined within square brackets when setting the value.
E.g. apply scoring to the default metadata fields as well as customField1 and customField2.
-sco=2[k,K,t,customField1,customField2]
Metadata weighting
Once scoring mode 2 is enabled separate weightings can be assigned to each defined field using a corresponding wmeta
ranking option.
A default weighting of 1.0 is applied to all listed metadata fields except for anchor text (k) and click information (K) which both receive a default weighting of 0.5.
A larger value provides a bigger upweight.
Individual weights can be applied. For example, reduce the default upweighting to the t
metadata field:
-wmeta.t=0.6
Example
Assume that the following metadata classes are configured for a collection:
- description
- author
- section
- datePublished
- dateModified
- articleText
- articleTitle
- articleKeywords
- articleAbstract
The following ranking options (set as part of the query_processor_options
within collection.cfg
) could be used to upweight the text within the articleTitle, articleAbstract and articleText metadata classes.
-sco=2[articleText,articleTitle,articleAbstract] -wmeta.articleText=0.3 -wmeta.articleAbstract=0.75 -wmeta.articleTitle=1.0
This tells Funnelback to apply metadata weightings to the articleText, articleTitle and articleAbstract fields (the -sco=2
parameter) then apply non-default weightings to articleText, articleAbstract and articleTitle.