Skip to content

Customising Funnelback: configuring ranking

Ranking options

Funnelback's ranking algorithm determines what results are retrieved from the index and what how the order of relevance is determined.

The ranking of results is a complex problem, influenced by a multitude of document attributes. It's not just about how many times a word appears within a document's content.

  • Ranking options are a subset of the query processor options which also control other aspects of query time behaviour (such as display settings).
  • Ranking options are applied at query time - this means that different services and profiles can have different ranking settings applied, on an identical index. Ranking options can also be changed via CGI parameters at the time the query is submitted.

See: Funnelback ranking algorithm

Automated tuning

Tuning is a process that can be used to determine which attributes of a document are indicative of relevance and adjust the ranking algorithm to match these attributes.

Tuning requires the specification of a set of queries and best answers that are uses as a training data set to optimise the ranking algorithm.

See: automated tuning

Setting ranking indicators

Funnelback has an extensive set of ranking parameters that influence how the ranking algorithm operates.

This allows for customisation of the influence provided by 73 different ranking indicators.

Note: Automated tuning should be used (where possible) to set ranking influences as manually altering influences can result in fixing of a specific problem at the expense of the rest of the content.

The main ranking indicators are:

  • Content: This is controlled by the cool.0 parameter and is used to indicate the influence provided by the document's content score.
  • On-site links: This is controlled by the cool.1 parameter and is used to indicate the influence provided by the links within the site. This considers the number and text of incoming links to the document from other pages within the same site.
  • Off-site links: This is controlled by the cool.2 parameter and is used to indicate the influence provided by the links outside the site. This considers the number and text of incoming links to the document from external sites in the index.
  • Length of URL: This is controlled by the cool.3 parameter and is used to indicate the influence provided by the length of the document's URL. Shorter URLs generally indicate a more important page.
  • External evidence: This is controlled by the cool.4 parameter and is used to indicate the influence provided via external evidence (see query independent evidence below).
  • Recency: This is controlled by the cool.5 parameter and is used to indicate the influence provided by the age of the document. Newer documents are generally more important than older documents.

See: full list of all the ranking options

Applying ranking options

Ranking options are applied in one of three ways:

  • Set as a default for the collection by adding the ranking option to the query_processor_options parameter in the collection.cfg. This can be done either by editing the collection.cfg file directly, or via the administration interface's interface editor screen which is accessible via the edit collection settings option.
  • Set as a default for the profile by adding the ranking option to the list of options defined in the profile's padre_opts.cfg. This can be done by editing the padre_opts.cfg file within the relevant profile folder directly, or by editing padre_opts.cfg for the relevant profile from the file manager screen in the administration interface for the collection.
  • Set at query time by adding the ranking option as a CGI parameter. This is a good method for testing but should be avoided in production unless the ranking factor needs to be dynamically set for each query, or set by a search form control such as a slider.

Many ranking options can be set simultaneously, with the ranking algorithm automatically normalising all of the supplied ranking factors. E.g.

query_processor_options=-stem=2 -cool.1=0.7 -cool.5=0.3 -cool.21=0.24

Automated tuning is the recommended way of setting these ranking parameters as it uses an optimisation process to determine the optimal set of factors. Manual tuning can result in an overall poorer end result as improving one particular search might impact negatively on a lot of other searches.

Meta collection component weighting

When different collections are combined into a meta collection it is often beneficial to weight the collections differently. This can be for a number of reasons, the main ones being:

  • Some collections are simply more important than others. E.g. a university's main website is likely to be more important than a department's website.
  • Some collection types naturally rank better than others. E.g. web collections generally rank better than other collection types as there is a significant amount of additional ranking information that can be inferred from attributes such as the number of incoming links, the text used in these links and page titles. XML and database collections generally have few attributes beyond the record content that can be used to assist with ranking.

Meta collection component weighting is controlled using the cool.21 parameter.

See: Meta collections: relative weighting

Click data

By default Funnelback will track which results are click on by a user for any query that is run.

This information can be utilised by Funnelback to improve ranking over time by learning from this recorded user behaviour.

See: Using click data to improve rankings

Result diversification and collapsing

There are a number of ranking options that are designed to increase the diversity of the result set. These options can be used to reduce the likelihood of result sets being flooded by results from the same website, collection etc.

Result collapsing can also be used to group together consecutive similar results.

See: Result diversification

Metadata weighting

It is often desirable to up (or down) weight a search result when search keywords appear in specified metadata fields. Funnelback provides ranking options to set individual metadata fields to consider and also relative weightings to apply.

See: Controlling metadata field weighting

Query independent evidence

Query independent evidence (QIE) allows certain pages or groups of pages within a website (based on a regular expression match to the document's URL) to be upweighted or downweighted without any consideration of the query being run.

See: Query independent evidence

Troubleshooting ranking

Funnelback's SEO auditor tool can be used to investigate ranking for specific queries and URLs, and provides advice on how to improve the ranking of the document.

See: SEO auditor

top

Funnelback logo
v15.16.0