Result collapsing
Overview
Result collapsing is the ability to group similar results into one, when displayed on the search results page. Results are considered similar when:
- Their content is identical, or nearly identical (as decided by the query processor
- They share one or multiple identical metadata fields
The list of fields to consider for similarity is controlled by the indexing.collapse_fields setting.
Workflow
Every time the collection is updated a signature file is generated. This file contains one or more signatures for each document, depending on the list of fields which has been configured. This signature file can then be used at query time to control how the query processor will collapse similar results.
Because the signature file is generated at indexing time, any change to the indexing.collapse_fields
setting requires re-indexing the collection to take effect.
Presentation
At query time the query processor will detect which results should be collapsed together. The most relevant result for the current query constraints will be chosen as the "main" result, and other similar results will be collapsed with it. Collapsing is enabled with the -collapsing=on
query processor option. This could be specified as a setting in the collection's collection.cfg file, or as a per-request CGI parameter e.g. collapsing=on
.
The display of collapsed results can be controlled from the search form by using custom FreeMarker tags and inspecting the Data Model. Display options range from simply displaying the number of collapsed results next to the "main" result, to displaying a simplified view of each collapsed result as a "sub-result" of the main one. Display options can be controlled with the -collapsing_sig
, -collapsing_num_ranks
and -collapsing_SF
query processor options.
To set up result collapsing on your collection, please follow the instructions below. As an example, we will be considering a collection containing job offers, on which:
- The
X
metadata field is mapped to the state where the job is advertised. - The
a
metadata field is mapped to the employer offering the job.
This guide will explain how to configure the collection so that results can be collapsed on their content similarity, by state, or by employer.
Configure the collection
Navigate to Administration Home -> Administer Tab -> Edit Collection Configuration and make the following changes:
- Add
-collapsing=on
to thequery_processor_options
setting. This will enable result collapsing at query time. - Set
indexing.collapse_fields
to[$],[a],[X]
. This will generate a signature file based on the document content, thea
andX
metadata classes. - Set the relevant display
query_processor_options
(e.g.-collapsing_sig
,-collapsing_num_ranks
and-collapsing_SF
) on this collection and any meta collection it will be contained within.
Update or re-index the collection so that the signature file gets generated.
Configure the form file
Collapsed results can be displayed with the <@fb.Collapsed />
tag. In its simplest form this tag just displays the number of collapsed results with a link to access them:
Query the collection
Result collapsing has been enabled in the previous steps and should be active, however by default results will be collapsed on the similarity of their content. To collapse results on a specific metadata field, use the collapsing_sig
parameter, either as a CGI parameter (http://server/s/search?collection=...&collapsing_sig=[a]
) or as a query processor option (-collapsing_sig=[a]
).
With collapsing_sig
set to [a]
, 1 job offer for the same employer is collapsed with our example result:
With collapsing_sig
set to [X]
, 6 job offers in the same state are collapsed with our example result:
Use different labels for different metadata fields
The <@fb.Collapsed />
can be configured to use a different label depending on which metadata field is used for collapsing:
<@fb.Collapsed labels={ "X": "{0} results in the same state", "a": "{0} results from the same employer"} />
When collapsing on [a]
:
...and on [X]
:
Display each collapsed result
By default a link is generated to access the collapsed result. This link uses a special query syntax to return all the documents sharing the same signature.
The form can also be configured to directly display each collapsed result. The number of results to show is controlled by the -collapsing_num_ranks
query processor option, and the metadata fields to show is controlled via -collapsing_SF
.
Edit the collection settings, and on the Interface tab set the following Query processor options: -collapsing_num_ranks=2 -collapsing_SF=[a,X]
.
Then, in your form file, add the following snippet after the <@fb.Collapsed />
tag:
<#if s.result.collapsed??>
<#list s.result.collapsed.results as r>
<p><a href="${r.indexUrl?html}">${r.title}</a> by ${r.metaData.a} in ${r.metaData.X}</p>
</#list>
</#if>
This will cause the first 2 collapsed results to be displayed. For each result, its title, employer (a
) and state (X
) will be shown.
When collapsing on [a]
:
...and on [X]
:
Note that even if there are 6 collapsed results, only the first 2 will be shown due to -collapsing_num_ranks=2
.
Advanced usage
The signature file can be configured to combine multiple fields together. For example, setting indexing.collapse_fields=[a],[a,X],[X,Y,Z]
will generate 3 different signatures:
- A signature on the sole
a
field value, - A signature on the concatenation of the
a
andX
field values, - A signature on the concatenation of the
Y
,X
andZ
field values.
The -collapsing_sig
parameter is then used in a similar fashion to collapse results on those combinations: -collapsing_sig=[a]
, -collapsing_sig=[a,X]
, -collapsing_sig=[X,Y,Z]
.