Skip to content

Stemming

Introduction

Stemming is the process of reducing words to a common stem and allowing the search to match different variants of the word based on the common word stem.

The default Funnelback configuration supports automatic stemming so that queries match closely related words e.g. "parties" may also match "party". However, stemming may sometimes harm retrieval effectiveness e.g. returning documents containing "Hawk" or "Hawkins" for the query "Hawking".

The stemming is controlled with the query processor option -stem.

Light stemming

This is the default for Funnelback. Light stemming stems words to singular and plural forms of the same word. Support is provided for English and French words. E.g. dog/dogs, worry/worries.

Light stemming is applied by setting the query processor option:

query_processor_options= -stem=2

Heavy stemming

Heavier stemming designed as a limited extension to cover subject/professional matching - science/scientist, biology/biologist. It does not do stemming of participles, so bullying will not be considered equivalent to bully, in the same way that Hawking is not equivalent to Hawk or Hawks.

Heavy stemming is applied by setting the query processor option:

query_processor_options= -stem=3

Disable stemming

Stemming is disabled by setting the query processor option:

query_processor_options= -stem=0

Note: -stem=1 is a discontinued option and has the same effect as setting -stem=0.

See also

top

Funnelback logo
v15.16.0