Skip to content

Stop words

Background

Stop words are commonly used words that are removed from a user's query before running a search.

Query processing behaviour

In normal query processing Funnelback will ignore words from the stop words list when it runs a query.

Stop words can always be removed by setting the -ras=2 query processor option, or never removed by setting -ras=0. The default, -ras=1 removes the stop words when the query contains fewer that two words which are not stop words.

Stop words are not removed from within a phrase operator.

Example

Remove all stop words

query_processor_options= -ras=2

Configuring the stop words list

The default stop words applied is localised based on the value of the lang CGI parameter, defaulting to English if the parameter is not set.

Stop word lists are supplied for the following languages, and are used when the lang CGI parameter is set to the language code. The lang parameter can be specified with sub variants that are appended with an underscore. e.g. lang=en_US will apply the English stop words list.

Language code Language
ar Arabic
bg Bulgarian
bn Bengali
cs Czech
de German
en English
es Spanish
fa Persian
fi Finnish
fr French
hi Hindi
hu Hungarian
it Italian
mr Marathi
pl Polish
pt Portugese
ro Romanian
ru Russian
sv Swedish

A custom stop words list can be used instead of the default list by defining the -STOP query processor option. The value should be set to the absolute path to the text file containing the stop words, or path relative the the $SEARCH_HOME/share/lang folder.

Note: only a single stop words list is applied. If you wish to use a custom stop words list it must include all the words to consider as stop words and is not combined with the locale specific default list.

Default value

-STOP=$SEARCH_HOME/share/lang/en_stopwords

Example

Set the stopwords to custom_stopwords.txt stored in the collection's configuration folder:

query_processor_options= -STOP=$SEARCH_HOME/conf/$COLLECTION_NAME/custom_stopwords.txt

or

query_processor_options= -STOP=../../conf/$COLLECTION_NAME/custom_stopwords.txt

Default stop words list (English)

The following words are stripped from a user's query (subject to the stop word removal rules defined by the ras query processor option).

The English stop words list is located within the Funnelback installation at: INSTALL_DIRECTORY/share/lang/en_stopwords. Stop words lists for other languages can also be viewed by inspecting the appropriate file within the same folder.

a
a's
able
about
above
according
accordingly
across
actually
after
afterwards
again
against
ain't
all
allow
allows
almost
alone
along
already
also
although
always
am
among
amongst
an
and
another
any
anybody
anyhow
anyone
anything
anyway
anyways
anywhere
apart
appear
appreciate
appropriate
are
aren't
around
as
aside
ask
asking
associated
at
available
away
awfully
b
be
became
because
become
becomes
becoming
been
before
beforehand
behind
being
believe
below
beside
besides
best
better
between
beyond
both
brief
but
by
c
c'mon
c's
came
can
can't
cannot
cant
cause
causes
certain
certainly
changes
clearly
co
com
come
comes
concerning
consequently
consider
considering
contain
containing
contains
corresponding
could
couldn't
course
currently
d
definitely
described
despite
did
didn't
different
do
does
doesn't
doing
don't
done
down
downwards
during
e
each
edu
eg
eight
either
else
elsewhere
enough
entirely
especially
et
etc
even
ever
every
everybody
everyone
everything
everywhere
ex
exactly
example
except
f
far
few
fifth
first
five
followed
following
follows
for
former
formerly
forth
four
from
further
furthermore
g
get
gets
getting
given
gives
go
goes
going
gone
got
gotten
greetings
h
had
hadn't
happens
hardly
has
hasn't
have
haven't
having
he
he's
hello
help
hence
her
here
here's
hereafter
hereby
herein
hereupon
hers
herself
hi
him
himself
his
hither
hopefully
how
howbeit
however
i
i'd
i'll
i'm
i've
ie
if
ignored
immediate
in
inasmuch
inc
indeed
indicate
indicated
indicates
inner
insofar
instead
into
inward
is
isn't
it
it'd
it'll
it's
its
itself
j
just
k
keep
keeps
kept
know
knows
known
l
last
lately
later
latter
latterly
least
less
lest
let
let's
like
liked
likely
little
look
looking
looks
ltd
m
mainly
many
may
maybe
me
mean
meanwhile
merely
might
more
moreover
most
mostly
much
must
my
myself
n
name
namely
nd
near
nearly
necessary
need
needs
neither
never
nevertheless
new
next
nine
no
nobody
non
none
noone
nor
normally
not
nothing
novel
now
nowhere
o
obviously
of
off
often
oh
ok
okay
old
on
once
one
ones
only
onto
or
other
others
otherwise
ought
our
ours
ourselves
out
outside
over
overall
own
p
particular
particularly
per
perhaps
placed
please
plus
possible
presumably
probably
provides
q
que
quite
qv
r
rather
rd
re
really
reasonably
regarding
regardless
regards
relatively
respectively
right
s
said
same
saw
say
saying
says
second
secondly
see
seeing
seem
seemed
seeming
seems
seen
self
selves
sensible
sent
serious
seriously
seven
several
shall
she
should
shouldn't
since
six
so
some
somebody
somehow
someone
something
sometime
sometimes
somewhat
somewhere
soon
sorry
specified
specify
specifying
still
sub
such
sup
sure
t
t's
take
taken
tell
tends
th
than
thank
thanks
thanx
that
that's
thats
the
their
theirs
them
themselves
then
thence
there
there's
thereafter
thereby
therefore
therein
theres
thereupon
these
they
they'd
they'll
they're
they've
think
third
this
thorough
thoroughly
those
though
three
through
throughout
thru
thus
to
together
too
took
toward
towards
tried
tries
truly
try
trying
twice
two
u
un
under
unfortunately
unless
unlikely
until
unto
up
upon
us
use
used
useful
uses
using
usually
uucp
v
value
various
very
via
viz
vs
w
want
wants
was
wasn't
way
we
we'd
we'll
we're
we've
welcome
well
went
were
weren't
what
what's
whatever
when
whence
whenever
where
where's
whereafter
whereas
whereby
wherein
whereupon
wherever
whether
which
while
whither
who
who's
whoever
whole
whom
whose
why
will
willing
wish
with
within
without
won't
wonder
would
would
wouldn't
x
y
yes
yet
you
you'd
you'll
you're
you've
your
yours
yourself
yourselves
z
zero

top

Funnelback logo
v15.24.0