Skip to content

filter.jsoup.classes

Specify which java/groovy classes will be used for filtering, and operate on JSoup objects rather than byte streams.

Key: filter.jsoup.classes
Type: List<String>
Can be set in: collection.cfg

Description

This setting specifies a list of Java/Groovy classes that are run by the Jsoup filter.

Values

The value of this setting is expected to be a comma separated list of filter class names to be run in the order specified (left to right).

The names given in this configuration option should be fully qualified Java/Groovy class names, or simple class names which are then assumed to exist within the com.funnelback.common.filter.jsoup package. Groovy classes will be loaded from $SEARCH_HOME/lib/java/groovy or the collection's @groovy directory, and where they are declared within a package, their location in the directory structure below must reflect that.

Default Value

filter.jsoup.classes=ContentGeneratorUrlDetection,FleschKincaidGradeLevel,UndesirableText,TitleDuplicates

Examples

Add an additional custom Jsoup filter (com.example.CustomFilter) that will process the HTML after all the default Jsoup filters have run:

filter.jsoup.classes=ContentGeneratorUrlDetection,FleschKincaidGradeLevel,UndesirableText,com.example.CustomFilter

See Also

:

top

Funnelback logo
v15.24.0