filter.jsoup.classes
Specify which java/groovy classes will be used for filtering, and operate on JSoup objects rather than byte streams.
Key: filter.jsoup.classes
Type: List<String>
Can be set in: collection.cfg
Description
This setting specifies a list of Java/Groovy classes that are run by the Jsoup filter.
Values
The value of this setting is expected to be a comma separated list of filter class names to be run in the order specified (left to right).
The names given in this configuration option should be fully qualified Java/Groovy class names, or simple
class names which are then assumed to exist within the com.funnelback.common.filter.jsoup
package.
Groovy classes will be loaded from $SEARCH_HOME/lib/java/groovy
or the collection's @groovy
directory,
and where they are declared within a package, their location in the directory structure below must reflect
that.
Default Value
filter.jsoup.classes=ContentGeneratorUrlDetection,FleschKincaidGradeLevel,UndesirableText,TitleDuplicates
Examples
Add an additional custom Jsoup filter (com.example.CustomFilter
) that will process the HTML after
all the default Jsoup filters have run:
filter.jsoup.classes=ContentGeneratorUrlDetection,FleschKincaidGradeLevel,UndesirableText,com.example.CustomFilter
See Also
: