Skip to content

Filter example: read a configuration file

Description

This example shows how to read configuration options contained in a custom configuration file from a filter.

Jsoup filter example

For Jsoup filters use the filter's setup context to read in the configuration to an object that can be accessed from the process document method.

The example below initialises a set of filter rules based on configuration specified as JSON contained in the filter-rules.cfg.

@groovy.util.logging.Log4j2
public class MyFilter implements IJSoupFilter {

    // Object holding all the filtering rules
    def filterRules

    @Override
    public void setup(SetupContext context) {

        // Read the filtering rules from filter-rules.cfg into an object.  The filter rules file for this example is JSON format.

        def rulesFile = new File(context.getSearchHome().getAbsolutePath()+"/conf/"+context.getCollectionName()+"/filter-rules.cfg")

        filterRules = new JsonSlurper().parseFile(rulesFile, 'UTF-8')

    }


   @Override
   void processDocument(FilterContext context) {
    ...

Assuming filter-rules.cfg contains the following:

[ 
  {
    "name":"Canonical URL defined",
    "check":"ELEMENT_EXISTENCE",
    "metaField":"X-FUNNELBACK-CANONICAL",
    "selector":"link[rel=canonical]",
    "extractValue":false,
    "description":"Detects the presence of a canonical URL"
  },
  {
    "name":"Plain English",
    "check":"WORD_LIST_COMPARE",
    "metaField":"X-FUNNELBACK-PLAIN-ENGLISH",
    "wordList":"plain-english",
    "selector":"body",
    "description":"Check for non plain English expressions"
  }
]

The code within the setup context reads this file and populates the filterRules object that can be accessed from within the filter code.

For example the following code within the processDocument() method of the filter can be used to access the configuration values:

...

void processDocument(FilterContext context) {
    def doc = context.getDocument()

    def url = doc.baseUri()

    //Apply each rule to the document that is being filtered
    filterRules.each {
        println "Applying rule '"+it.name+"'' to document '"+url+"'"

        if (it.check == "ELEMENT_EXISTENCE") {
            checkExistence(it.selector,it.metaField,it.extractValue)
        }
        else if (it.check == "WORD_LIST_COMPARE") {
            wordListCompare(it.selector,it.wordList,it.metaField,doc)
        }

    }
}

...

General filter example

For general document filters use the filter's constructor to read in the configuration to an object that can be accessed from the filter.

Assuming the mappings.cfg in the example below contains postcode to latlong mappings:

2601,-35.2809942;149.1192087
2001,-33.847927;150.65179
3001,-37.9712371;144.4927073

Then the following code can be used to read from the mappings to generate a latlong metadata value based on a postcode.

...

@groovy.util.logging.Log4j2
public class CustomFilter implements StringDocumentFilter {


    // Custom mappings (populated in constructor)
    def mappings = [:]

    public CustomFilter(File searchHome, String collectionName) {

        // Read the mappings and load these into the filter.  In this example comma delimited data is read into a map which can be accessed from the main filter.
        
        def mFile = new File(searchHome.getAbsolutePath()+"/conf/"+collectionName+"/mappings.cfg")

        // read each line of the config file and split on a comma
        mFile.readLines().each() {
            def m = it.split(",")
            mappings[m[0]]=m[1]
        }
    }


    public PreFilterCheck canFilter(NoContentDocument document, FilterContext context) {
        return PreFilterCheck.ATTEMPT_FILTER;
    }

    @Override
    public FilterResult filterAsStringDocument(StringDocument document, FilterContext context) throws RuntimeException,
        FilterException {

            // Get a copy of the existing metadata, 
            // so that we preserve the existing metadata
            ListMultimap<String, String> metadata = document.getCopyOfMetadata();

            // Get the documents postCode
            def postCode = getPostcode(document)

            // Lookup the latlong based on the document's postcode
            def latLong = mappings[postcode]

            // Add the latlong to the metadata
            metadata.put("X-FUNNELBACK-LATLONG", latLong);

...

        return FilterResult.of(document.cloneWithMetadata(metadata));
    }
}

See also:

top

Funnelback logo
v15.24.0