Filter example: read a configuration file
Description
This example shows how to read configuration options contained in a custom configuration file from a filter.
Jsoup filter example
For Jsoup filters use the filter's setup context to read in the configuration to an object that can be accessed from the process document method.
The example below initialises a set of filter rules based on configuration specified as JSON contained in the filter-rules.cfg
.
@groovy.util.logging.Log4j2
public class MyFilter implements IJSoupFilter {
// Object holding all the filtering rules
def filterRules
@Override
public void setup(SetupContext context) {
// Read the filtering rules from filter-rules.cfg into an object. The filter rules file for this example is JSON format.
def rulesFile = new File(context.getSearchHome().getAbsolutePath()+"/conf/"+context.getCollectionName()+"/filter-rules.cfg")
filterRules = new JsonSlurper().parseFile(rulesFile, 'UTF-8')
}
@Override
void processDocument(FilterContext context) {
...
Assuming filter-rules.cfg
contains the following:
[
{
"name":"Canonical URL defined",
"check":"ELEMENT_EXISTENCE",
"metaField":"X-FUNNELBACK-CANONICAL",
"selector":"link[rel=canonical]",
"extractValue":false,
"description":"Detects the presence of a canonical URL"
},
{
"name":"Plain English",
"check":"WORD_LIST_COMPARE",
"metaField":"X-FUNNELBACK-PLAIN-ENGLISH",
"wordList":"plain-english",
"selector":"body",
"description":"Check for non plain English expressions"
}
]
The code within the setup context reads this file and populates the filterRules
object that can be accessed from within the filter code.
For example the following code within the processDocument()
method of the filter can be used to access the configuration values:
...
void processDocument(FilterContext context) {
def doc = context.getDocument()
def url = doc.baseUri()
//Apply each rule to the document that is being filtered
filterRules.each {
println "Applying rule '"+it.name+"'' to document '"+url+"'"
if (it.check == "ELEMENT_EXISTENCE") {
checkExistence(it.selector,it.metaField,it.extractValue)
}
else if (it.check == "WORD_LIST_COMPARE") {
wordListCompare(it.selector,it.wordList,it.metaField,doc)
}
}
}
...
General filter example
For general document filters use the filter's constructor to read in the configuration to an object that can be accessed from the filter.
Assuming the mappings.cfg
in the example below contains postcode to latlong mappings:
2601,-35.2809942;149.1192087
2001,-33.847927;150.65179
3001,-37.9712371;144.4927073
Then the following code can be used to read from the mappings to generate a latlong metadata value based on a postcode.
...
@groovy.util.logging.Log4j2
public class CustomFilter implements StringDocumentFilter {
// Custom mappings (populated in constructor)
def mappings = [:]
public CustomFilter(File searchHome, String collectionName) {
// Read the mappings and load these into the filter. In this example comma delimited data is read into a map which can be accessed from the main filter.
def mFile = new File(searchHome.getAbsolutePath()+"/conf/"+collectionName+"/mappings.cfg")
// read each line of the config file and split on a comma
mFile.readLines().each() {
def m = it.split(",")
mappings[m[0]]=m[1]
}
}
public PreFilterCheck canFilter(NoContentDocument document, FilterContext context) {
return PreFilterCheck.ATTEMPT_FILTER;
}
@Override
public FilterResult filterAsStringDocument(StringDocument document, FilterContext context) throws RuntimeException,
FilterException {
// Get a copy of the existing metadata,
// so that we preserve the existing metadata
ListMultimap<String, String> metadata = document.getCopyOfMetadata();
// Get the documents postCode
def postCode = getPostcode(document)
// Lookup the latlong based on the document's postcode
def latLong = mappings[postcode]
// Add the latlong to the metadata
metadata.put("X-FUNNELBACK-LATLONG", latLong);
...
return FilterResult.of(document.cloneWithMetadata(metadata));
}
}