Skip to content

Filter example: add metadata

Description

Filters can be used to add metadata to documents. Although in this example StringDocumentFilter is implemented, both ByteDocumentFilter and Filter can be used to add metadata to a document.

Example

In this example we count the number of occurrences of the word 'Elvis' and store this count in the document metadata. This example implements the StringDocumentFilter. We are required to implement canFilter(), used to check if the filter should be run, as well as filterAsBytesDocument() which contains the logic for the filter.

This example also has a simple test method which test can be executed by running the main method see testing Groovy filters.

package com.myfilters;

import java.util.*;
import org.junit.*;
import org.junit.Test;
import com.funnelback.filter.api.*;
import com.funnelback.filter.api.documents.*;
import com.funnelback.filter.api.filters.*;
import com.funnelback.filter.api.mock.*;
import com.google.common.collect.ListMultimap;

@groovy.util.logging.Log4j2
public class CountElvisOccurrencesFilter implements StringDocumentFilter {

    @Override
    public PreFilterCheck canFilter(NoContentDocument document, FilterContext context) {
        //Always run filter
        return PreFilterCheck.ATTEMPT_FILTER;
    }

    @Override
    public FilterResult filterAsStringDocument(StringDocument document, FilterContext context) {
        //Work out how many times Elvis appears.
        int elvisCount = document.getContentAsString()
                                 .toLowerCase()
                                 .split("elvis")
                                 .length - 1;

        log.debug("Found: " + elvisCount + " counts of Elvis in " + document.getURI());

        //Ensure we get the existing metadata from the document, to preserve existing
        //metadata
        ListMultimap<String, String> metadata = document.getCopyOfMetadata();

        //The metadata value is an array, we first remove all entries from that
        //array before adding a single count to the array.
        metadata.removeAll("elvis-count");
        metadata.put("elvis-count", Integer.toString(elvisCount));

        //Create a new document with the updated metadata.
        StringDocument documentWithElvisCount = document.cloneWithMetadata(metadata);

        return FilterResult.of(documentWithElvisCount);
    }

    /*
     * Below are filter test methods
     */
    public static class FilterTest {

        @Test
        public void countElvisManyOccurrencesTest() throws Exception {
            //Create the input document to be filtered.
            StringDocument inputDocument = MockDocuments.mockEmptyStringDoc()
                                            .cloneWithStringContent(DocumentType.MIME_UNKNOWN, 
                                                "Perhaps Elvis should be referred to as King elvis.");

            //Create the filter and filter the document.
            FilterResult filterResult = new CountElvisOccurrencesFilter()
                                                .filter(inputDocument, MockFilterContext.getEmptyContext());

            //Get the filtered document.
            FilterableDocument filteredDocument = filterResult.getFilteredDocuments().get(0);

            //Get the 'elvis-count' metadata. In filters the metadata values are always arrays.
            List<String> elvisCounts = filteredDocument.getCopyOfMetadata().get("elvis-count"); 

            Assert.assertEquals("Should have exactly one value for metadata 'elvis-count'", 
                                1, elvisCounts.size());

            Assert.assertEquals("Should have counted the term 'Elvis' twice", 2, Integer.parseInt(elvisCounts.get(0)));
        }
    }

    //Running the main method will execute the test methods.
    public static void main(String[] args) throws Exception {
        FilterTestRunner.runTests(FilterTest.class);
    }

}

Metadata mapping

The above example added metadata with the name 'elvis-count' to the document. For the metadata to be available, it needs to be added to the collection's metadata mappings.

See also:

top

Funnelback logo
v15.24.0