Skip to content

Document flags

Introduction

Document flags provide the ability to remove documents from an index without reindexing, and can be an efficient way of complying with requirements to quickly remove search results from a public website. This document describes the command line interface to manipulating document flags, which may be useful from automated scripts. To remove documents manually a simple interface is also available at as part of the collection update system.

padre-fl

padre-fl is the program responsible for setting and unsetting document flags, and is located in the bin directory within your funnelback installation. padre-fl supports the following usage:

Usage 1: padre-fl index_stem -clearkill|-killall|-show|-sumry
    
Usage 2: padre-fl index_stem file_of_url_patterns [-exactmatch] -unkill
    
Usage 3: padre-fl index_stem file_of_url_patterns [-exactmatch] -kill

In each case, the index_stem should be set to the path of a Funnelback index, generally of the form install_path/data/collection_name/live/idx/index.

The -show and -sumry options provide an overview of the flags currently set on the index. In show's case, eight flags are shown for each document, representing the following flags. For the purposes of this document, only the second column, representing the 'killed' bit is relevant.

The -clearkill and -killall options will set the kill bit for all documents in the index off or on respectively. clearkill may be useful if kill bits have been set accidentally, and killall provides a quick mechanism for removing all search results.

Usage 2 and 3 allow a set of specific documents to be have their kill bits set of unset, based on a list of URL patterns provided in an input file. By default, the URL patterns in the file are left-anchored (i.e. any URL which starts with the given pattern will be affected), but if the -exactmatch option is used, URLs will only be affected if the exactly match the given pattern.

Note that patterns are simple strings only, and do not support wildcard characters or regular expression type patterns.

Matches will only be performed against canonicalised forms of URLs, as stored in the index.

Non-web collections

In some cases, Funnelback must create a URL for documents which do not otherwise have one (e.g. records from a database collection). The simplest way to identify the URL of a specific document in this case is to view the search.xml output of search results, since the HTML result pages are commonly customised to present a different URL the the Funnelback-generated one.

Pattern files

Kill patterns can be applied automatically during a collection update when defined on a collection-level basis by:

top

Funnelback logo
v15.24.0