Skip to content

textify.cfg

Introduction

textify.cfg defines which non-Java based filters will be used to extract plain text from binary documents.

  • Name: textify.cfg
  • Collection location: ~/conf/collection/
  • Global Location: ~/conf/

Caveat

  • External filtering is generally not recommended as a new process is started for each document that is filtered using an external filter. Where possible use Tika to filter binary documents to text.

Configuring external filter programs

The external filters are external programs that convert binary documents into plain text. textify.cfg contains lines specifying filters in the following form:

extension=command

where

  • extension is the file extension of the target file, for example .doc or .pdf.
  • command is the external command to run.

Command parameters

In the value for the filter's command, the following tokens will be replaced by the information on the current file being filtered:

Token Replaced by...
TEXTIFY_INPUT The path to the binary file (input).
TEXTIFY_OUTPUT The path to the plain text file (output).
executable{PROGNAME} The executable program taken from $SEARCH_HOME/conf/executables.cfg (e.g. executable{perl} might map to c:\perl\bin\perl.exe)
$SEARCH_HOME / %SEARCH_HOME% The environment variable $SEARCH_HOME (Linux) or %SEARCH_HOME% (Windows).

If TEXTIFY_INPUT or TEXTIFY_OUTPUT is not specified on the command line, then these files will be used as the command's standard input and standard output respectively.

Textify files

There are usually three textify.cfg files that will be consulted to determine which filters should be used on particular files. These are (in order of precedence):

  • collection specific textify.cfg ( $SEARCH_HOME/conf/COLLECTION_NAME/textify.cfg )
  • system wide textify.cfg ( $SEARCH_HOME/conf/textify.cfg )
  • default textify.cfg ( $SEARCH_HOME/conf/textify.cfg.default )

The textify.cfg.default should not be changed as these are overwritten during an upgrade.

Example

Here is the PDF filter from the standard textify.cfg:

.pdf=executable{perl} $SEARCH_HOME/bin/filter/pdf2html.pl TEXTIFY_INPUT

See also

top

Funnelback logo
v15.24.0