Skip to content

Built-in filters: DocumentFixerFilterProvider

Introduction

This filter analyses the document's title and attempts to replace it if the title is not considered a good title.

The filter only processes titles of HTML documents.

Enabling and disabling the document title fixer

The document title fixer (DocumentFixerFilterProvider) is enabled by default and included in the default filter chain.

To enable the document title fixer on a custom filter chain add DocumentFixerFilterProvider to the collection's filter.classes after the custom filter.

Example:

filter.classes=TikaFilterProvider:JSoupProcessingFilterProvider:myCustomFilter:DocumentFixerFilterProvider

To disable the document title fixer remove DocumentFixerFilterProvider from the filter chain.

Filter options

collection.cfg option Description
filter.document_fixer.timeout_ms Configures the maximum amount of time to spend fixing a single title.

See also:

top

Funnelback logo
v15.24.0