filter.jsoup.undesirable_text-source (collection.cfg setting)
This setting controls where 'undesirable text' is listed for detection in content auditor.
The format allows for setting several sources to be defined, each with a key name (allowing collections to override the defaults).
The format of the file at the given path is expected to be a list of undesirable word sequences, with newlines separating each sequence. Where multi-word sequences are used, each word should be separated by a single space character. Text versions of HTML entities (e.g.
— instead of
—) should be used where applicable.
Undesirable text files can be created from the administration interface file manager by selecting
undesirable-text.*.cfg from the create menu. To make use of this file, the
file_path must be set to
key_name can be any string as long as it is unique per collection.
This default setting provides a list of commonly misspelled words in English based on Wikipedia's list of common misspellings for machines.
The following overrides the misspellings with a custom file, and also includes an additional set from 'undesirable-text.additional.cfg'.
— etc. e.g. aluminum purple monkey