Reject Documents with Symbolic Content
The SymbolicContentFilter task calculates the proportion of symbolic characters in a document. If the proportion of symbolic characters in the document content exceeds the limit specified by the MaxSymbolicCharactersPercent parameter, the document is rejected.
Symbolic characters are defined as any character between U+2000 and U+2FFF.
The SymbolicContentFilter task can be configured as a Post task. The parameters that are passed to the task are specified in a named section of the configuration file. For example:
[ImportTasks] Post0=SymbolicContentFilter:SymbolicContentFilterSettings [SymbolicContentFilterSettings] MaxSymbolicCharactersPercent=8 OnErrorIndexerSections=IdolErrorServer IndexDatabase=Review