Common Problems and Solutions
This section describes some common problems you might come across while using File Content Extraction, and some suggestions for how to solve the problem.
Output does not contain content that you expect
If you do not see content that you expect to see in the output:
-
Check that you are using the right operation: filter, metadata, subfile metadata, detection, or extract.
-
Check your hidden text options, including revision marks. You can use the test programs with different hidden text options to attempt to replicate your problem.
-
Understand the difference between metadata and subfile metadata. See What is Metadata?
-
Check the original file in its native viewer to see where the data is, and what you expect File Content Extraction to output. If you cannot resolve the problem, use this information in a ticket to OpenText support to help resolve the problem as quickly as possible. See Create a Support Ticket.
-
Check whether there are multiple readers for the file type. You might be able to achieve a preferable result with an alternative reader.
Output text is in the wrong encoding
If the output text is in the wrong encoding, check that your expectations are reasonable for the file. For information about character set detection, see Character Set Detection. For example, check whether the input is real text or a test file where the encoding cannot reasonably be detected.
File Content Extraction hangs or stops responding
If you think File Content Extraction is hanging, first check whether it is just processing very slowly. For example:
-
Check your timeout settings for the filter timeout and extract timeout. Increase the timeout to see if the file is processed, which implies that processing is taking a long time rather than hanging.
-
Check the impact of processing steps such as OCR, which can increase the time taken.
-
Check the impact of ExtractImages, which can increase the time taken.
If filtering the text is the part that takes a long time, you might want to try using steaming mode, for example by using the filtertest test program. For example:
filtertest.exe -ch inputfile -
This option sends the filter output to standard output as soon as it is available, to give you a better idea of whether File Content Extraction is doing any processing, rather than waiting for it to filter or fail to filter the whole input file.
If none of these options solves the problem, and you have a reproducible case, send this to OpenText support to help resolve the problem. See Create a Support Ticket.
File Content Extraction crashes or exits unexpectedly
If File Content Extraction crashes, it might indicate a resource problem. In the C API, you can look at the lifetime and memory management sections in the documentation for the API calls that you use, to ensure that you are managing resources appropriately.
Where a crash occurs more than once, contact OpenText support for advice and to attempt to resolve the problem. To make it easier to diagnose and resolve your issue, you can:
-
Provide a reproducible test case, for example with a document or documents that consistently lead to the crash.
-
Provide a crash dump. In this case, a crash dump of kvoop can be easier to analyze than a crash dump of your application, so if you run in process, attempt to reproduce the issue while running out of process.
For more information, see Create a Support Ticket.
File Content Extraction leaks memory
A memory leak is when the average memory use rises steadily over time without dropping back to its earlier level.
If you observe a memory leak, you can first look at your memory management. When you are using the C API, you can look at the lifetime and memory management sections in the documentation for the API calls that you use, to ensure that you are managing resources appropriately.
For persistent problems, contact OpenText support for advice and to attempt to resolve the problem. To make it easier to diagnose and resolve your issue, attempt to provide a reproducible test case. See Create a Support Ticket.
General errors in containers when scaling up threads
When you are using File Content Extraction in a container, if you see lots of KVError_General errors when you scale up the number of threads, check that you have set the system up correctly. For more information, see System Requirements, particularly the information about shared memory limits.
Problems that occur under heavy load
When you encounter problems that occur when File Content Extraction is under heavy load, check for the following issues:
-
is the available memory running low?
-
are the available file handles running low?