When is the text extracted from the PDF file in Project Builder and in KTM Server?
Issue
When is the text extracted from the PDF file in Project Builder and in KTM Server?
Why do I get different OCR results at runtime?
Cause
PDF files may contain a text layer.
Solution
As PDF files may contain a text layer you need to find out what is best for the process: Either use the text layer or always perform OCR with an OCR engine.
Project Builder
If you select a PDF file in Project Builder then it tries to detect the text layer and extracts it from the directly. If the text layer was used then you will see that the left button "A" is enabled in the document viewer.
If there is no text layer then the "A"-button is disabled.
Afterwards you can run OCR on the document which will overwrite the previous text.
This is maybe required if the PDF file contains a layout with address information and a logo as background images and the text is not included in the text layer.
- Note: If the x-document is saved and you open it in the xdoc-browser (context menu of the document) then you can verify what was done.
If the text was extracted from the PDF file and not by using OCR then the xdoc contains the representation PDFTEXT, e.g. pXDoc.Representations.ItemByName("PDFTEXT").
If you have performed OCR and want to check for PDF text then you need to "open the document set" again and select "Source files" with "PDF files". By doing the x-document files will be created again and afterwards checked, if PDF text is available.
KTM Server / Runtime
In the Kofax Capture administration you can select this in the batch class.
Select the "Extended Synchronization Settings" from the batch class menu. On the sheet "Server" you can select the option "Import text from PDF files".
If you modify this setting then you need to publish the batch class afterwards.
Level of Complexity
Easy
Applies to
Product | Version | Build | Environment | Hardware |
---|---|---|---|---|
Kofax Transformation Module | All |
References
Add any references to other internal or external articles