Skip to main content

When is the text extracted from the PDF file in Project Builder and in KTM Server?

Article # 3045843 - Page views: 38


When is the text extracted from the PDF file in Project Builder and in KTM Server?

Why do I get different OCR results at runtime?



PDF files may contain a text layer.


As PDF files may contain a text layer you need to find out what is best for the process: Either use the text layer or always perform OCR with an OCR engine.

Project Builder

If you select a PDF file in Project Builder then it tries to detect the text layer and extracts it from the directly. If the text layer was used then you will see that the left button "A" is enabled in the document viewer.

If there is no text layer then the "A"-button is disabled.

Afterwards you can run OCR on the document which will overwrite the previous text.

This is maybe required if the PDF file contains a layout with address information and a logo as background images and the text is not included in the text layer.

  • Note: If the x-document is saved and you open it in the xdoc-browser (context menu of the document) then you can verify what was done.
    If the text was extracted from the PDF file and not by using OCR then the xdoc contains the representation PDFTEXT, e.g. pXDoc.Representations.ItemByName("PDFTEXT"). 


If you have performed OCR and want to check for PDF text then you need to "open the document set" again and select "Source files" with "PDF files". By doing the x-document files will be created again and afterwards checked, if PDF text is available.

KTM Server / Runtime

In the Kofax Capture administration you can select this in the batch class.

Select the "Extended Synchronization Settings" from the batch class menu. On the sheet "Server" you can select the option "Import text from PDF files".

If you modify this setting then you need to publish the batch class afterwards.


Level of Complexity 



Applies to  

Product Version Build Environment Hardware
Kofax Transformation Module  All      


Add any references to other internal or external articles


Article # 3045843