QAID # 22090 NOT Published
Please do NOT provide the article number (QAID) and/or URL of this Knowledgebase article or its contents to external customers, as it is NOT Published and/or * INTERNAL ONLY *.
Question / Problem:
When is the text extracted from the PDF file in Project Builder and in KTM Server?
Answer / Solution:
If you select a PDF file in Project Builder then the text is extracted from the PDF file directly. Afterwards you can run OCR on the images which will overwrite the previous text.
This is maybe reqired if the PDF file contains a layout with address information and a logo as background images.
In the Kofax Capture administration you can select this in the batch class.
Select the "Extended Synchronization Settings" from the batch class menu. On the sheet "Server" you can select the option "Import text from PDF files".
If the text was extracted from the PDF file and not by using OCR then the xdoc contains the representation PDFTEXT, e.g. pXDoc.Representations.ItemByName("PDFTEXT"). You will see this in the xdoc-Browser, too.