Question / Problem:
Is it possible to use the existing OCR layer of a PDF in DTS?
Answer / Solution:
Yes, this is possible.
The PDF text extraction is not automatically performed by RPA during the runtime, but can be forced by adding the below Script to the DTS Project.
' Project Script
' Add the PDF text representation on incoming documents (as long as they are PDFs).
Private Sub Document_BeforeProcessXDoc(pXDoc As CASCADELib.CscXDocument)
Dim IsPDF As Boolean
Dim i As Long
Dim Count As Long
IsPDF = False
Count = pXDoc.CDoc.SourceFiles.Count
' Check whether the file is PDF.
For i = 0 To Count - 1
If pXDoc.CDoc.SourceFiles.ItemByIndex(i).FileType = "PDF" Then
IsPDF = True
' If so, add the PDFTEXT representation using default separators.
If IsPDF Then
The script can be viewed/added by right-clicking 'Project Class' in Project Builder and selecting 'Show Script'.