Skip to main content
Kofax

Is it possible to use the existing OCR layer of a PDF in DTS?

3025130

Question / Problem: 

Is it possible to use the existing OCR layer of a PDF in DTS?

Answer / Solution: 

Yes, this is possible.

The PDF text extraction is not automatically performed by RPA during the runtime, but can be forced by adding the below Script to the DTS Project.

 

  1. Option Explicit
  2.  
  3. ' Project Script
  4.  
  5. ' Add the PDF text representation on incoming documents (as long as they are PDFs).
  6. Private Sub Document_BeforeProcessXDoc(pXDoc As CASCADELib.CscXDocument)
  7. Dim IsPDF As Boolean
  8. Dim i As Long
  9. Dim Count As Long
  10.  
  11. IsPDF = False
  12. Count = pXDoc.CDoc.SourceFiles.Count
  13.  
  14. ' Check whether the file is PDF.
  15. For i = 0 To Count - 1
  16. If pXDoc.CDoc.SourceFiles.ItemByIndex(i).FileType = "PDF" Then
  17. IsPDF = True
  18. Exit For
  19. End If
  20. Next i
  21.  
  22. ' If so, add the PDFTEXT representation using default separators.
  23. If IsPDF Then
  24. pXDoc.Representations.CreateFromPDF("")
  25. End If
  26.  
  27. End Sub

The script can be viewed/added by right-clicking 'Project Class' in Project Builder and selecting 'Show Script'.

Applies to:  

Product Version

RPA

10,11

 

 

  • Was this article helpful?