Skip to main content
Kofax

Scripting - OCR - How to use results of an external OCR Engine

Article # 3036864 - Page views: 332

Issue

How to use an external OCR Engine in a project?

 

Case 

KTM uses Omnipage (6.3: ABBYY FineReader) per default in order to perform OCR Full Text. In KTM 6.3 RecoStar and Kadmos were also available. But if these OCR engines do not provide the results that are required, and it would be really helpful for the project if an external OCR could be used then this example project will show you how to do do this.


Screen Shot 2018-09-12 at 6.39.20 AM.png

 

  • The following notes are important:
    • The example script can be used as it is and it will create the necessary items in the XDoc.
      All you have to do is replace the CSV file example for the external OCR engine with a real OCR engine.
    • In the example, the external OCR engine is called in the Document_BeforeProcessXDoc event. Depending on what you want to do with the OCR results, it may make sense to execute the external OCR engine at a different point in time. Other events that make sense are:
      • Document_BeforeProcessXDoc
      • Document_BeforeClassifyText
      • Document_BeforeClassifyXDoc
      • Document_BeforeExtract
      • Document_AfterClassifyImage
      • Document_AfterClassifyXDoc
    • See the Script Documentation for more information.
  • After adding the words from the results of the external OCR engine, it is important to analyze the lines of the representation, and commit the results to the XDoc.
    Representation.AnalyzeLines will automatically oragnize the words on a page and create lines in the XDoc. Lines are important for locator methods.

Components 

The project does not make use of any locator methods, apart from a Format Locator for demonstration purposes.

  • Project Script
  • Type
  • Enum
  • Event Document_BeforeProcessXDoc
  • Sub AddWordFromExternalOCREngine

 

Some helpful objects

' Project classification script
Option Explicit

Type WordStructure
  sText As String
  lLeft As Long
  lTop As Long
  lWidth As Long
  lHeight As Long
  lPageIndex As Long
End Type

Enum enumWordStructure
  eTop = 0
  eLeft = 1
  eWidth = 2
  eHeight = 3
  ePageIndex = 4
  eText = 5
End Enum

Edit the path to the test csv file, the virtual external OCR engine

Document_BeforeProcessXDoc.txt

Subroutine for adding words:

Private Sub AddWordFromExternalOCREngine(ByRef XDocRep As CscXDocRepresentation, ByRef tWordStructure As WordStructure)
  Dim oWord As New CscXDocWord
  oWord.Text = tWordStructure.stext
  oWord.Top = tWordStructure.ltop
  oWord.Left = tWordStructure.lLeft
  oWord.Width = tWordStructure.lWidth
  oWord.Height = tWordStructure.lHeight
  oWord.PageIndex = tWordStructure.lPageIndex
  '# Add a word to the representation
  XDocRep.Pages(tWordStructure.lPageIndex).AddWord (oWord)
  Set oWord = Nothing
End Sub

Level of Complexity 

High

 

Applies to  

Product Version Build Environment Hardware
Kofax Transformation Modules

6.3

6.4

     

References

Add any references to other internal or external articles

 

  • Was this article helpful?