Skip to main content
Kofax

External OCR Engine

Article # 3036864 - Page views: 13

Issue

External OCR Engine

 

Solution

Extraction - Script - External OCR Engine

 

Case 

KTM uses ABBYY FineReader per default in order to perform OCR Full Text, RecoStar and Kadmos are also available. But if the OCR engines do not provide the results that are required, and it would be really helpful for the project if an external OCR could be used in KTM, then this example project will show you how to do do this.
Screen Shot 2018-09-12 at 6.39.20 AM.png

 

General 
  • The following notes are important:
    • The example script can be used as it is and it will create the necessary items in the XDoc.
      All you have to do is replace the CSV file example for the external OCR engine with a real OCR engine.
    • In the example, the external OCR engine is called in the Document_BeforeProcessXDoc event. Depending on what you want to do with the OCR results, it may make sense to execute the external OCR engine at a different point in time. Other events that make sense are:
      • Document_BeforeProcessXDoc
      • Document_BeforeClassifyText
      • Document_BeforeClassifyXDoc
      • Document_BeforeExtract
      • Document_AfterClassifyImage
      • Document_AfterClassifyXDoc
    • See the Script Documentation for more information.
  • After adding the words from the results of the external OCR engine, it is important to analyse the lines of the representation, and commit the results to the XDoc.
    Representation.AnalyzeLines will automatically oragnise the words on a page and create lines in the XDoc. Lines are important for locator methods.

 

Components 

The project does not make use of any locator methods, apart from a Format Locator for demonstration purposes.

  • Project Script
  • Type
  • Enum
  • Event Document_BeforeProcessXDoc
  • Sub AddWordFromExternalOCREngine

 

Some helpful objects 
' Project classification script
Option Explicit
Type WordStructure
sText As String
lLeft As Long
lTop As Long
lWidth As Long
lHeight As Long
lPageIndex As Long
End Type
Enum enumWordStructure
eTop = 0
eLeft = 1
eWidth = 2
eHeight = 3
ePageIndex = 4
eText = 5
End Enum

Edit the path to the test csv file, the virtual external OCR engine

Document_BeforeProcessXDoc.txt

Subroutine for adding words

Private Sub AddWordFromExternalOCREngine(ByRef XDocRep As CscXDocRepresentation, _
ByRef tWordStructure As WordStructure)
Dim oWord As New CscXDocWord
oWord.Text = tWordStructure.stext
oWord.Top = tWordStructure.ltop
oWord.Left = tWordStructure.lLeft
oWord.Width = tWordStructure.lWidth
oWord.Height = tWordStructure.lHeight
oWord.PageIndex = tWordStructure.lPageIndex
'# Add a word to the representation
XDocRep.Pages(tWordStructure.lPageIndex).AddWord (oWord)
Set oWord = Nothing
End Sub

 

Level of Complexity 

High

 

Applies to  

Product Version Build Environment Hardware
Kofax Transformation Modules

6.3

6.4

     

References

Add any references to other internal or external articles