Skip to main content
Kofax

External OCR Engine

17338

QAID # 17338 Published * INTERNAL ONLY * 

QAID # 17338 Published * INTERNAL ONLY *
Please do NOT provide the article number (QAID) and/or URL of this Knowledgebase article or its contents to external customers, as it is NOT Published and/or * INTERNAL ONLY

Question / Problem:

External OCR Engine

Answer / Solution:

Extraction - Script - External OCR Engine

Download

Click this link to download the zipped example project: (Created with KTM V3.5).

Case

KTM uses ABBYY FineReader per default in order to perform OCR Full Text, RecoStar and Kadmos are also available. But if the OCR engines do not provide the results that are required, and it would be really helpful for the project if an external OCR could be used in KTM, then this example project will show you how to do do this.
Screen Shot 2018-09-12 at 6.39.20 AM.png

General

  • The following notes are important:
    • The example script can be used as it is and it will create the necessary items in the XDoc.
      All you have to do is replace the CSV file example for the external OCR engine with a real OCR engine.
    • In the example, the external OCR engine is called in the Document_BeforeProcessXDoc event. Depending on what you want to do with the OCR results, it may make sense to execute the external OCR engine at a different point in time. Other events that make sense are:
      • Document_BeforeProcessXDoc
      • Document_BeforeClassifyText
      • Document_BeforeClassifyXDoc
      • Document_BeforeExtract
      • Document_AfterClassifyImage
      • Document_AfterClassifyXDoc
    • See the Script Documentation for more information.
  • After adding the words from the results of the external OCR engine, it is important to analyse the lines of the representation, and commit the results to the XDoc.
    Representation.AnalyzeLines will automatically oragnise the words on a page and create lines in the XDoc. Lines are important for locator methods.

Components

The project does not make use of any locator methods, apart from a Format Locator for demonstration purposes.

  • Project Script
  • Type
  • Enum
  • Event Document_BeforeProcessXDoc
  • Sub AddWordFromExternalOCREngine

Some helpful objects

' Project classification script
Option Explicit
Type WordStructure
sText As String
lLeft As Long
lTop As Long
lWidth As Long
lHeight As Long
lPageIndex As Long
End Type
Enum enumWordStructure
eTop = 0
eLeft = 1
eWidth = 2
eHeight = 3
ePageIndex = 4
eText = 5
End Enum

Edit the path to the test csv file, the virtual external OCR engine

Document_BeforeProcessXDoc.txt

Subroutine for adding words

Private Sub AddWordFromExternalOCREngine(ByRef XDocRep As CscXDocRepresentation, _
ByRef tWordStructure As WordStructure)
Dim oWord As New CscXDocWord
oWord.Text = tWordStructure.stext
oWord.Top = tWordStructure.ltop
oWord.Left = tWordStructure.lLeft
oWord.Width = tWordStructure.lWidth
oWord.Height = tWordStructure.lHeight
oWord.PageIndex = tWordStructure.lPageIndex
'# Add a word to the representation
XDocRep.Pages(tWordStructure.lPageIndex).AddWord (oWord)
Set oWord = Nothing
End Sub

Applies to:

Product

Version

Category

AXPRO

3.5

Configuration

AXPRO

4.0

Configuration

AXPRO

4.5

Configuration

AXPRO

5.0

Configuration

AXPRO

5.5

Configuration

  • Was this article helpful?