External OCR Engine

External OCR Engine



Extraction - Script - External OCR Engine



KTM uses ABBYY FineReader per default in order to perform OCR Full Text, RecoStar and Kadmos are also available. But if the OCR engines do not provide the results that are required, and it would be really helpful for the project if an external OCR could be used in KTM, then this example project will show you how to do do this.
Screen Shot 2018-09-12 at 6.39.20 AM.png


  • The following notes are important:
    • The example script can be used as it is and it will create the necessary items in the XDoc.
      All you have to do is replace the CSV file example for the external OCR engine with a real OCR engine.
    • In the example, the external OCR engine is called in the Document_BeforeProcessXDoc event. Depending on what you want to do with the OCR results, it may make sense to execute the external OCR engine at a different point in time. Other events that make sense are:
      • Document_BeforeProcessXDoc
      • Document_BeforeClassifyText
      • Document_BeforeClassifyXDoc
      • Document_BeforeExtract
      • Document_AfterClassifyImage
      • Document_AfterClassifyXDoc
    • See the Script Documentation for more information.
  • After adding the words from the results of the external OCR engine, it is important to analyse the lines of the representation, and commit the results to the XDoc.
    Representation.AnalyzeLines will automatically oragnise the words on a page and create lines in the XDoc. Lines are important for locator methods.



The project does not make use of any locator methods, apart from a Format Locator for demonstration purposes.

  • Project Script
  • Type
  • Enum
  • Event Document_BeforeProcessXDoc
  • Sub AddWordFromExternalOCREngine


Some helpful objects 
' Project classification script
Option Explicit
Type WordStructure
sText As String
lLeft As Long
lTop As Long
lWidth As Long
lHeight As Long
lPageIndex As Long
End Type
Enum enumWordStructure
eTop = 0
eLeft = 1
eWidth = 2
eHeight = 3
ePageIndex = 4
eText = 5
End Enum

Edit the path to the test csv file, the virtual external OCR engine


Subroutine for adding words

Private Sub AddWordFromExternalOCREngine(ByRef XDocRep As CscXDocRepresentation, _
ByRef tWordStructure As WordStructure)
Dim oWord As New CscXDocWord
oWord.Text = tWordStructure.stext
oWord.Top = tWordStructure.ltop
oWord.Left = tWordStructure.lLeft
oWord.Width = tWordStructure.lWidth
oWord.Height = tWordStructure.lHeight
oWord.PageIndex = tWordStructure.lPageIndex
'# Add a word to the representation
XDocRep.Pages(tWordStructure.lPageIndex).AddWord (oWord)
Set oWord = Nothing
End Sub


