Scripting - OCR - How to use results of an external OCR Engine
Article # 3036864 - Page views: 332
Issue
How to use an external OCR Engine in a project?
Case
KTM uses Omnipage (6.3: ABBYY FineReader) per default in order to perform OCR Full Text. In KTM 6.3 RecoStar and Kadmos were also available. But if these OCR engines do not provide the results that are required, and it would be really helpful for the project if an external OCR could be used then this example project will show you how to do do this.
- The following notes are important:
- The example script can be used as it is and it will create the necessary items in the XDoc.
All you have to do is replace the CSV file example for the external OCR engine with a real OCR engine. - In the example, the external OCR engine is called in the Document_BeforeProcessXDoc event. Depending on what you want to do with the OCR results, it may make sense to execute the external OCR engine at a different point in time. Other events that make sense are:
Document_BeforeProcessXDoc
Document_BeforeClassifyText
Document_BeforeClassifyXDoc
Document_BeforeExtract
Document_AfterClassifyImage
Document_AfterClassifyXDoc
- See the Script Documentation for more information.
- The example script can be used as it is and it will create the necessary items in the XDoc.
- After adding the words from the results of the external OCR engine, it is important to analyze the lines of the representation, and commit the results to the XDoc.
Representation.AnalyzeLines
will automatically oragnize the words on a page and create lines in the XDoc. Lines are important for locator methods.
Components
The project does not make use of any locator methods, apart from a Format Locator for demonstration purposes.
Project Script
Type
Enum
Event Document_BeforeProcessXDoc
Sub AddWordFromExternalOCREngine
Some helpful objects
' Project classification script Option Explicit Type WordStructure sText As String lLeft As Long lTop As Long lWidth As Long lHeight As Long lPageIndex As Long End Type Enum enumWordStructure eTop = 0 eLeft = 1 eWidth = 2 eHeight = 3 ePageIndex = 4 eText = 5 End Enum
Edit the path to the test csv file, the virtual external OCR engine
Document_BeforeProcessXDoc.txt
Subroutine for adding words:
Private Sub AddWordFromExternalOCREngine(ByRef XDocRep As CscXDocRepresentation, ByRef tWordStructure As WordStructure) Dim oWord As New CscXDocWord oWord.Text = tWordStructure.stext oWord.Top = tWordStructure.ltop oWord.Left = tWordStructure.lLeft oWord.Width = tWordStructure.lWidth oWord.Height = tWordStructure.lHeight oWord.PageIndex = tWordStructure.lPageIndex '# Add a word to the representation XDocRep.Pages(tWordStructure.lPageIndex).AddWord (oWord) Set oWord = Nothing End Sub
Level of Complexity
High
Applies to
Product | Version | Build | Environment | Hardware |
---|---|---|---|---|
Kofax Transformation Modules |
6.3 6.4 |
References
Add any references to other internal or external articles