Skip to main content
Kofax

Remove Vertical Text

17779

QAID # 17779 Published

Question / Problem:

Remove Vertical Text

Answer / Solution:

Case

You may have a case where vertical text printed on an invoice or other image messes up the OCR result. Usually the problem is that the vertical text prevents good text line segmentation. The effect of that is that sometimes the Table Locator, especially when set up in manual mode, has difficulties finding the line items.

Screen Shot 2018-09-12 at 8.32.37 AM.png

Solution

We can implement a Script Locator, that is called somewhere in the locator sequence. This Script Locator removes all words on the left border that are smaller than a certain width and then calls a function from the XDoc that re-analyzes the text lines.

Here is the code we put in the class script:

' extraction script for class Invoices
Private Sub DeleteVerticalText(ByVal pXDoc As CASCADELib.CscXDocument)

   Dim i As Integer
   Dim AnchorLeft As Integer

   AnchorLeft = 145

   For i = pXDoc.Words.Count - 1 To 0 Step -1
      Dim oWord As CscXDocWord

      Set oWord = pXDoc.Words(i)

      If oWord.Left + oWord.Width <= AnchorLeft And oWord.Width < 29 Then
         pXDoc.Representations(0).Words.Remove(i)

End If
      pXDoc.Representations(0).AnalyzeLines()
   Next

End Sub 

Download

Click this link to download the zipped example project: VerticalText.zip (Created with V4.0). 

Product Version Category
AXPRO 5.5 Project Builder
  • Was this article helpful?