Remove Vertical Text

You may have a case where vertical text printed on an invoice or other image messes up the OCR result. Usually the problem is that the vertical text prevents good text line segmentation. The effect of that is that sometimes the Table Locator, especially when set up in manual mode, has difficulties finding the line items.

Screen Shot 2018-09-12 at 8.32.37 AM.png


We can implement a Script Locator, that is called somewhere in the locator sequence. This Script Locator removes all words on the left border that are smaller than a certain width and then calls a function from the XDoc that re-analyzes the text lines.

Here is the code we put in the class script:

' extraction script for class Invoices
Private Sub DeleteVerticalText(ByVal pXDoc As CASCADELib.CscXDocument)

   Dim i As Integer
   Dim AnchorLeft As Integer

   AnchorLeft = 145

   For i = pXDoc.Words.Count - 1 To 0 Step -1
      Dim oWord As CscXDocWord

      Set oWord = pXDoc.Words(i)

      If oWord.Left + oWord.Width <= AnchorLeft And oWord.Width < 29 Then
      End If

End Sub 


Click this link to download the zipped example project: (Created with V4.0). 

Product Version Category
AXPRO 5.5 Project Builder
