QAID # 17779 Published
Question / Problem:
Remove Vertical Text
Answer / Solution:
You may have a case where vertical text printed on an invoice or other image messes up the OCR result. Usually the problem is that the vertical text prevents good text line segmentation. The effect of that is that sometimes the Table Locator, especially when set up in manual mode, has difficulties finding the line items.
We can implement a Script Locator, that is called somewhere in the locator sequence. This Script Locator removes all words on the left border that are smaller than a certain width and then calls a function from the XDoc that re-analyzes the text lines.
Here is the code we put in the class script:
' extraction script for class Invoices Private Sub DeleteVerticalText(ByVal pXDoc As CASCADELib.CscXDocument) Dim i As Integer Dim AnchorLeft As Integer AnchorLeft = 145 For i = pXDoc.Words.Count - 1 To 0 Step -1 Dim oWord As CscXDocWord Set oWord = pXDoc.Words(i) If oWord.Left + oWord.Width <= AnchorLeft And oWord.Width < 29 Then pXDoc.Representations(0).Words.Remove(i) End If Next pXDoc.Representations(0).AnalyzeLines() End Sub
Click this link to download the zipped example project: VerticalText.zip (Created with V4.0).