Skip to main content

Classification by Graphical Lines


QAID # 17332 Published

Question / Problem:

Classification by Graphical Lines

Answer / Solution:

It is not possible to classify some documents by layout or by content, because they do not contain a typical layout or content. An example is ECG paper (millimeter lines).

This script sample shows how to classify a document by graphical lines. This might be useful for scanned charts or printed diagrams containing a grid pattern as a background. The script gets the number of vertical and horizontal lines on the image. Depending on some thresholds, these numbers are used to make the classification decision.

Note: The following reference has to be set: Kofax Cascade Forms Processing 2.0 - You may need to remove all CASCADELib objects if you use KTM 3.5 and newer (just remove them).

The DetectGraphicLines function works on the first 3 pages of a document. Internally, it calls DetectGraphicLinesOnPage, which works on a single page.

Starting with 5.5, the confidence can be specified as second parameter of the Reclassify method call.

Private Function DetectGraphicLines(pXDoc As CASCADELib.CscXDocument) As Boolean
    Dim i As Long
    Dim count As Long
    Dim bResult As Boolean
    ' search for hor. and vertical lines on the first 3 pages only
    count = pXDoc.CDoc.Pages.Count
    If count > 3 Then
        count = 3
    End If
    For i = 0 To count - 1
        ' if we detect enough graphic lines on any of the first 3 pages, return TRUE
        bResult = DetectGraphicLinesOnPage(pXDoc.CDoc.Pages(i).GetImage())
        If bResult = True Then
            DetectGraphicLines = True
            Exit Function
        End If
    Next i
    DetectGraphicLines = False
End Function
Private Function DetectGraphicLinesOnPage(pImage As CscImage) As Boolean
    ' counts horizontal and vertical lines on a page
    ' this is used to detect class "Zeichnungen"
    Dim pLinesDetection As CscLinesDetection
    Dim xLeft As Long
    Dim xWidth As Long
    Dim yTop As Long
    Dim yHeight As Long
    ' check color format
    If pImage.BitsPerSample <> 1 Or pImage.SamplesPerPixel <> 1 Then
        DetectGraphicLinesOnPage = False
        Exit Function
    End If
    Set pLinesDetection = New CscLinesDetection
    ' set up parameters for lines detection
    pLinesDetection.DetectHorCombs = False
    pLinesDetection.DetectHorDotLines = False
    pLinesDetection.DetectHorLines = True
    pLinesDetection.DetectVerLines = True
    pLinesDetection.MinHorLineLenMM = 40
    pLinesDetection.MinVerLineLenMM = 40
    ' start lines detection, skip a border of 5%
    xLeft = pImage.Width * 0.05
    xWidth = pImage.Width * 0.9
    yTop = pImage.Height * 0.05
    yHeight = pImage.Height * 0.9
    pLinesDetection.DetectLines pImage, xLeft, yTop, xWidth, yHeight
    ' we require more than 8 hor. and vertical lines to return TRUE
    If (pLinesDetection.HorLineCount > 8 And pLinesDetection.VerLineCount > 8) Then
        DetectGraphicLinesOnPage = True
        DetectGraphicLinesOnPage = False
    End If
End Function

Below is a sample how it can be called. This sample assumes that "Charts" is a valid class name of the actual project.

Private Sub Document_AfterClassifyXDoc(pXDoc As CASCADELib.CscXDocument)
    If DetectGraphicLines(pXDoc) = True Then
        Exit Sub
    End If
End Sub

Applies to:

Product Version Category
AXPRO 3.5 Classification
AXPRO 4.0 Classification
AXPRO 4.5 Classification
AXPRO 5.0 Classification
AXPRO 5.5 Classification