*** Archive *** Classification by Graphical Lines
QAID # 17332 Published
Question / Problem:
Classification by Graphical Lines
Answer / Solution:
It is not possible to classify some documents by layout or by content, because they do not contain a typical layout or content. An example is ECG paper (millimeter lines).
This script sample shows how to classify a document by graphical lines. This might be useful for scanned charts or printed diagrams containing a grid pattern as a background. The script gets the number of vertical and horizontal lines on the image. Depending on some thresholds, these numbers are used to make the classification decision.
Note: The following reference has to be set: Kofax Cascade Forms Processing 2.0 - You may need to remove all
CASCADELib
objects if you use KTM 3.5 and newer (just remove them).
The DetectGraphicLines
function works on the first 3 pages of a document. Internally, it calls DetectGraphicLinesOnPage
, which works on a single page.
Starting with 5.5, the confidence can be specified as second parameter of the Reclassify
method call.
Private Function DetectGraphicLines(pXDoc As CASCADELib.CscXDocument) As Boolean Dim i As Long Dim count As Long Dim bResult As Boolean ' search for hor. and vertical lines on the first 3 pages only count = pXDoc.CDoc.Pages.Count If count > 3 Then count = 3 End If For i = 0 To count - 1 ' if we detect enough graphic lines on any of the first 3 pages, return TRUE bResult = DetectGraphicLinesOnPage(pXDoc.CDoc.Pages(i).GetImage()) If bResult = True Then DetectGraphicLines = True Exit Function End If Next i DetectGraphicLines = False End Function Private Function DetectGraphicLinesOnPage(pImage As CscImage) As Boolean ' counts horizontal and vertical lines on a page ' this is used to detect class "Zeichnungen" Dim pLinesDetection As CscLinesDetection Dim xLeft As Long Dim xWidth As Long Dim yTop As Long Dim yHeight As Long ' check color format If pImage.BitsPerSample <> 1 Or pImage.SamplesPerPixel <> 1 Then DetectGraphicLinesOnPage = False Exit Function End If Set pLinesDetection = New CscLinesDetection ' set up parameters for lines detection pLinesDetection.DetectHorCombs = False pLinesDetection.DetectHorDotLines = False pLinesDetection.DetectHorLines = True pLinesDetection.DetectVerLines = True pLinesDetection.MinHorLineLenMM = 40 pLinesDetection.MinVerLineLenMM = 40 ' start lines detection, skip a border of 5% xLeft = pImage.Width * 0.05 xWidth = pImage.Width * 0.9 yTop = pImage.Height * 0.05 yHeight = pImage.Height * 0.9 pLinesDetection.DetectLines pImage, xLeft, yTop, xWidth, yHeight ' we require more than 8 hor. and vertical lines to return TRUE If (pLinesDetection.HorLineCount > 8 And pLinesDetection.VerLineCount > 8) Then DetectGraphicLinesOnPage = True Else DetectGraphicLinesOnPage = False End If End Function
Below is a sample how it can be called. This sample assumes that "Charts" is a valid class name of the actual project.
Private Sub Document_AfterClassifyXDoc(pXDoc As CASCADELib.CscXDocument) If DetectGraphicLines(pXDoc) = True Then pXDoc.Reclassify("Charts") Exit Sub End If ... End Sub
Applies to:
Product | Version | Category |
---|---|---|
AXPRO | 3.5 | Classification |
AXPRO | 4.0 | Classification |
AXPRO | 4.5 | Classification |
AXPRO | 5.0 | Classification |
AXPRO | 5.5 | Classification |