Classification by Blackness


QAID # 17325 Published

Question / Problem:

Answer / Solution:

It is not possible to classify some documents by layout or by content, because they do not contain a typical layout or content. This script sample shows how to classify a document by blackness. This might be useful for scanned photos, which normally appear very dark on the scanned image. The script gets the average blackness on the image and also checks that the black regions are well distributed over the document. This should avoid conflicts with logos or graphical elements, which might also generate dark regions, but only in a single place of the image.

The DetectBlackImage function works on the first page pages of a document. Internally, it calls DetectBlackImageOnPage, which works on a single page.

Starting with 5.5, the confidence can be specified as the second parameter of the Reclassify method call.

Private Function DetectBlackImage(pXDoc As CASCADELib.CscXDocument) As Boolean
    Dim i As Long
    Dim count As Long
    Dim bResult As Boolean
    ' search for photos on the first 3 pages only
    count = pXDoc.CDoc.Pages.Count
    If count > 3 Then
        count = 3
    End If
    For i = 0 To count - 1
        bResult = DetectBlackImageOnPage(pXDoc.CDoc.Pages(i).GetImage())
        If bResult = True Then
            DetectBlackImage = True
            Exit Function
        End If
    Next i
    DetectBlackImage = False
End Function

Private Function DetectBlackImageOnPage(pImage As CscImage) As Boolean
    ' detects dark regions on a page
    ' this is used to detect class "Foto"
    Dim TileWidth As Long
    Dim TileHeight As Long
    Dim XStart As Long
    Dim YStart As Long
    Dim x As Long
    Dim y As Long
    Dim dBlackness As Double
    Dim BlackTileCount As Long
    ' divide the image in 5 * 7 tiles (ignoring 1/2 tile as border)
    ' we have to check 4 * 6 tiles
    TileWidth = pImage.Width / 5

    TileHeight = pImage.Height / 7

    YStart = TileHeight / 2
    BlackTileCount = 0
    For y = 0 To 5
        XStart = TileWidth / 2
        For x = 0 To 3
            dBlackness = pImage.GetBlackness(XStart, YStart, TileWidth, TileHeight)
            If dBlackness > 0.4 Then
                BlackTileCount = BlackTileCount + 1
            End If
            XStart = XStart + TileWidth
        Next x
        YStart = YStart + TileHeight
    Next y
    If BlackTileCount > 3 Then
        DetectBlackImageOnPage = True
        DetectBlackImageOnPage = False
    End If
End Function

Below is a sample of how it can be called. This sample assumes that "Pictures" is a valid class name of the actual project.

Private Sub Document_AfterClassifyXDoc(pXDoc As CASCADELib.CscXDocument)
    If DetectGraphicLines(pXDoc) = True Then
        Exit Sub
    End If
End Sub

Applies to:

Product Version Category
AXPRO 5.5 Classification
