Classification by Blackness
QAID # 17325 Published
Question / Problem:
Classification by Blackness
Answer / Solution:
It is not possible to classify some documents by layout or by content, because they do not contain a typical layout or content. This script sample shows how to classify a document by blackness. This might be useful for scanned photos, which normally appear very dark on the scanned image. The script gets the average blackness on the image and also checks that the black regions are well distributed over the document. This should avoid conflicts with logos or graphical elements, which might also generate dark regions, but only in a single place of the image.
The DetectBlackImage
function works on the first page pages of a document. Internally, it calls DetectBlackImageOnPage
, which works on a single page.
Starting with 5.5, the confidence can be specified as the second parameter of the Reclassify
method call.
Private Function DetectBlackImage(pXDoc As CASCADELib.CscXDocument) As Boolean Dim i As Long Dim count As Long Dim bResult As Boolean ' search for photos on the first 3 pages only count = pXDoc.CDoc.Pages.Count If count > 3 Then count = 3 End If For i = 0 To count - 1 bResult = DetectBlackImageOnPage(pXDoc.CDoc.Pages(i).GetImage()) If bResult = True Then DetectBlackImage = True Exit Function End If Next i DetectBlackImage = False End Function Private Function DetectBlackImageOnPage(pImage As CscImage) As Boolean ' detects dark regions on a page ' this is used to detect class "Foto" Dim TileWidth As Long Dim TileHeight As Long Dim XStart As Long Dim YStart As Long Dim x As Long Dim y As Long Dim dBlackness As Double Dim BlackTileCount As Long ' divide the image in 5 * 7 tiles (ignoring 1/2 tile as border) ' we have to check 4 * 6 tiles TileWidth = pImage.Width / 5 TileHeight = pImage.Height / 7 YStart = TileHeight / 2 BlackTileCount = 0 For y = 0 To 5 XStart = TileWidth / 2 For x = 0 To 3 dBlackness = pImage.GetBlackness(XStart, YStart, TileWidth, TileHeight) If dBlackness > 0.4 Then BlackTileCount = BlackTileCount + 1 End If XStart = XStart + TileWidth Next x YStart = YStart + TileHeight Next y If BlackTileCount > 3 Then DetectBlackImageOnPage = True Else DetectBlackImageOnPage = False End If End Function
Below is a sample of how it can be called. This sample assumes that "Pictures" is a valid class name of the actual project.
Private Sub Document_AfterClassifyXDoc(pXDoc As CASCADELib.CscXDocument) If DetectGraphicLines(pXDoc) = True Then pXDoc.Reclassify("Pictures") Exit Sub End If ... End Sub
Applies to:
Product | Version | Category |
---|---|---|
AXPRO | 5.5 | Classification |