Question / Problem:
Documents are being automatically rotated in the wrong direction by OCR. There is some sideways text, but there is much more text in the correct direction — why is it choosing this orientation?
Answer / Solution:
When the "Automatic Rotation" feature is enabled, the OCR engines in KTM will perform a "quick" OCR on a few lines in each direction to see which is the first acceptable orientation.
On some documents where there is sideways text that is closer to the edge of the image than the main text (such as fax sheets), the OCR engine will rotate to this orientation.
This is an unfortunate side-effect of a reasonable design because the engine cannot afford to spend too long recognizing enough text lines to know the relative proportions; it is optimized for speed, so only analyzes the lines close to the edge.
Such document layouts are unfortunate enough to fall into the small proportion of images that are mis-rotated, and experience the bad side of this design, but should be considered an edge case.
The workaround is to use a script to remove a small border from the image (in memory), forcing the automatic rotation to look deeper in the image, where there is more good text than sideways text.
The example code below can be added to the Project script.
NOTE: This will only work at runtime, not in Project Builder testing.
Private Sub Document_BeforeClassifyXDoc(ByVal pXDoc As CASCADELib.CscXDocument, ByRef bSkip As Boolean) Dim oImage As CscImage Dim lMargin As Long lMargin = 100 'Get current image for page 1 Set oImage = pXDoc.CDoc.Pages(0).GetImage() 'Erase a margin around the edge of the image oImage.EraseRect 0, 0, lMargin, oImage.Height oImage.EraseRect oImage.Width-lMargin, 0, lMargin, oImage.Height oImage.EraseRect 0, 0, oImage.Width, lMargin oImage.EraseRect 0, oImage.Height-lMargin, oImage.Width, lMargin 'Clean up memory Set oImage = Nothing End Sub