Skip to main content
Kofax

PDF File with Text Layer Causes StructureException Error with Format Locator in KTA

3027106

Question / Problem: 

A StructuredException error is produced when testing a format locator with a PDF file with text layer. Here is the full error message.

Execution of activity Extraction in job 'XXXXX' on Kofax Transformation Server on 'XXXXX' 
was aborted due to the following error(s): Error from ExtractionProcess.exe:
BeforeExtraction: BeforeExtract: The execution of a locator method failed. Class = "XXX", Locator = "XXX",
Original error message:
StructuredException. The thread tried to read from or write to a virtual address for which it does not have
the appropriate access.
See Kofax Transformation Server log for details.

Click here to see the format locator configuration that will produce the error when tested. Modify the search text for text that exists on the document.

 

Answer / Solution: 

This error is due to an issue with the 3rd party engine used (Pdfium) to interpret the PDF text layer of documents. It leaves some unreadable OCR which can affect certain locators.

This is not viewed as a Kofax product issue since it is an issue with the 3rd party engine we use to read the PDF text layer. A bug report of this behavior has been submitted to the Developers for review per the following reference.

Bug 1282089: DB Locator fails with error "The thread tried to read from or write to a virtual address for which it does not have the appropriate access"

Enabling the "Ignore PDF text layer" option in the TotalAgility process will prevent the error, but if it's required to access the PDF text layer, the following script will correct the OCR in the layer and allow it to be extracted properly.

Private Sub Document_BeforeExtract(ByVal pXDoc As CASCADELib.CscXDocument)
   Dim representation As CscXDocRepresentation
   Dim words As CscXDocWords
   Dim word As CscXDocWord

   Dim idx As Integer
   Dim wordCount As Integer
   Dim reorganize As Boolean

   Set representation = pXDoc.Representations(0)
   Set words = representation.Words

   wordCount = words.Count
   reorganize = False

   For idx = (wordCount - 1) To 0 Step - 1
      Set word = words(idx)

      If word.Text = "" Or word.Width = 0 Or word.Height = 0 Then
         words.Remove(idx)
         reorganize = True
      End If
   Next

   If reorganize Then
      representation.AnalyzeLines()
   End If
End Sub

It was also seen that running Recognition on the document, such as with FineReader, prior to testing will prevent the error from appearing.

 

Applies to:  

Product Version
KTA All

 

 

  • Was this article helpful?