Question / Problem:
A StructuredException error is produced when testing a format locator with a PDF file with text layer. Here is the full error message.
Execution of activity Extraction in job 'XXXXX' on Kofax Transformation Server on 'XXXXX'
was aborted due to the following error(s): Error from ExtractionProcess.exe:
BeforeExtraction: BeforeExtract: The execution of a locator method failed. Class = "XXX", Locator = "XXX",
Original error message:
StructuredException. The thread tried to read from or write to a virtual address for which it does not have
the appropriate access.
See Kofax Transformation Server log for details.
Click here to see the format locator configuration that will produce the error when tested. Modify the search text for text that exists on the document.
Answer / Solution:
This error is due to an issue with the 3rd party engine used (Pdfium) to interpret the PDF text layer of documents. It leaves some unreadable OCR which can affect certain locators.
This is not viewed as a Kofax product issue since it is an issue with the 3rd party engine we use to read the PDF text layer. A bug report of this behavior has been submitted to the Developers for review per the following reference.
Bug 1282089: DB Locator fails with error "The thread tried to read from or write to a virtual address for which it does not have the appropriate access"
Enabling the "Ignore PDF text layer" option in the TotalAgility process will prevent the error, but if it's required to access the PDF text layer, the following script will correct the OCR in the layer and allow it to be extracted properly.
Private Sub Document_BeforeExtract(ByVal pXDoc As CASCADELib.CscXDocument)
Dim representation As CscXDocRepresentation
Dim words As CscXDocWords
Dim word As CscXDocWord
Dim idx As Integer
Dim wordCount As Integer
Dim reorganize As Boolean
Set representation = pXDoc.Representations(0)
Set words = representation.Words
wordCount = words.Count
reorganize = False
For idx = (wordCount - 1) To 0 Step - 1
Set word = words(idx)
If word.Text = "" Or word.Width = 0 Or word.Height = 0 Then
reorganize = True
If reorganize Then
It was also seen that running Recognition on the document, such as with FineReader, prior to testing will prevent the error from appearing.