Question / Problem:
Training for Kofax TotalAgility (KTA) Transformation Data Extraction is being configured in Transformation Designer, but the documents are not getting assigned a Layout ID.
Why are the documents not assigned a Layout ID?
Answer / Solution:
The Layout ID is only used for the Specific (or Both) training types, so if only the Generic type is selected, no Layout ID will show and Trained will show as "No." This applies to both Extraction only and Shared projects.
Here is some documentation that explains the difference between the 2 training types and the use of Templates.
The Generic approach is not widely used in current solutions and has not changed for many years. It is mainly based on keywords and allows a locator to extract data even from unknown layouts/vendors. It’s basically a pre-trained DB so all the documents would have to match some random document for the training.
The crucial information here is it currently only matches Keywords that are at the top and/or to the left of the data. There are current TFS around this at the moment which may come in useful : Enhancement Request 1204320:(MUFG) - Generic training algorithm to support BTMU samples. Bug 1204482:Generic learning does not work for alpha-only values (works for numeric/alpha-numeric).
For Specific Type, this type learns the layout of a sample document and only applies the extraction to another document with the same layout. This is typically true for invoices from the same vendor. Because of the trained layout, specific training can use a combination of restricted keywords, fixed positions and a known field format. In combination, these provide significantly better recognition and accuracy than generic extraction. However, this type of training only works for trained layouts.
Please find below the concepts behind Specific training documents.
A Virtual Class defines a cluster of documents with similar layouts. Usually a Virtual Class is related to a specific vendor of an invoice. The invoice layouts of the vendors are different although they belong to the same class in Transformation Designer, but for the specific training we have to distinguish those layouts by a Virtual Class. It is called ‘virtual’ because the user should not be concerned with virtual classes, they are just needed internally by the specific training algorithm.
When a document is being trained it will be classified in order to assign it to a virtual class. If the layout of the document is not matched a new virtual class for this document will be created.
The virtual class identifier will be converted to an integer value in the TD. It is displayed as a column ‘Layout ID’ in the extraction set view.
A Template defines a set of fields for a specific locator. This means that for every trainable locator at least one Template will be created during training. A template is assigned to a Virtual Class.
When a document is being trained and is assigned to a Virtual Class, the algorithm tries to find a matching template for the document by extracting fields for the current locator. If a template was found the document is assigned to this template. If not, a new template will be created based on the document.
In one example, all trained documents are in the same Virtual Class but three templates are created. Template 2 was created because the documents 4, 5, 6 had an additional field ‘Customer ID’. Template 3 was created, because the field Customer ID on the documents 7, 8, 9 had a different position than with documents 4, 5, 6.
A template consists of a set of Template Fields. A Template Field basically stores position, keyword, the index of the page and format information for a field. This information is used by the position and keywords based extraction methods.
Normally, if the Specific Type is failing to extract, if you take a look at your test documents and compare them with the trained document then you may see the same Virtual Class / Layout, but there might be differences in the detail of the documents. for example, different table columns, different text (different keywords around the amounts) etc. Thus the training of the two documents may result in the same Virtual Classes/Layouts or but have two different templates.
Please also see the attached files that show no Layout ID generated when using the Generic type, the Locator Properties where this option is configured, and a Layout ID generated when using the Specific type.