Skip to main content
Kofax

Table Line Items with Variable Number of Lines

17775

Question / Problem:

Table Line Items with Variable Number of Lines

Answer / Solution:

Case

A table can consist of line items with a varying number of text lines in the middle, which will create difficulties for the Table Locator in Manual Mode. Figure 1 shows a table containing line items with three and two text lines. To extract the data of these different line items, two Table Locators, one for the 2-line text and one for the 3-line text, have to be defined. Validation is based on the content of only one table, which means that the contents of both Table Locators should be returned from only one Table Locator.

This example project describes how to combine the contents of two Table Locators, which then returns the results to a Field of type table.

Figure 1: Table with varying number of middle text lines

image

Steps

  1. Create a new project.
  2. Define a class:
    1. Set "Default classification result" to the defined class in project settings
  3. Define a table model with three fields: Quantity, Unit Price and Total Price.
  4. Create a table field in the defined class (Table):
    1. Select the table model defined before.
  5. Download and open the example documents as "Test Documents":
    1. Expand the column Filename under Test Documents so you can see the complete document names.
    2. Select all test documents.
    3. Perform OCR.
    4. Open a test document.
  6. Create a Table Locator and name it "TL_3_Lines", to find the 3 lines Line Items:
    1. Open the table locator properties
    2. Select the table model created above.
    3. Select Manual detection method.
    4. Select Use current sample image.
    5. Define the Master Item for a 3-line Line Item.
    6. Assign the Cells Quantity, Unit Price and Total Price.
    7. Add "Anchor" if necessary, for example, on the "Size" and "Colour" fields.
    8. Test the results (In Document Viewer, select Test Documents; all 3-line Line Items should be highlighted.)
    9. Close "TL_3_Lines" locator.
  7. Create a Table Locator and name it "TL_2_Lines", to find the 2-line Line Items:
    1. Select the table model.
    2. Select Manual detection method.
    3. Define the Master Item for a 2-line Line Item.
    4. Assign the Cells Quantity, Unit Price and Total Price.
    5. Add Anchor if necessary, for example, on the "Size" and "Colour" fields.
    6. Test the results (In Document Viewer, select Test Documents; all 2-line line items should be highlighted.)
      • If the result is not satisfactory (only 2 of 3 "Line Items" are extracted), then go to "Master Item" and select "Many comments per item" (Select this option if your tables might have large gaps caused by comments or other items between the line items).
      • If you test the example document, Example_Document_Tables_Items_Varying_Lines_with_border_other_sequence, and the result is not satisfactory because the first Line Item also contains the table header, please add an Anchor to the "Quantity" field in the "TL_3_Lines" table.
  8. Create a Script Locator and name it "SL_Merging_TL", to combine the contents of both Table Locator into one Table Locator:
    1. Open properties of the Script Locator and select Show Script.
    2. Open object "SL_Merging_TL" and select proc "LocateAlternatives".
    3. Copy and paste the script provided below.
    4. Close the Script Locator properties.

Logic of the Script

If the "TL_2_Lines" table locator finds rows (Line Items), insert the source "Line Item" into the target table locator when the top position of the source "Line Item" is higher than the top position of a "Line Item" in the target table locator.
  1. Assign the "TL_3_Lines" table locator to the field "Table".
  2. Classify and Extract.
  3. Validate:
    1. Check the results as is shown in Figure 2 (All line items are found and listed in the correct sequence in the validation form table).

Figure 2: Validation results

image

Download

Click this link to download the zipped example project: Table Line Items with Variable Number of Lines (Created with KTM V4.5).

Click this link to download the zipped example documents: Example_Document_Tables_Items_Varying_Lines.zip.

Script

Script.txt

Applies to:

Product Version Category
AXPRO 4.0 Project Builder