Skip to main content
Kofax

Troubleshooting classification and extraction services

Article # 3021243 - Page views: 491

 

Below are some common topics related to extraction and classification performance and common reasons why performance might be lower than expected.

 

Buyer classification

Issue: Incorrectly identified buyer or no buyer identified.

  • When buyer classification is used, incorrectly identified buyer is usually caused by lack of buyer information in ReadSoft Online. The solution is to add buyer details for each buyer, check Buyer classification in Help for more information.

Issue: Incorrectly identified supplier or no supplier identified. 

  • If the buyer is incorrectly identified and master data is on buyer level (setting in master data service), it will also generate a lower supplier identification rate since the correct buyer needs to be found first, since all buyers has their own set of suppliers and historical learning. In this case there is no use in focusing on the supplier identification issue without first work on buyer classification improvements. 

Issue: Incorrectly identified buyer when buyer classification service is not used. 

  • If buyer classification is not used then incorrect buyer recognition is a matter of either incorrect buyer specified in email settings or the incorrect buyer has been selected during manual web upload. 

 

Supplier identification

Issue:

Incorrectly identified supplier, supplier not identified at all, or correct supplier record frequently generates validation warning. 
 

Possible causes:

  • Buyer has been incorrectly identified, see above.
  • Similar layout and supplier names of the incoming document makes it hard to identify a unique supplier record. 
  • Office user has selected/matched incorrect supplier record, which affects the identification on following documents, until correct supplier record is selected/matched.
  • Supplier record (supplier master data) is frequently erased thus all identification information is reset.
  • "Exclude supplier from automatic identification" for the supplier record has been selected. Consequently the system will not try to identify the supplier record and it has to be manually selected. 
  • Matching supplier name from master data cannot be found on the document and no additional fields is entered in the supplier master data record.
  • Received document is of poor quality, low resolution, inconsistent size (compared to previous document) or not a supported document type.
     

Solution:

Go to Analytics in ReadSoft Online to review if the supplier identification rates are lower than expected. Analytics can be opened on Partner, Customer Group and Customer level. Go to Performance → Extraction → Supplier identification and field extraction.

If the supplier identification is lower than 80% and it cannot be explained by any of the causes above, contact Kofax support. 

More details on how supplier identification works can be found in the Supplier identification page.

 

Field extraction

Issue:

Field not correctly captured or a correct field frequently generates validation warning.
 

Possible causes:

  • Incorrect buyer and/or supplier record has been identified.
  • Same information is captured differently, meaning differently formatted and/or in different locations on the document, for a supplier.
  • Value cannot be found on the document.
  • Net amount, VAT/tax amount and Gross amount are found on different pages of the document.
  • Compact format or regular expression is set on the field in extraction configuration and the value does not match. 
  • Document language is not supported for the document type used.
  • Validation rule(s) triggered, causing warning(s). 
  • Received document is of poor quality, low resolution, inconsistent size (compared to previous document) or not a supported document type.
     

Solution: 

Go to Analytics in ReadSoft Online to review if the field extraction rates are lower than expected. Analytics can be opened on Partner, Customer Group and Customer level. Go to Performance → Extraction → Supplier identification and field extraction.

  • The average extraction rate is the sum of True Positive and False Negative. This should be above 80%.
    • If there is a high rate of false negative, expand the field to check if the issue is related to empty fields. 

If extraction rates are low, check field configuration settings in Extraction service in ReadSoft Online: Customer → Services → Extraction. 

  • Accept empty value.
    • Recommended for optional fields and fields rarely used.
    • Reduces the need for manual verification of empty fields.
  • Regular expression / compact format.
    • Used in extraction to help find the desired value.
    • Field value is validated against the specified format. 
      • Regular expression can be specified for all fields with field type string/text.
      • Compact format can be specified for:
        • Custom fields, if the field type is string/text.
        • BuyerContactPersonName and/or BuyerContactReference, when the customer wishes to capture values containing characters not allowed according to default format, e.g. numeric digits in the name field.
  • Use master data to suggest/validate field values.
    • Field value is validated against imported master data values.
  • Enforce validation.
    • When this is used it is only possible to complete verification of the document by providing a value compliant to validation rule.
       

Please note: Fields are pre-configured with default field format(s) intended to find and extract desired values. The system will learn formats deviating from the default values, but only on a supplier-by-supplier basis. Using regular expression or compact format will help the system find the desired values in every document regardless of the supplier. 

More details about field extraction can be found in the Field extraction page.

 

Line item extraction

Issue:

Line item fields, rows and/or columns are not extracted.
 

Possible causes:

  • Incorrect buyer and/or supplier record has been identified. 
  • Line item table layout differs too much between documents from the supplier. 
  • Document header fields are captured as part of the line item table.
  • Column or cells vary in position and/or size within the line item table. 
  • Received document is of poor quality, low resolution, inconsistent size (compared to previous document) or not a supported document type.
     

Solution:

Line item tables are extracted by using Draw (wizard) in Office. If all lines are not extracted the first time Draw is used, use Draw (wizard) again on the first missing line, until all lines are included. 

Based on the user input the system will attempt to extract line items from the next document (from the supplier).  

 

Level of Complexity

High

 

Applies to  

Product Version Build Environment Hardware
ReadSoft Online Current      

 

  • Was this article helpful?