Skip to main content
Kofax

Dictionary Types for Field Type PDF Generator and OCR Full Text

6583

 

QAID # 6583 Published  

Question / Problem:

What are the different types of Dictionaries I can use in the Field Type, PDF Generator and OCR Full Text setup?

Answer / Solution:

From the Ascent Capture Help File:

There are two types of dictionaries, Full Text Dictionaries and Zonal Dictionaries.

Full Text Dictionaries

This type of dictionary is used by the Kofax PDF Generator module and by OCR Full Text Recognition.

You can create an ASCII text file to be used as an dictionary with any ASCII text editor, but it must be specified at the document class level and it must adhere to these specifications:

  • For Kofax PDF Image + Text, the Full Text dictionary should contain 1000 words or less. While dictionaries with more than 1000 words may be usable, it is recommended that you stay within this limit. Very large dictionaries may impose performance penalties without significantly increasing accuracy. This suggestion does not apply to Advanced OCR Full Text.
  • Each word must contain from 2 to 32 characters.
  • Each word must be on a separate line in the ASCII text file.

The Full Text dictionary should contain terminology specific to the document class, although output from the recognition engine may contain words not found in the dictionary.

The dictionary file must be in a folder accessible to the recognition engines or modules that use it. This may be either on the local machine, via a mapped drive, or a UNC path.

Zonal Dictionaries

In addition to the Full Text dictionary, you can create additional dictionaries to be used with ICR and OCR zone processing. If specified, the dictionary will be used by Ascent Capture’s OCR and ICR recognition engines at data capture time, and could be useful for checking unrecognized words.

The dictionary file must be in a folder accessible to the zonal recognition engines. This may be either on the local machine, via mapped drive, or a UNC path.

Each field type can have only one dictionary, but a dictionary can be used by one or more field types. For example, you could have seven field types and five dictionaries, with one dictionary shared by three field types.

You can create an ASCII text file to be used as a dictionary with any ASCII text editor, but it must be specified at the field type level.

For ICR

If your field type will only be used in conjunction with ICR recognition engines, your dictionary must adhere to these specifications:

  • The Zonal dictionary can have no more than 32,000 words.
  • Each word must be on a separate line in the ASCII text file.

For High Performance or Advanced OCR

If your field type will only be used in conjunction with the high performance OCR recognition engine, your dictionary must adhere to these specifications:

  • The Zonal dictionary can have no more than 32,000 words.
  • Each word must be on a separate line in the ASCII text file.

For Standard OCR

If your field type will only be used in conjunction with the standard OCR recognition engine, your dictionary must adhere to these specifications:

  • The Zonal dictionary should contain 1000 words or less. While dictionaries with more than 1000 words may be usable, it is recommended that you stay within this limit. Very large dictionaries may impose performance penalties without significantly increasing accuracy.
  • Each word must contain from 2 to 32 characters.
  • Each word must be on a separate line in the ASCII text file.

Zonal dictionaries that conform to the above standards can also be used by the ICR and High Performance OCR Recognition engines.

Applies to:

Product Version
CAPTURE 10.0
  10.1
  10.2
  11.0
  • Was this article helpful?