QAID # 122 Published
Question / Problem:
How does the page recognition method of form identification work?
Answer / Solution:
There are a few different things that are taken into account when recognizing a page.
First, you must have a sample page for at least the first page of each form type in the batch class. When separating scanned pages into documents, the features on the sample page are compared to the features on a scanned page.
Next, in the Custom Separation and Form Identification Profiles Dialog Box, select page recognition in the Identification section. Here, you can also set the confidence and difference values used to differentiate the forms. The value that you specify for confidence is the confidence required to conclude that a scanned page matches the sample page. The difference is used to specify how much more confident the forms processing module must be that the best match is better than the next-best match. For example, if I have two form types and my sample page matches Form A with 86% confidence and matches Form B with 75% confidence, then with the default settings, the document will be rejected because it cannot be confidently separated. Both documents would be considered because each has a confidence above the default 70% threshold, but the document will be rejected because the difference in the confidences is 11%, and by default, the confidences must have a difference least 20%.
Finally, you can optionally add a form identification zone to the first sample page of each form type. A form identification zone would typically be used if two forms are similar, but have some identifying text. A form identification zone may use any zonal recognition profile. The confidence returned by the recognition engine must be at least equal to the confidence specified for the form identification zone. If a match string is specified, then the returned value must match that string exactly. (Note that you can use a recognition script to modify the returned value and its confidence.) If you specify form identification zones, then the only form types that are considered are those where the page recognition confidence is above the value specified in the Identification section of the Custom Separation and Form Identification Profiles Dialog Box. If the scanned page does not match a form identification zone for a particular form type, then that form type will not be selected, even if the page recognition confidence was high.