Skip to main content
Kofax

ShareScan 6.1: More than one language Searchable text recognition [OCR] with Asian and Arabic languages

Question:

I want to OCR a document with Asian and/or Arabic language and non-English language.  ShareScan 6.1 is not recognizing the non-English language. What we can do?  

Answer:

The Asian and Arabic language handling together with another (non-English) language for OCR is not supported in the CSDK version 20, which is the engine used with ShareScan 6.1.  See below for the details of the limitation. Language, Character Set and Code Page Handling Module CCJK and Arabic languages can be recognized one language at a time only (but English characters are automatically enabled), so only the second, Single Language Detection mode is supported when more than one CCJK languages and/or the Arabic one are enabled. Asian recognition module / Application areas This module provides recognition services for four Asian languages with horizontal or vertical text direction; these languages are Japanese, Korean and Chinese - Traditional and Simplified (generally referred as CCJK). In addition this module recognizes Arabic text. It can handle short embedded English texts within either CCJK or Arabic text. Asian recognition module / Language handling The Asian language handling differs somewhat from that for Western languages. Spell checking, editor display and verification are not available for Asian languages. In addition only one Asian language should be set for recognition and Western languages should not be set alongside an Asian language (except English in one case - see next paragraph). - the English language must explicitly be enabled for mixed Arabic / English recognition - Be aware, if the English are enabled together with the Arabic language, then the accuracy of Arabic language recognition is drastically reduced