Supported file formats
Documents sent to AP Essentials (formerly ReadSoft Online) may be of various formats, and they may be sent in as electronically generated PDFs, scanned images/PDFs or XML files. A complete list of all supported file formats can be found in AP Essentials (formerly ReadSoft Online) help.
Electronically created PDFs
Electronically created PDFs have images of high quality, embedded text layer and minimal chance of character misinterpretation.
Instead of performing OCR on the images AP Essentials (formerly ReadSoft Online) reads the embedded text layer.
Multiple overlapping text layers or fonts not embedded in the PDF may cause issues when reading the text layer.
PDF documents that require a password to open or decrypt are also not supported. However, Digitally Signed PDF documents should not cause a problem with extraction and processing, unless there are other underlying issues within the PDF.
A document may be scanned as an image (JPEG, TIF, PNG) or as a PDF document. The accuracy of the OCR will depend on the quality of the scanned document as well as scanner settings.
If auto rotation is enabled, invoices in landscape or upside down will automatically be rotated correctly in AP Essentials (formerly ReadSoft Online).
Recommended resolution is at least 300 DPI. For Asian languages, like Chinese, at least 400 DPI is required.
File compression is a type of data compression that creates a smaller version of a file to allow for easier sharing over a network or internet connection. Generally, it will have a greater impact on image-based PDFs or image files like JPEGs and PNGs, rather than text-only documents.
Image compression can be lossy or lossless. Lossless is preferred for OCR since it does not affect image quality. Lossless is mostly used for TIFF. Lossy is mainly used for JPEG and, depending on the method, it may affect OCR results. Examples of text with Lossy compression below. High compression rates will increase the risk of character misinterpretation.
Text layer in scanned documents
Some scanners have built in OCR functionality which embeds a text layer in the scanned document. If this functionality is used, AP Essentials (formerly ReadSoft Online) treats this as an electronic generated PDF and reads the text layer instead of using OCR. The text layer/OCR output from scanners varies in quality, meaning even if the image looks correct there might be characters in the text layer that have been incorrectly recognized by the scanner.
If there is a problem with character interpretation and the scanner's OCR function is used, it is recommended to turn it off and let AP Essentials (formerly ReadSoft Online) perform OCR.
XML documents contain structured information. Tags in the XML file are mapped to specific fields in AP Essentials (formerly ReadSoft Online). No OCR is performed. Only mapped tags will be available in AP Essentials (formerly ReadSoft Online), custom fields cannot be used to map additional content from the XML document. Different XML formats have different mappings and a full list of supported formats and mapped fields can be found in AP Essentials (formerly ReadSoft Online) help.
It is recommended to use the XML document type when receiving XML documents.
Level of Complexity
|AP Essentials (formerly ReadSoft Online)||Current|
Supported input file formats