Question / Problem:
The KTA API function CaptureDocumentService.GetDocumentFile takes a fileType parameter and will try to return the requested file type if available. Notably, when requesting a PDF, this could return the original document source if it was created from a PDF, or it can return a PDF that has previously been created from a PDF Generation activity. PDFs are not created on demand by this API, so one of these must exist for data to be returned. This also means that it is possible to get a PDF from this API that does not represent the current state of the document. This is best explained by an example:
- Imagine you import two single-page PDFs as two documents, therefore the document/source file is a single page PDF for each.
- The job goes through Image Processing, so each job has a single page object (tiff), and the document/source files remain a single page PDF for each.
- Now imagine the documents are merged in Validation or by API: There is now a single document with two page objects (tiffs).
- What happened to the two source documents (each a single page PDF)? They are unchanged, which is to say the document being merged into (merge destination) STILL has the same single page PDF. The document that was the merge source is deleted after the merge, so the second single page PDF no longer exists.
- You can still retrieve this document’s source PDF, but it is no longer a logical representation of the merged document (single page PDF, two page tiff document).
- Now if you run this merged document through PDF generation, it will create a two page PDF. However, instead of replacing the source document, this is stored as a Document Extension (Kofax.CEBPM.PdfRepresentation).
- Now that there are two PDFs that represent the document in different ways, which should be returned when you call GetDocumentFile(sessionId, null, docID, ”PDF”)?
Answer / Solution:
Ideally, more specific API functions can be used to get the specific data needed. Frequently GetDocumentFile is used to try to get the PDF that was generated in a PDF Generation activity. There are two ways to target this data more specifically.
- CaptureDocumentService.GetBinaryExtension can be called for a given document ID with a name value of "Kofax.CEBPM.PdfRepresentation". This will directly retrieve the PDF that was generated for this document.
- CaptureDocumentService.DeleteSourceFile can be called if the source file is no longer needed. Then a subsequent call to GetDocumentFile can no longer consider returning a source PDF file, and can only consider a generated PDF.
However if the goal is instead to specifically get the original source file, then the CaptureDocumentService.GetSourceFile function can be used.