Kofax TotalAgility Reporting Document Data Retention
Issue
What is important to understand about the reporting document data retention setting in KTA?
Solution
While KTA has always had a setting controlling retention of reporting field data, the setting for retention of reporting document data was introduced in KTA 7.5. To avoid a problem when upgrading from earlier versions where the data always persisted, it has a high default value of 3650 days (ten years). It is preferable to set a lower value if the data does not need to be kept that long, as this will limit growth of the TotalAgility_Reporting database.
These settings can be adjusted in the KTA Designer under System > System Settings > Database, retention, and reporting > Reporting Server > Data retention period.
For KTA 7.5: System Settings > Settings > General > Reporting
Document Data in KTA Reporting
Why is document reporting data not being removed when it meets all of the other removal criteria?
Before KTA 7.8.0.10/7.9.0.4/7.10, a bug caused incorrect criteria for removal of reporting document data and thus deletes much less than expected. To ensure that all expected data is deleted, update to these versions or higher.
When is data removed if it is older than the Document Data Retention Period?
At the end of a successful run of the reporting system task, once per day. Specifically, once the current time in UTC is greater than the value in retention_stamp.dt_last_retention, after which the value will be incremented one day.
What date is used for retention of document data?
In KTA 7.9 and earlier, doc_dim.dt_create_datetime is used to determine whether data about a document should be retained.
In KTA 7.10 and higher, doc_dim.dt_last_proc_datetime is used, which is more consistent with KTA's document retention policy, which uses the document's last access date.
How many documents are deleted per PurgeDocumentData iteration?
Both document and field reporting data retention use the environment variable "PurgeFieldDataBatchSize". In versions before KTA 7.8, the "ETLBatchSize" environment variable was used instead.
Is any other factor considered besides the date?
Only documents that are marked completed are removed, meaning doc_dim.is_processing_completed=1. Currently data from documents not marked completed is never removed under any condition.
What Data is Removed
Data is removed from the following tables based on being related back to a document. For dimension tables like tsf_class_dim, values are removed after they are no longer used by any documents in the reporting data.
- batch_edit_fact
- page_dim
- doc_dim
- doc_accum_fact
- doc_export_fact
- field_accum_fact
- doc_sess_snapshot_fact
- object_audit_fact
- event_data_dim
- batch_sess_snapshot_fact
- tsf_class_dim
- ta_classif_group_dim
- ta_categories_dim
- path_dim
- doc_class_dim
- field_dim
- field_aggregate_fact
- field_column_dim
- batch_dim
- batch_accum_fact
- object_audit_fact
- batch_class_dim
- user_dim
- reject_note_dim
- machine_dim
- station_dim
- group_value_dim
As of KTA 7.9:
- field_changes_fact
Interaction with Field Retention
Field retention will remove data from field_accum_fact when older than the field retention period, and technically does not remove data from other field tables (field_dim, field_column_dim, field_aggregate_fact). However, once the underlying field_accum_fact record is removed, then document retention will remove associated records from field_aggregate_fact, and any records no longer used by current data in field_dim and field_column_dim.
Therefore even though these tables are technically acted on by the document retention process, they are still affected by the shorter field retention period: field_dim, field_column_dim, field_aggregate_fact.
Considerations for Setting a Short Document Data Retention Period
If a shorter period is used for document retention, then some of the same considerations for field data can apply to document data. Essentially, if there is an outage longer than the retention period, then actions need to be taken to avoid a gap in reporting data:
Potential Actions to Take During a Reporting Processing Outage
Potential Actions to Take During an Insight/KAFTA Processing Outage
Level of Complexity
Moderate
Applies to
Product | Version | Build | Environment | Hardware |
---|---|---|---|---|
Kofax TotalAgility | All |