Skip to main content
Kofax

Kofax TotalAgility Reporting Document Data Retention

Article # 3036696 - Page views: 406

Issue

What is important to understand about the reporting document data retention setting in KTA?

 

Solution

While KTA has always had a setting controlling retention of reporting field data, the setting for retention of reporting document data was introduced in KTA 7.5.  To avoid a problem when upgrading from earlier versions where the data always persisted, it has a high default value of 3650 days (ten years).  It is preferable to set a lower value if the data does not need to be kept that long, as this will limit growth of the TotalAgility_Reporting database.

These settings can be adjusted in the KTA Designer under System > System Settings > Database, retention, and reporting > Reporting Server > Data retention period.

For KTA 7.5: System Settings > Settings > General > Reporting

Document Data in KTA Reporting

Why is document reporting data not being removed when it meets all of the other removal criteria?

Before KTA 7.8.0.10/7.9.0.4/7.10, a bug caused incorrect criteria for removal of reporting document data and thus deletes much less than expected.  To ensure that all expected data is deleted, update to these versions or higher.

 

When is data removed if it is older than the Document Data Retention Period?

At the end of a successful run of the reporting system task, once per day.  Specifically, once the current time in UTC is greater than the value in retention_stamp.dt_last_retention, after which the value will be incremented one day.

 

What date is used for retention of document data?

In KTA 7.9 and earlier, doc_dim.dt_create_datetime is used to determine whether data about a document should be retained. 

In KTA 7.10 and higher, doc_dim.dt_last_proc_datetime is used, which is more consistent with KTA's document retention policy, which uses the document's last access date.

 

How many documents are deleted per PurgeDocumentData iteration?

Both document and field reporting data retention use the environment variable "PurgeFieldDataBatchSize". In versions before KTA 7.8, the "ETLBatchSize" environment variable was used instead.

 

Is any other factor considered besides the date?

Only documents that are marked completed are removed, meaning doc_dim.is_processing_completed=1.  Currently data from documents not marked completed is never removed under any condition.

What Data is Removed

Data is removed from the following tables based on being related back to a document.  For dimension tables like tsf_class_dim, values are removed after they are no longer used by any documents in the reporting data.

  • batch_edit_fact
  • page_dim
  • doc_dim
  • doc_accum_fact
  • doc_export_fact
  • field_accum_fact
  • doc_sess_snapshot_fact
  • object_audit_fact
  • event_data_dim
  • batch_sess_snapshot_fact
  • tsf_class_dim
  • ta_classif_group_dim
  • ta_categories_dim
  • path_dim
  • doc_class_dim
  • field_dim
  • field_aggregate_fact
  • field_column_dim
  • batch_dim
  • batch_accum_fact
  • object_audit_fact
  • batch_class_dim
  • user_dim
  • reject_note_dim
  • machine_dim
  • station_dim
  • group_value_dim

As of KTA 7.9:

  • field_changes_fact
Interaction with Field Retention

Field retention will remove data from field_accum_fact when older than the field retention period, and technically does not remove data from other field tables (field_dim, field_column_dim, field_aggregate_fact).  However, once the underlying field_accum_fact record is removed, then document retention will remove associated records from field_aggregate_fact, and any records no longer used by current data in field_dim and field_column_dim. 

Therefore even though these tables are technically acted on by the document retention process, they are still affected by the shorter field retention period: field_dim, field_column_dim, field_aggregate_fact.

Considerations for Setting a Short Document Data Retention Period

If a shorter period is used for document retention, then some of the same considerations for field data can apply to document data.  Essentially, if there is an outage longer than the retention period, then actions need to be taken to avoid a gap in reporting data:

Potential Actions to Take During a Reporting Processing Outage

Potential Actions to Take During an Insight/KAFTA Processing Outage

Level of Complexity 

Moderate

 

Applies to  

Product Version Build Environment Hardware
Kofax TotalAgility All      

 

 

  • Was this article helpful?