Question / Problem:
What is important to understand about the reporting field retention period setting in KTA?
Answer / Solution:
Reporting on fields can add up to a significant amount of data at scale, which it is why it is important to have a retention period to limit how long the data is kept. By default, KTA reporting data related to fields is configured to only be retained for 5 days. By contrast, document level reporting data is configured by default to be retained for 3650 days (ten years).
These settings can be adjusted in the KTA Designer under System > System Settings > Database, retention, and reporting > Reporting Server > Data retention period.
For KTA 7.5 and earlier: System Settings > Settings > General > Reporting > Field Retention Time.
Field Data in KTA Reporting
Which fields are included in reporting?
By default, all fields are included in reporting data. The amount of field data accumulated by reporting can be reduced by unchecking “Include in Analytics” on individual fields that are not needed for reporting. This option is visible when opening the Extraction Group in the KTA Designer (not Transformation Designer). Preventing unneeded field data from being collected can be beneficial for reducing the amount of data processed by the reporting services, as well as the total amount of data retained in the field_accum_fact table. This can be especially important if there is a need to increase the retention period.
When is data removed if it is older than the Field Retention Period?
At the end of a successful run of the Reporting system task. In other words, when the Reporting service is processing successfully. The Reporting system task defaults to running every minute.
What date is used for retention of field data?
The last reporting session that the field was involved in (field_accum_fact.dt_last_sess). For example, if a validation activity were opened, then that would be a session that included all of the fields of all of the documents in the activity.
What if reporting data is not able to be processed for a duration longer than the Field Retention Period?
When it is eventually able to process successfully, the retention process will happen as normal, meaning that data older than that period would be immediately deleted. For example, this would mean that the data would be deleted before a process like the KAFTA data load would be able to access it.
What Data is Removed
Technically the field data retention process only affects the field_accum_fact table, however because of the interaction with the document data retention process, the shorter field data retention period will also affect tables field_dim, field_column_dim, and field_aggregate_fact. So effectively the field retention period affects:
Potential Actions to Take During a Reporting Processing Outage
If reporting data is not able to be processed for a period that will be longer than the Field Retention Period, then this would lead to a gap in field data, because it will be removed by the retention process once processing resumes. If field data is needed, then to avoid a gap you would need to lengthen the Field Retention Period to longer than the duration of the outage, before allowing successful processing of reporting data.
- An issue occurs that prevents processing of reporting data (either the service is not running, or it is not processing successfully)
- By the time a solution will be available, 7 days will have passed. This is longer than the default Field Retention Period, but the field data has not been deleted yet, because it only happens during successful processing.
- Before the solution to the problem is implemented increase the Field Retention Period To 8 days or more.
- Once processing begins, the field data will not be lost because of the longer Field Retention Period.
However, it is simpler to just use a very high value such as 1000 than to try to determine an exact number of days for a particular issue.
Then once the backlog of data is processed, the Field Retention Period can be returned to the intended value, such as the default of 5 days.
Field Data in KAFTA
If field data only exists for the past five days, how are reporting solutions such as KAFTA supposed to report on historical data?
The Field Fact record in the KAFTA project stores the records it loads from the field_accum_fact table, for the timespan of the data load that is being performed. When KAFTA’s Hourly Execution Plan runs on a schedule, it executes from “Last successful load date” to “Current time, rounded to the beginning of hours.” This means that when it is running on schedule, it is always only storing the most recent hour.
What happens if KAFTA’s Hourly Execution Plan is not able to be processed for a duration longer than the Field Retention Period?
When it is eventually able to process successfully, it will only have field data for the duration of the Field Retention Period, leaving a gap for any time longer than that when the execution plan was not able to run.
Potential Actions to Take During an Insight/KAFTA Processing Outage
If KAFTA’s Hourly Execution Plan is not able to be processed for a period that will be longer than the Field Retention Period, then this would lead to a gap in field data, because the data older than that will no longer be present. If field data is needed, then to avoid a gap you would need to lengthen the Field Retention Period to longer than the duration of the outage, before the Field Retention Period has passed since the start of the outage.
- An issue occurs that prevents processing of KAFTA’s Execution Plans and the outage has not been solved after 4 days.
- To keep field data and avoid a gap, the default 5 day Field Retention Period need to be increased before the Field Retention Period has passed.
- The Field Retention Period needs continue to be increased to be larger than the duration of the outage
- Once the execution plans are able to run successfully, the field data will not be lost because of the longer Field Retention Period.