Search

US-20260127155-A1 - DATA RETENTION MANAGEMENT BASED ON CONSISTENCY OF TABLE GROWTH AND DATA PURGING ACTIVITY

US20260127155A1US 20260127155 A1US20260127155 A1US 20260127155A1US-20260127155-A1

Abstract

Techniques are provided for data retention management based on consistency of table growth and data purging activity. One method comprises obtaining changes in size over time of a database table; predicting, using a regression model, an estimated table size of the database table; evaluating data features that characterize an error associated with the prediction of the estimated table size of the database table by the regression model; characterizing a consistency of growth of the database table based on the error of the prediction; characterizing a consistency of data purging activities in a database system comprising the database table; and generating data retention recommendations for the database table based on (i) the characterization of the consistency of growth of the database table and (ii) the characterization of the consistency of the data purging activities; and initiating an automated action based on the data retention recommendations for the database table.

Inventors

  • Beng Kiong Cheah
  • Shu Hsien Lee
  • Lead Ta Choo

Assignees

  • DELL PRODUCTS L.P.

Dates

Publication Date
20260507
Application Date
20241107

Claims (20)

  1. 1 . A computer-implemented method, comprising: accessing, using at least one processing device, one or more data structures comprising information characterizing changes in size over time of one or more database tables; predicting, using at least one processor-based regression model, an estimated table size of at least a given one of the one or more database tables; evaluating one or more data features that characterize an error associated with the prediction of the estimated table size of at least the given database table by the at least one processor-based regression model; characterizing, using at least one processor-based clustering model, a consistency of a growth of at least the given database table based at least in part on the error associated with the prediction; characterizing a consistency of one or more data purging activities in at least one database system comprising at least the given database table; automatically generating, using the at least one processing device, responsive to the consistency of growth characterization, one or more data retention recommendations for at least the given database table based at least in part on (i) the characterization of the consistency of growth of at least the given database table and (ii) the characterization of the consistency of the one or more data purging activities; and initiating one or more automated actions based at least in part on the one or more data retention recommendations for at least the given database table, wherein the one or more automated actions comprise at least one of: purging unused data associated with the at least one database table, purging obsolete data associated with the at least one database table, adjusting a suitable retention period for the at least one table, adjusting an automated purging frequency for the at least one table, adjusting an automated purging interval for the at least one table and adjusting an automated purging program for the at least one table; wherein the method is performed by at least one processing device comprising a processor coupled to a memory.
  2. 2 . The computer-implemented method of claim 1 , wherein the at least one regression model comprises one or more linear regression models.
  3. 3 . The computer-implemented method of claim 1 , wherein the one or more data features that characterize the error associated with the prediction comprise at least (i) a first mean average error data feature between an actual size over time of the at least one database table and an estimated size over time of the at least one database table and (i) a second mean average error data feature between a merged list of the actual size of the at least one database table for at least one designated time interval and a merged list of the estimated size of the at least one database table for the at least one designated time interval.
  4. 4 . The computer-implemented method of claim 1 , wherein the characterization of the consistency of growth of the at least one database table further comprises processing the one or more data features for the at least one database table to cluster the at least one database table into a first cluster associated with consistent table growth or a second cluster associated with inconsistent table growth.
  5. 5 . The computer-implemented method of claim 1 , wherein the characterization of the consistency of the one or more data purging activities evaluates whether a frequency of purging activities for the at least one database table is performed within a designed purging interval.
  6. 6 . The computer-implemented method of claim 1 , further comprising assigning the at least one database table to at least one of a plurality of categories based at least in part on (i) the characterization of the consistency of growth of the at least one database table and (ii) the characterization of the consistency of the one or more data purging activities, and wherein the generating the one or more data retention recommendations for the at least one database table comprises obtaining one or more designated data retention recommendations associated with the assigned at least one category.
  7. 7 . The computer-implemented method of claim 1 , further comprising evaluating a data access report to identify one or more of unused data and orphaned data and recommending a purging of the identified one or more of the unused data and the orphaned data.
  8. 8 . The computer-implemented method of claim 1 , wherein the one or more automated actions comprise at least one of: generating one or more notifications related to the one or more data retention recommendations; generating one or more signals related to the one or more data retention recommendations; providing information characterizing the one or more data retention recommendations to a display system; and controlling a performance of at least one action in another system using the one or more data retention recommendations.
  9. 9 . An apparatus comprising: at least one processing device comprising a processor coupled to a memory; the at least one processing device being configured to implement the following steps: accessing, using at least one processing device, one or more data structures comprising information characterizing changes in size over time of one or more database tables; predicting, using at least one processor-based regression model, an estimated table size of at least a given one of the one or more database tables; evaluating one or more data features that characterize an error associated with the prediction of the estimated table size of at least the given database table by the at least one processor-based regression model; characterizing, using at least one processor-based clustering model, a consistency of a growth of at least the given database table based at least in part on the error associated with the prediction; characterizing a consistency of one or more data purging activities in at least one database system comprising at least the given database table; automatically generating, using the at least one processing device, responsive to the consistency of growth characterization, one or more data retention recommendations for at least the given database table based at least in part on (i) the characterization of the consistency of growth of at least the given database table and (ii) the characterization of the consistency of the one or more data purging activities; and initiating one or more automated actions based at least in part on the one or more data retention recommendations for at least the given database table, wherein the one or more automated actions comprise at least one of: purging unused data associated with the at least one database table, purging obsolete data associated with the at least one database table, adjusting a suitable retention period for the at least one table, adjusting an automated purging frequency for the at least one table, adjusting an automated purging interval for the at least one table and adjusting an automated purging program for the at least one table.
  10. 10 . The apparatus of claim 9 , wherein the characterization of the consistency of growth of the at least one database table further comprises processing the one or more data features for the at least one database table to cluster the at least one database table into a first cluster associated with consistent table growth or a second cluster associated with inconsistent table growth.
  11. 11 . The apparatus of claim 9 , wherein the characterization of the consistency of the one or more data purging activities evaluates whether a frequency of purging activities for the at least one database table is performed within a designed purging interval.
  12. 12 . The apparatus of claim 9 , further comprising assigning the at least one database table to at least one of a plurality of categories based at least in part on (i) the characterization of the consistency of growth of the at least one database table and (ii) the characterization of the consistency of the one or more data purging activities, and wherein the generating the one or more data retention recommendations for the at least one database table comprises obtaining one or more designated data retention recommendations associated with the assigned at least one category.
  13. 13 . The apparatus of claim 9 , further comprising evaluating a data access report to identify one or more of unused data and orphaned data and recommending a purging of the identified one or more of the unused data and the orphaned data.
  14. 14 . The apparatus of claim 9 , wherein the one or more automated actions comprise at least one of: generating one or more notifications related to the one or more data retention recommendations; generating one or more signals related to the one or more data retention recommendations; providing information characterizing the one or more data retention recommendations to a display system; and controlling a performance of at least one action in another system using the one or more data retention recommendations.
  15. 15 . A non-transitory processor-readable storage medium having stored therein program code of one or more software programs, wherein the program code when executed by at least one processing device causes the at least one processing device to perform the following steps: obtaining accessing, using at least one processing device, one or more data structures comprising information characterizing changes in size over time of one or more database tables; predicting, using at least one processor-based regression model, an estimated table size of at least a given one of the one or more database tables; evaluating one or more data features that characterize an error associated with the prediction of the estimated table size of at least the at least one given database table by the at least one processor-based regression model; characterizing, using at least one processor-based clustering model, a consistency of a growth of the at least one the given database table based at least in part on the error associated with the prediction; characterizing a consistency of one or more data purging activities in at least one database system comprising the at least one the given database table; automatically generating, using the at least one processing device, responsive to the consistency of growth characterization, one or more data retention recommendations for the at least one the given database table based at least in part on (i) the characterization of the consistency of growth of the at least one the given database table and (ii) the characterization of the consistency of the one or more data purging activities; and initiating one or more automated actions based at least in part on the one or more data retention recommendations for the at least one the given database table, wherein the one or more automated actions comprise at least one of: purging unused data associated with the at least one database table, purging obsolete data associated with the at least one database table, adjusting a suitable retention period for the at least one table, adjusting an automated purging frequency for the at least one table, adjusting an automated purging interval for the at least one table and adjusting an automated purging program for the at least one table.
  16. 16 . The non-transitory processor-readable storage medium of claim 15 , wherein the one or more data features that characterize the error associated with the prediction comprise at least (i) a first mean average error data feature between an actual size over time of the at least one database table and an estimated size over time of the at least one database table and (i) a second mean average error data feature between a merged list of the actual size of the at least one database table for at least one designated time interval and a merged list of the estimated size of the at least one database table for the at least one designated time interval.
  17. 17 . The non-transitory processor-readable storage medium of claim 15 , wherein the characterization of the consistency of growth of the at least one database table further comprises processing the one or more data features for the at least one database table to cluster the at least one database table into a first cluster associated with consistent table growth or a second cluster associated with inconsistent table growth.
  18. 18 . The non-transitory processor-readable storage medium of claim 15 , wherein the characterization of the consistency of the one or more data purging activities evaluates whether a frequency of purging activities for the at least one database table is performed within a designed purging interval.
  19. 19 . The non-transitory processor-readable storage medium of claim 15 , further comprising assigning the at least one database table to at least one of a plurality of categories based at least in part on (i) the characterization of the consistency of growth of the at least one database table and (ii) the characterization of the consistency of the one or more data purging activities, and wherein the generating the one or more data retention recommendations for the at least one database table comprises obtaining one or more designated data retention recommendations associated with the assigned at least one category.
  20. 20 . The non-transitory processor-readable storage medium of claim 15 , further comprising evaluating a data access report to identify one or more of unused data and orphaned data and recommending a purging of the identified one or more of the unused data and the orphaned data.

Description

BACKGROUND Data purging is a technique to remove data from a storage system or database in order to reclaim storage space, reduce security risks and/or improve data processing. Data purging techniques often employ data retention policies to identify data that may be purged. SUMMARY Illustrative embodiments of the disclosure provide techniques for data retention management based on consistency of table growth and data purging activity. One method includes obtaining information characterizing changes in size over time of one or more database tables; predicting, using at least one regression model, an estimated table size of at least one of the one or more database tables; evaluating one or more data features that characterize an error associated with the prediction of the estimated table size of the at least one database table by the at least one regression model; characterizing a consistency of a growth of the at least one database table based at least in part on the error associated with the prediction; characterizing a consistency of one or more data purging activities in at least one database system comprising the at least one database table; generating one or more data retention recommendations for the at least one database table based at least in part on (i) the characterization of the consistency of growth of the at least one database table and (ii) the characterization of the consistency of the one or more data purging activities; and initiating one or more automated actions based at least in part on the one or more data retention recommendations for the at least one database table. Illustrative embodiments can provide significant advantages relative to conventional techniques. For example, technical problems related to such conventional techniques are mitigated in one or more embodiments by evaluating a consistency of database table growth and a consistency of data purging activities to generate data retention recommendations. These and other illustrative embodiments described herein include, without limitation, methods, apparatus, systems, and computer program products comprising processor-readable storage media. BRIEF DESCRIPTION OF THE DRAWINGS FIG. 1 illustrates an information processing system configured for data retention management based on consistency of table growth and data purging activity in accordance with an illustrative embodiment; FIG. 2 illustrates a number of representative table growth and data purging consistency categories in accordance with an illustrative embodiment; FIG. 3 is a flow diagram illustrating an exemplary implementation of a multi-phase process for data retention management based on consistency of table growth and data purging activity in accordance with an illustrative embodiment; FIG. 4 is a sample table illustrating a representative set of data features for evaluating a consistency of database table growth in accordance with an illustrative environment; FIG. 5 is a flow diagram illustrating an exemplary implementation of a process for evaluating a consistency of table growth in accordance with an illustrative embodiment; FIG. 6 is a flow diagram illustrating an exemplary implementation of a process for evaluating a consistency of purging activity in accordance with an illustrative embodiment; FIG. 7 is a flow diagram illustrating an exemplary implementation of a process used by the process of FIG. 6 to determine whether purging activities are performed with a consistent time interval in accordance with an illustrative embodiment; FIG. 8 is a sample table illustrating exemplary category-based data retention recommendations in accordance with an illustrative environment; FIG. 9 is a flow diagram illustrating an exemplary implementation of a process for data retention management based on consistency of table growth and data purging activity in accordance with an illustrative embodiment; FIG. 10 illustrates an exemplary processing platform that may be used to implement at least a portion of one or more embodiments of the disclosure comprising a cloud infrastructure; and FIG. 11 illustrates another exemplary processing platform that may be used to implement at least a portion of one or more embodiments of the disclosure. DETAILED DESCRIPTION Illustrative embodiments of the present disclosure will be described herein with reference to exemplary communication, storage and processing devices. It is to be appreciated, however, that the disclosure is not restricted to use with the particular illustrative configurations shown. One or more embodiments of the disclosure provide methods, apparatus and computer program products for data retention management based on consistency of table growth and data purging activity. One or more aspects of the disclosure recognize that inadequate or improper data purging can result in a proliferation of unused data in database tables, for example. An accumulation of such unused data may result in wasted storage space, increased storage costs