US-12619753-B2 - Dynamic time-based data access policy definition and enforcement
Abstract
Data access requests are grouped into a plurality of clusters according to an activity count within each of a plurality of time periods. Based on a time separating two clusters in the plurality of clusters, a time window in which a data access policy applies is defined. Using the plurality of data access requests, a forecasting model is trained to predict a volume of future data access requests, the training resulting in a trained forecasting model. A data access policy effective during the time window and conditioned on a threshold related to the volume of future data access requests is generated. Responsive to determining that a new data request meets a criterion of the data access policy, processing of the new data request is allowed.
Inventors
- Sai Sree Laya Chukkapalli
- Shriti Priya
- Julian James Stephen
- Arjun Natarajan
Assignees
- INTERNATIONAL BUSINESS MACHINES CORPORATION
Dates
- Publication Date
- 20260505
- Application Date
- 20230510
Claims (17)
- 1 . A computer-implemented method comprising: grouping, into a plurality of clusters according to an activity count within each of a plurality of time periods, a plurality of data access requests; defining, based on a time midway between boundaries separating two clusters in the plurality of clusters, a boundary for a time window in which a data access policy applies; training, using the plurality of data access requests, a forecasting model to predict a volume of future data access requests, the training resulting in a trained forecasting model; defining, based on a predicted volume of future data access requests, a threshold at which the data access policy applies generating the data access policy effective during the time window and conditioned on the threshold; allowing, responsive to determining that a new data request meets a criterion of the data access policy, processing of the new data request; and disallowing, responsive to determining that the new data request does not meet the criterion of the data access policy, processing of the new data request.
- 2 . The computer-implemented method of claim 1 , wherein the forecasting model comprises a long short-term memory.
- 3 . The computer-implemented method of claim 1 , further comprising: determining, subsequent to the training, that a prediction error of the trained forecasting model is above a first error threshold; retraining, using a second plurality of data access requests, the trained forecasting model to predict a second volume of future data access requests; and adjusting, to be conditioned on a second threshold related to the second volume of future data access requests, the data access policy.
- 4 . The computer-implemented method of claim 1 , further comprising: determining that a volume of data access requests disallowed for not meeting a criterion of the data access policy exceeds a second error threshold; and adjusting, based on a second time separating two clusters in a second plurality of clusters, the time window in which the data access policy applies, the second plurality of clusters formed using a third plurality of data access requests.
- 5 . The computer-implemented method of claim 1 , further comprising: determining that a volume of data access requests disallowed for not meeting a criterion of the data access policy exceeds a third error threshold; retraining, using a third plurality of data access requests, the trained forecasting model to predict a third volume of future data access requests; and adjusting, to be conditioned on a second threshold related to the third volume of future data access requests, the data access policy.
- 6 . A computer program product comprising one or more computer readable storage medium, and program instructions collectively stored on the one or more computer readable storage medium, the program instructions executable by a processor to cause the processor to perform operations comprising: grouping, into a plurality of clusters according to an activity count within each of a plurality of time periods, a plurality of data access requests; defining, based on a time midway between boundaries separating two clusters in the plurality of clusters, a boundary for a time window in which a data access policy applies; training, using the plurality of data access requests, a forecasting model to predict a volume of future data access requests, the training resulting in a trained forecasting model; defining, based on a predicted volume of future data access requests, a threshold at which the data access policy applies; generating the data access policy effective during the time window and conditioned on the threshold; allowing, responsive to determining that a new data request meets a criterion of the data access policy, processing of the new data request; and disallowing, responsive to determining that the new data request does not meet the criterion of the data access policy, processing of the new data request.
- 7 . The computer program product of claim 6 , wherein the stored program instructions are stored in a computer readable storage device in a data processing system, and wherein the stored program instructions are transferred over a network from a remote data processing system.
- 8 . The computer program product of claim 6 , wherein the stored program instructions are stored in a computer readable storage device in a server data processing system, and wherein the stored program instructions are downloaded in response to a request over a network to a remote data processing system for use in a computer readable storage device associated with the remote data processing system, further comprising: program instructions to meter use of the program instructions associated with the request; and program instructions to generate an invoice based on the metered use.
- 9 . The computer program product of claim 6 , wherein the forecasting model comprises a long short-term memory.
- 10 . The computer program product of claim 6 , further comprising: determining, subsequent to the training, that a prediction error of the trained forecasting model is above a first error threshold; retraining, using a second plurality of data access requests, the trained forecasting model to predict a second volume of future data access requests; and adjusting, to be conditioned on a second threshold related to the second volume of future data access requests, the data access policy.
- 11 . The computer program product of claim 6 , further comprising: determining that a volume of data access requests disallowed for not meeting a criterion of the data access policy exceeds a second error threshold; and adjusting, based on a second time separating two clusters in a second plurality of clusters, the time window in which the data access policy applies, the second plurality of clusters formed using a third plurality of data access requests.
- 12 . The computer program product of claim 6 , further comprising: determining that a volume of data access requests disallowed for not meeting a criterion of the data access policy exceeds a third error threshold; retraining, using a third plurality of data access requests, the trained forecasting model to predict a third volume of future data access requests; and adjusting, to be conditioned on a second threshold related to the third volume of future data access requests, the data access policy.
- 13 . A computer system comprising a processor and one or more computer readable storage media, and program instructions collectively stored on the one or more computer readable storage media, the program instructions executable by the processor to cause the processor to perform operations comprising: grouping, into a plurality of clusters according to an activity count within each of a plurality of time periods, a plurality of data access requests; defining, based on a time midway between boundaries separating two clusters in the plurality of clusters, a boundary for a time window in which a data access policy applies; training, using the plurality of data access requests, a forecasting model to predict a volume of future data access requests, the training resulting in a trained forecasting model; defining, based on a predicted volume of future data access requests, a threshold at which the data access policy applies; generating the data access policy effective during the time window and conditioned on the threshold; allowing, responsive to determining that a new data request meets a criterion of the data access policy, processing of the new data request; and disallowing, responsive to determining that the new data request does not meet the criterion of the data access policy, processing of the new data request.
- 14 . The computer system of claim 13 , wherein the forecasting model comprises a long short-term memory.
- 15 . The computer system of claim 13 , further comprising: determining, subsequent to the training, that a prediction error of the trained forecasting model is above a first error threshold; retraining, using a second plurality of data access requests, the trained forecasting model to predict a second volume of future data access requests; and adjusting, to be conditioned on a second threshold related to the second volume of future data access requests, the data access policy.
- 16 . The computer system of claim 13 , further comprising: determining that a volume of data access requests disallowed for not meeting a criterion of the data access policy exceeds a second error threshold; and adjusting, based on a second time separating two clusters in a second plurality of clusters, the time window in which the data access policy applies, the second plurality of clusters formed using a third plurality of data access requests.
- 17 . The computer system of claim 13 , further comprising: determining that a volume of data access requests disallowed for not meeting a criterion of the data access policy exceeds a third error threshold; retraining, using a third plurality of data access requests, the trained forecasting model to predict a third volume of future data access requests; and adjusting, to be conditioned on a second threshold related to the third volume of future data access requests, the data access policy.
Description
BACKGROUND The present invention relates generally to a method, system, and computer program product for data access policy management. More particularly, the present invention relates to a method, system, and computer program product for dynamic time-based data access policy definition and enforcement. A data access policy enforces controls on access to data. Some data access policies are put in place for security reasons-for example, limiting or disallowing access to data from a system outside a company's firewall. Other data access policies are put in place for privacy reasons—for example, limiting or disallowing access to personally identifiable (PII) data, except by a small subset of specially trained individuals who require such access to perform their jobs. A time-based data access policy, also called a temporal data access policy enforces a data access control that is related to time. A data request subject to a data access policy includes data with one or more attributes, and a data request subject to a time-based data access policy includes data with one or more temporal attributes. For example, some temporal attributes might be a time at which a user logs into a system, a time at which an application invokes an application programming interface (API) of another application, a time at which a database is read or written, and a time at which an application is deployed for use by other applications or a user. For example, a time-based data access policy might specify a maximum access rate for an asset (e.g., maximum 100 downloads per day), a number of accesses (e.g., only 50 simultaneous access for PII data), or access time windows (e.g., only 9 am-5 pm on weekdays). The illustrative embodiments recognize that determining the parameters of a time-based data access control policy, including time windows when a policy applies and the thresholds that trigger a particular policy, relies on distinguishing a normal from an abnormal condition. For example, a hundred user logins to an office computer system between 8 and 8:15 am on weekdays, when most users begin work, might be considered normal and thus allowed by a data access policy, but a hundred user logins between 3 and 3:15 am, when almost no one typically works, might be an indicator of an attack on the system and thus should be disallowed by the data access policy. However, distinguishing a normal from an abnormal condition is difficult. Timestamps can have many different values. For example, if daily user login times are being recorded with millisecond granularity, there can be as many values of data with a daily user login attribute as there are milliseconds in a day. Further, a data access policy typically controls hundreds or thousands of systems, each generating timestamped data. Thus, data policy officers often rely on ad hoc rules of thumb (e.g., no one should be working at 3 am, so only a very few user logins per minute are permitted then), which can be overprotective (hindering legitimate activity) or insufficiently protective (allowing illegitimate activity). In addition, usage patterns can change, necessitating a policy update, but such pattern changes are often not apparent until users are wrongly denied access and complain. Thus, the illustrative embodiments recognize that there is a need to dynamically determine the parameters of a time-based data access control policy, including time windows when a policy applies and the thresholds that trigger a particular policy data, based on analysis of actual data, and update the policy automatically as conditions change. SUMMARY The illustrative embodiments provide a method, system, and computer program product. An embodiment includes a method that groups, into a plurality of clusters according to an activity count within each of a plurality of time periods, a plurality of data access requests. An embodiment defines, based on a time separating two clusters in the plurality of clusters, a time window in which a data access policy applies. An embodiment trains, using the plurality of data access requests, a forecasting model to predict a volume of future data access requests, the training resulting in a trained forecasting model. An embodiment generates a data access policy effective during the time window and conditioned on a threshold related to the volume of future data access requests. An embodiment allows, responsive to determining that a new data request meets a criterion of the data access policy, processing of the new data request. Thus, an embodiment provides a method for dynamic time-based data access policy definition and enforcement. Another embodiment further comprises disallowing, responsive to determining that a new data request does not meet a criterion of the data access policy, processing of the new data request. Thus, the embodiment provides additional enforcement of a data access policy. In another embodiment, the forecasting model comprises a long short-term memory. Thus, the