US-12619714-B2 - Coping with feature error suppression: a mechanism to handle the concept drift

US12619714B2US 12619714 B2US12619714 B2US 12619714B2US-12619714-B2

Abstract

Approaches herein relate to reconstructive models such as an autoencoder for anomaly detection. Herein are machine learning techniques that detect and suppress any feature that causes model decay by concept drift. In an embodiment in a production environment, a computer initializes an unsuppressed subset of features with a plurality of features that an already-trained reconstructive model can process. A respective reconstruction error of each feature of the unsuppressed subset of features is calculated. The computer detects that a respective moving average based on the reconstruction error of a particular feature of the unsuppressed subset of features exceeds a respective feature suppression threshold of the particular feature, which causes removal of the particular feature from the unsuppressed subset of features. After removing the particular feature from the unsuppressed subset of features, a loss of the reconstructive model is calculated based on respective reconstruction errors of the unsuppressed subset of features.

Inventors

SAEID ALLAHDADIAN
ANDREW BROWNSWORD
MILOS VASIC
MATTEO CASSERINI
Amin SUZANI
Hamed Ahmadi
Felix Schmidt
Nipun Agarwal

Assignees

ORACLE INTERNATIONAL CORPORATION

Dates

Publication Date: 20260505
Application Date: 20201215

Claims (18)

1 . A method comprising: initializing an unsuppressed subset plurality of features with a plurality of features that includes a particular feature; calculating, for a first feature vector that consists of a respective value for each feature in the plurality of features, a respective separate reconstruction error of a trained reconstructive model for each feature of the unsuppressed subset plurality of features; detecting that a moving average of reconstruction errors of the particular feature exceeds a feature suppression threshold that is exclusively for the particular feature; removing, in response to said detecting, the particular feature from the unsuppressed subset plurality of features; and calculating, for a second feature vector that consists of a respective value for each feature in the plurality of features after said removing the particular feature from the unsuppressed subset plurality of features, a loss of the trained reconstructive model based on a respective separate reconstruction error of each feature of the unsuppressed subset plurality of features excluding the particular feature; wherein the method is performed by one or more computers without retraining the trained reconstructive model.
2 . The method of claim 1 further comprising detecting an anomaly based on said calculating said loss of the trained reconstructive model after said removing the particular feature from the unsuppressed subset plurality of features.
3 . The method of claim 1 wherein said detecting comprises detecting an exponential moving average of reconstruction errors of the particular feature exceeds said feature suppression threshold of the particular feature.
4 . The method of claim 1 wherein said loss of the trained reconstructive model is not based on a feature that was removed from the unsuppressed subset plurality of features.
5 . The method of claim 1 wherein said calculating the separate reconstruction error of each feature of the unsuppressed subset plurality of features comprises calculating a respective amount of standard deviations of reconstruction error of said each feature.
6 . The method of claim 1 further comprising: generating the trained reconstructive model by training; applying the trained reconstructive model to a new dataset; detecting that a count of features removed from the unsuppressed subset plurality of features exceeds a retrain threshold.
7 . The method of claim 6 further comprising retraining the trained reconstructive model responsive to said detecting said count of features removed exceeds said retrain threshold.
8 . The method of claim 7 wherein: the method further comprises after said detecting said count of features removed exceeds said retrain threshold, applying the trained reconstructive model to an amount of new data; said retraining the trained reconstructive model is further responsive to detecting that said amount of new data exceeds a sufficiency threshold.
9 . The method of claim 1 further comprising: detecting that said moving average of reconstruction errors of the particular feature no longer exceeds said feature suppression threshold of the particular feature; adding, based on said detecting said moving average no longer exceeds said feature suppression threshold, the particular feature to the unsuppressed subset plurality of features.
10 . One or more non-transitory computer-readable media storing instructions that, when executed by one or more processors, cause without retraining a trained reconstructive model: initializing an unsuppressed subset plurality of features with a plurality of features that includes a particular feature; calculating, for a first feature vector that consists of a respective value for each feature in the plurality of features, a respective separate reconstruction error of the trained reconstructive model for each feature of the unsuppressed subset plurality of features; detecting that a moving average of reconstruction errors of the particular feature exceeds a feature suppression threshold that is exclusively for the particular feature; removing, in response to said detecting, the particular feature from the unsuppressed subset plurality of features; and calculating, for a second feature vector that consists of a respective value for each feature in the plurality of features after said removing the particular feature from the unsuppressed subset plurality of features, a loss of the trained reconstructive model based on a respective separate reconstruction error of each feature of the unsuppressed subset plurality of features excluding the particular feature.
11 . The one or more non-transitory computer-readable media of claim 10 wherein the instructions further cause detecting an anomaly based on said calculating said loss of the trained reconstructive model after said removing the particular feature from the unsuppressed subset plurality of features.
12 . The one or more non-transitory computer-readable media of claim 10 wherein said detecting comprises detecting an exponential moving average of reconstruction errors of the particular feature exceeds said feature suppression threshold of the particular feature.
13 . The one or more non-transitory computer-readable media of claim 10 wherein said loss of the trained reconstructive model is not based on a feature that was removed from the unsuppressed subset plurality of features.
14 . The one or more non-transitory computer-readable media of claim 10 wherein said calculating the separate reconstruction error of each feature of the unsuppressed subset plurality of features comprises calculating a respective amount of standard deviations of reconstruction error of said each feature.
15 . The one or more non-transitory computer-readable media of claim 10 wherein the instructions further cause: generating the trained reconstructive model by training; applying the trained reconstructive model to a new dataset; detecting that a count of features removed from the unsuppressed subset plurality of features exceeds a retrain threshold.
16 . The one or more non-transitory computer-readable media of claim 10 wherein the instructions further cause: detecting that said moving average of reconstruction errors of the particular feature no longer exceeds said feature suppression threshold of the particular feature; adding, based on said detecting said moving average no longer exceeds said feature suppression threshold, the particular feature to the unsuppressed subset plurality of features.
17 . The one or more non-transitory computer-readable media of claim 15 wherein the instructions further cause retraining the trained reconstructive model responsive to said detecting said count of features removed exceeds said retrain threshold.
18 . The one or more non-transitory computer-readable media of claim 17 wherein: the instructions further cause after said detecting said count of features removed exceeds said retrain threshold, applying the trained reconstructive model to an amount of new data; said retraining the trained reconstructive model is further responsive to detecting that said amount of new data exceeds a sufficiency threshold.

Description

FIELD OF THE INVENTION The present invention relates to reconstructive models such as an autoencoder for anomaly detection. Herein are machine learning techniques that detect and suppress any feature that causes model decay by concept drift. BACKGROUND Anomaly detection is an important tool with various use cases in security such as fraud detection and intrusion detection. A powerful approach recently employed in anomaly detection involves machine learning models such as reconstructive neural nets such as an autoencoder. Anomaly detection using such models can be categorized into two groups of existing techniques. In the first group, the entire dataset is used for model training, and the reconstructive model is used to detect anomalies inside that dataset. In the second group, the data is discretized into three parts: training, validation, and test. The reconstructive model is trained on the training dataset and then fine-tuned using the validation dataset. Subsequently the trained and validated model is applied to the test dataset. The test dataset can be a nearly infinite stream of data incoming to the reconstructive model. Although such models can be made somewhat robust to input noise, they cannot handle input data distribution changes, which is a phenomenon called concept drift or data drift. In one example, concept drift occurs when a series of inputs evolves over time such that later normal inputs have feature values that are no longer within the value range of earlier normal inputs such as training inputs. For example, a reconstructive model may be a classifier that learned to distinguish normal tree seedlings from normal weeds because seedlings are smaller than weeds. However over time, seedlings may grow to be a same size as weeds or grow into trees that are bigger than weeds, which may confuse the classifier. For example, the classifier may mistakenly classify an older seedling as a weed. Concept drift may have various causes. For example, seasonality may cause a normal temperature range to shift such that a normal temperature may be mistakenly detected as anomalous or an abnormal temperature may be mistakenly detected as non-anomalous. Seasonality may influence consumer preferences which may interfere with a predictive machine learning model for a supply chain or for behavioral advertisement targeting. Concept drift naturally happens in many or most real-world systems, where the system is inherently dynamic or dependent on dynamic parameters such as the evolving state of a server machine in time. Such shifts in data are usually inevitable because the system is dynamic and cannot be controlled or predicted in every aspect. Moreover, machine learning model performance is highly dependent on the distribution of previous data involved during a training phase. After fitting to training data, a machine learning model may be unable to handle concept drift. In industry, an existing solution for concept drift is usually to retrain the machine learning model with new data, in which case, the retrained model learns the new distribution of the data and can do well in detecting anomalies happening after the concept drift that necessitated retraining. However, a sufficient amount of data is needed after the concept drift in order to be able to retrain the model, which means, in some applications, several hours or days need to pass to be able to accumulate enough real data from a concept drift before retraining. In addition, retraining a machine learning model and redeploying it after a concept drift usually takes significant amount of time. In those ways, retraining may impose a somewhat prolonged service outage of a machine learning model. BRIEF DESCRIPTION OF THE DRAWINGS In the drawings: FIG. 1 is a block diagram that depicts an example computer that detects and suppresses, for a reconstructive model such as an autoencoder for anomaly detection, any feature that causes machine learning (ML) model decay by concept drift; FIG. 2 is a flow diagram that depicts an example computer process that can detect and suppress, for a reconstructive model, any feature that causes model decay by concept drift; FIG. 3 is a flow diagram that depicts example computer activities for monitoring and suppression of features; FIG. 4 is a flow diagram that depicts feature suppression affecting an example lifecycle of a reconstructive model; FIG. 5 is a block diagram that illustrates a computer system upon which an embodiment of the invention may be implemented; FIG. 6 is a block diagram that illustrates a basic software system that may be employed for controlling the operation of a computing system. DETAILED DESCRIPTION In the following description, for the purposes of explanation, numerous specific details are set forth in order to provide a thorough understanding of the present invention. It will be apparent, however, that the present invention may be practiced without these specific details. In other instances, well-known struc