US-12619225-B2 - Memory and compute-efficient unsupervised anomaly detection for intelligent edge processing

US12619225B2US 12619225 B2US12619225 B2US 12619225B2US-12619225-B2

Abstract

Systems, apparatuses, and methods include technology that identifies a first dataset that comprises a plurality of data values, and partitions the first dataset into a plurality of bins to generate a second dataset, where the second dataset is a compressed version of the first dataset. The technology randomly subsamples data associated with the first dataset to obtain groups of randomly subsampled data, and generates a plurality of decision tree models during an unsupervised learning process based on the groups of randomly subsampled data and the second dataset.

Inventors

Fei Su
Rita Chattopadhyay

Assignees

INTEL CORPORATION

Dates

Publication Date: 20260505
Application Date: 20220518

Claims (20)

1 . A computing system comprising: a host processor; a hardware accelerator coupled to the host processor; and a memory coupled to the hardware accelerator and the host processor, the memory including a set of executable program instructions, which when executed by one or more of the host processor or the hardware accelerator, cause the computing system to: identify a first dataset that comprises a plurality of data values, partition the first dataset into a plurality of bins to generate a second dataset, wherein the second dataset is a compressed version of the first dataset, randomly subsample data associated with the first dataset to obtain groups of randomly subsampled data, identify a first group of data from the groups of randomly subsampled data, wherein the first group of data is associated with a plurality of bin count values from the second dataset, identify that that first data of the first group of data is associated with a first bin count value of the plurality of bin count values, identify that second data of the first group of data is associated with a second bin count value of the plurality of bin count values, wherein the second bin count value is greater than the first bin count value, and bypass the second data and select the first data for an establishment of a split point in a first decision tree model of the plurality of decision tree models based on the second bin count value being greater than the first bin count value, and generate a plurality of decision tree models during an unsupervised learning process based on the groups of randomly subsampled data and the second dataset.
2 . The computing system of claim 1 , wherein the executable program instructions, when executed, cause the computing system to: identify a first group of randomly subsampled data of the groups of randomly subsampled data; identify a rank of data intensities from the second dataset based on the first group of randomly subsampled data; select first data of the first group of randomly sub sampled data based on the rank of data intensities; and establish a split point in a first decision tree model of the plurality of decision tree models based on the first data.
3 . The computing system of claim 2 , wherein: the data intensities are bin count values identified from the second dataset, wherein the bin count values are associated with the first group of randomly subsampled data; and the first dataset includes process values associated with physical machinery.
4 . The computing system of claim 1 , wherein the executable program instructions, when executed, cause the computing system to: execute, with the plurality of decision tree models, an inference process on inference data to generate anomaly estimations; and determine whether the inference data is an anomaly based on the anomaly estimations.
5 . The computing system of claim 4 , wherein the anomaly estimations are path lengths associated with a classification of the inference data with the plurality of decision tree models during the inference process, further wherein to determine whether the inference data is the anomaly, the executable program instructions, when executed, further cause the computing system to: average the path lengths to generate an average path length; determine that the inference data is the anomaly in response to the average path length being below a threshold; and determine that the inference data is not the anomaly in response to the average path length meeting the threshold.
6 . A semiconductor apparatus comprising: one or more substrates; and logic coupled to the one or more substrates, wherein the logic is implemented in one or more of configurable logic or fixed-functionality logic hardware, the logic coupled to the one or more substrates to: identify a first dataset that comprises a plurality of data values, partition the first dataset into a plurality of bins to generate a second dataset, wherein the second dataset is a compressed version of the first dataset, randomly subsample data associated with the first dataset to obtain groups of randomly subsampled data, identify a first group of data from the groups of randomly subsampled data, wherein the first group of data is associated with a plurality of bin count values from the second dataset, identify that that first data of the first group of data is associated with a first bin count value of the plurality of bin count values, identify that second data of the first group of data is associated with a second bin count value of the plurality of bin count values, wherein the second bin count value is greater than the first bin count value, bypass the second data and select the first data for an establishment of a split point in a first decision tree model of the plurality of decision tree models based on the second bin count value being greater than the first bin count value, and generate a plurality of decision tree models during an unsupervised learning process based on the groups of randomly subsampled data and the second dataset.
7 . The apparatus of claim 6 , wherein the logic coupled to the one or more substrates is to: identify a first group of randomly subsampled data of the groups of randomly subsampled data; identify a rank of data intensities from the second dataset based on the first group of randomly subsampled data; select first data of the first group of randomly subsampled data based on the rank of data intensities; and establish a split point in a first decision tree model of the plurality of decision tree models based on the first data.
8 . The apparatus of claim 7 , wherein: the data intensities are bin count values identified from the second dataset, wherein the bin count values are associated with the first group of randomly subsampled data; and the first dataset includes process values associated with physical machinery.
9 . The apparatus of claim 6 , wherein the logic coupled to the one or more substrates is to: execute, with the plurality of decision tree models, an inference process on inference data to generate anomaly estimations; and determine whether the inference data is an anomaly based on the anomaly estimations.
10 . The apparatus of claim 9 , wherein the anomaly estimations are path lengths associated with a classification of the inference data with the plurality of decision tree models during the inference process, further wherein to determine whether the inference data is the anomaly, the logic coupled to the one or more substrates is to: average the path lengths to generate an average path length; determine that the inference data is the anomaly in response to the average path length being below a threshold; and determine that the inference data is not the anomaly in response to the average path length meeting the threshold.
11 . The apparatus of claim 6 , wherein the logic coupled to the one or more substrates includes transistor channel regions that are positioned within the one or more substrates.
12 . At least one non-transitory computer readable storage medium comprising a set of executable program instructions, which when executed by a computing system, cause the computing system to: identify a first dataset that comprises a plurality of data values; partition the first dataset into a plurality of bins to generate a second dataset, wherein the second dataset is a compressed version of the first dataset; randomly subsample data associated with the first dataset to obtain groups of randomly subsampled data; identify a first group of data from the groups of randomly subsampled data, wherein the first group of data is associated with a plurality of bin count values from the second dataset; identify that first data of the first group of data is associated with a first bin count value of the plurality of bin count values; identify that second data of the first group of data is associated with a second bin count value of the plurality of bin count values, wherein the second bin count value is greater than the first bin count value; and bypass the second data and select the first data for an establishment of a split point in a first decision tree model of the plurality of decision tree models based on the second bin count value being greater than the first bin count value; and generate a plurality of decision tree models during an unsupervised learning process based on the groups of randomly subsampled data and the second dataset.
13 . The at least one non-transitory computer readable storage medium of claim 12 , wherein the instructions, when executed, further cause the computing system to: identify a first group of randomly subsampled data of the groups of randomly subsampled data; identify a rank of data intensities from the second dataset based on the first group of randomly subsampled data; select first data of the first group of randomly subsampled data based on the rank of data intensities; and establish a split point in a first decision tree model of the plurality of decision tree models based on the first data.
14 . The at least one non-transitory computer readable storage medium of claim 13 , wherein: the data intensities are bin count values identified from the second dataset, wherein the bin count values are associated with the first group of randomly subsampled data; and the first dataset includes process values associated with physical machinery.
15 . The at least one non-transitory computer readable storage medium of claim 12 , wherein the instructions, when executed, further cause the computing system to: execute, with the plurality of decision tree models, an inference process on inference data to generate anomaly estimations; and determine whether the inference data is an anomaly based on the anomaly estimations.
16 . The at least one non-transitory computer readable storage medium of claim 15 , wherein the anomaly estimations are path lengths associated with a classification of the inference data with the plurality of decision tree models during the inference process, wherein the instructions, when executed, further cause the computing system to: average the path lengths to generate an average path length; determine that the inference data is the anomaly in response to the average path length being below a threshold; and determine that the inference data is not the anomaly in response to the average path length meeting the threshold.
17 . A method comprising: identifying a first dataset that comprises a plurality of data values; partitioning the first dataset into a plurality of bins to generate a second dataset, wherein the second dataset is a compressed version of the first dataset; randomly subsampling data associated with the first dataset to obtain groups of randomly subsampled data; identifying a first group of data from the groups of randomly subsampled data, wherein the first group of data is associated with a plurality of bin count values from the second dataset; identifying that that first data of the first group of data is associated with a first bin count value of the plurality of bin count values; identifying that second data of the first group of data is associated with a second bin count value of the plurality of bin count values, wherein the second bin count value is greater than the first bin count value; bypassing the second data and selecting the first data for an establishment of a split point in a first decision tree model of the plurality of decision tree models based on the second bin count value being greater than the first bin count value; and generating a plurality of decision tree models during an unsupervised learning process based on the groups of randomly subsampled data and the second dataset.
18 . The method of claim 17 , further comprising: identifying a first group of randomly subsampled data of the groups of randomly subsampled data; identifying a rank of data intensities from the second dataset based on the first group of randomly subsampled data; selecting first data of the first group of randomly subsampled data based on the rank of data intensities; and establishing a split point in a first decision tree model of the plurality of decision tree models based on the first data.
19 . The method of claim 18 , wherein: the data intensities are bin count values identified from the second dataset, wherein the bin count values are associated with the first group of randomly subsampled data; and the first dataset includes process values associated with physical machinery.
20 . The method of claim 17 , further comprising: executing, with the plurality of decision tree models, an inference process on inference data to generate anomaly estimations; and determining whether the inference data is an anomaly based on the anomaly estimations.

Description

TECHNICAL FIELD Embodiments generally relate to anomaly detection. More particularly, embodiments relate to optimal data quantization to generate an ensemble of random decision trees for anomaly detection, and anomaly inference with the random decision trees. BACKGROUND Anomaly detection is used in a variety of fields to detect atypical behavior. Atypical behavior of a system may indicate that the system is potentially failing and/or executing in a sub-optimal state. Anomaly detection may consume significant compute and memory resources. Thus, certain devices may be unable to implement anomaly detection resulting in reduced efficiency and higher failure rates. Furthermore, anomaly detection may be challenging to implement with neural networks due to the varied types of anomalies and the lack of labeled datasets available for training. BRIEF DESCRIPTION OF THE DRAWINGS The various advantages of the embodiments will become apparent to one skilled in the art by reading the following specification and appended claims, and by referencing the following drawings, in which: FIGS. 1A and 1B are a process of an example of an unsupervised anomaly detection training process according to an embodiment; FIG. 2 is a process of an example of an unsupervised inference process according to an embodiment; FIG. 3 is a flowchart of an example of a method of training a plurality of decision tree models processing according to an embodiment; FIG. 4 is a flowchart of an example of a method of generating decision trees processing according to an embodiment; FIG. 5 is a flowchart of an example of a method of executing inference with a plurality of decision trees processing according to an embodiment; FIG. 6 is a block diagram of an example of an efficiency-enhanced and performance-enhanced training and inference computing system according to an embodiment; FIG. 7 is an illustration of an example of a semiconductor apparatus according to an embodiment; FIG. 8 is a block diagram of an example of a processor according to an embodiment; and FIG. 9 is a block diagram of an example of a multi-processor based computing system according to an embodiment. DESCRIPTION OF EMBODIMENTS Turning now to FIGS. 1A and 1B, embodiments herein relate to an unsupervised anomaly detection training process 100 (e.g., machine learning without labeled data) that is optimized for memory and compute efficiency. As such, the unsupervised anomaly detection training process 100 may execute in resource constrained devices (e.g., edge devices) that previously may have been unable to execute unsupervised training in a practical manner. Embodiments include an application of an optimal data quantization to an ensemble of random decision trees for anomaly detection, thus highly reducing the training samples (i.e., search space) and achieving anomaly detection performance with a more efficient (e.g., leaner and smaller) model than other designs. Thus, embodiments combine decision tree based ensemble learning and the optimal data quantization to generate an efficient memory and compute-efficient anomaly detector. Initially in FIG. 1A, a first dataset 102 is provided. The first dataset 102 may comprise features associated with one or more objects. The objects may be varied (e.g., system, computer program, vehicle, etc.). The features may be measurable properties of the object. The features may have different values and are associated with anomaly detection (e.g., certain values of the features indicate an anomaly). In some examples, the first dataset 102 may comprises different values of the feature over a period of time. Examples of the first dataset 102 include process values associated with physical machinery such as temperature, pressure, humidity, air or water flow rate etc. (e.g., the process values) measured over a period of time to monitor health of fan, pump or a compressor (e.g., the physical machinery). These features may have a certain value under normal operation and one or all may change under any anomaly or failure. The unsupervised anomaly detection training process 100 includes an optimal data quantization technique that is applied to preprocess the data (on all feature dimensions). That is, the process 100 partitions the first dataset 102 into a plurality of bins to generate a second dataset 104. For example, the process 100 executes data discretization. In the illustrated example, a second dataset 106 (e.g., a histogram) is created for the first dataset 102 by performing data discretization using an identified bin size of 10. The bin size is an adjustable feature which may be adjusted for different types of data and based on various criteria. The first dataset 102 is an array of numerical data, which contains 168 total data elements with values ranging between 0 and 99. The entire range of values of the first dataset 102 (from 0 to 99) is broken down into intervals of 10, and each interval is represented by a separate bin, resulting in a total of 10 bin