CN-122021817-A - Abnormality detection method, electronic device, medium, and computer program product
Abstract
The embodiment discloses an anomaly detection method, electronic equipment, medium and computer program product, wherein the anomaly detection method comprises the steps of determining at least two isolated trees corresponding to a data set to be detected, determining first weights of each isolated tree based on first path lengths in each isolated tree in the at least two isolated trees, determining first anomaly scores of first data to be detected through second path lengths of the first data to be detected in each isolated tree and the first weights of each isolated tree, wherein the first data to be detected is any one data in the data set to be detected, and determining first anomaly detection results of the first data to be detected based on the first anomaly scores.
Inventors
- HUANG XINAN
- QIU MING
Assignees
- 中移(苏州)软件技术有限公司
- 中国移动通信集团有限公司
Dates
- Publication Date
- 20260512
- Application Date
- 20260410
Claims (10)
- 1. An anomaly detection method, the method comprising: Determining at least two isolated trees corresponding to a data set to be detected; determining a first weight of each of the at least two orphan trees based on a first path length in the each orphan tree; Determining a first anomaly score of first to-be-detected data through a second path length of the first to-be-detected data in each isolated tree and a first weight of each isolated tree, wherein the first to-be-detected data is any one data in the to-be-detected data set; and determining a first abnormality detection result of the first data to be detected based on the first abnormality score.
- 2. The method of claim 1, wherein the determining at least two orphaned trees corresponding to the data set to be detected comprises: Determining the information entropy of each of at least two features corresponding to a data set to be detected; determining a second weight of each feature based on the information entropy of each feature, wherein the second weight of a first feature is inversely related to the information entropy of the first feature, and the first feature is any one of the at least two features; and determining at least two isolated trees corresponding to the data set to be detected through the at least two features and the second weight of each feature.
- 3. The method of claim 2, wherein the determining at least two orphan trees corresponding to the dataset to be detected by the at least two features and the second weight of each feature comprises: Randomly selecting a target feature from the at least two features based on the second weight of each feature, wherein the probability of the target feature being selected is positively correlated with the second weight of the target feature; And determining at least two isolated trees corresponding to the data set to be detected based on the target features.
- 4. The method of claim 1, wherein prior to determining the first weight for each of the at least two orphan trees based on the first path length in each orphan tree, the method further comprises: Determining the information entropy of each of at least two features corresponding to a data set to be detected; determining a second weight of each feature based on the information entropy of each feature, wherein the second weight of a first feature is inversely related to the information entropy of the first feature, and the first feature is any one of the at least two features; and determining a first path length in each isolated tree based on the second weight of each feature, wherein the first path length is inversely related to the second weight of the feature corresponding to the first path length.
- 5. The method of claim 1, wherein the determining the first weight for each of the at least two orphan trees based on the first path length in each orphan tree comprises: Determining a first degree of discretization corresponding to a first isolated tree based on a first path length of each leaf node in the first isolated tree, wherein the first isolated tree is any one of the at least two isolated trees; Determining an orphan tree weight of the first orphan tree based on the first degree of discretization; And determining the first weight of each isolated tree through the isolated tree weight of the first isolated tree.
- 6. The method according to claim 1, wherein the method further comprises: Determining a first sub-data set in the data set to be detected; determining a second anomaly score for the first data to be detected based on an average of the first sub-data set; Determining a second anomaly detection result for the first data to be detected based on the second anomaly score; and determining a third abnormality detection result of the first data to be detected based on any one of the first abnormality detection result and the second abnormality detection result, or determining a third abnormality detection result of the first data to be detected based on the first abnormality detection result and the second abnormality detection result.
- 7. The method of claim 6, wherein the determining a first sub-data set among the data sets to be detected comprises: A first sub-data set is determined among the data sets to be detected based on a sliding window.
- 8. An electronic device comprising a processor and a memory for storing a computer program capable of running on the processor, wherein, The processor is configured to run the computer program to perform the method of any of claims 1 to 7.
- 9. A computer storage medium having stored thereon a computer program, which when executed by a processor implements the method of any of claims 1 to 7.
- 10. A computer program product comprising a computer program, characterized in that the computer program, when executed by a processor, implements the method of any one of claims 1 to 7.
Description
Abnormality detection method, electronic device, medium, and computer program product Technical Field The present application relates to the field of big data processing technologies, and in particular, to an anomaly detection method, an electronic device, a medium, and a computer program product. Background In the big data age, the data volume is rapidly increased, and the anomaly detection is widely applied to a plurality of fields as an important means for guaranteeing the data quality. The core goal of anomaly detection is to identify data points that deviate significantly from the normal mode to prevent false decisions or system failures from occurring. At present, an anomaly detection method based on an isolated forest is widely applied. In anomaly detection based on an isolated forest algorithm, the average path length of a data sample in an isolated forest is typically determined based on the path length of each isolated tree in the isolated forest, and whether an anomaly exists in the data sample is determined based on the average path length of the data sample in the isolated forest. The accuracy of anomaly detection by this detection method is to be improved. Disclosure of Invention The embodiment of the application provides an abnormality detection method, electronic equipment, medium and computer program product, which effectively improve the accuracy of abnormality detection by considering the difference of abnormality detection capability of each isolated tree in the abnormality detection process. The embodiment of the application provides an abnormality detection method, which comprises the following steps: Determining at least two isolated trees corresponding to a data set to be detected; determining a first weight of each of the at least two orphan trees based on a first path length in the each orphan tree; Determining a first anomaly score of first to-be-detected data through a second path length of the first to-be-detected data in each isolated tree and a first weight of each isolated tree, wherein the first to-be-detected data is any one data in the to-be-detected data set; and determining a first abnormality detection result of the first data to be detected based on the first abnormality score. The embodiment of the application provides an electronic device, which comprises a processor and a memory for storing a computer program capable of running on the processor, wherein, The processor is configured to run the computer program to perform any of the anomaly detection methods described above. An embodiment of the present application provides a computer storage medium having stored thereon a computer program which, when executed by a processor, implements any one of the anomaly detection methods described above. An embodiment of the present application provides a computer program product comprising a computer program which, when executed by a processor, implements any of the above-described anomaly detection methods. The embodiment of the application provides an anomaly detection method, electronic equipment, medium and computer program product, wherein at least two corresponding isolated trees are constructed through a data set to be detected to form an isolated forest, good generalization capability of anomaly detection is ensured, the first weight of each isolated tree is determined through analysis on the first path length of each isolated tree, the anomaly detection capability of each isolated tree is reflected, differential processing on different isolated trees is realized, finally, the first anomaly score is determined through combining the second path length of first data to be detected and the first weight of each isolated tree, and whether the first data to be detected is an anomaly point or not is judged according to the first anomaly score, so that the robustness of anomaly detection is ensured, and the accuracy of anomaly detection is improved. Drawings FIG. 1 is a flowchart of an anomaly detection method according to an embodiment of the present application; FIG. 2 is a flowchart of an isolated tree construction method according to an embodiment of the present application; FIG. 3 is a flowchart of a first anomaly score calculation method according to an embodiment of the present application; FIG. 4 is a flowchart of another abnormality detection method according to an embodiment of the present application; fig. 5 is a schematic structural diagram of an abnormality detection apparatus according to an embodiment of the present application; fig. 6 is a schematic diagram of a composition structure of an electronic device according to an embodiment of the present application. Detailed Description The present application will be further described in detail with reference to the accompanying drawings, for the purpose of making the objects, technical solutions and advantages of the present application more apparent, and the described embodiments should not be construed as limitin