CN-121980477-A - Digital asset anomaly detection method based on large language model and related products

CN121980477ACN 121980477 ACN121980477 ACN 121980477ACN-121980477-A

Abstract

The invention discloses a digital asset abnormality detection method based on a large language model and related products, and belongs to the technical field of Guanyu equipment abnormality detection. The digital asset abnormality detection method based on the large language model comprises the steps of collecting multi-source data of a plurality of time windows of the gateway base equipment in a normal operation state, constructing a digital asset original data set, obtaining a behavior semantic feature vector of the gateway base equipment in the normal operation state based on the digital asset original data set and combining the pre-trained large language model, obtaining the current behavior semantic feature vector of the gateway base equipment based on the multi-source data of the gateway base equipment in the current time window and combining the pre-trained large language model, and judging whether the gateway base equipment is abnormal in the current time window or not by using a K-means clustering algorithm and a contour coefficient. According to the invention, through the semantic understanding of the large language model and the quantitative analysis of the clustering algorithm, the reliability of detecting the abnormality of the gateway base equipment is improved.

Inventors

QU HUA
ZHOU FENG

Assignees

西安交通大学

Dates

Publication Date: 20260505
Application Date: 20260407

Claims (10)

1. A digital asset abnormality detection method based on a large language model is characterized by comprising the following steps: step one, acquiring the identification of the gateway base equipment, acquiring multi-source data of a plurality of time windows of the gateway base equipment in a normal operation state, associating the multi-source data with the identification of the gateway base equipment, and constructing a digital asset original data set; Step two, preprocessing the digital asset original data set to obtain a preprocessed digital asset original data set, carrying out semantic abstract conversion on the preprocessed digital asset original data set, mapping the preprocessed digital asset original data set into a behavior description unit set, inputting the behavior description unit set into a pre-trained large language model to obtain behavior semantic feature vectors of the gateway base equipment in a normal running state ; Step three, multi-source data of the gateway device in the current time window are obtained, semantic abstraction conversion is carried out after preprocessing, a behavior description unit of the gateway device in the current time window is generated, the behavior description unit of the gateway device in the current time window is input into a pre-trained large language model, and a current behavior semantic feature vector of the gateway device is obtained ; Step four, obtaining a behavior semantic feature vector of the gateway base equipment in a normal running state through a K-means clustering algorithm Calculating the current behavior semantic feature vector of the gateway device And taking the centroid distance as a preset threshold value, judging whether the contour coefficient is larger than the preset threshold value, if so, judging that the base closing device is abnormal in the current time window, and if not, judging that the base closing device is not abnormal in the current time window.
2. The method for detecting the digital asset abnormality based on the large language model according to claim 1 is characterized in that the multi-source data at least comprises static asset information, network communication data and operation state data, wherein the static asset information is collected through interaction with the gateway device or actively scanning the gateway device, the network communication data is collected through passive monitoring of the gateway device, and the operation state data is collected through reading of the state of the gateway device or subscribing of an interface of the gateway device.
3. The method for detecting anomalies in digital assets based on a large language model as recited in claim 2, wherein the multi-source data further includes behavior log data collected by subscribing, pulling, forwarding, or proxy collecting logs of the gateway device.
4. The method for detecting digital asset abnormality based on large language model of claim 1, further comprising the step of iteratively updating the multisource data of the base closing device in the multiple time windows in the normal operation state based on the multisource data of the base closing device in the current time window acquired in the step three when the base closing device has no abnormality in the current time window.
5. The large language model-based digital asset anomaly detection method of claim 1, wherein the preprocessing of the digital asset raw data set is specifically: at least one of structure normalization, time alignment and semantic enhancement processing is performed on multi-source data of a gateway device in a digital asset raw data set.
6. The digital asset anomaly detection method based on a large language model of claim 1, wherein the profile coefficients are specifically: Wherein, the As the profile factor is used, Is that Average value of dissimilarity degree to other points in the same cluster; Is that Minimum to average dissimilarity to other clusters, max { } is a function of taking the maximum.
7. A digital asset anomaly detection system based on a large language model, comprising: The first module is used for acquiring the identification of the gateway base equipment, acquiring multi-source data of a plurality of time windows of the gateway base equipment in a normal operation state, associating the multi-source data with the identification of the gateway base equipment, and constructing a digital asset original data set; The system comprises a first module, a second module, a behavior description unit set, a pre-training large language model and a data processing module, wherein the first module is used for obtaining a digital asset original data set after preprocessing by preprocessing the digital asset original data set, mapping the digital asset original data set after preprocessing into the behavior description unit set by carrying out semantic abstract conversion on the digital asset original data set after preprocessing, and inputting the behavior description unit set into the pre-training large language model to obtain a behavior semantic feature vector of the related base equipment in a normal running state ; A third module for obtaining multi-source data of the gateway device in the current time window, performing semantic abstraction conversion after preprocessing to generate a behavior description unit of the gateway device in the current time window, and inputting the behavior description unit of the gateway device in the current time window into a pre-trained large language model to obtain the current behavior semantic feature vector of the gateway device ; A fourth module for obtaining the behavior semantic feature vector of the gateway device in the normal running state through the K-means clustering algorithm Calculating the current behavior semantic feature vector of the gateway device And taking the centroid distance as a preset threshold value, judging whether the contour coefficient is larger than the preset threshold value, if so, judging that the base closing device is abnormal in the current time window, and if not, judging that the base closing device is not abnormal in the current time window.
8. A computer device comprising a memory and a processor, the memory storing a computer program, wherein the processor, when executing the computer program, implements the steps of the large language model-based digital asset anomaly detection method of any one of claims 1-6.
9. A computer-readable storage medium having stored thereon a computer program, wherein the computer program when executed by a processor implements the steps of the large language model based digital asset anomaly detection method of any one of claims 1 to 6.
10. A computer program product comprising a computer program, characterized in that the computer program when executed by a processor implements the steps of the large language model based digital asset anomaly detection method of any one of claims 1 to 6.

Description

Digital asset anomaly detection method based on large language model and related products Technical Field The invention relates to the technical field of abnormal detection of gateway-based equipment, in particular to a digital asset abnormal detection method based on a large language model and related products. Background The key information infrastructure equipment is widely applied to important fields such as energy, electric power, traffic, industrial control and the like, and has complex running state, various protocol types and strong equipment isomerism, and once an abnormality or a safety event occurs, the production safety and the social running can be seriously influenced. Therefore, accurate anomaly detection of the gateway device is an important requirement in the current information security and asset management fields. The existing method for detecting the abnormality of the related base equipment mainly depends on manual rule configuration, static fingerprint matching or single data characteristic statistical analysis mode, the method generally needs to define equipment types, communication protocols and abnormal rules in advance, is difficult to adapt to frequent changes of the types of the related base equipment, dynamic adjustment of configuration and complex interaction scenes across systems, and has the problems of coarse modeling granularity, weak semantic understanding capability, dependence on experience of abnormality identification, high false alarm and false alarm rate and the like. In particular, in a scenario where multiple source data coexist, it is difficult to effectively characterize the association relationship and the context semantic features between the behaviors of the gateway-based device, resulting in insufficient reliability of anomaly detection. Therefore, how to use multi-source data to improve the reliability of abnormal detection of the base closing device has become a technical problem to be overcome by those skilled in the art. Disclosure of Invention The invention aims to provide a digital asset abnormality detection method based on a large language model and related products, so as to solve the problem of insufficient reliability of the existing gateway-based equipment abnormality detection method. The invention solves the technical problems by the following technical proposal: The invention provides a digital asset abnormality detection method based on a large language model, which comprises the following steps: step one, acquiring the identification of the gateway base equipment, acquiring multi-source data of a plurality of time windows of the gateway base equipment in a normal operation state, associating the multi-source data with the identification of the gateway base equipment, and constructing a digital asset original data set; Step two, preprocessing the digital asset original data set to obtain a preprocessed digital asset original data set, carrying out semantic abstract conversion on the preprocessed digital asset original data set, mapping the preprocessed digital asset original data set into a behavior description unit set, inputting the behavior description unit set into a pre-trained large language model to obtain behavior semantic feature vectors of the gateway base equipment in a normal running state ; Step three, multi-source data of the gateway device in the current time window are obtained, semantic abstraction conversion is carried out after preprocessing, a behavior description unit of the gateway device in the current time window is generated, the behavior description unit of the gateway device in the current time window is input into a pre-trained large language model, and a current behavior semantic feature vector of the gateway device is obtained; Step four, obtaining a behavior semantic feature vector of the gateway base equipment in a normal running state through a K-means clustering algorithmCalculating the current behavior semantic feature vector of the gateway deviceAnd taking the centroid distance as a preset threshold value, judging whether the contour coefficient is larger than the preset threshold value, if so, judging that the base closing device is abnormal in the current time window, and if not, judging that the base closing device is not abnormal in the current time window. The invention is further improved in that the multi-source data at least comprises static asset information, network communication data and operation state data, wherein the static asset information is collected through interaction with the gateway device or actively scanning the gateway device, the network communication data is collected through passive monitoring of the gateway device, and the operation state data is collected through reading of the state of the gateway device or subscribing of an interface of the gateway device. The invention is further improved in that the multi-source data also comprises behavior log data, and the behavior log da