CN-122027307-A - Intelligent analysis and decision-making system and method for safe operation based on large language model and knowledge enhancement

CN122027307ACN 122027307 ACN122027307 ACN 122027307ACN-122027307-A

Abstract

The invention relates to the technical field of network security, and discloses a large-model security operation intelligent analysis system and method based on semantic embedding and knowledge enhancement. The system comprises a data acquisition module, a data standardization module, a semantic feature extraction module, a vectorization embedding module, a vector retrieval module, a knowledge enhancement retrieval module, a large model reasoning module and a risk decision output module. The method comprises the steps of carrying out semantic analysis on safety alarm data, generating semantic vectors, carrying out similarity retrieval in a vector database, carrying out enhanced retrieval by combining a safety knowledge base, and carrying out comprehensive reasoning analysis by utilizing a large language model to generate alarm authenticity judgment results, risk scores and treatment suggestions. The invention further adopts a double-channel semantic analysis mechanism, a dynamic weight risk scoring model and an ultra-long request body segmentation semantic compression method to realize automatic classification, automatic noise reduction and intelligent decision output of the safety alarm. Compared with the prior art, the invention improves the alarm studying and judging accuracy, reduces the manual analysis cost and has good practicability and popularization value.

Inventors

GU Yu
SHI CHENGYU
GE YAO
PAN WEICHEN

Assignees

潘伟晨

Dates

Publication Date: 20260512
Application Date: 20260228

Claims (10)

1. A large model security operation intelligent analysis system based on semantic embedding and knowledge enhancement, comprising: The data acquisition module is used for acquiring safety alarm data from safety equipment, a log system or network traffic; The data standardization module is used for carrying out structural processing on the safety alarm data and extracting a request body, a response body, asset information and behavior characteristics; The semantic feature extraction module is used for carrying out semantic analysis on the request body, the response body and the behavior feature to generate semantic feature data; The vectorization embedding module is used for converting the semantic feature data into semantic vectors based on a pre-training embedding model; the vector retrieval module is used for retrieving the historical alarm data similar to the semantic vector in a vector database; The knowledge enhancement retrieval module is used for retrieving attack modes and treatment rules related to the semantic features from a secure knowledge base; The large model reasoning module is used for generating an alarm authenticity judgment result and a risk level based on the historical alarm data and the safety knowledge base information; and the risk decision output module is used for outputting alarm classification results, risk scores and treatment suggestions.
2. The intelligent analysis method for the safety alarm based on semantic vector and knowledge enhancement is characterized by comprising the following steps of: S1, acquiring safety alarm data; S2, carrying out structural analysis on the safety alarm data, and extracting semantic features; S3, generating a corresponding semantic vector based on the embedded model; S4, performing similarity retrieval in a vector database to obtain a history similarity alarm set; s5, searching corresponding attack characteristic information in the security knowledge base; S6, inputting the historical similar alarm set and the attack characteristic information into a large language model for reasoning analysis; And S7, generating an alarm authenticity judgment result, a risk score and a treatment suggestion.
3. The system of claim 1, wherein the semantic feature extraction module employs a two-channel semantic analysis mechanism to respectively construct an attack semantic vector and a business semantic vector, and determines the authenticity of the alarm based on a similarity difference between the two.
4. The system of claim 1, wherein the risk score employs a dynamic weight calculation model, the calculation formula being: Riskscore=α×current semantic risk value Beta x historical similar attack proportion Gamma x asset sensitivity score Wherein alpha, beta and gamma are dynamically adjustable parameters.
5. The system of claim 1, wherein the vector database is a distributed storage system supporting high-dimensional vector similarity retrieval.
6. The system of claim 1, wherein the large model inference module employs a search enhancement generation mechanism to input historical similar alert data and secure knowledge base content as context.
7. The system of claim 1, wherein the system employs a segmented semantic compression method when a request body or a response body of the security alert data exceeds a preset length, comprising: The data are subjected to block processing; generating a local semantic vector for each partition; generating a semantic abstract; and aggregating the semantic abstract to perform overall risk reasoning.
8. The method of claim 2, wherein the similarity search uses cosine similarity or euclidean distance calculation.
9. The method of claim 2, wherein the system supports adaptive updating of risk weights based on historical treatment results.
10. The system of claim 1, wherein the system supports local deployment or private cloud deployment to meet data security compliance requirements.

Description

Intelligent analysis and decision-making system and method for safe operation based on large language model and knowledge enhancement Technical Field The invention belongs to the technical field of network security, in particular to a Security Operation (SOC) intelligent analysis technology, and particularly relates to a security alarm automatic analysis system and method integrating semantic vector retrieval, knowledge enhancement reasoning and large language model technology. Background With the continuous deep information construction, the network environment of government enterprises is increasingly complex, and the types of security devices are continuously increased, including firewalls, intrusion Detection Systems (IDS), intrusion Prevention Systems (IPS), web Application Firewalls (WAFs), terminal security systems, situation awareness platforms and the like, which generate a large amount of security alarm data every day. The existing Security Operation Center (SOC) generally adopts a mode based on rule matching or keyword retrieval to classify and judge security alarms, and mainly has the following technical problems: (1) The false alarm rate of the alarm is high, and the rule is difficult to cover complex service scenes; (2) The safety analysis is seriously dependent on manual experience, has low efficiency and is difficult to scale; (3) Semantic level comparison cannot be performed on the historical similar events; (4) Lack of efficient processing power for very long requesters or complex Payload; (5) The safety knowledge base and the history handling experience cannot be unified and fused; (6) Existing systems are mostly based on structured field matching, lacking depth understanding capability for semantic hierarchy. With the development of large language models and vector retrieval technologies, analysis modes based on semantic understanding become possible, but the prior art does not form a complete closed-loop architecture facing to a safe operation scene, and especially lacks a systematic scheme combining semantic embedding, historical similarity retrieval, knowledge enhancement reasoning and dynamic risk scoring mechanisms. Therefore, there is a need for an intelligent safe operation analysis system and method that can implement automatic classification, automatic noise reduction, automatic scoring, and automatic generation of treatment recommendations. Disclosure of Invention In order to solve the technical problems, the invention provides a large-model safe operation intelligent analysis system and method based on semantic embedding and knowledge enhancement, which realize automatic semantic analysis, historical similarity retrieval, knowledge enhancement reasoning and dynamic risk scoring of a safe alarm by constructing a multi-module cooperative framework, thereby improving alarm research and judgment accuracy and reducing manual analysis cost. The system provided by the invention comprises: -a data acquisition module for receiving security alert data from a security device or a logging system; The system comprises a data standardization module, a semantic feature extraction module, a data analysis module and a data analysis module, wherein the data standardization module is used for carrying out structural processing on safety alarm data; The vector retrieval module is used for carrying out similarity retrieval in a vector database; -a knowledge enhancement retrieval module for retrieving attack patterns and security rules associated with alarms; The large model reasoning module is used for carrying out semantic reasoning analysis by combining the historical similar data with knowledge base information; and the risk decision output module is used for outputting alarm authenticity judgment results, risk levels and treatment suggestions. Furthermore, the invention adopts a double-channel semantic analysis mechanism, simultaneously constructs an attack semantic vector and a service semantic vector, and realizes the distinction of real attack and service triggering false alarm by comparing the semantic similarity difference between the two types of vectors. Further, the invention constructs a dynamic weight risk scoring model, and calculates a risk score by the following formula, riskscore=α×current semantic risk value+β×history similar attack proportion+γ×asset sensitivity score, wherein α, β, γ are dynamically adjustable parameters. Further, when the request body or the response body is detected to exceed the preset length, the method adopts a segmentation semantic compression method, which comprises block processing, local vector generation, semantic abstract compression and aggregation reasoning, so that the problem that model reasoning is limited due to the fact that the volume of the ultra-large data is large is solved. Furthermore, the invention supports risk weight self-adaptive updating based on historical treatment results, and realizes model continuous optimization. Through the