CN-122027220-A - Method and system for generating network security alarm information treatment scheme based on large language model

CN122027220ACN 122027220 ACN122027220 ACN 122027220ACN-122027220-A

Abstract

The invention relates to a method and a system for generating an Internet security alarm information treatment scheme based on a large language model, which are characterized in that firstly, multi-mode network security alarm information is acquired from a plurality of security data sources and preprocessed, then the pre-trained large language model is trained by using pre-acquired manually marked network security field training data to obtain a trained large language model, the trained large language model is called again, the preprocessed multi-mode network security alarm information is analyzed by combining a retrieval enhancement generation technology, the Internet security alarm information treatment scheme is automatically generated, and finally, the alarm treatment scheme is automatically converted into a standardized report document in a readable mark language format through a template engine and is displayed to a user so as to provide feedback comments for the user, thereby realizing fusion analysis of static knowledge and dynamic information, improving the recognition and analysis capability of novel and variant attacks, coping with novel threats without frequent regulation rules, and effectively improving flexibility and adaptability.

Inventors

HE FENG
CHEN LUXIAO
ZHANG HUA
SHAO YANGYANG
ZHU XIANGHU
LOU LI
XU JUN
LIN YIMING
ZHAO HONGTAO
ZHANG JING
HU RONG
CHEN ZHENDONG
LIU LIGUO
CHENG JINXING
Peng Yongen
SHANG RUI
DAI ZHENGRONG
NIE HENGKAI
GUO LEI
LIU YONGQI
LIU SHAOLEI
WU HUI

Assignees

中远海运科技股份有限公司

Dates

Publication Date: 20260512
Application Date: 20260107

Claims (10)

1. The method for generating the network security alarm information treatment scheme based on the large language model is characterized by comprising the following steps of: the multi-mode information acquisition step is that multi-mode network security alarm information comprising text form data, log form data and structured form data is acquired from a plurality of security data sources; Preprocessing the multi-mode network security alarm information, wherein the preprocessing comprises text cleaning, word segmentation processing, entity identification and semantic analysis, the word segmentation processing is used for decomposing text form data into independent vocabulary units, the entity identification is used for identifying and extracting key threat indexes from the multi-mode network security alarm information, and the semantic analysis is used for converting unstructured text form data and log form data into structured form data; Training a pre-trained large language model by using pre-acquired training data of the manually marked network security field by adopting a supervised learning algorithm to obtain a trained large language model; The processing scheme online generation step comprises the steps of calling the trained large language model, combining a retrieval enhancement generation technology, dynamically retrieving corresponding real-time threat information from the RAG knowledge base according to the key threat indexes, analyzing the preprocessed multi-mode network security alarm information according to the retrieved real-time threat information, the threat identification knowledge and the processing knowledge which are trained and learned by the trained large language model, describing the nature, the influence range and the attack path information of the threat, and automatically generating a network security alarm information processing scheme comprising processing measures, security improvement measures and effect evaluation; And the format conversion and report display step, namely automatically converting the web alarm information treatment scheme into a standardized report document in a readable markup language format through a template engine and displaying the standardized report document to a user so as to provide feedback comments for the user.
2. The method for generating a web alarm information handling scheme based on a large language model according to claim 1, further comprising a feedback optimization step after the format conversion and report presentation step: And obtaining feedback comments of the user on the standardized report document, carrying out iterative optimization on the trained large language model according to the feedback comments, and updating the content of the RAG knowledge base at the same time, wherein the iterative optimization comprises the steps of adjusting model parameters and updating training data in the network security field so as to improve the accuracy of a subsequent alarm treatment scheme.
3. The method according to claim 1, wherein in the step of obtaining the multimodal information, the plurality of security data sources include a security information and event management system, an intrusion detection system, an intrusion prevention system, and a firewall, wherein in the multimodal network security alert information, log-formed data includes IDS alert data from the intrusion detection system and an original traffic record in network traffic statistics from the security information and event management system, and structured form data includes a structured report in the network traffic statistics.
4. The method for generating a network security alarm information processing scheme based on a large language model according to claim 3, wherein in the multi-mode information processing step, the text cleansing is used for removing noise data and irrelevant information in the multi-mode network security alarm information, the noise data and irrelevant information comprise an HTML tag, special characters and repeated redundant fields, and the key threat indexes extracted by entity identification comprise suspicious IP addresses, vulnerability numbers, malicious software names, attack domain names and abnormal port numbers.
5. The method for generating a network security alarm information treatment scheme based on a large language model according to claim 1, wherein in the model training and RAG knowledge base construction steps, the network security domain training data comprises a vulnerability report, an attack case base and a security treatment manual, and the pre-training large language model is DeepSeek model or GPT series model.
6. The method according to any one of claims 1 to 5, wherein in the online generation step of the treatment scheme, the threat identification knowledge includes identifying a network attack pattern, the treatment knowledge includes judging a threat type and recommending treatment measures, the treatment facility includes blocking network access of suspicious IP addresses and isolating infected devices, the security improvement measures include repairing system vulnerabilities and optimizing security policies, and the effect evaluation includes evaluating an attack success rate, a data leakage risk, and an expected effect after treatment by the treatment facility.
7. A system for generating an on-line alarm information treatment scheme based on a large language model is characterized by comprising a multi-mode information acquisition module, a multi-mode information processing module, a model training and RAG knowledge base construction module, an on-line treatment scheme generation module and a format conversion and report display module which are connected in sequence, The multi-mode information acquisition module acquires multi-mode network security alarm information comprising text form data, log form data and structured form data from a plurality of security data sources; The multi-mode information processing module is used for preprocessing the multi-mode network security alarm information, wherein the preprocessing comprises text cleaning, word segmentation processing, entity recognition and semantic analysis, the word segmentation processing is used for decomposing text form data into independent vocabulary units, the entity recognition is used for recognizing and extracting key threat indexes from the multi-mode network security alarm information, and the semantic analysis is used for converting unstructured text form data and log form data into structured form data; The model training and RAG knowledge base construction module uses pre-acquired training data of the network security field marked by manpower to train the pre-trained large language model by adopting a supervised learning algorithm to obtain a trained large language model; The processing scheme online generation module invokes the trained large language model, combines a retrieval enhancement generation technology, dynamically retrieves corresponding real-time threat information from the RAG knowledge base according to the key threat indexes, analyzes the preprocessed multi-mode network security alarm information according to the retrieved real-time threat information, the threat identification knowledge and the processing knowledge which are trained and learned by the trained large language model, describes the nature, the influence range and the attack path information of the threat, and automatically generates a network security alarm information processing scheme comprising processing measures, security improvement measures and effect evaluation; The format conversion and report display module automatically converts the network security alarm information treatment scheme into a standardized report document in a readable markup language format through a template engine and displays the standardized report document to a user so as to provide feedback comments for the user.
8. The large language model based web alarm information handling scheme generation system of claim 7, further comprising, after the format conversion and report presentation module, a feedback optimization module: And obtaining feedback comments of the user on the standardized report document, carrying out iterative optimization on the trained large language model according to the feedback comments, and updating the content of the RAG knowledge base at the same time, wherein the iterative optimization comprises the steps of adjusting model parameters and updating training data in the network security field so as to improve the accuracy of a subsequent alarm treatment scheme.
9. The large language model based network security alarm information processing scheme generation system of claim 7, wherein the plurality of security data sources comprise a security information and event management system, an intrusion detection system, an intrusion prevention system, and a firewall in the multimodal information acquisition module, wherein the log-formed data comprises IDS alarm data from the intrusion detection system and raw traffic records in network traffic statistics from the security information and event management system, and wherein the structured form data comprises structured reports in the network traffic statistics.
10. The system for generating a large language model based network security alarm information treatment scheme according to one of claims 7 to 9, wherein in the multi-modal information processing module, the text cleansing is used for removing noise data and irrelevant information in the multi-modal network security alarm information, the noise data and irrelevant information comprises HTML tags, special characters and repeated redundant fields; And/or, in the model training and RAG knowledge base construction module, the network security domain training data comprises a vulnerability report, an attack case library and a security handling manual, wherein the pre-training large language model is DeepSeek model or GPT series model; And/or, in the treatment scheme online generation module, the threat identification knowledge comprises a network attack mode identification, the treatment knowledge comprises a threat type judgment and a treatment measure recommendation, in the treatment scheme, the treatment facility comprises a network access blocking of suspicious IP addresses and an infected device isolation, the security improvement measure comprises a system bug repairing and a security policy optimizing, and the effect evaluation comprises an attack success rate, a data leakage risk and an expected effect after the treatment facility is treated.

Description

Method and system for generating network security alarm information treatment scheme based on large language model Technical Field The invention relates to the technical field of artificial intelligence and natural language processing, in particular to a method and a system for generating an Internet security alarm information processing scheme based on a large language model. Background With the rapid development of network technology, network attack means are increasingly complicated and diversified, and novel attack layers such as Advanced Persistent Threat (APT), zero-day vulnerability exploitation, luxury software lateral movement and the like are endless, so that unprecedented challenges are brought to network security. Traditional network security alarm information handling schemes rely primarily on static rules, signature matching, or manual empirical analysis, which are frustrating in the face of new and variant attacks. In particular, existing schemes suffer from the problem of 1) limited ability to detect unknown threats, the difficulty of rule-based systems to identify new or variant attacks that are not defined in the rule base, resulting in security vulnerabilities. 2) The response efficiency is low, the time for manually analyzing the alarm information is long, the expert experience is highly depended, the real-time requirement can not be met, and the method is particularly outstanding in the face of large-scale attack. 3) The flexibility is insufficient, the static rule base needs to be frequently updated to cope with the newly-appearing threats, which not only increases the operation and maintenance cost, but also is difficult to adapt to the rapidly-changing threat environment. 4) The lack of intelligent treatment advice, the existing solutions typically only provide alert information, and the lack of the ability to automatically generate targeted treatment solutions, increases the burden on the security team. Although improvements have been made, such as rule engine based systems, manual analysis platforms, and traditional machine learning models, they still suffer from significant drawbacks. 1) Based on the scheme of a rule engine, under the original social background, network safety protection mainly depends on simple rule setting, and when network data accords with preset rules, an alarm is triggered. For example, rules are set for frequent accesses to a particular IP address, and an alarm is generated once the access frequency of that IP exceeds a threshold. However, this approach has significant limitations in that with the continued advancement of network attack technology, attackers can easily bypass these static rules, and rule-based systems have difficulty identifying new or variant attacks because these attacks often do not have the characteristics described by known rules, resulting in a large number of unknown threats that cannot be effectively detected. 2) Manual analysis platform, although can deal with complicated network attack to a certain extent, inefficiency and cost are high. The security expert needs to spend a large amount of time to analyze massive alarm information, so that the response speed is low, the real-time requirement cannot be met, the analysis result is greatly influenced by the expert experience, and different experts can possibly obtain different conclusions. 3) The traditional machine learning model requires a large amount of annotation data for training, has limited generalization capability and is difficult to adapt to a rapidly-changing threat environment. Therefore, the development of the method capable of automatically, efficiently and accurately generating the network security alarm information treatment scheme has important practical significance. Disclosure of Invention In order to solve the problems of limited detection capability, low response efficiency, insufficient flexibility and the like of unknown threats in the traditional network security alarm information treatment process, the invention provides a method for generating a network security alarm information treatment scheme based on a large language model, which can rapidly analyze multi-mode network security alarm information and generate a treatment scheme, remarkably shorten response time, meet real-time requirements, realize fusion analysis of static knowledge and dynamic information, improve the identification and analysis capability of novel and variant attacks, can cope with the novel threats without frequent regulation of rules, and effectively improve flexibility and adaptability. The invention also relates to a system for generating the network security alarm information treatment scheme based on the large language model. The technical scheme of the invention is as follows: the method for generating the network security alarm information treatment scheme based on the large language model is characterized by comprising the following steps of: the multi-mode information acquis