CN-121389128-B - Intelligent detection method for hidden danger of loopholes based on large model

CN121389128BCN 121389128 BCN121389128 BCN 121389128BCN-121389128-B

Abstract

The application discloses an intelligent detection method of hidden danger of a vulnerability based on a large model, which comprises the steps of firstly carrying out global static analysis on a source code set, constructing a complete calling graph and a data flow graph of a program, forming a structured code knowledge graph, then carrying out accurate context retrieval and enhancement on identified candidate vulnerability slices based on the graphs, converting key information such as a calling chain and a data tracing path which are strongly related to the vulnerability slices into natural language description which is understandable by the large language model, and injecting prompt words, thereby providing missing global context information for the model and making up for the defect of the model in complex code analysis capability. By the method, the problem that the attention mechanism fails in remote code association is solved, and the accuracy and reliability of vulnerability detection are remarkably improved.

Inventors

Yan lijing
ZHANG MENGYAN
LI SHUAI
DANG FANGFANG
SONG YIFAN
LIU HAN
Jiao Qidi
LIU BOYU
ZHANG JING
DUAN MENGFEI

Assignees

国网河南省电力公司信息通信分公司
国网河南省电力公司

Dates

Publication Date: 20260512
Application Date: 20251016

Claims (7)

1. The intelligent detection method for the hidden danger of the vulnerability based on the large model is characterized by comprising the following steps of: acquiring a set of source code files; Performing global static analysis on the set of source code files to obtain a call graph and a data flow graph; Performing candidate vulnerability slice identification on a first source code file in a set of source code files to obtain a candidate vulnerability slice list; based on the call graph and the data flow graph, performing context retrieval and enhancement based on the map on a first candidate vulnerability slice in the candidate vulnerability slice list to obtain a first enhanced vulnerability slice; performing depth context-aware vulnerability arbitration on the first enhanced vulnerability slice to obtain an arbitration result; Performing candidate vulnerability slice identification on a first source code file in a set of source code files to obtain a candidate vulnerability slice list, including: Based on each mode rule in the dangerous sink mode library, matching the first source code file to obtain an original candidate slice list; Inputting the code segments, the file paths and the line numbers of all the original candidate slices in the original candidate slice list into a predefined prompt word template to obtain an analysis prompt list; Inputting the analysis prompt list and the corresponding analysis prompt and the original candidate slice in each group of the analysis prompt list and the original candidate slice list into an API interface of the large language model to obtain a candidate vulnerability slice list; Performing graph-based context retrieval and enhancement on a first candidate vulnerability slice in a candidate vulnerability slice list based on a call graph and a data flow graph to obtain a first enhanced vulnerability slice, including: Performing slice positioning and map anchor mapping in a data flow graph based on a file path and a line number of the first candidate vulnerability slice to obtain anchor nodes; resolving the required context type of the first candidate vulnerability slice to obtain a traversal instruction; Performing reverse graph traversal and path collection on the call graph and the data flow graph based on the anchor node and the traversal instruction to obtain an original path set; carrying out path pruning and key evidence chain screening on the original path set to obtain a key path; and performing slice enhancement on the first candidate vulnerability slice based on the critical path to obtain the first enhancement vulnerability slice.
2. The large model-based vulnerability and hidden danger intelligent detection method of claim 1, wherein performing global static analysis on the set of source code files to obtain a call graph and a data flow graph comprises: performing source code lexical/syntactic analysis on each source code file in the set of source code files to obtain a set of abstract syntax trees; Performing intra-function analysis and local graph construction on each abstract syntax tree in the set of abstract syntax trees to obtain a function node set, a variable node set and a function local graph set; Performing inter-function analysis and global relation link on the function node set, the variable node set and the function local graph set of each abstract syntax tree to obtain a calling edge set and a data stream edge set; and performing global map aggregation on the function node set, the variable node set, the calling edge set and the data flow edge set to obtain a calling graph and a data flow graph.
3. The large model-based vulnerability and hidden danger intelligent detection method of claim 2, wherein performing source code lexical/syntactic parsing on each source code file in the set of source code files to obtain the set of abstract syntax trees comprises: Inputting the source code file into a lexical analyzer to obtain a source code word stream; The source code word stream is input to a parser to obtain an abstract syntax tree.
4. The method for intelligent detection of vulnerability and hidden danger based on large model of claim 3, wherein inputting each group of corresponding analysis prompts and original candidate slices in the analysis prompt list and the original candidate slice list into the API interface of the large language model to obtain the candidate vulnerability slice list comprises: inputting the analysis prompts and the original candidate slices into an API interface of the large language model to obtain initial scores and required context types; adjusting the initial score to obtain a preliminary score; The preliminary score, the required context type, and the metadata of the original candidate slice are combined to obtain a candidate vulnerability slice.
5. The method of intelligent detection of vulnerability and potential hazards based on large model as recited in claim 4, wherein the adjusting the initial score to obtain the initial score comprises adjusting the initial score with the following formula: Wherein, the To extract risk of a dangerous sink from a dangerous sink pattern library, For the purpose of local definition, Is an initial score.
6. The method for intelligent detection of vulnerability and hidden danger based on large model of claim 5, wherein the step of performing path pruning and key evidence chain screening on the original path set to obtain a key path comprises: constructing a risk potential field based on the data flow graph and the operation risk knowledge base; Performing risk accumulation calculation based on path integral on each original path in the original path set based on the risk potential field to obtain a path score set; and carrying out risk density normalization and critical path decision on each original path in the path score set to obtain the critical path.
7. The method of intelligent detection of vulnerability and hidden danger based on large model of claim 4, wherein performing depth context aware vulnerability arbitration on the first enhanced vulnerability slice to obtain arbitration results comprises: Embedding the code segment and the context description of the first enhanced vulnerability slice into a hint word template for in-depth arbitration to obtain a final hint word; Inputting the final prompt word into a large language model to obtain an original LLM response; and carrying out structural arbitration resolution and credibility fusion on the original LLM response and the primary score to obtain an arbitration result.

Description

Intelligent detection method for hidden danger of loopholes based on large model Technical Field The application relates to the field of intelligent detection, in particular to an intelligent detection method for hidden danger of loopholes based on a large model. Background In the current software development field, with the increasing complexity of the system and the continuous evolution of network attack means, software vulnerabilities have become a serious challenge for network security. These vulnerabilities may lead to data leakage, system damage, or even serious economic and reputation damage. Traditional vulnerability detection methods, such as manual code audit and rule-based static analysis, are often difficult to effectively cope with modern large-scale and high-complexity code libraries, and generally have the risks of low detection efficiency, high false alarm rate or more critical false alarm. Therefore, a set of efficient and intelligent vulnerability detection scheme is constructed so as to accurately identify and repair potential safety defects in the early stage of the life cycle of the software, and the method has a vital meaning for improving the safety and toughness of the software. However, some existing intelligent vulnerability detection schemes, particularly in applications incorporating Large Language Models (LLMs), still face significant technical bottlenecks. One core problem is the input length (Token) limitation inherent to large language models, which forces large item code to have to be physically split when processing it. This splitting behavior severely destroys global context information such as data flow paths and function call chains across files, which are precisely the key basis for accurately judging complex vulnerabilities. In addition, the "distance decay" effect of the self-attention mechanism in the transducer model is also a non-negligible weakness. Although in theory, the mechanism can associate any two Token, in practical application, the association strength between code segments far away from each other is significantly weakened, so that the model may not effectively capture the potential safety hazards generated by long-distance code association, thereby causing missing report. Furthermore, large language models typically treat code as a sequence of text, which makes it a planarization feature for the understanding of complex inheritance, polymorphism, etc., structural relationships in the programming of an object, and higher-order function calls, etc., logic. This approach is difficult to accurately build and parse control and data streams as in conventional static analysis tools, thereby limiting the ability of the model to accurately track data and control stream delivery paths in complex program structures. The technical problems limit the accuracy and reliability of the current intelligent vulnerability detection scheme, and a new method capable of overcoming the limitations is needed. Therefore, an optimized large-model-based intelligent detection method for hidden danger is expected. Disclosure of Invention The present application has been made to solve the above-mentioned technical problems. The embodiment of the application provides an intelligent detection method for hidden danger based on a large model, which comprises the steps of firstly carrying out global static analysis on a source code set, constructing a complete calling graph and a data flow graph of a program, forming a structured code knowledge graph, then carrying out accurate context retrieval and enhancement on identified candidate vulnerability slices based on the graphs, converting key information such as a calling chain and a data tracing path which are strongly related to the vulnerability slices into natural language description which is understandable by the large language model, and injecting prompt words, thereby providing missing global context information for the model and making up for the defect of the model in complex code analysis capability. By the method, the problem that the attention mechanism fails in remote code association is solved, and the accuracy and reliability of vulnerability detection are remarkably improved. According to one aspect of the application, a large-model-based vulnerability and hidden danger intelligent detection method is provided, which comprises the following steps: acquiring a set of source code files; Performing global static analysis on the set of source code files to obtain a call graph and a data flow graph; Performing candidate vulnerability slice identification on a first source code file in a set of source code files to obtain a candidate vulnerability slice list; based on the call graph and the data flow graph, performing context retrieval and enhancement based on the map on a first candidate vulnerability slice in the candidate vulnerability slice list to obtain a first enhanced vulnerability slice; and performing depth context-awar