CN-121997343-A - Code bug automatic detection tool for basic software development

CN121997343ACN 121997343 ACN121997343 ACN 121997343ACN-121997343-A

Abstract

The invention discloses an intelligent code vulnerability automatic detection system for basic software development, and belongs to the technical field of software security and static analysis. The system comprises a source code preprocessing and standardization module, a multi-level composite abstract syntax tree construction module, a vulnerability rule knowledge base with enhanced domain knowledge, a context sensitive analysis module based on a graph neural network, a symbol execution and constraint solving and guiding module, a vulnerability association and root cause positioning module and a feedback-driven self-adaptive optimization engine. The method models deep semantics and complex data flow and control flow of codes by constructing enhanced code representation integrating grammar, semantics and specific field characteristics and combining a deep learning model, thereby realizing high-precision and context-sensitive automatic detection of multiple loopholes such as memory security, concurrency security, logic defects and the like.

Inventors

Que Chenbing

Assignees

江苏妍与桐电子科技有限公司

Dates

Publication Date: 20260508
Application Date: 20260129

Claims (10)

1. An intelligent code vulnerability automatic detection system for basic software development, which is characterized by comprising: The source code preprocessing and standardization module is used for receiving the target source code, carrying out coding standardization, macro expansion, annotation removal and configuration association preprocessing with the target compiling environment, and generating a standardized intermediate code; The multi-level composite abstract syntax tree construction module is connected with the source code preprocessing and standardization module and is used for analyzing the standardization intermediate code to generate a basic abstract syntax tree, and further fusing a control flow diagram, a data flow diagram and basic software specific semantic annotation to construct a composite abstract syntax tree containing syntax, structure and part of semantic information; a domain knowledge enhanced vulnerability rule knowledge base storing a rule set for a typical vulnerability pattern of basic software, wherein the rule set comprises traditional static analysis rules based on formal logic description, historical vulnerability code segment feature vectors and vulnerability pattern features extracted by a machine learning model; The context sensitive analysis module based on the graph neural network is connected with the multi-level composite abstract syntax tree construction module and the vulnerability rule knowledge base with the field knowledge enhancement, and is used for converting the composite abstract syntax tree into a graph structure which can be processed by the heterogeneous graph neural network, extracting the deep representation of the code graph by utilizing a pre-trained graph neural network model, and carrying out pattern matching and anomaly detection by combining with the knowledge base to generate a preliminary suspicious point set; The symbol execution and constraint solving guide module is connected with the context sensitive analysis module based on the graph neural network and is used for carrying out priority ordering on the preliminary suspicious point set, starting selective symbol execution aiming at a high-priority path, generating path constraint conditions, and calling a constraint solver to verify the accessibility of the loopholes so as to filter false alarm; The vulnerability association and root cause positioning module is connected with the symbol execution and constraint solving and guiding module and is used for constructing a vulnerability propagation chain through backtracking data dependence and control dependence relation on confirmed vulnerability points, identifying the vulnerability root position and triggering condition and generating a detailed report containing the vulnerability type, position, severity level, root cause chain and repairing suggestion; And the feedback-driven self-adaptive optimization engine is connected with the vulnerability rule knowledge base with the enhanced domain knowledge and the context sensitive analysis module based on the graph neural network, and is used for collecting false alarm and missing report samples and newly confirmed vulnerability modes in analysis results, dynamically updating the vulnerability rule knowledge base and finely adjusting the graph neural network model, so as to realize continuous optimization of the system.
2. The system for automatically detecting the vulnerability of intelligent codes developed by basic software according to claim 1, wherein the source code preprocessing and normalizing module specifically comprises: the coding unified sub-module is used for uniformly converting source codes with different coding formats into UTF-8 formats; The compiling environment simulation sub-module is used for loading compiler configuration, a header file path and a macro definition set matched with target basic software and executing correct branch expansion of a conditional compiling instruction; and the structure normalization sub-module is used for carrying out standardized expansion on grammar sugar and inline functions in the source code and removing all notes and nonfunctional code elements.
3. The system for automatically detecting the vulnerability of intelligent codes developed by basic software according to claim 1, wherein the multi-level composite abstract syntax tree construction module specifically performs the following operations: calling a basic parser to generate a standard AST corresponding to a programming language; Constructing a control flow graph and a data flow graph at a function level and a module level based on a standard AST; marking nodes and side information in the CFG and the DFG as attributes on corresponding AST nodes to form a first layer of enhancement; And integrating a knowledge base in the field of basic software, adding semantic annotation nodes to specific API calls, resource management operations and synchronization primitives to form a second-layer enhancement, and finally generating the composite AST.
4. The system for automatically detecting vulnerabilities of intelligent code for basic software development of claim 1, wherein the rule set in the domain knowledge-enhanced vulnerability rule knowledge base comprises at least: Memory security rule subset, which covers null pointer dereferencing, buffer overflow, release after use, repeated release, memory leakage mode; the concurrency security rule subset comprises race conditions, deadlocks, atomic violations and sequence violations; logic and API misuse rule subset, which covers integer overflow, zero removal error, error return value not checked, unsafe function call mode; Rules are stored in extensible DSL or intermediate representations with confidence weights and trigger conditions attached.
5. The intelligent code vulnerability automatic detection system for basic software development of claim 1, wherein the context sensitive analysis module based on the graph neural network specifically comprises: The graph conversion sub-module is used for mapping nodes in the composite AST into graph nodes, mapping grammar parent-child relationships, control flow edges, data flow edges and semantic annotation edges into different types of graph edges, and constructing different graphs; The graph neural network model submodule adopts a multi-layer graph attention network or a graph isomorphic network to carry out message transmission and node embedding update on different graphs and learn global and local characteristics of codes; And the vulnerability detection head sub-module receives the final embedded vector of the graph node, calculates the similarity with the feature vector in the knowledge base through a classifier, and outputs the probability of each code position having a specific type of vulnerability.
6. The intelligent code vulnerability automatic detection system for basic software development according to claim 1, wherein the sign execution and constraint solving and guiding module adopts a selective sign execution strategy, and the selection criteria comprise vulnerability probability level, vulnerability potential severity level and complexity estimation of an execution path output by a GNN model, and the module takes suspicious program points screened by the GNN as guiding targets of sign execution instead of full path exploration, so that detection depth and time expenditure are balanced.
7. The intelligent code vulnerability automatic detection system for basic software development according to claim 1, wherein the vulnerability association and root cause positioning module performs forward and backward analysis along data dependence and control dependence edges marked in the composite AST through a static slicing technology, generates a complete evidence chain from a vulnerability trigger point to a source defect, and visually presents the evidence chain in a final analysis report.
8. The system for automatically detecting a vulnerability of an intelligent code for basic software development of claim 1, wherein the workflow of the feedback driven adaptive optimization engine comprises: the collection module is used for collecting false alarm and missing report samples from manual auditing results or cross verification with the dynamic analysis tool; The characterization module is used for converting the collected samples into a composite AST graph representation; the updating module is used for performing incremental learning or fine adjustment on the graph neural network model by utilizing the new sample; And the expansion module is used for abstracting the confirmed new vulnerability mode into rules or feature vectors and storing the rules or feature vectors into a vulnerability rule knowledge base.
9. An intelligent code vulnerability automatic detection method applied to the system of any one of claims 1-8 and oriented to basic software development, comprising the following steps: S1, acquiring a source code of target basic software, preprocessing and standardizing the source code to generate a standardized intermediate code; s2, analyzing the standardized intermediate code to construct a multi-level composite abstract syntax tree fused with CFG, DFG and field semantic annotation; S3, converting the composite AST into a heterogeneous graph, inputting the heterogeneous graph into a pre-trained graph neural network model for context sensitive analysis, and outputting a preliminary suspicious vulnerability point set by combining a vulnerability rule knowledge base; S4, carrying out priority ranking on the preliminary suspicious point set, starting selective symbol execution and constraint solving aiming at high-priority points, verifying the accessibility of the vulnerability, filtering false alarm, and confirming the real vulnerability; s5, constructing a vulnerability propagation chain for the confirmed real vulnerability by a static slicing technology, positioning root causes and generating a structured vulnerability report; And S6, adaptively updating the vulnerability rule knowledge base and the optimization graph neural network model parameters according to the new sample in the detection result.
10. A computer-readable storage medium, on which a computer program is stored, characterized in that the program, when being executed by a processor, implements the method for automatically detecting an intelligent code vulnerability developed for basic software according to claim 9.

Description

Code bug automatic detection tool for basic software development Technical Field The invention relates to the technical field of software security test and static program analysis, in particular to a code bug automatic detection tool for basic software development. Background The basic software is used as a basic stone for calculating ecology, and the safety of the basic software is important. The software is usually large in scale, long in life cycle and complicated in concurrency, and a large amount of bottom languages such as C/C++ are used, so that vulnerabilities such as memory damage and race conditions are difficult to detect, and serious security events can be caused. Code bug automatic detection mainly relies on dynamic and static analysis. Dynamic analysis (such as fuzzy test) has low false alarm rate, but limited coverage rate, and is difficult to reach deep states and complex branches. Static analysis has higher theoretical coverage rate, but traditional tools have obvious limitations that the method based on pattern matching has high false alarm rate due to lack of deep semantic understanding, introduces context and path sensitive analysis (such as symbol execution) to suffer from path explosion and difficult to process large-scale codes, has insufficient modeling capability on complex patterns such as concurrency loopholes and the like, depends on expert rules to cover unknown varieties, has poor effect due to unadapted special paradigms (such as pointer operation and inline assembly) of basic software by general tools, and has low overall automation and intelligent degree and needs a large amount of manual intervention. In recent years, detection methods based on deep learning capture vulnerability patterns by learning code samples. However, the existing method mostly regards codes as sequences or simple syntax trees, and key structural semantic information such as data flow, control flow and the like cannot be fully modeled, so that in a basic software scene needing to accurately understand long-distance dependence, the detection capability still has a significant improvement space. Disclosure of Invention In order to solve the technical problems, the invention provides a code bug automatic detection tool for basic software development, which is used for an intelligent code bug automatic detection system for basic software development, and is characterized by comprising the following components: The source code preprocessing and standardization module is used for receiving the target source code, carrying out coding standardization, macro expansion, annotation removal and configuration association preprocessing with the target compiling environment, and generating a standardized intermediate code; The multi-level composite abstract syntax tree construction module is connected with the source code preprocessing and standardization module and is used for analyzing the standardization intermediate code to generate a basic abstract syntax tree, and further fusing a control flow diagram, a data flow diagram and basic software specific semantic annotation to construct a composite abstract syntax tree containing syntax, structure and part of semantic information; a domain knowledge enhanced vulnerability rule knowledge base storing a rule set for a typical vulnerability pattern of basic software, wherein the rule set comprises traditional static analysis rules based on formal logic description, historical vulnerability code segment feature vectors and vulnerability pattern features extracted by a machine learning model; The context sensitive analysis module based on the graph neural network is connected with the multi-level composite abstract syntax tree construction module and the vulnerability rule knowledge base with the field knowledge enhancement, and is used for converting the composite abstract syntax tree into a graph structure which can be processed by the heterogeneous graph neural network, extracting the deep representation of the code graph by utilizing a pre-trained graph neural network model, and carrying out pattern matching and anomaly detection by combining with the knowledge base to generate a preliminary suspicious point set; The symbol execution and constraint solving guide module is connected with the context sensitive analysis module based on the graph neural network and is used for carrying out priority ordering on the preliminary suspicious point set, starting selective symbol execution aiming at a high-priority path, generating path constraint conditions, and calling a constraint solver to verify the accessibility of the loopholes so as to filter false alarm; The vulnerability association and root cause positioning module is connected with the symbol execution and constraint solving and guiding module and is used for constructing a vulnerability propagation chain through backtracking data dependence and control dependence relation on confirmed vulnerability points, identify