CN-122019360-A - Code vulnerability detection method and system integrating node position enhanced local graph neural network and global self-attention mechanism

CN122019360ACN 122019360 ACN122019360 ACN 122019360ACN-122019360-A

Abstract

The invention belongs to the technical field of software security analysis, and discloses a code vulnerability detection method and system integrating a local graph neural network with a global self-attention mechanism, wherein the method comprises the steps of constructing a function level code representation based on a code attribute graph, and enhancing a node characteristic representation through a node position sensing mechanism; the method comprises the steps of training a code vulnerability detection model through the enhanced node characteristic representation, wherein the code vulnerability detection model comprises a local-global mixed graph representation learning module, the local-global mixed graph representation learning module comprises a local message transfer sub-module and a global attention sub-module, the local message transfer sub-module is used for acquiring local structure information between nodes and neighborhoods thereof, the global attention sub-module is used for acquiring global semantic relations among all nodes in the graph, and vulnerability detection is carried out on an object code based on the trained code vulnerability detection model. The method and the device realize efficient and accurate detection of the functional loopholes in the complex software system.

Inventors

Zong Guoxiao
YIN ZHONGXU
Sang Haiya
LI JUNRU
KONG LIYA
WANG ZHENGUO
CUI JIAYI

Assignees

中国人民解放军网络空间部队信息工程大学

Dates

Publication Date: 20260512
Application Date: 20251218

Claims (10)

1. A code vulnerability detection method integrating a node position enhanced local graph neural network and a global self-attention mechanism is characterized by comprising the following steps: Step 1, constructing function level code representation based on a code attribute graph, and enhancing node characteristic representation through a node position sensing mechanism; The method comprises the steps of 2, training a code vulnerability detection model through enhanced node characteristic representation, wherein the code vulnerability detection model comprises a local-global mixed graph representation learning module, the local-global mixed graph representation learning module comprises a local message transmission sub-module and a global attention sub-module, the local message transmission sub-module is used for acquiring local structure information between nodes and neighborhoods thereof, and the global attention sub-module is used for acquiring global semantic relations among all nodes in a graph; and 3, performing vulnerability detection on the target code based on the trained code vulnerability detection model.
2. The method for detecting code holes by fusing a local graph neural network with a global self-attention mechanism with node location enhancement according to claim 1, wherein the step 1 comprises: And (3) carrying out standardization processing on the source codes, adding an NTE edge on the basis of the code attribute graph to construct an extended code attribute graph, and then adopting Laplace position coding to calculate global position information of each node in the graph and fusing the global position information with original node characteristics so as to enhance node representation capability.
3. The method for detecting code vulnerabilities of a local graph neural network and a global self-attention mechanism of claim 1, wherein the code vulnerabilities detection model further comprises a multi-layer perceptron and classifier.
4. The method for detecting code holes by fusing node location enhanced local graph neural network and global self-attention mechanism of claim 3, wherein the local-global hybrid graph representation learning module further comprises splicing outputs of a local message passing sub-module and a global attention sub-module in each layer, and inputting the spliced information into the multi-layer perceptron.
5. The method for detecting the code vulnerability of the local graph neural network and the global self-attention mechanism with the fused node position enhancement according to claim 4, wherein the multi-layer perceptron carries out nonlinear transformation on the spliced information, generates a fused node representation, and inputs the fused node representation into the classifier for vulnerability classification.
6. The method for code vulnerability detection fusing node location enhanced local graph neural network with global self-attention mechanism of claim 1, wherein the local messaging sub-module and global attention sub-module run in parallel.
7. The method for detecting code vulnerabilities by fusing a node location enhanced local graph neural network and a global self-attention mechanism according to claim 1, wherein the local message transmission sub-module adopts a gating graph neural network structure, controls information flow by introducing a gating circulation unit, and specifically performs the following steps: Message aggregation is first performed in each layer: Wherein the method comprises the steps of Represent the first Layer-to-node Is a message aggregated by all neighboring nodes of the network, Representation of Is defined by a set of neighboring nodes of the network, Is a matrix of learnable weights associated with the edge type, Representing nodes In the first place A representation of the layer; Node update operations are performed after message aggregation: Wherein, the Representing nodes In the first place The representation of the layer(s), Representing a gated loop unit.
8. The method for detecting code vulnerabilities by fusing a local graph neural network with a global self-attention mechanism, which is characterized in that the global attention submodule adopts the self-attention mechanism, establishes a long-distance dependency relationship across nodes by calculating semantic similarity among all nodes in a graph, and specifically performs the following steps: Attention weight calculation: Wherein the method comprises the steps of Is a node And Is used for the concentration weight of the person, Is a node Is used to determine the vector of the query, Is a node Is used to determine the key vector of (1), And Is a matrix of the weights that can be learned, And Representing nodes respectively And In the first place The representation of the layer(s), The characteristic dimension of the nodes in the layer is that n is the number of the nodes; and carrying out weighted summation on all the nodes according to the attention weight to generate a new node representation: Wherein the method comprises the steps of As a vector of values, Is a learnable matrix.
9. A code vulnerability detection system fusing a node location enhanced local graph neural network with a global self-attention mechanism, comprising: The node representation enhancement unit is used for constructing function-level code representation based on the code attribute graph and enhancing node characteristic representation through a node position sensing mechanism; The model training unit is used for training a code vulnerability detection model through the enhanced node characteristic representation, wherein the code vulnerability detection model comprises a local-global mixed graph representation learning module, the local-global mixed graph representation learning module comprises a local message transmission sub-module and a global attention sub-module, the local message transmission sub-module is used for acquiring local structure information between nodes and neighborhoods thereof, and the global attention sub-module is used for acquiring global semantic relations among all nodes in a graph; And the vulnerability detection unit is used for carrying out vulnerability detection on the target code based on the trained code vulnerability detection model.
10. An electronic device comprising a processor and a memory, the memory storing a computer program, wherein the computer program when executed by the processor implements the method of any one of claims 1 to 8.

Description

Code vulnerability detection method and system integrating node position enhanced local graph neural network and global self-attention mechanism Technical Field The invention relates to the technical field of software security analysis, in particular to a code vulnerability detection method and system integrating a local graph neural network with a global self-attention mechanism and enhanced by node positions. Background Source code vulnerability detection is used as a key link of software security analysis, and the traditional static analysis and rule matching method is gradually changed from the traditional static analysis and rule matching method to automatic detection based on deep learning in recent years. The main methods in the prior art are mainly a graph neural network (Graph Neural Network, GNN) method and a graph simplification and reinforcement learning combined method. The graph neural network method abstracts function level codes into unified graph representation containing AST, CFG, PDG and other structures, and learns node characteristics by using a message transmission mechanism, and the graph simplification and reinforcement learning combined method introduces a graph simplification strategy to reduce redundant information and improve the global dependency modeling capability. Although these methods perform well on multiple evaluation datasets, there are the following key issues that severely limit their application in complex code environments: (1) The traditional GNN method mainly relies on local neighborhood information aggregation, and is difficult to effectively capture deep semantic dependency relations of cross basic blocks or control flow paths in codes in a multi-layer message transmission process, so that vulnerability type identification capability of complex conditional branches and strong path dependency is reduced; (2) The node characteristics converge, namely, as the number of GNN layers increases, the node characteristics tend to converge and even become highly similar, so that the distinguishing capability of a model is reduced, and particularly in codes with a large number of nonlinear control flow structures, the phenomenon obviously reduces the recognition precision of the model on a vulnerability mode; (3) The complexity of the graph structure affects the performance of the model, namely, the function level graph structure in an actual project often has huge number of nodes and edges, so that the training efficiency of the model is low, the model is easy to be interfered by noise, and the effective modeling capability of the model on global information is limited; (4) The method lacks a position sensitivity modeling mechanism, namely the existing method generally ignores topological position information of nodes in a graph, only relies on local adjacency modeling, and fails to effectively fuse relative position characteristics of the nodes in the whole program structure, so that the recognition capability of a key vulnerability triggering path is affected. Disclosure of Invention The existing source code vulnerability detection method based on the graph neural network has the problems of insufficient long-distance semantic dependency modeling, node characteristic convergence after the message transmission of the graph neural network, high complexity of the graph structure, lack of topology position sensitivity modeling and the like, so that the prior art is difficult to effectively capture key vulnerability triggering conditions when processing complex control flow and data flow paths, and the defects of high omission rate, weak model generalization capability, low training efficiency and the like are caused. Therefore, the invention provides a code vulnerability detection method and system for fusing a local graph neural network with a global self-attention mechanism, which are used for enhancing topological position information of nodes in a graph by introducing Laplace position codes (LAPLACIAN POSITIONAL ENCODING), improving the recognition capability of a model on cross basic block dependency relations in the code, designing a hybrid architecture of local message transmission and global self-attention mechanism in parallel, excavating deep semantic association among all nodes in the graph while maintaining local grammar characteristics, and relieving the problem of node characteristic convergence in a multi-layer graph neural network. Finally, efficient and accurate detection of functional loopholes in a complex software system is realized, and particularly, aiming at the loophole type with long-path dependence characteristics, the detection accuracy and recall rate are remarkably improved, and better technical support is provided for automatic security test and code audit. In order to achieve the above purpose, the present invention adopts the following technical scheme: the invention provides a code vulnerability detection method integrating a node position enh