CN-121997336-A - Vulnerability detection method based on multi-view hypergraph network learning

CN121997336ACN 121997336 ACN121997336 ACN 121997336ACN-121997336-A

Abstract

The invention discloses a loophole detection method based on multi-view hypergraph network learning, which belongs to the technical field of loophole detection and code analysis and comprises the following steps of S1, extracting multi-view features from a byte code file of target software, generating network embedding by using a node2vec algorithm as an initial feature, S2, constructing hypergraphs by using k neighbor algorithm rules based on node embedding and optional adjacent information aiming at each view, S3, splicing the initial features and node attribute features, extracting high-dimensional features of nodes by using a hypergraph neural network architecture, carrying out self-adaptive weighted fusion on the high-dimensional features obtained by the three hypergraphs through a gate control network, and carrying out probability calibration and evaluation on fusion output, and S4, obtaining highly aggregated code association information under the multi-view by S1-S3 as guide information of loophole detection. The method and the device can solve the problems of low leak detection precision and weak generalization capability in the prior art.

Inventors

HE PENG
ZHAO CAIXIA
ZHENG QIAOXIAN
HUANG TAI
YUAN YIN

Assignees

湖北大学

Dates

Publication Date: 20260508
Application Date: 20260122

Claims (9)

1. A vulnerability detection method based on multi-view hypergraph network learning is characterized by comprising the following steps: s1, analyzing a byte code file of target software, respectively constructing a dependency graph, a control flow graph and a data flow graph among classes from three view angles of class dependency, control flow and data flow, respectively performing embedded learning on nodes of the dependency graph, the control flow graph and the data flow graph by using a node2vec algorithm, and generating 128-dimensional network embedded as initial characteristics; S2, aiming at each view, based on node embedding and optional adjacent information, constructing a hypergraph by using a k neighbor algorithm rule, forming a hyperedge by taking each node as a center, wherein the members of the hyperedge are determined by similarity sequencing and capacity constraint; S3, on three views, splicing 128-dimensional initial features and 3-dimensional node attribute features, respectively extracting high-dimensional features of nodes by using hypergraph neural network architecture with the same setting, carrying out self-adaptive weighted fusion on the high-dimensional features obtained by the three hypergraphs through a gate control network, and carrying out probability calibration and evaluation on fusion output; S4, through S1-S3, highly aggregated code association information in multiple views is obtained and used as guidance information of vulnerability detection.
2. The vulnerability detection method based on multi-view hypergraph network learning of claim 1, wherein S1 comprises the following steps: s11, taking a relevant jar file downloaded by a official network as a target, and extracting three view relations at a class level by using an open source static analysis tool; S12, for each view, generating 128-dimensional vector representation for all nodes by adopting a random walk+skip-gram graph embedding method, wherein embedded training is input as the view Edgelist, the output is a vector table of node→R 128 .
3. The vulnerability detection method based on multi-view hypergraph network learning of claim 2, wherein S11 comprises the following steps: S111, directly extracting class dependence from the obtained jar file by using an open source tool jdeps to obtain side data of a class dependence graph; S112, decompressing the jar file, and then extracting a control flow and a data flow through an open source tool WALA to obtain side data of the control flow graph and the data flow graph; S113, performing naming adaptation and cleaning on the edge data obtained in the steps S111 and S112, and performing consistency processing on different views; s114, counting three-dimensional attributes of the nodes based on the cleaned edge data, namely an in edge number deg_in, an out edge number deg_out and a total degree deg.
4. The vulnerability detection method based on multi-view hypergraph network learning of claim 3, wherein the construction hypergraph in S2 comprises the following steps: S21, reading a node sequence and a vector from the embedded csv file obtained in the S12, carrying out L2 normalization and calculating cosine similarity by using a dot product; s22, traversing each central node according to the descending order of the similarity, skipping the self-loop and the ones lower than the threshold value, and collecting the first k neighbors; S23, taking a one-hop neighbor as a seed, taking a similarity descending neighbor, removing a self-loop and a center, filtering according to a threshold value, adding a candidate set, and restricting the upper limit of the set size as cap 2; S24, scoring the candidate members according to the similarity with the center node, wherein if the members are connected with the center node in the adjacent set, the score is multiplied by the edge_type_weight lifting weight; s25, in order to prevent instability caused by the identical scores, the candidate members with the identical scores are randomly disordered and then sorted according to descending scores; s26, cutting off and taking the previous caps and meeting the threshold members, if the number of the selected members is not less than 3, generating an overrun for the central node, otherwise, skipping; s27, performing gridding search on parameters k, h and cap used in hypergraph construction according to the mode of S21-S26, and finding out the most suitable parameter value.
5. The vulnerability detection method based on multi-view hypergraph network learning of claim 4, wherein the search range of parameters k, h and cap is k= {5,10,15,20}, h= {1,2}, cap= {12,20,30}.
6. The vulnerability detection method based on multi-view hypergraph network learning of claim 5, wherein S3 comprises the following steps: S31, single view feature extraction; s32, gating fusion and probability calibration and evaluation.
7. The vulnerability detection method based on multi-view hypergraph network learning of claim 6, wherein S31 comprises the following steps: S311, splicing the 3-dimensional attribute features generated in S11 and the initial features generated in S12 in columns to form 131-dimensional initial feature vectors, wherein each node 131-dimensional initial feature vector is used as an input of a hypergraph neural network, and a matrix X and a neighbor feature list H are constructed according to an effective node set in a view by the initial feature vectors, wherein the hypergraph adjacency is from a hyperedge member set of S2, and an aggregation list of neighbor features is provided for each central node; s312, constructing a model architecture to process input data: The input projection comprises the steps of mapping 131-dimensional features to hidden layers through linear+ReLU respectively to obtain a central hidden layer vector h and a neighbor hidden layer vector set { n i }, and converting 131 dimensions into 64-dimensional representations; Multi-head attention aggregation, namely, using query linear mapping of a center vector and key linear mapping of neighbor vectors, calculating attention weights and weighting and summing, wherein the calculation formula is as follows: ; ; in the formula, Is a central hidden layer vector; Traversing all neighbors of the center node for the sequence numbers of the neighbor nodes; Is the first Hidden layer feature vectors of the neighboring nodes; applying a result of the query linear mapping to the central hidden layer vector; applying a key linear mapping result to the neighbor hidden layer vector; Normalizing the original association score for normalizing the exponential function to make the weight sum be 1; For attention weight value, the central hidden layer vector is represented And the first Hidden layer vector of each neighbor node A degree of importance of the association between the two; The number of the attention points is a super parameter, which represents the number of times of parallel execution of an attention mechanism; For aggregating the feature vectors, the neighbor nodes weight-sum according to weights and multi-head average the 64-dimensional features; after averaging the multi-head result, adding the multi-head result with the central residual error, wherein the formula is as follows: ; in the formula, The hidden layer feature vector updated for the central node is the residual sum of the original central feature and the neighbor aggregation feature; the classification head and the regularization are that Dropout is output logits by Linear, and hidden layer representation h is returned as the high-dimensional characteristic of the view; S313, three views respectively pass through the model architecture constructed in S312, and high-dimensional characteristics h cdg 、h cfg 、h dfg of each view are output.
8. The vulnerability detection method based on multi-view hypergraph network learning of claim 7, wherein S32 comprises the following steps: s321, reading three-view test features, and performing intersection alignment with a common node of the tag file to enable f cdg 、f cfg 、f dfg to be fused according to the same sequence; S322, carrying out the fusion of a gating network, splicing 64-dimensional features of three views extracted in S312 into 192 dimensions according to columns, outputting three paths of weights by Linear after linear+ReLU+Dropout, obtaining w cdg 、w cfg 、w dfg by softmax, carrying out weighted fusion by fused=w cdg ·f cdg +w cfg ·f cfg +w dfg ·f dfg , wherein fused is a multi-view gating weighted fusion feature vector, namely a final aggregation feature of three view features of CDG/CFG/DFG after self-adaptive weight weighted summation, and finally outputting fusion logits and probability by Linear; s323, dividing data into a training set and a testing set according to 7:3 by adopting hierarchical sampling, and setting fixed random seeds to carry out iterative training on the model; S324, probability calibration, training LogisticRegression a calibrator by using the fusion score of the training set, and performing posterior calibration on the fusion score of the testing set.
9. The vulnerability detection method based on multi-view hypergraph network learning of claim 8, wherein S4 comprises the following steps: S41, acquiring a csv file marked with the vulnerability from the official network; S42, three-view feature extraction and network embedding are carried out according to the S1; S43, performing hypergraph construction according to the step S2; S44, performing feature splicing, high-dimensional feature aggregation and fusion operation according to the S3; S45, the trained model is used for predicting whether a new instance has a vulnerability.

Description

Vulnerability detection method based on multi-view hypergraph network learning Technical Field The invention relates to the technical field of vulnerability detection and code analysis, in particular to a vulnerability detection method based on multi-view hypergraph network learning. Background With the exponential increase of the functional complexity of a software system and the scale of a code, hidden loopholes in the code cause significant increase of risks of security events such as system operation faults, data leakage, service interruption and the like, not only seriously damage the reliability, stability and user experience of software products, but also bring direct economic loss and potential safety hazards to enterprises and end users. Therefore, the vulnerability detection technology is used as a core means for identifying a high-risk code module in the early stage of a software development life cycle and avoiding the security risk in advance, and becomes a key link for guaranteeing the software security. However, the existing vulnerability detection method still has technical defects to be solved in practical engineering application, so that the detection precision and generalization capability of the vulnerability detection method are difficult to meet the safety protection requirement of a complex software system, and the vulnerability detection method is characterized by the following three points that the single-view feature characterization has inherent limitations, the high-order associated modeling capability has technical blank and the multi-view feature fusion strategy has low efficiency and lacks adaptability. Disclosure of Invention The invention aims to provide a vulnerability detection method based on multi-view hypergraph network learning, which solves the problems of low vulnerability detection precision and weak generalization capability in the prior art. In order to achieve the above purpose, the present invention provides a vulnerability detection method based on multi-view hypergraph network learning, comprising the following steps: s1, analyzing a byte code file of target software, respectively constructing a dependency graph, a control flow graph and a data flow graph among classes from three view angles of class dependency, control flow and data flow, respectively performing embedded learning on nodes of the dependency graph, the control flow graph and the data flow graph by using a node2vec algorithm, and generating 128-dimensional network embedded as initial characteristics; S2, aiming at each view, based on node embedding and optional adjacent information, constructing a hypergraph by using a k neighbor algorithm rule, forming a hyperedge by taking each node as a center, wherein the members of the hyperedge are determined by similarity sequencing and capacity constraint; S3, on three views, splicing 128-dimensional initial features and 3-dimensional node attribute features, respectively extracting high-dimensional features of nodes by using hypergraph neural network architecture with the same setting, carrying out self-adaptive weighted fusion on the high-dimensional features obtained by the three hypergraphs through a gate control network, and carrying out probability calibration and evaluation on fusion output; S4, through S1-S3, highly aggregated code association information in multiple views is obtained and used as guidance information of vulnerability detection. Preferably, S1 comprises the steps of: s11, taking a relevant jar file downloaded by a official network as a target, and extracting three view relations at a class level by using an open source static analysis tool; S12, for each view, generating 128-dimensional vector representation for all nodes by adopting a random walk+skip-gram graph embedding method, wherein embedded training is input as the view Edgelist, the output is a vector table of node→R 128. Preferably, S11 comprises the steps of: S111, directly extracting class dependence from the obtained jar file by using an open source tool jdeps to obtain side data of a class dependence graph; S112, decompressing the jar file, and then extracting a control flow and a data flow through an open source tool WALA to obtain side data of the control flow graph and the data flow graph; S113, performing naming adaptation and cleaning on the edge data obtained in the steps S111 and S112, and performing consistency processing on different views; s114, counting three-dimensional attributes of the nodes based on the cleaned edge data, namely an in edge number deg_in, an out edge number deg_out and a total degree deg. Preferably, the construction hypergraph in S2 includes the following steps: S21, reading a node sequence and a vector from the embedded csv file obtained in the S12, carrying out L2 normalization and calculating cosine similarity by using a dot product; s22, traversing each central node according to the descending order of the similarity, skipping the se