CN-122020676-A - Contract vulnerability detection method and system based on cross-granularity feature fusion and meta-learning

CN122020676ACN 122020676 ACN122020676 ACN 122020676ACN-122020676-A

Abstract

The invention discloses a contract vulnerability detection method and system based on cross-granularity feature fusion and meta-learning, and relates to the technical field of blockchain security, wherein the method comprises the steps of obtaining source codes and compiled byte codes of contracts to be detected; extracting source code semantic features from source codes, extracting operation code execution features from byte codes, extracting graph structure features after a program dependency graph is constructed from the source codes, performing cross-granularity attention interaction to generate cross-granularity fusion features, performing type-aware feature modulation on the cross-granularity fusion features and a vulnerability type identifier of a current detection task, training a classifier by adopting a meta-learning strategy, outputting a vulnerability detection result, responding to the vulnerability detection result as a vulnerability, and positioning the position of the vulnerability code based on a graph-source code attention weight matrix generated in the cross-granularity attention interaction process.

Inventors

DAI YI
LIU HANGYU
Lv Xianmen
WU ZONGJIN
Fan Xiutang
WANG LINGFENG

Assignees

成都信息工程大学

Dates

Publication Date: 20260512
Application Date: 20260413

Claims (10)

1. The contract vulnerability detection method based on cross-granularity feature fusion and meta-learning is characterized by comprising the following steps: Acquiring a source code and a compiled byte code of a contract to be detected; Extracting semantic features of source codes from the source codes, extracting execution features of operation codes from byte codes, and extracting structural features of a graph after constructing a program dependency graph from the source codes; Taking the graph structure features as a query main body, respectively carrying out cross-granularity attention interaction with the source code semantic features and the operation code execution features, and generating cross-granularity fusion features; performing type-aware feature modulation on the cross-granularity fusion features and the vulnerability type identifiers of the current detection tasks, training a classifier by adopting a meta-learning strategy, and outputting vulnerability detection results; and responding to the vulnerability detection result as the presence of the vulnerability, and positioning the vulnerability code position based on a graph-source code attention weight matrix generated in the cross-granularity attention interaction process.
2. The contract vulnerability detection method based on cross-granularity feature fusion and meta-learning according to claim 1 is characterized in that the source code semantic feature extraction method comprises the steps of word segmentation processing of source codes, extraction of output hidden states as semantic feature matrixes through a pre-training code language model, and mapping to unified dimensions through a full-connection layer.
3. The contract vulnerability detection method based on cross-granularity feature fusion and meta-learning of claim 1, wherein the operation code execution feature extraction method is as follows: disassembling the byte codes to obtain an operation code sequence; Converting the operation code sequence into a vector sequence through an embedding layer, and inputting the vector sequence into a two-way long-short-term memory network to obtain a context hidden state sequence; and carrying out convolutional neural network and maximum pooling layer processing on the context hidden state sequence to obtain an operation code execution characteristic matrix.
4. The contract vulnerability detection method based on cross-granularity feature fusion and meta-learning of claim 1, wherein the graph structure feature extraction step specifically comprises: Constructing a program dependency graph from the source codes, wherein nodes of the program dependency graph correspond to code sentences or basic blocks, edges of the program dependency graph correspond to control dependencies or data dependencies, and initial feature vectors of the program dependency graph are endowed by source code semantic features; inputting the program dependency graph into the graph neural network, updating node characteristics through a message transmission mechanism, and outputting a graph structure characteristic matrix.
5. The method for detecting contract vulnerabilities based on cross-granularity feature fusion and meta-learning of claim 1, wherein the step of cross-granularity attention interaction comprises: Taking the graph structure feature matrix as query, taking the source code semantic feature matrix and the operation code execution feature matrix as keys and values respectively, and calculating dot product scaling attention to obtain context features, wherein the context features comprise graph-source code context features and graph-operation code context features; Adopting a self-adaptive gating mechanism to respectively calculate dynamic fusion proportion of the graph structural features and the context features, and generating enhanced graph features; And carrying out global maximum pooling and global average pooling on the enhanced graph characteristics, splicing pooling results, and projecting through a multi-layer perceptron to obtain a cross-granularity fusion characteristic vector.
6. The method for detecting contract vulnerabilities based on cross-granularity feature fusion and meta-learning of claim 5, wherein the adaptive gating mechanism determines gating coefficients by Sigmoid functions and performs weighted summation on graph structural features and context features according to the gating coefficients.
7. The method for detecting contract vulnerabilities based on cross-granularity feature fusion and meta-learning of claim 1, wherein type-aware feature modulation comprises: mapping the vulnerability type identifier into a type vector through a learnable embedded layer; generating a channel attention weight through nonlinear transformation by using the type vector; multiplying the cross-granularity fusion characteristic with the channel attention weight element by element to obtain a modulated adaptation characteristic, and inputting the modulated adaptation characteristic into a classifier to obtain the vulnerability prediction probability.
8. The method for detecting contract vulnerabilities based on cross-granularity feature fusion and meta-learning of claim 1, wherein the meta-learning strategy employs model independent meta-learning, and the training process comprises: internal circulation, namely calculating loss on a support set of each task, and carrying out gradient descent on model parameters to obtain temporary parameters; Outer loop-calculating the loss on the query set of each task, back-propagating updating the global initialization parameters.
9. The cross-granularity feature fusion and meta-learning based contract vulnerability detection method of claim 1, wherein locating vulnerability code locations comprises: acquiring a graph-source code attention weight matrix generated in a cross-granularity attention interaction stage, wherein elements in the graph-source code attention weight matrix represent attention degrees of graph nodes to Token in source codes; acquiring the activation intensity of each graph node in the graph structure feature matrix; summing the attention weight matrix according to column weighting, wherein the weight is the activation intensity of the corresponding graph node, and obtaining the risk contribution degree of each Token; Mapping the risk contribution degree sequence back to the source code line number, screening out a key vulnerability path through a dynamic threshold value, and outputting the vulnerability code position.
10. A cross-granularity feature fusion and meta-learning based contract vulnerability detection system, characterized by being configured to perform the method of any one of claims 1-9, comprising: the data acquisition module is used for acquiring the source code and the compiled byte code of the contract to be detected; the feature extraction module is used for extracting source code semantic features from source codes, extracting operation code execution features from byte codes, and extracting graph structure features after a program dependency graph is constructed from the source codes; the cross-granularity fusion module is used for taking the graph structure feature as a query main body, respectively carrying out cross-granularity attention interaction with the source code semantic feature and the operation code execution feature, and generating a cross-granularity fusion feature; the vulnerability type sensing module is used for modulating the cross-granularity fusion characteristics and the characteristics of type sensing of the vulnerability type identifier of the current detection task, training a classifier by adopting a meta-learning strategy and outputting a vulnerability detection result; and the vulnerability positioning module is used for positioning the vulnerability code position based on the graph-source code attention weight matrix generated in the cross-granularity attention interaction process.

Description

Contract vulnerability detection method and system based on cross-granularity feature fusion and meta-learning Technical Field The application relates to the technical field of blockchain security, in particular to a contract vulnerability detection method and system based on cross-granularity feature fusion and meta-learning. Background Early intelligent contract vulnerability detection mainly depends on program analysis technology in the traditional software engineering field, and mainly comprises formal verification, symbol execution, static analysis and other methods. Formal verification attempts to ensure correctness of contract logic through mathematical proof, but is difficult to deal with modern contracts with large scale and complex logic, symbolic execution is often trapped in path explosion problems by traversing program paths, detection efficiency is low, and static analysis tools represented by Slither, mythril are widely applied in industry, but the methods are severely dependent on manually predefined vulnerability feature rule bases. Because the artificial rules are difficult to cover the infinite novel attack modes and the complex semantics of the codes cannot be deeply understood, the traditional method often faces extremely high false alarm rate and false alarm rate in practical application, and especially is worry when dealing with logic loopholes related to complex state interaction. In recent years, the breakthrough of the deep learning technology opens up a new path for intelligent contract vulnerability detection. Unlike traditional methods, deep learning models can automatically extract high-dimensional features from massive source codes or bytecodes in a data-driven manner, thereby identifying potential vulnerability patterns. Early studies were based primarily on the idea of sequence modeling in natural language processing, with smart contract codes being considered as text sequences. Researchers use Word2Vec or FastText techniques to map opcodes or source code identifiers into Word-embedded vectors, and incorporate long-term memory networks (LSTM) or gate-loop units (GRUs) to capture contextual dependencies in code sequences. In order to solve the problems of gradient disappearance and poor parallel computing capability of the cyclic neural network when processing long sequences, the subsequent research further introduces Convolutional Neural Network (CNN) and a transducer architecture. Particularly, the method based on the pre-training language model (such as CodeBERT) can capture rich grammar and semantic knowledge in codes by performing self-supervision learning on a large-scale code corpus, and remarkably improves the precision of vulnerability detection. However, there are significant limitations to the deep learning approach based solely on the sequence model. Intelligent contracts are not simple linear texts, which contain complex control flow jumps and data dependencies inside. Forcing code to flatten into a one-dimensional sequence destroys the structural information inherent in the program, making it difficult for the model to detect complex vulnerabilities that are either cross function calls or dependent on a particular execution path. To overcome this drawback, a Graph Neural Network (GNN) based approach has evolved. Such methods first translate intelligent contracts into Control Flow Graphs (CFGs) or Program Dependency Graphs (PDGs), explicitly model execution logic and data transfer relationships between code statements, and then utilize graph convolutional networks for message transfer and feature aggregation over graph structures. Although the graph neural network can effectively capture structural features, it tends to ignore fine-grained semantic information inside nodes and execution details of underlying opcodes, and the construction process of high-quality graph structures is extremely susceptible to code syntax errors or parsing tool limitations. A further challenge is that existing deep learning methods mostly rely on large scale and well distributed annotation datasets. However, in a real blockchain security scene, vulnerability samples show significant long-tail distribution characteristics, namely, common simple vulnerability samples are more, and high-risk, complex and novel zero-day vulnerability samples are extremely rare. In the face of such a few sample scenario, traditional supervised learning models are extremely prone to overfitting, resulting in serious inadequacies in generalization ability. In addition, the existing multi-mode research mostly adopts a simple feature splicing strategy, so that the problem of semantic gap between source code semantics, bottom execution logic and graph structural features is not solved, and an effective interpretability mechanism is lacking, so that a detection result is difficult to trust by security auditors. Disclosure of Invention Aiming at the technical problems of single feature representation, difficult c