CN-120257284-B - Intelligent contract vulnerability detection method and system based on mask consistency and dynamic margin adjustment

CN120257284BCN 120257284 BCN120257284 BCN 120257284BCN-120257284-B

Abstract

The invention belongs to the field of blockchains and discloses an intelligent contract vulnerability detection method and system based on mask consistency and dynamic margin adjustment, wherein the method comprises the steps of selecting the most suitable source contract by using MMD as a target contract to obtain a matching pair of the source contract and the target contract of SSDA; extracting contract features from a source contract and a target contract, preprocessing contract codes, performing mask learning by using a mask consistency framework, performing field adaptation by combining SSDA (secure digital architecture), predicting by a student network, gradually learning features of input data by a plurality of neural network layers, screening target contract samples with higher confidence by a DMA (direct memory access) strategy, optimizing target and model training, minimizing MC (monomer casting) loss and SSDA (secure digital architecture) loss, performing performance evaluation by using standard evaluation indexes, and adjusting and optimizing training processes. The method and the device effectively improve the accuracy, generalization capability and adaptability of intelligent contract vulnerability detection by innovatively combining the mask consistency and dynamic margin adjustment technology.

Inventors

JIANG SIYU
LI SHILONG
SU SHEN
XUAN SHICHANG

Assignees

广东外语外贸大学

Dates

Publication Date: 20260508
Application Date: 20250307

Claims (7)

1. An intelligent contract vulnerability detection method based on mask consistency and dynamic margin adjustment is characterized by comprising the following steps: Selecting the most suitable source contract for the target contract by using a Maximum Mean Difference (MMD) algorithm to obtain a matching pair of the source contract and the target contract across the field; preprocessing codes of a source contract and a target contract, generating an Abstract Syntax Tree (AST) and an intelligent contract graph (SG), and extracting node and edge characteristics; Mask learning is carried out based on a mask consistency framework, and cross-domain adaptation is realized by combining a domain adaptation algorithm (SSDA); Predicting a target contract through a student network, and gradually extracting input features based on a plurality of neural network layers; screening the target domain samples by using a Dynamic Margin Adjustment (DMA) strategy, and generating a pseudo tag with high confidence coefficient; minimizing Mask Consistency (MC) loss and domain adaptation (SSDA) loss, optimizing parameters of the target network; performing performance evaluation on the model through standard evaluation indexes and adjusting a training process; In the step of selecting a source contract for a target contract by using an MMD algorithm, calculating the average distribution difference between a source domain and a target domain based on the sample distribution difference, and realizing cross-domain matching by using a function mapping sample in a Regeneration Kernel Hilbert Space (RKHS); the mask consistency framework performs feature masking on the target domain samples through a random masking mechanism, generates pseudo tags by using unmasked local features, and guides optimization of a student network based on index sliding average (EMA) of a teacher network; the MC framework performs mask learning and performs field adaptation by combining with SSDA, and specifically comprises the following steps: (1) Using SSDA for domain adaptation between source and target domains, the SSDA task can be divided into two parts, one part is semi-supervised learning on the target domain and the other part is cross-domain non-supervised learning between source domain tagged data and target domain non-tagged data: Wherein, the Is a classification loss of source domain data, indicating the input samples for each pair of source domains And labels The difference between the model's predicted result and the true label minimizes this loss by optimizing G Representing the empirical risk common to the source domain and the target domain, including the loss of one of the discriminators D and one of the classifiers G, For the two-value cross entropy loss, As a characteristic representation of the features of the device, For classifier prediction, by optimizing the gaming relationship between classifier G and arbiter D: adding part of labeling data of a target domain for semi-supervised learning: adding 10% of target domain data in target domain Semi-supervised learning is performed, and the marked sample loss of the target domain is defined as: Wherein the method comprises the steps of The weight of the adaptive loss of the conditional reactance domain is that ; (2) MC retains local information by a random masking mechanism, randomly samples a patch mask from uniform distribution : Wherein the method comprises the steps of Is shown as Iverson brackets for Boolean determination, b is shown as block size, r is shown as mask ratio, v is a random variable sampled from U (0, 1), m, n is shown as index of block, and mask target feature The multiplication of the elements of the mask with the features yields: Wherein the method comprises the steps of For unlabeled samples in the target domain Feature generated after contract code preprocessing and student network The whole sample is predicted from the unmasked code features: Generating a teacher network by introducing an index sliding average (EMA), guiding the learning of a student network, and weighting the teacher network by the student network Determined by the weight EMA of (C) and smoothed by a smoothing factor And (3) adjusting: Wherein t represents training step length, through mask learning, student network can learn how to infer missing label information under the condition of no complete label, and meanwhile, through domain adaptation and screening of pseudo labels, vulnerability detection accuracy in a target domain is improved; The DMA policy specifically includes: (1) The DMA is utilized to carry out pseudo tag screening, a target contract sample with higher confidence coefficient is selected by a help model, firstly, a Pseudo Margin (PM) definition is introduced, wherein the confidence coefficient is used for measuring the confidence coefficient of the pseudo tag and represents the maximum difference between the pseudo tag and other tags in the prediction distribution: Wherein the method comprises the steps of For the logic value of the corresponding pseudo tag w, Is the maximum logic value of the other tags; (2) The teacher network is tracked by computing an average pseudo-edge (APM) by averaging all edges corresponding to w from the initial iteration to the iteration t For a pair of The evolution of the prediction with respect to the pseudo tag w from the training start to the iteration t is the following calculation method: If the confidence of the current pseudo tag is high, PM is positive, if the pseudo tag prediction is inconsistent, PM becomes negative, low PM values of a plurality of continuous iterations indicate that samples are wrongly marked, the samples can have negative influence on training, the average value of all pseudo margins of a certain sample from the initial training to the current iteration is calculated to obtain average pseudo margin APM, the low APM value indicates that the sample pseudo tag prediction is unstable and is an abnormal sample, samples with low pseudo tag reliability are dynamically screened in training, and a sample set is not marked at first Extract subset of Next, giving A new class label C+1 is allocated to the sample in (a) and is selected from the group consisting of Is removed from the furnace and finally For modeling the behavior of erroneous samples, calculating APM for these samples; (3) Computing and aggregating using strong enhancements The losses associated with the erroneous samples are calculated as follows for a batch of samples consisting of B erroneous samples: Wherein the method comprises the steps of Representation of Is used for the strong amplification of the (a), Representing teacher network At the input feature The class distribution generated above, at iteration t, uses APM of the erroneous sample to select APM threshold Is provided with APM, which is the 95 th percentile error sample, therefore, the mask learning penalty on unlabeled samples can be defined as: where v represents the ratio of unlabeled to labeled samples within each batch, Representing a pseudo tag that is to be read, Representing the quality weights, the pseudo tag being a teacher's network For complete target contract features The predictions made.
2. The inspection method of claim 1, wherein the contract code preprocessing comprises: transcoding the smart contracts into an Abstract Syntax Tree (AST) and generating a smart contract graph (SG); Learning node features using a graph rolling network (GCN); The edge features are modeled and learned using a multi-headed attention mechanism.
3. The method of claim 1, wherein the headroom adjustment (DMA) strategy screens samples in the target domain that have low confidence in the pseudo tag by calculating Pseudo Margin (PM) and Average Pseudo Margin (APM), and performs simulation and further loss calculation after removing them from the unlabeled sample set.
4. The method of claim 1, wherein the neural network layer of the student network comprises a convolution layer, a pooling layer, a Dropout layer, and a full connection layer for gradually extracting multi-level features of the contract code.
5. The method of detecting as claimed in claim 1, wherein in the step of minimizing MC loss and SSDA loss, the MC loss is used to optimize a mask learning process, and the SSDA loss implements model adaptation of the target domain through cross-domain knowledge migration of the source domain and the target domain, both of which optimize model parameters through joint back propagation.
6. The method of claim 1, wherein the evaluation metrics include accuracy, precision, recall, and F1 score, and wherein true positives, true negatives, false positives, and false negatives are calculated from classification results of the smart contracts, and model performance is evaluated based on these results.
7. An intelligent contract vulnerability detection system based on mask consistency and dynamic margin adjustment for detection method as claimed in any one of claims 1-6, characterized in that the system specifically comprises: the data preparation module is used for selecting the most suitable source contract to obtain a matching pair of the source contract and the target contract of the SSDA; the preprocessing module is used for preprocessing the contract codes; the context learning and field self-adapting module is used for carrying out knowledge transfer through context learning and reducing the distribution difference between the source field and the target field; the mask learning and pseudo tag screening module is used for further optimizing the learning process of the student network through mask learning and pseudo tag screening; The classifier training and optimizing module classifies based on the output of the student network and optimizes the model performance; The MCDMA optimization and training module is used for performing multi-stage optimization so as to further improve the performance of the MCDMA method; and the performance evaluation module is used for evaluating the performance of the model and verifying the vulnerability detection effect of the model on the target domain.

Description

Intelligent contract vulnerability detection method and system based on mask consistency and dynamic margin adjustment Technical Field The invention belongs to the technical field of blockchain, but is not limited to, and particularly relates to an intelligent contract vulnerability detection method and system based on mask consistency and dynamic margin adjustment. Background With the rapid development of blockchain technology, intelligent contracts are widely used as important components of blockchain applications. However, with the widespread use of smart contracts, contract vulnerabilities are also an important threat to network security. By exploiting these vulnerabilities, an attacker can do malicious operations in the contract, with serious loss to users and developers. Therefore, intelligent contract vulnerability detection is an important research topic in the current blockchain security field. The traditional intelligent contract vulnerability detection method mainly comprises methods of static analysis, dynamic analysis, fuzzy test and the like based on rules. These methods typically rely on manually written rules or specific vulnerability patterns for vulnerability detection, while some known vulnerabilities can be found, there are significant limitations in facing new types or unknown vulnerabilities. For example, static analysis methods ignore some complex vulnerabilities, or false positives and false negatives due to imperfect rules. While the test method based on dynamic analysis can simulate the contract executing process, the test method depends on a large number of test cases, has low efficiency and is easily limited by the contract executing environment. In recent years, a deep learning method has been an emerging vulnerability detection technology, which shows superior performance, especially when dealing with complex vulnerability patterns and large-scale data sets. The deep learning method can effectively capture potential vulnerabilities in the code by automatically learning the feature representation of the contract code. However, existing deep learning-based vulnerability detection methods generally rely on a large amount of annotation data to train a model, and still present a significant challenge in the face of new vulnerabilities. In particular, existing models have difficulty learning subtle code differences due to a lack of sufficient labeling data, resulting in ambiguous discrimination between vulnerable contracts and security contracts, resulting in false negatives (FALSE NEGATIVES). This is particularly true in practical applications, especially when dealing with contracts for some similar code segments, where existing methods are not effective in identifying potential vulnerabilities between them. In view of the above analysis, the technical problem that needs to be solved in the prior art is: (1) Many deep learning models rely on large amounts of annotation data for training in intelligent contract vulnerability detection, which is very difficult for many practical applications because the acquisition of annotation data is time consuming and costly. In addition, smart contracts are of a wide variety, and insufficient annotation data can lead to reduced performance of the model. (2) The difficulty in distinguishing similar contracts is that the differences between many vulnerable contracts and security contracts are very small, especially when similar code segments are present, and existing deep learning methods have difficulty distinguishing these contracts, resulting in frequent occurrence of false negative results. (3) The quality problem of the pseudo tag is that the current semi-supervised learning method enhances learning through the pseudo tag under the condition of lacking of marking data, but the quality of the pseudo tag is difficult to ensure, and if the pseudo tag is wrong or too noisy, the learning effect and the final detection performance of the model can be affected. Disclosure of Invention Aiming at the problems existing in the prior art, the invention provides an intelligent contract vulnerability detection method and system based on Mask Consistency and Dynamic Margin Adjustment (MCDMA). The invention is realized in this way, an intelligent contract leak detection method based on mask consistency and dynamic margin adjustment, which is characterized in that the method specifically comprises the following steps: S1, selecting the most suitable source contract for a target contract by using MMD (maximum mean value difference) to obtain a matching pair of the source contract and the target contract of SSDA (semi-supervision domain self-adaption); S2, extracting contract features from a source contract and a target contract, and preprocessing contract codes; S3, mask learning is carried out by using a mask consistency framework, and field adaptation is carried out by combining with SSDA; s4, predicting through a student network, and gradually learning the c