CN-122020669-A - Intelligent contract byte code vulnerability detection method, equipment and storage medium based on large model simulation stack execution

CN122020669ACN 122020669 ACN122020669 ACN 122020669ACN-122020669-A

Abstract

The invention belongs to the technical field of security detection, and particularly relates to an intelligent contract byte code vulnerability detection method, equipment and storage medium based on large model simulation stack execution. The invention aims to solve the problem that the leak detection accuracy is low in the intelligent contract byte code leak detection method. The intelligent contract byte code vulnerability detection method based on large model simulation stack execution comprises the steps of obtaining intelligent contract byte codes to be detected, preprocessing the intelligent contract byte codes to be detected to obtain a control flow graph CFG, configuring expert identities for the large models to obtain large models of the expert identities, inputting the control flow graph CFG into the large models of the expert identities to conduct variable analysis processing to obtain analysis results, constructing a variable state transition diagram according to the analysis results, and inputting the variable state transition diagram into a predictor to conduct vulnerability detection processing to obtain vulnerability detection results. The problem that the leak detection accuracy is low in the intelligent contract byte code leak detection method is solved.

Inventors

ZHANG XINPENG
SUN GUOKAI
ZHUANG YUAN
SUN JIAKE
LIU BING
HUANG LINHUI

Assignees

哈尔滨工程大学

Dates

Publication Date: 20260512
Application Date: 20260210

Claims (10)

1. The intelligent contract byte code vulnerability detection method based on large model simulation stack execution is characterized by comprising the following steps: step one, acquiring intelligent contract byte codes to be detected; step two, preprocessing intelligent contract byte codes to be detected to obtain a control flow graph CFG; step three, configuring a variable analysis expert identity for the large model to obtain a large model of the variable analysis expert identity, inputting a control flow graph CFG to the large model of the variable analysis expert identity for variable analysis processing to obtain a variable analysis result; step four, configuring a data flow analysis expert identity for the large model to obtain a large model of the data flow analysis expert identity, and inputting a control flow graph CFG and a variable analysis result into the large model of the data flow analysis expert identity to perform data flow analysis processing to obtain a data flow analysis result; fifthly, configuring a control flow analysis expert identity for the large model to obtain a large model of the control flow analysis expert identity, inputting a control flow graph CFG and a variable analysis result to the large model of the control flow analysis expert identity for control flow analysis processing to obtain a control flow analysis result; step six, constructing a variable state transition diagram according to the data flow analysis result and the control flow analysis result; and step seven, inputting the variable state transition diagram into a predictor to perform vulnerability detection processing, and obtaining a vulnerability detection result.
2. The method for intelligent contract bytecode vulnerability detection based on large model simulation stack execution of claim 1, In the second step, the intelligent contract byte code to be detected is preprocessed to obtain preprocessed intelligent contract byte code information, and the specific process is as follows: Step two, converting intelligent contract byte codes to be detected into assembly codes; Dividing the assembly code into A operation code blocks, wherein A is a positive integer; step two, defining A operation code blocks as nodes of the control flow graph CFG, and defining control operation relations such as branch skip, function call and the like between the operation code blocks as edges of the control flow graph CFG; and constructing a control flow graph CFG according to the nodes of the control flow graph CFG and the edges of the control flow graph CFG.
3. The method for intelligent contract bytecode vulnerability detection based on large model simulation stack execution of claim 2, In the third step, the identity of the variable analysis expert is configured for the large model to obtain a large model of the identity of the variable analysis expert, the CFG of the control flow graph is input to the large model of the identity of the variable analysis expert for variable analysis processing to obtain a variable analysis result, and the specific process is as follows: step three, intelligent contract byte code instruction set knowledge is injected into the large model by using a Prompt project, and variable analysis expert identities are configured for the large model to obtain the large model of the variable analysis expert identities; Inputting the CFG of the control flow graph into a large model of the identity of a variable analysis expert for variable analysis processing to obtain a variable analysis result, wherein the specific process is as follows: step three, two and one, performing stack operation simulation processing on the large model of the variable analysis expert identity according to the CFG of the control flow graph, and identifying and obtaining all variable sets in the stack operation process when the intelligent contract is executed based on the instruction sequence , Wherein, the Represents the identified i-th variable (i=1, 2, 3.), Step III, step II, and step III, large model record variable set of variable analysis expert identity The variable type of each variable in (a), Recording variable set Storage location of each variable in (a), Recording variable set High-level semantics of each variable in (1), Step III, the large model of the expert identity of variable analysis is based on all variable sets Variable analysis results are generated from the variable type of each variable, the storage location of each variable, and the high-level semantics of each variable.
4. The method for intelligent contract bytecode vulnerability detection based on large model simulation stack execution of claim 3, In the fourth step, the data flow analysis expert identity is configured for the large model to obtain a large model of the data flow analysis expert identity, the control flow graph CFG and the variable analysis result are input into the large model of the data flow analysis expert identity for data flow analysis processing to obtain the data flow analysis result, and the specific process is as follows: step four, configuring a data flow analysis expert identity for the large model by using a Prompt engineering to the large model to obtain the large model of the data flow analysis expert identity; Step four, a large model of the data flow analysis expert identity is identified according to the CFG of the control flow graph and the analysis result of the variables to obtain the data flow relation among the variables, and the data flow analysis result is formed by the collection of all the data flow relations, wherein the specific process is as follows: Step four, step two, a large model of the data flow analysis expert identity traverses all operation code sequences of all operation code blocks according to the CFG of the control flow graph and the variable analysis result, and recognizes and obtains a data change related instruction; The data change related instruction comprises a data source, an assignment type, a data stream type, Analyzing the large model of the expert identity of the data flow analysis according to the data change related instruction to obtain the data flow relation among the variables; Wherein, the Representing variables And (3) with The data flow relationship between them, Data flow relationship when the data flow type is an assignment instruction Expressed by the formula: In the formula, Representing the type of data stream, Represents a source variable, Representing the target variable, Representing the assignment type, Representing the data source, When the data stream type is a type conversion instruction, the data stream relationship Expressed by the formula: In the formula, Representing the type of data stream, A variable representing the occurrence of a type transition, Representing the type of the variable before conversion, Representing the type of the converted variable, Representing a transition scenario.
5. The method for intelligent contract bytecode vulnerability detection based on large model simulation stack execution of claim 4, In the fifth step, the control flow analysis expert identity is configured for the large model to obtain a large model of the control flow analysis expert identity, the control flow graph CFG and the variable analysis result are input into the large model of the control flow analysis expert identity for control flow analysis processing to obtain the control flow analysis result, and the specific process is as follows: Fifthly, configuring a control flow analysis expert identity for the large model by using a Prompt engineering to the large model to obtain the large model of the control flow analysis expert identity; fifthly, identifying and obtaining a control flow analysis result by a large model of the data flow analysis expert identity according to the CFG of the control flow graph and the variable analysis result, wherein the specific process is as follows: Step five, step two, a large model of data flow analysis expert identity traverses all nodes and edges in the control flow graph CFG according to the control flow graph CFG and the variable analysis result, identifies an instruction sequence corresponding to the control structure, Fifthly, recording the control state of the related variable by the large model of the data flow analysis expert identity according to the instruction sequence corresponding to the control structure; And fifthly, generating a control flow analysis result by a large model of the data flow analysis expert identity according to the control state of the related variable, wherein the control flow analysis result is expressed as follows by a formula: In the formula, The control conditions are indicated to be such that, The associated instruction sequence representing the new control state, A characteristic of the sequence of instructions is represented, An identifier representing the associated variable, A control state transition mode representing a variable, a source code layer operation explanation corresponding to the control state change represented by a releas, Representing the type of control operation, and Then representing the actual control flow behavior and state change that occurs when the control conditions are satisfied.
6. The method for detecting the leak of the intelligent contract byte code based on the large model simulation stack execution as claimed in claim 5, wherein in the step six, a variable state transition diagram is constructed according to the data flow analysis result and the control flow analysis result, and the specific process is as follows: step six, taking the target intelligent contract name as the root node of the variable state transition diagram, Step six, according to the data flow analysis result and the control flow analysis result, using all variables identified by the variable analysis as the primary independent variable nodes below the root node, each variable as a primary independent variable node, Step six, constructing a main branch of the variable state transition diagram based on the data flow analysis result, Step six, constructing the thin branches of the variable state transition diagram based on the control flow analysis result, Step six, constructing a variable state transition diagram according to the root node, the primary independent variable node, the main branch of the variable state transition diagram, the thin branch of the variable state transition diagram and the variable state transition diagram.
7. The method for detecting the leak of the intelligent contract byte code based on the large model simulation stack execution of claim 6, wherein the predictor in the seventh step comprises an input layer, a rule layer, a core processing layer and an output layer; The core processing layer comprises a feature extraction module and a rule matching module; inputting the variable state transition diagram into a predictor for vulnerability detection processing to obtain a vulnerability detection result, wherein the specific process is as follows: seventhly, inputting the variable state transition diagram to an input layer for standardization processing to obtain a standardized variable state transition diagram; Inputting the standardized variable state transition diagram into a feature extraction module, extracting the state transition feature of the variable by traversing nodes and edges in the diagram by a predictor, Inputting the state transition characteristics of the extracted variables into a rule matching module for matching treatment to obtain a matching result; and seventhly, outputting the matching result as a vulnerability detection result by the output layer.
8. The method for detecting the leak of the intelligent contract byte code based on the large model simulation stack execution according to claim 7, wherein in the step seven, the state transition characteristic of the extracted variable is input into a rule matching module for matching processing, so as to obtain a matching result, and the specific process is as follows: matching the state transition characteristics of the extracted variables with a preset loophole rule set in the rule layer, If a certain feature in the state transition features of the extracted variables meets the judging condition of a certain type of loopholes, marking that the corresponding type of loopholes exist in the contract, outputting a first matching result, The first matching result comprises a vulnerability type, a trigger variable, a state transition path and a vulnerability cause; If all the characteristics in the state transition characteristics of the extracted variables do not meet the judging conditions of a certain type of loopholes, outputting that the intelligent contract loopholes are not detected; The preset vulnerability rule set in the rule layer comprises a permission control defect vulnerability detection rule, an integer overflow vulnerability detection rule and a denial of service vulnerability detection rule.
9. A computer storage medium having stored therein at least one instruction that is loaded and executed by a processor to implement the intelligent contract bytecode vulnerability detection method based on large model simulation stack execution of any one of claims 1 to 8.
10. An intelligent contract bytecode vulnerability detection device based on large model simulation stack execution, characterized in that the device comprises a processor and a memory, wherein at least one instruction is stored in the memory, and the at least one instruction is loaded and executed by the processor to implement the intelligent contract bytecode vulnerability detection method based on large model simulation stack execution according to any one of claims 1 to 8.

Description

Intelligent contract byte code vulnerability detection method, equipment and storage medium based on large model simulation stack execution Technical Field The invention belongs to the technical field of security detection, and particularly relates to an intelligent contract byte code vulnerability detection method, equipment and storage medium based on large model simulation stack execution. Background Currently, blockchains are an important emerging technology in the field of computers, and the development of the technical field is continuously in depth, so that various aspects of social development are gradually affected. In blockchain ecology, intelligent contracts are used as a core carrier for carrying business logic and automatically executing transactions, and the safety of the intelligent contracts directly relates to the stable operation of the whole system. Part of illegal molecules capture illegal benefits, and attack means are continuously retreaded aiming at design defects and operation loopholes of intelligent contracts. However, in the blockchain field, the contract number verified by the ethernet is up to millions, and the number of attacks caused by the contracts is not counted, so that research on intelligent contract vulnerability detection by blockchain talents at home and abroad is continuously put into effort, and aims at perfecting the ecological security of the blockchain. Up to now, current smart contracts mainly exist in the form of byte codes, but lack enough logic structures, and it is difficult to deeply mine semantic information for detection. The intelligent contract vulnerability detection based on deep learning enables AI to learn vulnerability characteristics to become a vulnerability identification expert, so that a good effect can be achieved, and a large number of convenient and practical vulnerability detection tools are also invented. However, the current mainstream methods (such as LSTM long-term memory network training classification, graph neural network GNN and the like) have three problems of relying on high-quality data sets such as marked vulnerability samples, poor interpretability, unfavorable explanation of the underlying principle of vulnerability formation, unfavorable vulnerability repair, weak generalization capability and reduced performance on new vulnerability types. Therefore, the existing intelligent contract byte code vulnerability detection method has the problem of low accuracy of vulnerability detection. Disclosure of Invention The invention aims to solve the problem that the leak detection accuracy is low in the intelligent contract byte code leak detection method. The intelligent contract byte code vulnerability detection method based on large model simulation stack execution comprises the following steps: step one, acquiring intelligent contract byte codes to be detected; step two, preprocessing intelligent contract byte codes to be detected to obtain a control flow graph CFG; step three, configuring a variable analysis expert identity for the large model to obtain a large model of the variable analysis expert identity, inputting a control flow graph CFG to the large model of the variable analysis expert identity for variable analysis processing to obtain a variable analysis result; step four, configuring a data flow analysis expert identity for the large model to obtain a large model of the data flow analysis expert identity, and inputting a control flow graph CFG and a variable analysis result into the large model of the data flow analysis expert identity to perform data flow analysis processing to obtain a data flow analysis result; fifthly, configuring a control flow analysis expert identity for the large model to obtain a large model of the control flow analysis expert identity, inputting a control flow graph CFG and a variable analysis result to the large model of the control flow analysis expert identity for control flow analysis processing to obtain a control flow analysis result; step six, constructing a variable state transition diagram according to the data flow analysis result and the control flow analysis result; and step seven, inputting the variable state transition diagram into a predictor to perform vulnerability detection processing, and obtaining a vulnerability detection result. Preferably, in the second step, the intelligent contract byte code to be detected is preprocessed to obtain preprocessed intelligent contract byte code information, and the specific process is as follows: Step two, converting intelligent contract byte codes to be detected into assembly codes; Dividing the assembly code into A operation code blocks, wherein A is a positive integer; Step two, defining A operation code blocks as nodes of the control flow graph CFG, and defining control operation relations such as branch skip and function call between the operation code blocks as edges of the control flow graph CFG; Preferably, in the third step, the i