Search

CN-116108450-B - Intelligent contract vulnerability detection method and system based on operation code instruction clustering

CN116108450BCN 116108450 BCN116108450 BCN 116108450BCN-116108450-B

Abstract

The invention discloses an intelligent contract vulnerability detection method and system based on operation code instruction clustering, which relate to the technical field of intelligent contracts and are used for converting EVM byte codes of intelligent contracts into operation code instructions, vectorizing operation code instruction features through a word embedding model, clustering the vectorized operation code instructions, selecting contracts with vulnerabilities to form a vulnerability library, uniformly replacing operation code instructions belonging to the same cluster in contracts of the vulnerability library according to clustering results, slicing contracts to be detected, uniformly replacing operation code instructions belonging to the same cluster in the contracts to be detected by utilizing clustering results, and detecting vulnerabilities by comparing similarity with contracts in the vulnerability library. The intelligent contract vulnerability detection method and system based on the operation code instruction clustering solve the problems that similar vulnerabilities in operation code instructions generated by compilers of different versions are difficult to detect and the vulnerability detection time cost is high due to the difference of the operation code instructions generated by the compilers of different versions.

Inventors

  • Gu Xiguo
  • CUI ZHANQI
  • LI LI
  • ZHENG LIWEI

Assignees

  • 北京信息科技大学

Dates

Publication Date
20260505
Application Date
20230228

Claims (5)

  1. 1. An intelligent contract vulnerability detection method based on operation code instruction clustering is characterized by comprising the following steps: s1, converting EVM byte codes of intelligent contracts into operation code instructions, and vectorizing the characteristics of the operation code instructions through a word embedding model; s2, clustering the vectorized operation code instructions; S3, selecting contracts with loopholes to be sliced, constructing a loophole library, and uniformly replacing operation code instructions belonging to the same cluster in the contracts of the loophole library according to a clustering result; S3 specifically comprises the following steps: s3.1, inputting an intelligent contract set containing loopholes ; S3.2, acquiring contracts containing loopholes in sequence Wherein ; S3.3 compiling contracts containing vulnerabilities Generating a corresponding operation code instruction sequence ; S3.4 analysis of the opcode instruction sequence The dependency relationship between the medium operation code instruction and the introduced external data is used for slicing according to the dependency relationship, and the slicing result is output ; S3.5, using the clustering result to make the slicing result The operation code instructions belonging to the same cluster are uniformly replaced by cluster numbers, and the vulnerability contract operation code instruction sequence after uniform replacement is carried out Joining vulnerability library collections In (a) and (b); S3.6 vulnerability library set If all intelligent contracts in the system are compiled, executing S3.7 if all intelligent contracts are compiled, otherwise executing S3.2; S3.7, outputting the vulnerability library set ; S4, slicing the contracts to be detected, uniformly replacing operation code instructions belonging to the same cluster in the contracts to be detected by using a clustering result, and detecting vulnerabilities by comparing similarity with contracts in a vulnerability library; S4 specifically comprises the following steps: S4.1 inputting contracts to be detected ; S4.2 compiling contracts to be detected Generating a corresponding operation code instruction sequence ; S4.3 analysis of the opcode instruction sequence The dependency relationship between the operation code instruction and the introduced external data, slicing the operation code instruction and the introduced external data through the dependency relationship, and outputting a slicing result ; S4.4, utilizing the clustering result to make the slicing result The operation code instructions belonging to the same cluster are uniformly replaced by the cluster number, and the replaced operation code instruction sequence is output ; S4.5 inputting vulnerability library set ; S4.6, calculating the replaced operation code instruction sequence in sequence And vulnerability library collection In a unified replaced vulnerability contract opcode instruction sequence Similarity of (2) ; S4.7, judging the similarity Whether or not it is greater than a specified threshold If the similarity is greater than the specified threshold, step S4.8 is performed, if the similarity is Less than a specified threshold Step S4.6 is performed; S4.8 outputting the contract to be detected Vulnerability contract operation code instruction sequence after unified replacement The vulnerability type in (2) and executing S4.9; s4.9, judging the replaced operation code instruction sequence And (4) whether the similarity calculation of all contracts with the vulnerability library is finished, if not, executing S4.6, and if all the calculation is finished, ending.
  2. 2. The intelligent contract vulnerability detection method based on operation code instruction clustering according to claim 1, wherein the step S1 specifically comprises the following steps: s1.1 input training set The training set includes a plurality of intelligent contracts known to contain vulnerabilities and not contain vulnerabilities; s1.2 from training set Sequentially acquiring intelligent contracts , ; S1.3 judging Intelligent contract If so, executing the step S1.4, and if not, executing the step S1.5; s1.4 compiling Intelligent contracts Then step S1.6 is performed; S1.5 deletion of Intelligent contracts Then step S1.2 is performed; S1.6 output Intelligent contract Is an opcode instruction sequence of (1) ; S1.7 preprocessing opcode instruction sequences Delete In (c) operands, split by delimiters The operation code instruction in the instruction sequence is normalized, and the normalized operation code instruction sequence is used for processing the instruction sequence Joining opcode instruction sequence sets ; S1.8 judging training set If all the processing is finished, executing S1.9 if all the processing is finished, otherwise executing S1.2; s1.9 Using the Smart contract opcode instruction sequence set Training a word embedding model; S1.10, acquiring the operation code instructions in the intelligent contract operation code instruction set one by one Word embedding model acquisition using S1.9 training Corresponding opcode instruction vector And stores a set of opcode instruction vectors 。
  3. 3. The intelligent contract vulnerability detection method based on the operation code instruction clustering according to claim 1, wherein the step S2 specifically comprises the following steps: s2.1 inputting the set of opcode instruction vectors generated in S1 ; S2.2 instruction vector set Using opcode The operation code instruction vectors in the cluster are clustered to operation code instructions, and all the operation code instruction vectors are clustered according to the distance from the clustering center A cluster; S2.3, outputting a clustering result.
  4. 4. The intelligent contract vulnerability detection method based on operation code instruction clustering according to claim 3, wherein the step S2.2 specifically comprises the following steps: instruction vector set with operation code For input, randomly select Individual opcode instruction vectors As a cluster center, wherein, Sequentially from random selection Individual opcode instruction vectors In selecting cluster center vectors , wherein, ; From the slave In order fetch opcode instruction vectors , wherein, Calculated by the following formula To the point of Individual cluster center vectors Distance of (2) Adding the operation code instruction into the cluster where the cluster center closest to the operation code instruction is located; When an opcode instruction is added to a corresponding cluster, the center of the cluster is recalculated and the above operations are repeated until all opcode instructions are added to the corresponding cluster.
  5. 5. An intelligent contract vulnerability detection system based on operation code instruction clustering, which is applied to the intelligent contract vulnerability detection method based on operation code instruction clustering as set forth in any one of claims 1-4, and is characterized in that the system comprises: The instruction vectorization module is used for converting EVM byte codes of the intelligent contract into operation code instructions and vectorizing the characteristics of the operation code instructions through a word embedding model; The instruction clustering module clusters the quantized operation code instructions; the instruction unifying module is used for selecting the contracts with the loopholes to be sliced, constructing a loopholes library, and uniformly replacing the operation code instructions belonging to the same cluster in the contracts of the loopholes library according to the clustering result; The instruction unification module specifically comprises: s3.1, inputting an intelligent contract set containing loopholes ; S3.2, acquiring contracts containing loopholes in sequence Wherein ; S3.3 compiling contracts containing vulnerabilities Generating a corresponding operation code instruction sequence ; S3.4 analysis of the opcode instruction sequence The dependency relationship between the medium operation code instruction and the introduced external data is used for slicing according to the dependency relationship, and the slicing result is output ; S3.5, using the clustering result to make the slicing result The operation code instructions belonging to the same cluster are uniformly replaced by cluster numbers, and the vulnerability contract operation code instruction sequence after uniform replacement is carried out Joining vulnerability library collections In (a) and (b); S3.6 vulnerability library set If all intelligent contracts in the system are compiled, executing S3.7 if all intelligent contracts are compiled, otherwise executing S3.2; S3.7, outputting the vulnerability library set ; The instruction comparison module is used for slicing the contracts to be detected, uniformly representing the operation code instructions belonging to the same cluster in the contracts to be detected according to the clustering result, and detecting the loopholes by comparing the similarity with the contracts in the loopholes library; The instruction comparison module specifically comprises: S4.1 inputting contracts to be detected ; S4.2 compiling contracts to be detected Generating a corresponding operation code instruction sequence ; S4.3 analysis of the opcode instruction sequence The dependency relationship between the operation code instruction and the introduced external data, slicing the operation code instruction and the introduced external data through the dependency relationship, and outputting a slicing result ; S4.4, utilizing the clustering result to make the slicing result The operation code instructions belonging to the same cluster are uniformly replaced by the cluster number, and the replaced operation code instruction sequence is output ; S4.5 inputting vulnerability library set ; S4.6, calculating the replaced operation code instruction sequence in sequence And vulnerability library collection In a unified replaced vulnerability contract opcode instruction sequence Similarity of (2) ; S4.7, judging the similarity Whether or not it is greater than a specified threshold If the similarity is greater than the specified threshold, step S4.8 is performed, if the similarity is Less than a specified threshold Step S4.6 is performed; S4.8 outputting the contract to be detected Vulnerability contract operation code instruction sequence after unified replacement The vulnerability type in (2) and executing S4.9; s4.9, judging the replaced operation code instruction sequence And (4) whether the similarity calculation of all contracts with the vulnerability library is finished, if not, executing S4.6, and if all the calculation is finished, ending.

Description

Intelligent contract vulnerability detection method and system based on operation code instruction clustering Technical Field The invention relates to the technical field of intelligent contracts, in particular to an intelligent contract vulnerability detection method and system based on operation code instruction clustering. Background Blockchains are distributed transaction ledgers with the characteristics of decentralization, non-tampering, common maintenance of multiple parties and the like. In recent years, the blockchain technology is widely used in fields of finance, medical treatment, education and the like because of being capable of solving practical problems such as data disclosure sharing and data security transmission. The intelligent contract can realize more complex and diverse business logic, and further expands the function of the block chain. Because the intelligent contract has the characteristics of transparent deployment disclosure, traceability, distrust and the like, the intelligent contract is widely applied to the financial fields such as securities management and the like. For example, in security management contracts facilitate automated payment of equity, stock segmentation and debt management by simplifying capital table management by bypassing intermediaries in the chain of security custody. However, while expanding the blockchain functionality, hidden defects in the smart contracts may present a potential security risk, causing significant economic loss in managing financial assets. To avoid significant loss due to potential vulnerabilities in smart contracts, efficient vulnerability detection techniques are needed. Therefore, how to provide an intelligent contract vulnerability detection method with strong performance and high detection speed is a problem that needs to be solved by those skilled in the art. Disclosure of Invention In view of the above, the invention provides an intelligent contract vulnerability detection method based on operation code instruction clustering, which can solve the problems that similar vulnerabilities in operation code instructions generated by compilers of different versions are difficult to detect due to the difference of operation code instructions generated by compilers of different versions, and the vulnerability detection time cost is high. In order to achieve the above purpose, the present invention adopts the following technical scheme: An intelligent contract vulnerability detection method based on operation code instruction clustering comprises the following steps: S1, converting EVM (Ethereum Virtual Machine) byte codes of an intelligent contract into operation code instructions, and vectorizing the characteristics of the operation code instructions through a word embedding model; s2, clustering the vectorized operation code instructions; S3, selecting contracts with loopholes to be sliced, constructing a loophole library, and uniformly replacing operation code instructions belonging to the same cluster in the contracts of the loophole library according to a clustering result; S4, slicing the contracts to be detected, uniformly representing the operation code instructions belonging to the same cluster in the contracts to be detected by using a clustering result, and detecting the loopholes by comparing the similarity with contracts in a loophole library. The method has the beneficial effects that the natural language processing technology is combined to learn the operation code instruction and the context characteristic of the operation code instruction, the slicing technology is utilized to reduce the interference in the process of detecting the loopholes, the time expenditure of detecting the loopholes is further reduced by calculating the contract similarity, the missing report and the false report in the loopholes detection are reduced, and the problem that the similar loopholes in the operation code instructions generated by compilers of different versions are difficult to detect due to the difference of the operation code instructions generated by compilers of different versions is solved to a certain extent. Preferably, S1 specifically includes: s1.1 input training set The training set includes a plurality of intelligent contracts known to contain vulnerabilities and not contain vulnerabilities; s1.2 from training set Sequentially acquiring intelligent contracts,; S1.3 judging Intelligent contractIf so, executing the step S1.4, and if not, executing the step S1.5; s1.4 compiling Intelligent contracts Then step S1.6 is performed; S1.5 deletion of Intelligent contracts Then step S1.2 is performed; S1.6 output Intelligent contract Is an opcode instruction sequence of (1); S1.7 preprocessing opcode instruction sequencesDeleteIn (c) operands, split by delimitersThe operation code instruction in the instruction sequence is normalized, and the normalized operation code instruction sequence is used for processing the instruction seque