Search

CN-121980567-A - Intelligent contract vulnerability detection method and device based on LLaMA large language model

CN121980567ACN 121980567 ACN121980567 ACN 121980567ACN-121980567-A

Abstract

The application discloses an intelligent contract vulnerability detection method and device based on LLaMA large language model. The method comprises the steps of collecting an original source code data set of an intelligent contract, cleaning intelligent contract source codes in the original source code data set to obtain a target source code data set, enabling the intelligent contract source codes to be source codes obtained through editing by adopting Solidity programming language, marking the types of the holes of the intelligent contract source codes in the target source code data set according to a preset hole type sequence, calling a source code processor of Solidity programming language to process the marked intelligent contract source codes to generate a fine tuning data set required by a model, pre-training an original LLaMA model by utilizing the target source code data set to obtain an intermediate LLaMA model, fine tuning the intermediate LLaMA model for multiple times by utilizing the fine tuning data set to obtain a target LLaMA model, and conducting hole identification on the Solidity intelligent contract source codes by utilizing a target LLaMA model. The method and the device solve the technical problem that the leak detection accuracy of the intelligent contract in the related technology is low.

Inventors

  • Ban Xiaoliang
  • SUN SHENG
  • LIU MIN
  • CHEN YALI
  • YAN YU

Assignees

  • 郑州大学
  • 中国科学院计算技术研究所

Dates

Publication Date
20260505
Application Date
20241030

Claims (10)

  1. 1. An intelligent contract vulnerability detection method based on LLaMA large language model, which is characterized by comprising the following steps: Collecting an original source code data set of an intelligent contract, and cleaning intelligent contract source codes in the original source code data set to obtain a target source code data set, wherein the intelligent contract source codes are source codes edited by Solidity programming language; Performing vulnerability type marking on intelligent contract source codes in the target source code dataset according to a preset vulnerability type sequence; Calling Solidity a source code processor of a programming language to process the marked intelligent contract source codes to generate a fine tuning dataset required by the model; pre-training an original LLaMA model by using the target source code data set to obtain a middle LLaMA model; Performing fine tuning on the intermediate LLaMA model for a plurality of times by using a LoRA mode by using the fine tuning dataset to obtain a target LLaMA model; and performing vulnerability identification on Solidity intelligent contract source codes by using the target LLaMA model.
  2. 2. The method of claim 1, wherein collecting an original source code dataset of the smart contract and cleaning smart contract source codes in the original source code dataset to obtain a target source code dataset, comprising: crawling the published intelligent contract source codes through a published platform of the Ethernet to obtain the original source code data set; And cleaning all the intelligent contract source codes which are crawled, and removing character strings, spaces and repeated codes which are irrelevant to the leak judgment of the intelligent contract in the intelligent contract source codes to obtain the target source code data set.
  3. 3. The method of claim 1, wherein marking the smart contract source codes in the target source code dataset for vulnerability types in a predetermined vulnerability type order comprises: performing vulnerability detection on intelligent contract source codes in the target source code dataset through a plurality of intelligent contract vulnerability detection tools; And marking the vulnerability type of each intelligent contract source code in the target source code data set according to the reentrant vulnerability, the timestamp dependency vulnerability, the infinite loop vulnerability and the vulnerability type sequence without the vulnerability.
  4. 4. The method of claim 1, wherein invoking Solidity a source code processor of a programming language to process the marked smart contract source code to generate the fine-tuning dataset required by the model comprises: Creating the source code processor according to the grammar semantic relation of Solidity programming language to be used as a processor for processing the model data input format; Inputting the marked intelligent contract source code into the source code processor, and automatically processing according to an input format required by the primary fine tuning of the model to obtain a fine tuning data set required by the primary fine tuning of the model; And according to the control dependency relationship, the data dependency relationship, the calling dependency relationship and the inheritance relationship of the key nodes in the intelligent contract source code, performing program slicing on the intelligent contract source code, adding a prompt word in an indication part of the data set, and integrating the prompt word into a fine tuning data set required by secondary fine tuning of the model.
  5. 5. The method of claim 4, wherein the input formats required for initial trimming of the model include string format, trim mode, and vulnerability type.
  6. 6. The method of claim 4, wherein pre-training the original LLaMA model with the target source code dataset to obtain an intermediate LLaMA model comprises: and inputting the intelligent contract source codes which are not marked in the target source code data set into the original LLaMA model for pre-training after being processed by the source code processor, and taking the obtained model as the intermediate LLaMA model.
  7. 7. The method of claim 1, wherein using LoRA modes, performing multiple fine adjustments to the intermediate LLaMA model using the fine adjustment dataset to obtain a target LLaMA model, comprising performing primary fine adjustments and secondary fine adjustments in sequence as follows: inputting the intelligent contract source codes in the fine tuning dataset into a LLaMA model, fine tuning the intermediate LLaMA model in a LoRA mode, and selecting a LoRA target layer to be applied subsequently from the intermediate LLaMA model after pre-training; creating two matrixes A and B, wherein according to a formula, W ' =W+A.B, a parameter matrix W is transformed into a parameter matrix W ', the parameter matrix W ' is used as a new parameter matrix for model fine adjustment, the matrix sizes of the two matrixes are determined according to the rank of LoRA, the matrix A and the matrix B are respectively a mapping matrix and an inverse mapping matrix, the matrix A and the matrix are opposite in dimension, the matrix A is a dimension reduction and the matrix B is a dimension increase; And updating gradients of the matrix A and the matrix B in the fine tuning process by using the calculation result of the loss function, and updating the matrix A and the matrix B according to an optimization algorithm of an optimizer.
  8. 8. An intelligent contract vulnerability detection device based on LLaMA big language model, characterized by comprising: The intelligent contract source code collecting unit is used for collecting an original source code data set of the intelligent contract, cleaning intelligent contract source codes in the original source code data set to obtain a target source code data set, wherein the intelligent contract source codes are source codes edited by adopting Solidity programming language; the marking unit is used for marking the vulnerability types of the intelligent contract source codes in the target source code data set according to a preset vulnerability type sequence; The processing unit is used for calling a Solidity source code processor of the programming language to process the marked intelligent contract source code and generating a fine tuning data set required by the model; The pre-training unit is used for pre-training the original LLaMA model by utilizing the target source code data set to obtain a middle LLaMA model; The fine tuning unit is used for carrying out fine tuning on the middle LLaMA model for a plurality of times by using a LoRA mode and utilizing the fine tuning data set to obtain a target LLaMA model; And the identification unit is used for performing vulnerability identification on the Solidity intelligent contract source codes by using the target LLaMA model.
  9. 9. A computer readable storage medium, characterized in that the storage medium comprises a stored program, wherein the program when run performs the method of any of the preceding claims 1 to 7.
  10. 10. An electronic device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, characterized in that the processor performs the method of any of the preceding claims 1 to 7 by means of the computer program.

Description

Intelligent contract vulnerability detection method and device based on LLaMA large language model Technical Field The application relates to the technical field of blockchains, in particular to an intelligent contract vulnerability detection method and device based on LLaMA large language models. Background This section is intended to provide a background or context for the matter recited in the claims or specification, which is not admitted to be prior art by inclusion in this section. The intelligent contract is used as one of the core technologies of the blockchain 2.0, the operation of the intelligent contract greatly improves the use scene of the blockchain, and the blockchain platform is expanded to a very rich decentralised operating system from a simple distributed ledger system. The intelligent contract is an automatically executed contract written in the form of computer code, and has the functions of automatically executing conditions and transactions, decentralizing transactions and coordination, data storage and inquiry, event triggering and correspondence, multiparty approval and voting and the like. These functions have led to a wide range of applications for intelligent contracts in various fields, from financial services to supply chain management, from digital asset transactions to voting and governance, and so on. They provide a programmable, automated way to transact, coordinate and perform operations, improving efficiency and security. However, smart contracts are more susceptible to attack than ordinary programs for 1) the content to which they relate is typically related to digital financial assets, which can gain tremendous economic benefit, and 2) the purpose of smart contracts is to exploit the characteristics of blockchain to achieve trusted and non-tamper-able contracts. However, vulnerabilities present in smart contracts may lead to unexpected behavior of the contract, which may defeat the purpose of guaranteeing fairness and reliability. Thus, ensuring security and stability of smart contracts is critical to the development of blockchain platforms. In recent years, smart contract vulnerabilities have triggered large-scale security events. The intelligent contract loopholes have great influence on economic loss, destroy trust of people on blockchains and intelligent contracts, and security precaution becomes an important problem and great challenge. The existing intelligent contract vulnerability detection method is the symbol execution method represented by oyente, maian, mythril, and has the advantages of high accuracy, multiple detectable vulnerability types and the like. However, the existing method is seriously dependent on the expert mode set in advance to detect the intelligent contract loopholes, as the intelligent contracts become more and more complex, the content required to be set by the expert mode becomes more and more, the rules of the expert mode cannot be increased along with the rapid increase of the quantity of the intelligent contracts, and finally the detection result is more prone to error. In view of the above problems, no effective solution has been proposed at present. Disclosure of Invention The embodiment of the application provides an intelligent contract vulnerability detection method and device based on LLaMA large language model, which at least solve the technical problem of low accuracy of vulnerability detection of intelligent contracts in related technologies. According to one aspect of the embodiment of the application, an intelligent contract vulnerability detection method based on a LLaMA large language model is provided, and the method comprises the steps of collecting an original source code data set of an intelligent contract, cleaning intelligent contract source codes in the original source code data set to obtain a target source code data set, wherein the intelligent contract source codes are source codes edited by adopting Solidity programming language, performing vulnerability type marking on the intelligent contract source codes in the target source code data set according to a preset vulnerability type sequence, calling a source code processor of Solidity programming language to process the marked intelligent contract source codes to generate a fine tuning data set required by a model, pre-training an original LLaMA model by utilizing the target source code data set to obtain an intermediate LLaMA model, performing multiple fine tuning on the intermediate LLaMA model by utilizing the fine tuning data set to obtain a target LLaMA model, and identifying Solidity intelligent contract source codes by utilizing the target LLaMA model. According to another aspect of the embodiment of the application, an intelligent contract vulnerability detection device based on a LLaMA large language model is provided, and the device comprises a collection unit, a pre-training unit, a fine-tuning unit and an identification unit, wherein the collection u