CN-121681323-B - Multi-language-oriented automatic source code verification method and system
Abstract
The invention provides a method and a system for automatically verifying multilingual source codes, which relate to the technical field of computers, and the method comprises the steps of obtaining multilingual source codes; compiling the multilingual source code through an LLVM compiler to generate an intermediate file, optimizing the intermediate file, analyzing the optimized intermediate file through an IR analyzer to generate a symbol execution model, performing simulation execution operation on the symbol execution model, and performing security verification on the symbol execution model after the simulation execution operation to obtain a verification report. The method can realize unified verification of the multi-language source codes, greatly reduce the construction and development cost of a verification environment, improve the overall verification efficiency, strengthen the analysis capability of intermediate files, accurately capture the logic constraint relation in the codes, practically ensure the comprehensiveness and reliability of verification work and meet the requirement of diversified software security verification.
Inventors
- Wang Buyang
- ZHANG LEPING
- TAN XING
Assignees
- 浙江望安科技有限公司
Dates
- Publication Date
- 20260508
- Application Date
- 20260212
Claims (9)
- 1. A multi-language oriented source code automatic verification method, comprising: S1, acquiring a multilingual source code; s2, compiling the multi-language source code through an LLVM compiler to generate an intermediate file; s3, optimizing the intermediate file; S4, analyzing the optimized intermediate file through an IR analyzer to generate a symbol execution model; Wherein, the S4 specifically includes: S401, analyzing the optimized intermediate file through an IR analyzer to obtain a model file; s402, inputting a dependent code into the model file to obtain a dependent model file; S403, generating a model transmitter according to the structural characteristics of the model file; The model transmitter is a code generation component which is adapted to the structural characteristics of LLVM IR intermediate files and is specially designed for a multilingual source code verification scene, and the structural information extracted by the IR analyzer is respectively converted into executable type model codes, global variable model codes, function model codes and metadata model codes according to a preset model specification; S404, respectively generating a type model code, a global variable model code, a function model code and a metadata model code through the model transmitter based on the dependent model file; s405, writing the type model code, the global variable model code, the function model code and the metadata model code into the dependent model file respectively to generate the symbol execution model; S5, performing simulation execution operation on the symbol execution model; S6, carrying out security verification on the symbol execution model after the simulation execution operation to obtain a verification report.
- 2. The automatic multi-language-oriented source code verification method according to claim 1, wherein S3 is specifically: Judging whether the name of the function to be extracted input by the user is empty, if so, determining the intermediate file as an optimized intermediate file, and if not, extracting the designated function in the intermediate file to generate the optimized intermediate file.
- 3. The automatic multi-language-oriented source code verification method according to claim 1, wherein the generation process of the type model code specifically comprises: Generating a function head of a type initialization function, and inputting the function head into the dependent model file to obtain an updated dependent model file; Traversing all structure body types in the updated dependency model file, defining a first type model character stream, and taking the first type model character stream as a structure body type declaration function call statement; Acquiring the name of the structure type; inputting the name of the structure type into the first type character stream to obtain a second type character stream; Acquiring all members in the structure type; performing escape treatment on each member; inputting each member after the escape processing into the second type of the character stream to obtain a third type of the character stream; Judging whether the structure type is compact, if so, inputting the correct character stream into the third type model character stream to obtain the type model code, otherwise, inputting the error character stream into the third type model character stream to obtain the type model code.
- 4. The automatic multi-language-oriented source code verification method according to claim 1, wherein the generation process of the global variable model code specifically comprises: Generating a function header of a global variable initialization function, and inputting the function header into the dependent model file to update the dependent model file; Traversing all global variables in the updated dependency model file, defining a first global variable model character stream, and taking the first global variable model character stream as a global variable declaration function call statement; acquiring the name of the global variable; Inputting the name of the global variable into the first global variable model character stream to obtain a second global variable model character stream; acquiring the type of the global variable; processing the Type of the global variable through a name Type function; inputting the processed type into the second global variable model character stream to obtain a third global variable model character stream; Judging whether the global variable has an initial value, if so, inputting the initial value into the third global variable model character stream to generate the global variable model code, otherwise, determining the third global variable model character stream as the global variable model code.
- 5. The automatic multi-language-oriented source code verification method according to claim 1, wherein the generating process of the function model code specifically comprises: Traversing all functions in the dependency model file; when the function is not declared, not generating the function model code; When the function has a statement, generating a function header, parameter type checking assertion, storage parameter codes, basic block subfunctions and function return statements; And respectively inputting the function header, the parameter type checking assertion, the stored parameter code, the basic block sub-function and the function return statement into the dependency model file and executing the dependency model file to obtain the function model code.
- 6. The automatic multi-language-oriented source code verification method according to claim 1, further comprising, after said S4, before said S5: A simulation execution context is defined and validation requirements are stored into the simulation execution context.
- 7. The automatic multi-language-oriented source code verification method according to claim 1, wherein S5 specifically comprises: S501, generating signed function parameters to be executed; s502, according to the function parameters to be executed, performing simulation execution operation on the symbol execution model through LLVM instructions.
- 8. The automatic multi-language-oriented source code verification method according to claim 1, wherein S6 specifically comprises: S601, based on the type, arithmetic operation and logic operation of the z3 solver, converting the safety property of the natural language description into a Boolean expression of the z3 solver; S602, generating a verification target of the path condition implication safety property according to the Boolean expression by a Implies method of the z3 solver; S603, acquiring a solver through the z3 solver, and adding global assertion into the solver; S604, verifying the verification target through the solver; and S605, outputting the verification report based on the verification result.
- 9. A multi-language-oriented automatic verification system for source codes is characterized by comprising a processor and a memory; the memory stores a program or instructions executable on the processor, which when executed by the processor, implement the steps of the multi-language oriented source code automatic verification method of any one of claims 1 to 8.
Description
Multi-language-oriented automatic source code verification method and system Technical Field The invention relates to the technical field of computers, in particular to a multilingual-oriented automatic source code verification method and system. Background With the rapid development of information technology, the complexity of a software system continuously rises, multi-language collaborative development has become a mainstream trend in industry, and programming languages such as C/C++, rust, go and the like are widely applied to different module development of the same item so as to consider the performance, safety and development efficiency of programs. Meanwhile, the software security problem is increasingly remarkable, and the bottom loopholes such as arithmetic overflow, array border crossing, null pointer dereferencing and the like are extremely easy to cause system breakdown, data leakage and even malicious attack, so that comprehensive and efficient security verification is carried out on the multi-language source code, and the method becomes a key link for guaranteeing the reliability of the software. In the prior art, the safety verification method aiming at the source code is mainly divided into two types, namely dynamic test and static analysis, wherein the dynamic test is used for verifying safety property by running a program and inputting a test case, but is limited by the coverage range of a test sample, so that potential loopholes of an edge path are difficult to find, the traditional static analysis method is mostly designed aiming at a single programming language, rule verification is carried out by analyzing a grammar tree of the source code, partial schemes try to realize multi-language analysis by means of intermediate representation of a compiler, but only simple grammar scanning is usually carried out on an intermediate file, a standardized executable model is not built, and complex symbol verification logic is difficult to support. However, in the prior art, the multi-language source code is difficult to uniformly verify due to the isomerism of grammar and type systems, an independent environment is required to be built, so that the verification efficiency is low, the hidden vulnerability detection rate is low because the test of specific numerical values cannot be traversed, meanwhile, the analysis of intermediate files is only stopped in simple grammar scanning, and the standard model support is lacking, so that the logic constraint of the code is difficult to accurately capture, and the verification comprehensiveness and reliability are insufficient. Disclosure of Invention In view of the shortcomings of the prior art, the embodiment of the invention aims to provide an automatic verification method for multi-language source codes, which can solve the technical problems that the multi-language source codes in the prior art are difficult to uniformly verify due to the isomerism of grammar and type systems, the verification efficiency is low because independent environments are required to be built, the hidden vulnerability detection rate is low because all execution paths cannot be traversed due to the fact that the test of specific numerical values are relied on, meanwhile, the analysis of intermediate files only stays in simple grammar scanning, and the standard model support is lacking, so that the logic constraint of the codes is difficult to accurately capture, and the verification comprehensiveness and reliability are insufficient. In a first aspect of the embodiment of the present invention, a method for automatically verifying a multilingual source code is provided, including: S1, acquiring a multilingual source code; S2, compiling the multi-language source codes through an LLVM compiler to generate an intermediate file; S3, optimizing the intermediate file; S4, analyzing the optimized intermediate file through an IR analyzer to generate a symbol execution model; S5, performing simulation execution operation on the symbol execution model; S6, carrying out security verification on the symbol execution model after the simulation execution operation to obtain a verification report. In a second aspect of the embodiment of the invention, a multi-language-oriented source code automatic verification system is provided, which comprises a processor and a memory; The memory stores a program or instructions executable on the processor which when executed by the processor implement the steps of the multilingual-oriented source code automatic verification method as described in the first aspect. The technical scheme provided by the embodiment of the invention has the beneficial effects that at least: In the embodiment of the invention, the LLVM compiler is used for compiling the multi-language source codes to generate a unified intermediate file, so that heterogeneous barriers of grammar and type systems among different programming languages are effectively broken, the verification eff