CN-120974460-B - Confusion and reverse protection method and device based on ROP chain and opaque predicates

CN120974460BCN 120974460 BCN120974460 BCN 120974460BCN-120974460-B

Abstract

The embodiment of the application provides a confusion and reverse protection method and device based on ROP chains and opaque predicates, which provides a third-order confusion protection framework combining ROP instruction dynamic reconstruction, opaque predicate protection and instruction semantic hiding, after compiling common C/C++ source codes into LLVM intermediate representation IR, the framework adopts opaque predicate combination based on dynamic extraction of Gadget and user input state to realize deep semantic disturbance of key logic blocks, jump structures and constant fields of a program. The control flow reduction chain is thoroughly broken, so that the confusion result is difficult to analyze by a reverse person, the confusion protection with controllable, verifiable and high stability of the program is realized, and meanwhile, the advanced analysis technologies such as symbol execution resistance, static analysis and the like are maximized.

Inventors

LIU CEYUE
BAO ZIYANG
REN CHUANLUN
ZHANG XIANGUO
JIA JIA
YANG TIANCHANG
ZHANG PENG
XIAO FENG
TANG RAN
XU MINGYE

Assignees

中国电子科技集团公司第十五研究所

Dates

Publication Date: 20260512
Application Date: 20250724

Claims (10)

1. A method for obfuscation and anti-reverse protection based on ROP chains and opaque predicates, the method comprising: Acquiring a source code of a target program, converting the source code into an LLVM intermediate representation, and acquiring an LLVM IR code; The method comprises the steps of extracting instruction fragments from the shared library file and the LLVM IR code, splicing instruction sequences together to form an ROP chain according to logic of a target program, replacing part of original instructions of the LLVM IR code by the ROP chain to obtain an assembly instruction stream, analyzing a stack structure of the target program, loading the ROP chain into a stack in an inverted sequence manner, modifying a pointer and a return address on the stack, and enabling the target program to jump to a correct instruction fragment every time the ret instruction is executed; the opaque predicate is a Boolean expression comprising a context judgment logic and constant control instruction nested structure and is used for guiding a dynamic symbol execution engine to symbolize path variable errors; And transmitting the mixed codes to an assembler to convert the mixed codes into assembly language codes, and inserting scheduling instructions into the assembly language codes according to the control flow of the target program to obtain final target program codes, wherein the scheduling instructions are used for triggering a scheduler to reconstruct the stack structure of the ROP chain as required when the program is executed.
2. The ROP chain and opaque predicate based obfuscation and inverse protection method according to claim 1, wherein the step of obtaining source code of a target program and converting the source code into LLVM intermediate representation, obtaining LLVM IR code, comprises: The source code is received by a compiler and is preprocessed, wherein the preprocessing comprises macro unfolding and file containing processing; Performing lexical analysis and grammar analysis on the source code by using a compiler, and converting the source code into an abstract grammar tree; Performing semantic checking on the abstract syntax tree by using an encoder, wherein the semantic checking comprises parsing variables, functions and types of the abstract syntax tree; Extracting LLVM IR from the abstract syntax tree by a compiler, and generating a. Ll file, wherein the. Ll file is LLVM IR code.
3. The ROP chain and opaque predicate based obfuscation and inverse protection method according to claim 1, wherein the step of extracting instruction fragments from the shared library file and LLVM IR code and stitching together the instruction sequences to form the ROP chain according to the logic of the target program comprises: scanning all ret instructions in the shared library file and the LLVM IR code through a reverse analysis tool, and extracting code fragments containing the ret instructions from the ret instructions to obtain instruction fragments; Analyzing the control flow of the target program, and identifying a target area which can be attacked by ROP in the target program; according to the logic of the target program, selecting instruction fragments capable of replacing the target area, and splicing the selected instruction fragments together according to a specific sequence to form the ROP chain.
4. The ROP chain and opaque predicate based obfuscation and inverse protection method according to claim 1, wherein the step of replacing a portion of original instructions of the LLVM IR code with the ROP chain, obtaining an assembly instruction stream, comprises: analyzing the stack structure of the target program, and loading the ROP chain into a stack in a reverse order mode; The pointer and return address on the stack are modified so that the target program can jump to the correct instruction fragment each time the ret instruction is executed.
5. The method of claim 1, wherein the steps of constructing an opaque predicate and inserting the opaque predicate into a program execution path of the assembler instruction stream to obtain the obfuscated code comprise: selecting a logic expression which is difficult to resolve, and constructing a Boolean expression based on the logic expression, wherein the Boolean expression is always true or false under different input conditions; inserting the opaque predicate into the assembly instruction stream by modifying a jump instruction of the assembly instruction stream; The method comprises the steps of constructing opaque predicates, inserting the opaque predicates into a program execution path of an assembly instruction stream, and inserting neutral codes into the mixed codes after the step of obtaining the mixed codes, wherein the neutral codes comprise NOP instructions, data moving instructions, idle operations and dead code segments.
6. The method of claim 1, wherein the step of transferring the obfuscated code to an assembler to convert the obfuscated code into assembly language code, and inserting a scheduling instruction into the assembly language code according to a control flow of the object program to obtain a final object program code comprises: converting the confusion code into assembly language code by using an assembly tool, and mapping operators in the confusion code onto an instruction set of a target machine to ensure that the program semantics are kept consistent; and inserting a scheduling instruction into the assembly language code according to the control flow of the target program, wherein the scheduling instruction is used for indicating how the assembly language code jumps or modifies a register value in the execution process.
7. The ROP chain and opaque predicate based obfuscation and inverse protection method according to claim 1, wherein the transferring the obfuscated code to an assembler converts the obfuscated code into an assembly language code, and inserting a scheduling instruction into the assembly language code according to a control flow of the object program, after the step of obtaining a final object program code, further comprises deploying and executing the final object program code, where the deploying and executing the final object program code includes: when a program is loaded, according to the base address and the symbol offset of a loaded shared library, the actual address of each instruction segment in the ROP chain is adjusted, so that the addresses of the ROP chain are different when the program runs each time; When the program is executed, reconstructing a stack structure of the ROP chain by using a scheduler, and adjusting data in the stack by using the scheduler so that a ret instruction can jump to a correct instruction fragment every time; when the program runs, the memory address of the program is randomized through an address space layout randomization technology, so that the position and stack structure of the ROP chain are changed.
8. A ROP chain and opaque predicate-based obfuscation and inverse protection device, the device comprising: The code conversion module is used for acquiring the source code of the target program, converting the source code into LLVM intermediate representation, and acquiring LLVM IR codes; The code assembly module is used for extracting instruction fragments from the shared library file and the LLVM IR codes and splicing instruction sequences together to form an ROP chain according to the logic of the target program; the ROP chain is used for replacing part of original instructions of the LLVM IR code to obtain an assembly instruction stream, which comprises analyzing a stack structure of the target program, loading the ROP chain into a stack in a reverse order manner, modifying a pointer and a return address on the stack, so that the target program can jump to a correct instruction segment each time the ret instruction is executed; The code confusion module is used for constructing opaque predicates and inserting the opaque predicates into program execution paths of the assembly instruction stream to obtain confusion codes, wherein the opaque predicates are Boolean expressions comprising context judgment logic and constant control instruction nested structures and are used for guiding a dynamic symbol execution engine to wrongly symbolize path variables; And the code generation module is used for transmitting the mixed codes to an assembler to be converted into assembly language codes, and inserting scheduling instructions into the assembly language codes according to the control flow of the target program to obtain final target program codes, wherein the scheduling instructions are used for triggering a scheduler to reconstruct the stack structure of the ROP chain as required when the program is executed.
9. An electronic device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, characterized in that the processor implements the steps of the ROP chain and opaque predicate based obfuscation and inverse protection method according to any of claims 1 to 7 when the program is executed.
10. A computer readable storage medium having stored thereon a computer program, which when executed by a processor implements the steps of the ROP chain and opaque predicate based obfuscation and inverse protection method according to any of claims 1 to 7.

Description

Confusion and reverse protection method and device based on ROP chain and opaque predicates Technical Field The application relates to the field of data processing, in particular to a confusion and reverse protection method and device based on an ROP chain and opaque predicates. Background For a long time, with intelligent evolution of static reverse tools, symbol execution engines and fuzzy test frameworks in security analysis and vulnerability mining, traditional protection mechanisms (such as string encryption, variable name confusion and basic BB jump disturbance) have been difficult to resist reverse attackers with semantic restoration capability or path discovery capability. Advanced confusion frameworks become necessary to effectively resist automatic analysis means such as Dynamic Symbol Execution (DSE), control-Flow reconstruction (Control-Flow Recovery), symbol constant Recovery (Symbolic ConstantFolding), and the like. In particular, an attacker may use symbols to implement design constraint paths, resume jump logic, use stack tracking to find sensitive information locations, or bypass control protection mechanisms through alternative algorithms. Thus, protection at the source code or binary level is required in several ways: 1. Path irreducibility the program control flow path is irreversible to the analysis engine; 2. Constants and data cannot be materialized, namely addresses, registers and constants involved in instruction operation cannot be simply decrypted or replaced; 3. Context sensitive, namely hiding a control path to break through the single path exhaustion possibility according to the input context state; 4. while confusion is maintained in terms of semantic consistency, any confusion transformation cannot change the actual semantics between input and output. However, with the development of reverse analysis tools, the binary program output by standard compilers has little resistance to automated reverse engineering tools. Tools such as IDAPro, binaryNinja, combined with symbol execution engines (e.g., KLEE, angr), path crawling tools (e.g., driller, mayhem), etc., can easily restore the execution flow and core variables of programs, and their capabilities far exceed the analysis tasks that have been done in the past with pure manual effort. For this reason, program confusion techniques have been proposed as a means of countermeasures against reverse, and have been widely studied. Among them, conventional confusion methods can be broadly divided into three categories: The first is a method based on syntax-level structure conversion. Such obfuscation changes the readability of the code at the source code level using methods such as variable name substitution, statement dead code insertion, conditional transformations (e.g., rewriting if (a) to if (a & & true) |false)). Representative tools include a source code tamper module in Obfuscator-LLVM and a Java class obfuscator ProGuard. Although these methods can increase the readability difficulty to some extent, they are easily restored by AST abstraction tools and do not help in dynamic analysis. The second category is Control Flow Graph (CFG) scrambling based methods. Such as reordering basic blocks, conditional jump dummy logic insertion, return address calculation distortion, etc. Such methods attempt to make the control flow graph look incoherent in the target binary file, destroying the flow restoration capability of static analysis. Common techniques include Bogus Control Flow and FLATTENING of OLLVM. While capable of countering simple directed lookup, inference engines (e.g., CFG Rebuilder) are limited in their role with respect to control flows with path restoration capabilities. The third class is a dynamic confusion mechanism based on path conditions. Such methods actively design program logic so that different execution paths are walked under different input conditions, and these logic constructs include opaque logic judgment computations that are difficult to infer. Such as opaque predicates, data transfer hard coding, and function template embedding mechanisms. Although the logic complexity is high, the computational overhead is large. At present, the method closest to the present invention comprises: Ollvm (Obfuscator-LLVM) OLLVM is a confusion reinforcement module based on LLVM compiler, which supports multiple confusion policies of Bogus Control Flow, control Flow Flattening, instruction Substitution, etc. The tool may statically confuse the IR. But this tool lacks dynamic support for ROP structures nor considers the systematic integration of opaque predicates with symbol execution attack countermeasure policies. 2.Tigress Tigress is an advanced obfuscation and virtualization protection system that supports multiple control flow transformations and function virtual machine substitutions. The method mainly relies on the form of function extraction and run-time offset to execute code logic, has high cost, is easy to focus f