Search

CN-122020650-A - Demand-driven reusable binary pointer analysis method

CN122020650ACN 122020650 ACN122020650 ACN 122020650ACN-122020650-A

Abstract

The invention discloses a demand-driven reusable binary pointer analysis method which is used for determining a pointing set and aliases of given target pointer variables, and comprises the steps of S1, restoring pointer types in a process through a variable using mode, extracting instructions related to pointer calculation, S2, carrying out abstract interpretation in the process based on pointer representation and an execution result of modeling instructions, obtaining pointer representation sets of all pointer variables, storing the results into an intermediate result library, S3, determining functions to be analyzed according to the demands based on the results of in-process analysis, obtaining different pointer representations of the target pointer variables in different functions, and triggering in-process analysis of step S1 and step S2 on a function if the function is not analyzed. The method realizes context sensitive strategy and stream sensitive strategy, realizes balance of analysis efficiency and accuracy, and can efficiently perform on-demand pointer analysis among processes.

Inventors

  • WANG YU
  • Zhu Wentai
  • WANG LINZHANG
  • LI XUANDONG

Assignees

  • 南京大学

Dates

Publication Date
20260512
Application Date
20260119

Claims (7)

  1. 1. A demand driven reusable binary pointer analysis method, the method comprising the steps of: S1, recovering the pointer type in the process by using a variable, and extracting instructions related to pointer calculation; S2, based on pointer representation, modeling an execution result of the instruction, performing abstract interpretation in a process to obtain pointer representation sets of all pointer variables, and storing the result into an intermediate result library; and S3, determining a function to be analyzed according to the requirement based on the result of in-process analysis, obtaining different pointer representations of the target pointer variable in different functions, and triggering in-process analysis of the step S1 and the step S2 for a certain function if the function is not analyzed.
  2. 2. The demand driven reusable binary pointer analysis method according to claim 1, wherein the step S1 specifically comprises the steps of: S11, initial classification, namely traversing all instructions of a current function, marking variables representing addresses of CALLIND instructions, BRANCHIND instructions, STORE instructions and LOAD instructions as pointer types, adding a pointer variable set, adding constants in a legal address space of a program into the pointer variable set, performing conservative judgment on real parameters and actual return values, marking the real parameters as pointer types, adding the pointer variable set if the real parameters pass judgment, marking the real parameters as non-pointer types, and adding the non-pointer variable set if the real parameters do not pass judgment; S12, backward propagation, namely iterating through all INT_ ADD, LOAD, COPY, MULTIEQUAL instructions of the current function, judging an input operand if the output operand of one instruction is of a pointer type, marking the pointer type if the judgment is passed, adding a pointer variable set, otherwise marking the pointer type as a non-pointer type, adding a non-pointer variable set, and terminating iteration when the size of the pointer variable set is not increased compared with that of the previous iteration after iteration; S13, forward disambiguation, namely iterating through all COPY and INT_ADD instructions of the current function, marking an output operand as a non-pointer type if the input operand is of a non-pointer type, and removing the output operand from a pointer variable set; s14, collecting pointer calculation related instructions, namely traversing all INT_ ADD, LOAD, COPY, MULTIEQUAL instructions of the current function, and adding the instructions into a pointer calculation related instruction set if the output operand of the instructions is of a pointer type.
  3. 3. The demand driven reusable binary pointer analysis method according to claim 2, wherein the method derives pointer types based on the manner of use of variables, and high precision pointer type recovery is achieved by combining backward propagation and forward disambiguation.
  4. 4. The demand driven reusable binary pointer analysis method according to claim 1, wherein the step S2 specifically comprises the steps of: S21, initializing a pointer expression set, namely traversing the pointer variable set obtained in the step S1, if the pointer variable is a stack pointer SP, a shape parameter of a function and a return value of the function form, setting the pointer expression set to contain unique SymbolPR, and if the pointer variable is a constant, setting the pointer expression set to contain unique ConstantPR; S22, abstracting in the process, namely, iterating all COPY, MULTIEQUAL, LOAD, INT _ADD instructions of the traversing function, setting a pointer representation set of an output operand as a pointer representation set of an input operand for a COPY instruction, setting the pointer representation set of the output operand as a union of the pointer representation sets of the input operand for a MULTIEQUAL instruction, setting the pointer representation set of the output operand as DerefPR obtained by dereferencing operation of all pointer representations of the input operand for a LOAD instruction, modeling offset based on def-use chain first for the INT_ADD instruction, setting the pointer representation set of the output operand as PlusPR obtained by +offset operation of all pointer representations of the input operand, and terminating iteration when the pointer representation set of all pointer variables is unchanged from the previous iteration after iteration; s23, analyzing the scope of the STORE instruction, namely, for each STORE instruction, traversing all instructions according to the program control flow from the program position of the STORE instruction, if another STORE instruction is encountered and the pointer representation set of the pointer variable corresponding to the STORE instruction is equal to the pointer representation set of the pointer variable of the starting STORE instruction, the initial STORE instruction is kill, and if a LOAD instruction is encountered, collecting the initial STORE instruction; S24, STORE semantics are propagated, namely, all STORE instructions in a program are iterated and traversed, the instruction semantics are assumed to be p=q, wherein a pointer corresponding to p is denoted as pr1, a pointer corresponding to q is denoted as pr2, pointer representations of pointer variables in the action domain of the STORE instructions are traversed, if a base such as DerefPR (pr 1) is contained, the base is replaced by pr2, new pointer variables obtained after replacement are added into a pointer representation set, and iteration is terminated when the pointer representation set of all pointer variables is unchanged from that of the previous iteration after iteration.
  5. 5. The demand driven reusable binary pointer analysis method of claim 4, wherein the method improves flow sensitivity with respect to memory variables by performing in-process pointer analysis based on pointer representation abstract domain and dead point iterations, performing scope analysis on STORE instructions, putting STORE instructions to final unified processing, and defining STORE instruction processing rules based on substitution.
  6. 6. The demand driven reusable binary pointer analysis method according to claim 1, wherein the step S3 specifically comprises the steps of: S31, initializing a search queue, namely, assuming that a target pointer variable is positioned in a function foo, adding a (foo, pr) tuple into the search queue for all pointer representations pr of the target pointer variable; S32, traversing a search queue, namely traversing each tuple in the queue, and sequentially processing each tuple in the queue until the queue is empty; S33, determining a function to be analyzed according to the need, wherein for a tuple (foo, pr), the function to be analyzed is determined according to the bottommost base of pr, if the bottommost base of pr is a base of the foo function or a certain actual return value pointer of the foo function represents the base of pr, the function calling the foo function needs to be analyzed, if the bottommost base of pr is a return value in the form of the foo function or a pointer of a certain real parameter of the foo function represents the base of pr, the called function corresponding to a CALL instruction needs to be analyzed, and if the bottommost base of pr is ConstantPR, other functions referring to the constant need to be analyzed; S34, acquiring in-process analysis results of the function to be analyzed determined in the step S33 from an intermediate result library, and triggering in-process analysis if the in-process analysis results do not exist; S35, obtaining a new pointer representation of the target pointer variable by replacing, for a tuple (foo, pr), the new pointer representation including replacing the bottommost base of pr with a pointer representation of a corresponding real parameter if the bottommost base of pr is a form parameter of the foo function, replacing the bottommost base of pr with a pointer representation of a corresponding real return value if the bottommost base of pr is a form return value of the foo function, replacing the portion of base of pr with a pointer representation of a corresponding form parameter in the called function if the pointer representation of a certain real parameter of the foo function is a base of pr, replacing the portion of base of pr with a pointer representation of a corresponding form parameter in the called function if the pointer representation of a certain actual return value of the foo function is a base of pr, and not replacing the portion of base of pr with a pointer representation of a corresponding form return value in the called function if the bottommost base of pr is ConstantPR; and S36, expanding a search queue, namely binding the function to be analyzed determined in the step S33 with the new pointer representation obtained in the step S35 to form a new tuple and adding the new tuple into the search queue.
  7. 7. The demand driven reusable binary pointer analysis method according to claim 6, wherein the method determines the function to be analyzed on demand based on the base relation of pointer representation and supports multi-layer function call through traversal algorithm based on breadth-first search.

Description

Demand-driven reusable binary pointer analysis method Technical Field The invention relates to the field of static program analysis and binary analysis, in particular to a demand-driven reusable binary pointer analysis method. Background Pointer analysis is a key basic technology in binary program analysis, and is mainly used for determining a set of memory objects possibly pointed by register variables and memory variables in a program in different program positions, and is used for recovering more complete control flow and data flow, and is an important precondition for tasks such as control flow recovery, stain analysis, vulnerability mining and the like. Many downstream tasks rely on-demand pointer analysis. The efficiency of pointer analysis is important because it is frequently invoked by downstream analysis tasks. Therefore, the pointer analysis between the processes driven by one requirement is realized, and the obtained analysis results are effectively multiplexed, so that the overall efficiency of the pointer analysis is obviously improved on the premise of ensuring higher analysis precision, and the technical problem to be solved in the current binary program analysis field is urgent. The key technology used by the invention is reverse analysis and abstract interpretation. And obtaining assembly instructions, function information and reference information of the binary file through reverse analysis, and obtaining intermediate representation irrelevant to the architecture. The execution result of the modeling instruction is interpreted through abstraction to obtain a pointer representation of the pointer variable. The invention discloses a demand-driven reusable binary pointer analysis method. The method comprises the steps of firstly restoring pointer types in a process through a variable using mode, extracting instructions related to pointer calculation, then carrying out abstract interpretation in the process based on pointer representation and modeling an execution result of the instructions, obtaining pointer representation sets of all pointer variables, storing the results in an intermediate result library, finally determining a function to be analyzed according to requirements based on the result of in-process analysis, obtaining different pointer representations of target pointer variables in different functions, and triggering in-process analysis of a function if the function is not analyzed. Disclosure of Invention In order to solve the problems in the prior art, the invention provides a demand-driven reusable binary pointer analysis method, aims to effectively multiplex the obtained analysis results through demand-driven analysis and ensure higher analysis precision, and remarkably improves the overall efficiency of pointer analysis. A demand driven reusable binary pointer analysis method, the method comprising the steps of: S1, recovering the pointer type in the process by using a variable, and extracting instructions related to pointer calculation; S2, based on pointer representation, modeling an execution result of the instruction, performing abstract interpretation in a process to obtain pointer representation sets of all pointer variables, and storing the result into an intermediate result library; and S3, determining a function to be analyzed according to the requirement based on the result of in-process analysis, obtaining different pointer representations of the target pointer variable in different functions, and triggering in-process analysis of the step S1 and the step S2 for a certain function if the function is not analyzed. Further, the step S1 specifically includes the following steps: S11, initial classification, namely traversing all instructions of a current function, marking variables representing addresses of CALLIND instructions, BRANCHIND instructions, STORE instructions and LOAD instructions as pointer types, adding a pointer variable set, adding constants in a legal address space of a program into the pointer variable set, performing conservative judgment on real parameters and actual return values, marking the real parameters as pointer types, adding the pointer variable set if the real parameters pass judgment, marking the real parameters as non-pointer types, and adding the non-pointer variable set if the real parameters do not pass judgment; S12, backward propagation, namely iterating through all INT_ ADD, LOAD, COPY, MULTIEQUAL instructions of the current function, judging an input operand if the output operand of one instruction is of a pointer type, marking the pointer type if the judgment is passed, adding a pointer variable set, otherwise marking the pointer type as a non-pointer type, adding a non-pointer variable set, and terminating iteration when the size of the pointer variable set is not increased compared with that of the previous iteration after iteration; S13, forward disambiguation, namely iterating through all COPY and INT_ADD instructions of t