CN-121984731-A - Low-interference stain path restoration system and method based on static pile insertion

CN121984731ACN 121984731 ACN121984731 ACN 121984731ACN-121984731-A

Abstract

The invention discloses a low-interference taint path restoration system and method based on static pile insertion, which mainly solve the problems of stiff analysis strategy, high operation cost and result semantic deletion faced by the existing vulnerability mining and security assessment. The system comprises a target data acquisition end and an analysis engine which are independently operated and connected through a shared memory channel, wherein the shared memory channel of the analysis engine is built by carrying out lightweight static instrumentation on source codes in a compiling stage of a target program, a time-consuming taint rule management and path restoration logic is stripped from a service main thread by building a producer-consumer asynchronous decoupling architecture, a data flow in operation is efficiently captured by utilizing a ring-free buffer zone, and a debugging information and a call stack hash algorithm are combined in the independent analysis engine to restore a source code level taint propagation path. The method can obtain the high-precision source code analysis result while maintaining low performance loss, and greatly improves the efficiency and flexibility of vulnerability mining.

Inventors

YANG CHAO
ZHENG HAOCHEN
FENG PENGBIN

Assignees

西安电子科技大学

Dates

Publication Date: 20260505
Application Date: 20260123

Claims (10)

1. The low-interference taint path restoration system based on the static pile is characterized by comprising a target data acquisition end and an analysis engine, wherein the target data acquisition end and the analysis engine independently run and are connected through a shared memory channel; The target data acquisition end consists of a compiling pile inserting unit, a stain rule marking unit and a stain track recording unit, wherein after the compiling pile inserting unit inserts a stain propagation strategy into a test object, stain data marking is carried out on test input through the stain rule marking unit, and then a stain propagation track is extracted through the stain track recording unit, and a call stack is formed and sent to an analysis engine; The analysis engine consists of a taint rule management unit and a path restoration and compression unit, wherein the taint rule management unit is used for generating and issuing a taint rule set and maintaining the taint rule set of the target data acquisition end, and the path restoration and compression unit is used for reading data from a shared memory channel, analyzing by combining debugging information and finally generating a data report.
2. The system of claim 1, wherein the compiling instrumentation unit is a core building module of the target data collection end, and is configured to inject the execution instruction of the taint propagation policy and the record instruction of the taint track into a binary file through a compiler instrumentation technique at a compiling stage of the test object, and link a runtime support component responsible for executing the taint rule marking logic into the binary file to build the taint rule marking unit.
3. The system of claim 2, wherein the taint rule marking unit is integrated at a data entry of the test object and is configured to receive a taint rule from the shared memory channel, and perform real-time matching and taint tag allocation on a test input entering the system according to the rule, where the test input includes a network message and a file stream.
4. The system of claim 1, wherein the dirty track recording unit is distributed in a key instruction node of the test object, wherein the key instruction node comprises memory read-write operation, data comparison and memory block transmission operation, and the key instruction node is used for monitoring data flow with a dirty label when the test object runs and capturing current execution context, namely call stack information when a propagation event is triggered.
5. The system of claim 1, wherein the path restoration and compression unit in the analysis engine comprises a cache query module, an ELF debugging information loading module, an address analysis module and a path compression module, wherein the cache query module is used for reading an original call stack sequence from a ring buffer area and querying whether an analysis result of a call stack address to be analyzed exists in the cache, the ELF debugging information loading module is used for loading a.debug_info section and a.debug_line section of an ELF format binary file corresponding to a test object, the address analysis module is used for traversing the.debug_info section to acquire a function base address and calculate an absolute PC address, and then positioning a corresponding source file name and a line number through the.debug_line section, the path compression module is used for calculating a hash signature for the restored call stack and generating or folding a path node according to signature change, and the generated data report contains a source code level mapping of a stain propagation call stack and a macroscopic path of stain propagation.
6. A method of implementing a spot path restoration based on the system of claim 1, comprising the steps of: (1) At a target data acquisition end, performing static analysis on a source code of a test object by utilizing a compiling pile-inserting unit, automatically injecting a stain propagation strategy execution instruction and a stain track recording instruction into a binary file, and linking a runtime support component responsible for executing a stain rule marking logic into the binary file to construct the stain rule marking unit, so as to generate a binary executable file which reserves eh_frame section and DWARF debugging information and can start frame pointer removal optimization; (2) Starting an analysis engine, initializing a stain rule management unit, a path restoration and compression unit and creating a shared memory area, and then loading a test object running through the instrumentation, and establishing communication connection with the analysis engine; (3) An analyst configures a stain rule set through a stain rule management unit according to a test requirement, and the method comprises the steps of generating the stain rule set through the stain rule management unit and dynamically issuing the stain rule set to a stain rule marking unit through a shared memory; (4) The target data acquisition end executes service logic of a test object, and when the shadow memory changes along with instruction flow, the stain track recording unit triggers stack backtracking logic to acquire a current instruction pointer sequence and write the current instruction pointer sequence into the shared memory; (5) The path restoring and compressing unit continuously takes out data from the shared memory, completes source code level positioning by combining standardized debug information DWARF, executes a path compressing algorithm in real time, filters redundant information, and finally outputs a macroscopic attack path or a privacy revealing link; (6) The system generates an analysis report containing the source code level positioning information and the macroscopic attack path or the privacy revealing link, and the stain path restoring process is completed.
7. The method of claim 6, wherein the configuring the set of spot rules by the spot rule management unit in step (3) is implemented as follows: (3.1) the taint rule management unit responds to the interaction request of the analyst, generates a taint rule set supporting the matching of the full quantity mark, the offset mark and the feature code, and writes the rules into a control channel of the shared memory after serialization; (3.2) the taint rule marking unit continuously monitors the shared memory channel, when detecting that a new rule is written, the unit applies for a new memory under the condition of not blocking a service thread by adopting a read-copy-update mechanism, constructs a rule linked list copy, and switches a global rule pointer through atomic operation to finish the lock-free update of the strategy; and 3.3, when the external test enters an input inlet of the test object, the taint rule marking unit reads the currently effective global rule linked list in a lock-free mode, scans and matches the input data in real time, and when the input data is successfully matched, sets a shadow byte of a corresponding area into a taint state by operating a shadow memory, so that the dynamic injection of the taint is completed.
8. The method of claim 6, wherein the shadow memory in step (4) employs a memory mapping mechanism based on a bitmask, comprising: (4a1) In the system initialization stage, dividing a virtual address space into an application memory and a shadow memory area corresponding to the application memory, wherein the shadow memory area is used for storing stain label data; (4a2) The stain track recording unit maps any application memory address to a corresponding shadow memory address by using bit Mask operation; (4a3) In the execution process of the test object, when a data moving instruction or a calculating instruction is detected, the taint track recording unit automatically checks the shadow memory state of the source operand, and if the source operand is taint, the taint label is transmitted to the shadow memory of the destination operand, so that the taint state is automatically updated along with the data stream.
9. The method of claim 8, wherein the dirty trace recording unit triggers stack trace back logic in step (4) to obtain and write a current sequence of instruction pointers into the shared memory, comprising the steps of: (4b1) When the test object runs and the shadow memory changes, the stain track recording unit immediately reads the eh_frame segment mapped in the memory and analyzes the call frame information, and iteratively searches the frame description entry by taking the current program counter as an index and calculates the standard frame address and the return address of the previous frame until reaching the stack bottom; (4b2) The stain track recording unit encapsulates the captured original call stack sequence into a structural body, and writes the structural body into the annular buffer zone of the shared memory in a lock-free mode.
10. The method of claim 9, wherein the path restoration and compression unit in step (5) performs source code level positioning and path compression according to the following steps: (5.1) taking the path restoring and compressing unit as a consumer, reading the original call stack sequence in batches from the annular buffer, utilizing the analysis result to cache and inquiring whether the current call stack address to be analyzed has a cache record or not, if so, directly reading, executing the step (5.5), otherwise, executing the step (5.2) to analyze; (5.2) when resolving, the path restore and compression unit opens the ELF format binary file corresponding to the test object, loads the internal debugging information segment, and reads the debug_info segment containing the debugging information item and the debug_line segment containing the line number mapping matrix; (5.3) traversing the debug_info segment to find the subroutine label and match the function name for each address to be resolved, extracting the function base address, and adding the function base address and the relative offset to calculate an absolute PC address; (5.4) inquiring in a line number matrix of the debug_line section by using an absolute PC address, accurately positioning and outputting a corresponding source file name and line number, finishing source code level restoration, and writing an analysis result into a cache; And (5.5) processing the restored call stack by using a call stack hash algorithm, calculating the hash signature of the current call stack, automatically folding if the current call stack is the same as the previous state, otherwise, generating a new path node when the signature is changed.

Description

Low-interference stain path restoration system and method based on static pile insertion Technical Field The invention belongs to the technical field of computer network security, and further relates to a network data flow analysis and security assessment technology, in particular to a low-interference taint path restoration system and method based on static pile insertion, which can be used for assessing the influence on taint data transmission in the network security analysis process, and for the situations of vulnerability mining, interactive analysis, automatic fuzzy test enhancement and the like of high-performance software in a production environment. Background Dynamic taint analysis is a technique that propagates paths in a program by marking and tracking unreliable data (i.e., taints). The method is widely applied to vulnerability detection, privacy disclosure analysis and malicious software analysis. The libdft technology published in the VEE '12 conference uses Intel Pin and other tools to instantly modify the binary instruction stream to insert propagation logic to implement analysis when the program runs, the mode has universality that analysis can be performed without source codes, but has extremely high performance loss due to the fact that real-time translation instructions and synchronous maintenance states are required, and high-level language type information is lacking due to the fact that the technology is separated from a compiling environment, the vulnerability root cause is difficult to locate, the PolyTracker technology published in the ISSTA 2024 conference realizes dynamic tracking of all input data through a compiling time instrumentation and reduces the running expenditure, the strategy configuration mainly depends on environment variables or static rule files, the stain definition often adopts a full-volume marking mode, the requirement of flexibly focusing on specific data fragments in interactive analysis cannot be met, and the ShadowReplica technology published in the CCS' 13 conference aims at stripping heavy analysis logic from a main thread through a multi-core architecture, so that the throughput is improved to a certain extent, huge inter-process data synchronization overhead is introduced, massive logs are extremely easy to generate when the technology faces a high-frequency execution path, and the capability of extracting microscopic instruction stream into a path is lacking macroscopically. The traditional binary instrumentation-based taint analysis synchronously executes complex logic judgment, taint propagation and log record in the process, and severely slows down the execution speed of a target program. In time sensitive scenarios (such as high concurrency services or real-time control systems), this delay can lead to program timeouts, disconnection or behavioral anomalies, rendering analysis unusable. The existing high-performance source code instrumentation scheme usually determines dirty point sources and convergence points during compiling, an analyst needs to recompile or restart a target program once the analysis rule needs to be modified, the efficiency is extremely low for long-running service or interactive analysis scenes needing frequent strategy adjustment, in a production environment, when a dirty propagation trigger record is used, a traditional debugging API cannot acquire a correct call stack, if a mode of analyzing debugging symbols in a process is used, a large amount of CPU (Central processing Unit) is consumed, huge memory is occupied, the target process is unstable, and if only a binary address is recorded, a developer can hardly map analysis results back to source codes for repairing. It follows that as software size and complexity increase, how to obtain high-precision source code analysis results while maintaining low performance loss becomes a major challenge in this field. Disclosure of Invention The invention aims to overcome the defects of the prior art, and provides a low-interference taint path restoration system and method based on static pile insertion, which are used for solving the problems of stiff analysis strategy, high cost in operation and result semantic deletion faced by production-level software when vulnerability mining and security evaluation are carried out. The method comprises the steps of firstly, carrying out lightweight static instrumentation on source codes in a compiling stage of a target program, pre-embedding strategy execution hooks and shadow memory operation logic, establishing a shared memory channel of an analysis engine, stripping time-consuming taint rule management and path restoration logic from a service main thread by constructing a producer-consumer asynchronous decoupling architecture, then, utilizing a lock-free buffer to efficiently capture a data flow in operation, and combining debugging information and a call stack hash algorithm in an independent analysis engine to asynchronously restore an