CN-121997337-A - Anti-serialization utilization chain detection method based on static analysis and large model
Abstract
The invention discloses a method for detecting an anti-serialization utilization chain based on static analysis and a large model, which comprises the steps of obtaining an anti-serialization entry in a Java application program to be detected, identifying all reachable paths from the anti-serialization entry to a dangerous function call point from the Java application program, recording the paths as the utilization chain, analyzing data dependence of the utilization chain to construct AMethod data structures, deducing an object structure diagram based on the AMethod data structures, and performing partial symbol execution and semantic verification on the utilization chain by using LLM as a miniature JVM and combining the deduced object structure diagram. The invention can accurately identify Java deserialization utilization chains with complete structure and feasible semantics from mass codes, and effectively solves the problems of path explosion and false alarm missing report.
Inventors
- ZOU SHIHONG
- XUE AN
- LU YUEMING
- SHI JINQIAO
Assignees
- 北京邮电大学
Dates
- Publication Date
- 20260508
- Application Date
- 20260123
Claims (10)
- 1. The anti-serialization utilization chain detection method based on static analysis and a large model is characterized by comprising the following steps of: acquiring an anti-serialization entry in a java application to be detected; Identifying all reachable paths from the anti-serialization entry to the dangerous function call point from the Java application, and recording the paths as utilization chains; analyzing the data dependence of the utilization chain to construct AMethod data structures; inferring an object structure map based on AMethod data structures; using LLM as "mini JVM", partial symbolic execution and semantic verification is performed on the utilization chain in conjunction with the inferred structure of the object.
- 2. The method of claim 1, wherein the specific method for identifying all reachable paths from the anti-serialization entry to the dangerous function call point from the Java application is: converting byte codes corresponding to the java application program to be detected into intermediate representation by utilizing a static analysis framework, and constructing a call graph based on the intermediate representation; Marking an anti-serialization entry in the call graph as Source, and marking all possible dangerous function call points in the call graph as Sink; Based on the marked call graph, performing hierarchical field-aware P/Taint analysis to obtain the propagation path of the taint data from Source to Sink in the program statement.
- 3. The method of claim 2, wherein the hierarchical field aware P/Taint analysis specifically comprises: Stain splitting: identifying polymorphic call points in a java application program to be detected, and acquiring a specific pointing set of a receiver, wherein the specific pointing set is obtained through static analysis or dynamic analysis; for each concrete class in the concrete direction set, checking whether the concrete class meets the serialization constraint, creating an independent stain abstract state for the concrete class meeting the serialization constraint, and carrying out stain propagation in the state; Stain spread analysis: Monitoring method call of a taint object, and identifying access operation of the method to an object field; For the writing operation of the field in the method, checking whether the propagation condition of the object field accords with a preset specific propagation rule; And propagating the taint to a field which satisfies a preset specific propagation rule in the object, and establishing a taint propagation chain.
- 4. The method of claim 3, wherein the condition satisfying the predetermined specific propagation rule comprises either a condition one of a condition one in which the field type is not an original type and the field is not marked as a transient and a condition two in which the field is marked as a transient but assigned in readObject.
- 5. A method according to claim 3, wherein the hierarchical field aware P/Taint analysis process is optimized by applying pruning strategies, the pruning strategies comprising: Class-level pruning, namely filtering specific classes which do not meet serialization requirements in the stain splitting process, wherein the filtering standard comprises the steps of excluding the specific classes which cannot be instantiated, excluding the specific classes which do not realize java. Io. Serizable interfaces, and excluding basic data types and basic data arrays; And (3) absolute error pruning, namely, in the searching process of the propagation path, removing the path with the preset invalid behavior mode, wherein the method comprises the steps of removing the path containing the nonsensical general method, removing the path with redundant recursion or repeated calling and removing the path violating the predefined strong constraint.
- 6. The method of claim 2, wherein analyzing data dependencies using chains, constructing AMethod a data structure, and constructing an object structure map based on AMethod data structure is performed by: Analyzing the defined positions and sources of the receiver object and the parameter object in the container method for each method call point in the utilization chain to obtain AMethod data structures; Linking a plurality of AMethod data structures in the utilization chain according to the calling sequence to form an associated method path AMethod Path; and gradually constructing an object structure diagram from the dangerous function call point Sink to the reverse-serialization entry point Source direction by adopting a backward traversal algorithm based on the association method path AMethod Path.
- 7. The method of claim 6, wherein the object structure map is built step by step from the dangerous function call site Sink to the reverse-sequenced entry site Source direction based on the associated method path AMethod Path, and the specific method is as follows: initializing an object structure diagram, and taking a Source object as a root node; traversing each method in the associated method path AMethod Path, and judging the declaration class relation between the current method and the last method; if the declaration class of the current method is the same as that of the previous method, performing intra-object traversal, namely adding a new field node under the current object node when the associated type is this. If the declaration class of the current method is different from that of the previous method, traversing among objects, namely identifying hidden objects which are necessary to exist for meeting the method calling condition according to the parameter association relation, and instantiating new object nodes or field nodes in the object structure diagram; until all the associated method paths AMethod Path are traversed, a complete reverse-serialization utilization chain object structure diagram is generated.
- 8. The method of claim 7, wherein the inference stage pruning is performed during the construction of the object structure diagram, wherein if the source of the receiver of a call point is unknown and is a non-critical class, the utilization chain corresponding to the call point is determined to be invalid and pruning is performed.
- 9. The method of claim 1, wherein utilizing LLM as a "mini JVM", in combination with inferred object structure map, performs partial symbolic execution and semantic verification on the utilization chain, comprising: The method comprises the steps of precisely extracting code context, namely, using an abstract syntax tree parser, extracting only code fragments related to the execution of a utilization chain for each concrete class related to the utilization chain to generate target code context, wherein the target code context comprises method source codes, related field descriptions and class constructors participating in the execution of the chain; The simulation is performed by using a chain, namely, the object code context, the object structure diagram and the chain are input into the LLM, so that the LLM is simulated and performed on the basis of conforming to a thinking chain protocol, and the utilization chain is marked as 'available' and a corresponding verification track is output only when the LLM is successfully simulated and verified to Sink points and no logic interruption occurs.
- 10. The method of claim 9, wherein the analysis performed for each calling method in the chain during LLM simulation execution comprises: Context identification, namely defining an object instance and a method which are currently executed; Path analysis, namely analyzing path sensitivity constraint required by the next method call based on the source code; Checking whether the defined object types, field values and reference relations in the object structure diagram meet the path sensitivity constraint or not according to the object structure diagram, and if so, passing the state verification; And (3) judging state transition, namely outputting 'Success' and entering a next method if the state verification is passed, and outputting 'Failure' and terminating the verification if the state verification is not passed.
Description
Anti-serialization utilization chain detection method based on static analysis and large model Technical Field The invention relates to the technical field of network security and software vulnerability detection, in particular to a method for detecting an anti-serialization utilization chain based on static analysis and a large model. Background Java serialization and anti-serialization mechanisms play a key role in interprocess communication (IPC), network data transfer, and persistent storage. However, unsafe de-serialization can lead to serious security problems, where an attacker triggers a series of method calls (i.e., GADGET CHAIN) during de-serialization by constructing malicious serialized objects (Payload), ultimately performing dangerous operations (e.g., remote code execution, denial of service, etc.). Currently, in order to detect such vulnerabilities, the following types of automated detection techniques are mainly adopted in the industry: Pure static stain analysis technique, represented by the Gadget instructor. The method tracks the polluted serialized data stream through static stain analysis, and searches a call path from an anti-serialization entry (Source) to a sensitive function (Sink) by using Breadth First Search (BFS). Static analysis and dynamic blur test are combined, and are represented by ODDFuzz, serHybrid and JDD. Such methods attempt to combine coverage of static analysis with verification capability of dynamic testing. For example, serHybrid builds a call graph using pointer analysis, ODDFuzz combines Class Hierarchy Analysis (CHA) to generate a potential chain and performs fuzzy test verification, JDD uses bottom-up path search and models Payload using IOCD data structure, and finally verifies availability through fuzzy test. Although the above prior art can discover vulnerabilities to some extent, significant drawbacks still exist when dealing with complex Java features and deep logic constraints, as follows: (1) Path explosion and false alarm (PATH CANDIDATE Explosion) caused by polymorphic assignment: Existing static analysis tools (e.g., CHA-algorithm-based tools) often employ coarse-grained analysis strategies when dealing with Java polymorphic properties. When a polymorphic call site is encountered, the tool blindly brings all possible child implementations into the analysis range. Such full enumeration can result in an exponential explosion of the analysis path due to the lack of domain-specific pruning strategies for serialization scenarios (e.g., fields that do not consider the transient key modification are non-serializable, unfiltered, non-Serializable classes). This not only creates a huge number of invalid candidate chains, resulting in serious computational resource waste, but also results in a very high False Positive (False Positive) so that the analysis result contains a large number of paths that are practically unavailable. (2) Linear call graph analysis cannot identify "hidden objects" (Incomplete Payload Modeling): Existing analysis methods (e.g., ODDFuzz) rely primarily on tracking explicit method call paths to identify objects. However, in constructing complex Payload, there is often a nonlinear object dependency. For example, some objects do not appear directly in the Receiver (Receiver) or parameters of the method call, but exist as nested fields, only to meet certain state conditions (e.g., in the HashMap attack chain, to trigger a hash collision, a certain auxiliary object needs to be constructed). The necessity of these "hidden objects" cannot be inferred by existing linear call graph analysis, resulting in incomplete structure of the generated Payload, which cannot be verified. (3) Dynamic verification is limited by high false negatives (Binary Rigidity) caused by "binary stiffness" at run-time: In dynamic verification using fuzzy testing (e.g., JDD), tools typically make random variations or simple rule-based mutations to object fields. However, the Java runtime environment (JVM) has very high stringency (i.e., a "binary rigidity"), and any minor serialization format error or type mismatch of the irrelevant fields can cause the JVM to throw an exception (e.g., streamCorruptedException) and terminate the deserialization flow. This means that even if the core logic of the exploit is correct, verification will fail as long as the Payload has flaws in non-critical parts. In addition, existing fuzzy tests have difficulty in inferring complex path-sensitive constraints (e.g., hash collisions of specific field values, conditional branching if logic, etc.), blind variations have difficulty in meeting these deep semantic requirements, resulting in a large number of truly available vulnerability chains being discarded erroneously (FALSE NEGATIVE). (4) The pointer analysis has insufficient support for reflection and dynamics: Some tools (e.g., serHybrid) often have difficulty accurately handling the reflection mechanism and dynamic proxy properties of Jav