CN-120872810-B - Fuzzy test optimization method and device based on corresponding relation between input and context
Abstract
The embodiment of the disclosure provides a fuzzy test optimization method and device based on a corresponding relation between input and context, wherein the method comprises the steps of carrying out fuzzy test on a target program based on kAFL, hooking all comparison instructions and recording parameters in the comparison instructions, carrying out mutation on the parameters in the comparison instructions to generate different mutation values, identifying a part related to magic numbers every time the comparison operation is encountered in the execution of the target program, replacing an original input value with the mutation value to generate a new test case, identifying all check detection instructions and recording check information, modifying the check detection instructions to be true all the time to generate a preparation queue, and verifying and repairing the input in the preparation queue to determine whether the input is put into a real queue or not according to whether the repaired input successfully triggers new coverage or exposes new loopholes.
Inventors
- ZHANG PENG
- XU YAOLING
- LI RUNZE
- Zhai Yunjiao
- WANG ZHAOYANG
- YANG YINGJIE
- SHI WEIMING
- XU MINGYE
Assignees
- 中国电子科技集团公司第十五研究所
Dates
- Publication Date
- 20260512
- Application Date
- 20250623
Claims (8)
- 1. A fuzzy test optimization method based on input and context correspondence, the method comprising: Based on kAFL, performing fuzzy test on the target program, hooking all comparison instructions and recording parameters in the comparison instructions; the parameters in the comparison instruction are mutated to generate different mutation values; when the comparison operation is encountered in the execution of the target program, identifying the part related to the magic number, replacing the original input value with a variation value, and generating a new test case, wherein the method comprises the steps of adding a preset number of random bytes into input data, changing the mode of the input data, carrying out initial marking on the input data, identifying similar modes in the input data, carrying out variation on the input only when the similar modes are found on the same offset when comparing two or more versions of input, replacing the original input value with the variation value, and generating the new test case; identifying all check detection instructions and recording check information, modifying said check detection instructions to be always true, generating a ready queue, and Verifying and repairing the input in the preparation queue, and determining whether to add the input into the real queue according to whether the repaired input successfully triggers new coverage or exposes new loopholes, wherein the method comprises the steps of modifying specific fields in the input according to a mode in a modification instruction, enabling the specific fields to conform to the expected format of a target program, restoring information in a program context, enabling the input to be logically matched with the target program, executing the repaired input on the unmodified real target program, adding the input into the real queue if the repaired input can trigger new coverage, and otherwise discarding the input from the preparation queue.
- 2. The fuzzy test optimization method based on the input and context correspondence of claim 1, the fuzzy test method is characterized in that the fuzzy test of the target program based on kAFL comprises the following steps: generating a plurality of inputs and delivering the inputs to the target program; collecting coverage information fed back after the target program is executed, wherein the coverage information is from tracking data generated by Intel PT; mutating the input based on the collected coverage information, and With virtual machines provided by KVM-PT and QEMU-PT, breakpoints are inserted in the target program and the memory and register contents are checked during execution.
- 3. The fuzzy test optimization method based on the input and context correspondence of claim 2, the method is characterized in that the hooking all comparison instructions and recording parameters in the comparison instructions comprises the following steps: Hooking common comparison operations in programs, compiler automatically generated optimization instructions and function call instructions by utilizing Intel PT hardware tracking, and Parameters involved in the comparison operation, parameters transferred at the time of function call, and parameters obtained by calculating the offset of the jump table are recorded.
- 4. The fuzzy test optimization method based on the input and context correspondence of claim 1, the method is characterized in that the step of mutating the parameters in the comparison instruction to generate different mutation values comprises the following steps: The parameters extracted from the seed pool and participating in the comparison operation are added or subtracted, and various coding strategies are applied to generate different variation values, wherein the coding strategies comprise inversion, zero extension or byte order conversion, C character string conversion, ASCII conversion and Memory (n), and n represents the number of bytes.
- 5. The method of claim 4, wherein the performing a mutation on the parameters in the comparison instruction to generate different mutation values further comprises: When the called function is detected to at least contain parameters of two pointer types, extracting the first 128 bytes in the data pointed by each pointer for the input of subsequent mutation processing; Comparing the first n bytes with the designated length, randomly mutating the first n bytes, adjusting the sequence of the bytes, simulating different input conditions, generating special conditions through mutation, and checking whether to trigger abnormal behaviors; Simulating different inputs by changing the first few bytes of the string for comparison from the start position to the null character, inserting the null character if there is no null character in the input data enabling the test to cover the case of the null character as a terminator, and When testing involves string and byte array comparisons, test input is provided by dynamically generating, screening, and utilizing a specific dictionary containing consecutive non-zero bytes and non-0 xff bytes.
- 6. The fuzzy test optimization method of claim 1, wherein the identifying all check-detect instructions and recording check information, modifying the check-detect instructions to always true, generating a preliminary queue comprises: extracting all comparison operations from the program by utilizing Intel PT hardware tracking, generating a comparison operation list, and extracting variation modes in input by comparing input values of different versions; if similar variant patterns occur in multiple input versions, these patterns are identified as instructions related to verification; if both parameters of the comparison operation are variables or pointers rather than hard-coded constants, then determining that the comparison operation is a check-related instruction; Checking whether the pattern depends on the input plurality of bytes and changes in different versions, if so, identified as a check-related instruction, and The identified check detect instruction is modified to always true, causing the input to bypass these check logic into the deeper path of the program, creating a ready queue.
- 7. A fuzzy test optimization apparatus based on input and context correspondence, the apparatus comprising: at least one processor, and At least one memory storing a computer program; Wherein the computer program, when executed by the at least one processor, causes the apparatus to perform the steps of the input and context correspondence based fuzzy test optimization method of any one of claims 1 to 6.
- 8. A computer-readable storage medium storing a computer program, characterized in that the computer program, when executed by a processor, implements the steps of the fuzzy test optimization method based on input and context correspondence of any one of claims 1 to 6.
Description
Fuzzy test optimization method and device based on corresponding relation between input and context Technical Field Embodiments of the present disclosure relate to the field of fuzzy test technology, and in particular, to a fuzzy test optimization method and apparatus based on a correspondence between inputs and contexts, and a computer readable storage medium storing a computer program. Background Fuzzy test technology has become a key component in testing the quality of software systems. Over the past several years, more intelligent feedback-driven fuzzy test tools, represented by AFL, have made significant progress in both academic research and industry. Two common problems in ambiguity testing are magic number and check detection. Example codes are as follows: if(u64(input)==u64("MAGICHDR")) bug(1); if(u64(input)==sum(input+2,len-2)) if(u64(input+4)==sum(input+2,1en-2) if(input[11]=='R’&&input[12]=='Q’) bug(1); The above code indicates that the first error can be found only when the first 8 bytes entered are a specific magic header. To trigger the second error, the input must contain the string "RQ" and two correct checksums. The probability of randomly creating an input that satisfies these conditions is almost negligible. Thus, feedback driven fuzzy testing is difficult to produce new coverage and the fuzzy testing process may be stopped. The prior fuzzy test technology has the defects of 1. The lack of diversity of generating invalid effective combination is that a magic digital section is usually used for constructing legal input data, but if a fuzzy tester excessively depends on the magic digital byte, the input data can be limited to legal format, and the boundary behavior of a program when processing invalid input can not be effectively explored, so that potential errors and loopholes can be missed. 2. The randomness of the fuzzy test is reduced, the core of the fuzzy test is to generate data with high randomness to explore unknown program behaviors, and the use of magic bytes can increase the structural characteristics of the generated data, and limit randomness and flexibility. 3. The validity of using magic bytes depends on some a priori assumptions about the target program, e.g. knowing which file format the target program parses, for applying excessive assumptions about protocols and logic. If the input format of the program changes, the magic bytes may become invalid, thereby reducing the applicability of the fuzzy test. 4. The complex path coverage is insufficient, namely, the magic byte can enable the fuzzy tester to cover only paths for processing normal input, but cannot trigger complex boundary conditions or abnormal processing paths, so that hidden logic loopholes or memory security problems can be omitted. Another common challenge faced by fuzzy test tools is how to efficiently handle the case of verification. if(u64(input)==sum(input+8,len-8)) if(u64(input+8)==sum(input+8,1en-16) if(input[16]=='R’&&input[17]=='Q’) bug(2); The main problem with the above example code is the presence of the check instruction, which partially prevents the validity of the fuzzy test tool because the fuzzy test tool needs to bypass or modify these check conditions in order to trigger more paths or expose potential vulnerabilities. Existing methods, such as FLAYER, TAINTSCOPE and T-FUZZ, rely on the same idea to delete hard-coded checks before repairing them. TAINTSCOPE and T-FUZZ automatically detect critical checks, and once abnormal behavior is found, symbolic execution is used to repair the checks. The shortcomings in the inspection and detection problems mainly comprise 1. Input is difficult to reach deep logic, fuzzy test generates data in a random or light variation mode by default, but most randomly generated test inputs cannot pass verification if a tested program performs strict verification on input data at the front end. This results in limited code coverage for the tested program, which can only trigger shallow logic code, and cannot test deep paths. 2. The fuzzing efficiency is reduced-the output of the fuzzing is feedback dependent and verification failure causes a large number of generated inputs to be discarded. This can result in a large number of invalid test iterations, wasting computing resources. 3. And the loopholes before and after the check codes cannot be found, namely if the fuzzy tester is blocked in the check link, only logic related to input check can be detected, and more complex functional logic after passing the check can not be further tested, so that the subsequent loopholes are missed. 4. The complex verification mechanism increases the parsing burden that fuzzy testing is difficult to generate valid inputs when multiple nested verification logics are involved, especially in relation to dynamic keys or encryption algorithms. To address these obstacles, advanced procedural analysis techniques, such as smudge tracking and symbolic execution, are currently commonly utilize