CN-122020657-A - Multi-arm game algorithm model-based adaptive seed variation method and system

CN122020657ACN 122020657 ACN122020657 ACN 122020657ACN-122020657-A

Abstract

The invention discloses a self-adaptive seed variation method and a system based on a multi-arm game algorithm model, wherein the method comprises the steps of initializing a double-layer multi-arm game algorithm model aiming at the current seed to construct an upper byte arm and a lower operator arm; the method comprises the steps of obtaining the selection probability of bytes corresponding to each byte arm to select a mutation operation byte, obtaining the selection probability of basic mutation operators corresponding to each operator arm to select a mutation operator, randomly distributing stacking times for the mutation operator, applying the mutation operator to the mutation operation byte according to the stacking times to obtain mutation seeds, and generating test cases for the mutation seeds and executing the test cases. The invention aims to solve the problems that the current directional gray box fuzzy test has rough distribution of test resources among seeds, lacks dynamic and fine-grained scheduling, has insufficient perception on semantic features of input data, is easy to damage key path constraint conditions, has insufficient adaptive capability of a variation strategy and is difficult to intelligently adjust according to a test process and a seed context.

Inventors

REN YI
Jian Songlei
WANG XIAOCHUAN
WANG YIQI
DING YAN
TAN SHUANG
ZHANG HAONAN
ZHANG JIANFENG
ZHAO XIN
WANG JING
MA JUN
YU JIE
LI BAO
GUO YONG

Assignees

中国人民解放军国防科技大学

Dates

Publication Date: 20260512
Application Date: 20251229

Claims (10)

1. The adaptive seed variation method based on the multi-arm game algorithm model is characterized by comprising the following steps of: s101, initializing a double-layer multi-arm game algorithm model aiming at a current seed for a target software to perform a directional gray box fuzzy test, wherein the method comprises the steps of constructing an upper byte arm for each byte of the current seed, and constructing a lower operator arm for a given basic mutation operator; s102, acquiring the selection probability of the byte corresponding to each byte arm, and taking the byte corresponding to the byte arm with the highest selection probability as a variation operation byte; s103, obtaining the selection probability of the basic mutation operator corresponding to each operator arm, and taking the basic mutation operator corresponding to the operator arm with the highest selection probability as a mutation operation operator; s104, randomly distributing stacking times for the mutation operator, and applying the mutation operator to the mutation operation byte according to the stacking times to obtain a mutation seed; S105, generating a test case for the mutated seed and executing the test case to finish one mutation of the current seed.
2. The adaptive seed variation method according to claim 1, further comprising, for each operator arm, in step S101 Maintaining a Beta distribution Wherein And Separately recording operator arms Number of mutation success and failure, for each byte arm Maintaining a Beta distribution Wherein And Separately recording byte arms The number of variation success and failure is initially 0, and the step S102 of obtaining the selection probability of the byte corresponding to each byte arm comprises the steps of Beta distribution of (2) Sampling one byte arm sample value Sampling value by adopting temperature-controlled Softmax selection algorithm Conversion to byte arms The step S103 of obtaining the selection probability of the basic mutation operator corresponding to each operator arm comprises the steps of Beta distribution of (2) Sampling an operator arm sample value As operator arms After the variant seeds are obtained in step S104, the method further comprises the step of changing the distance from the seeds to the preset target code area according to the distance between the seeds before and after the variant Determining rewards after seed variation Updating selected byte arms Beta distribution of (2) Selected operator arm Beta distribution of (2) : ; Wherein, the Representation of Or (b) , Representation of Or (b) 。
3. The adaptive seed variation method based on the multi-arm gaming algorithm model of claim 2, wherein the determining the seed-varied prize Comprises calculating the distance change between the seeds and the preset target code region before and after mutation If the distance is changed Less than 0, awarding the seeds after mutation The value is a preset positive number, otherwise, the rewards after the seeds are mutated Take a value of 0, then according to the selected byte arm Byte type of corresponding byte Determining corresponding rewarding weight coefficient, and mutating seeds Multiplying the obtained result by the reward weight coefficient to obtain final seed mutated reward 。
4. The adaptive seed variation method according to claim 2, wherein the step S102 of obtaining the selection probability of the byte corresponding to each byte arm includes determining whether the seed variation is the first variation, and if so, obtaining each byte arm Byte type of corresponding byte Byte type of the byte The method comprises critical constraint bytes and non-critical constraint bytes, wherein the bytes are critical constraint bytes if the code coverage rate of target program execution after disturbance is applied to a certain byte in the seed is reduced by more than a preset threshold value, otherwise, the bytes are non-critical constraint bytes, and the byte types of the bytes are used for processing the target program execution Acquiring corresponding initial selection probability, wherein the initial selection probability of the critical constraint byte is larger than the initial selection probability of the non-critical constraint byte, and if the non-critical constraint byte is not mutated for the first time, each byte arm is used for receiving the non-critical constraint byte Beta distribution of (2) Sampling one byte arm sample value Sampling value by adopting temperature-controlled Softmax selection algorithm Conversion to byte arms Is selected from the group consisting of: ; Wherein, the Is a byte arm Is used to determine the selection probability of (1), Is a temperature coefficient of the silicon carbide material, Is a byte arm Is used for the sampling value of (a), Is the total number of bytes in the current seed, and the step S102 is preceded by automatically adjusting the temperature coefficient And automatically adjust the temperature coefficient The functional expression of (2) is: ; Wherein, the At the maximum value of the temperature coefficient, For the test energy that the current seed has consumed, For the total test energy budget for the current seed, For the temperature decay rate, the test energy that the current seed has consumed is the sum of the test energy allocated to the variant seed that the current seed has performed.
5. The adaptive seed variation method based on the multi-arm gaming algorithm model of claim 2, wherein in step S103, from each operator arm Beta distribution of (2) Sampling an operator arm sample value As operator arms And adjusting the operator arm sampling value according to the type characteristic of the variant operation byte : ; Wherein, the For the adjusted operator arm sample value, For matching weight functions, for obtaining preset operator arms Basic mutation operator of (2) Byte type of operation byte and variation Is the fitness weight of a byte, byte type of a byte The method comprises the steps of including critical constraint bytes and non-critical constraint bytes, wherein the bytes are critical constraint bytes if the code coverage rate of target program execution after disturbance is applied to the bytes in the seeds is reduced to exceed a preset threshold value, and the bytes are non-critical constraint bytes otherwise.
6. The adaptive seed variation method based on the multi-arm game algorithm model according to claim 1, wherein after generating test cases for the variation seeds and executing in step S105, further comprises: s201, calculating a historical contribution degree for representing the value in the aspects of exploring a new code path and approaching a preset target position for the current seed according to the execution result of the test case, wherein a calculation function expression of the historical contribution degree is as follows: ; Wherein, the For the degree of contribution of the history, And As the weight coefficient of the light-emitting diode, To characterize the exploration contribution of value in exploring new code paths, To characterize the directed contribution of value in approaching the preset target position, explore the contribution After the target program is executed for the test cases of the variant seeds of the current seeds, the preset exploration rewarding coefficient corresponding to the variants of the non-branch or basic block in the global scope can be covered Accumulated value of (2); directed contribution After the target program is executed for the test cases of the variant seeds of the seeds, the test process can be guided to advance to the preset target code area, and the preset directional rewarding coefficient corresponding to the variant of the seed can be guided Is a cumulative value of (a); S202, judging whether the execution times of the variant seed test cases under the current seed are completed for one round, and if the execution times are completed for one round, attenuating the historical contribution degree of the current seed by using an attenuation mechanism: ; Wherein, the For the degree of contribution of the history, In order to attenuate the degree of contribution of the history before the decay, The global scheduling period T is a preset seed execution frequency as an attenuation factor; S203, performing energy distribution for the current seed based on the historical contribution of the current seed; ; Wherein, the For the current seed The energy to be distributed is such that, As a function of the total energy of the energy, For the current seed Is used to determine the degree of contribution of the history, And the current seeds in the step S101 are selected from the seed queue as given current seeds, wherein the high-energy seeds are seeds with the distributed energy exceeding a preset threshold value, the seeds with the highest distributed energy or the seeds with the front distributed energy.
7. The adaptive seed variation method according to claim 6, wherein in step S201, the predetermined directional prize coefficients are The expression of the calculation function of (c) is: ; Wherein, the In order to orient the bonus coefficients, In order to operate at the maximum value, And The distances from the target program execution time to the preset target code area are respectively the distances from the target program execution time to the preset target code area for the seed variants.
8. An adaptive seed variation system based on a multi-arm gaming algorithm model, comprising a microprocessor and a memory, which are connected to each other, wherein the microprocessor is programmed or configured to perform the adaptive seed variation method based on a multi-arm gaming algorithm model according to any one of claims 1 to 7.
9. A computer readable storage medium having a computer program or instructions stored therein, wherein the computer program or instructions is programmed or configured to perform the multi-arm gaming algorithm model-based adaptive seed variation method of any one of claims 1-7 by a processor.
10. A computer program product comprising a computer program or instructions programmed or configured to, when executed by a processor, perform the multi-arm gaming algorithm model-based adaptive seed variation method of any one of claims 1-7.

Description

Multi-arm game algorithm model-based adaptive seed variation method and system Technical Field The invention relates to the technical field of software security, in particular to a multi-arm game algorithm model-based self-adaptive seed variation method and system. Background In modern information technology ecology, software systems play a supportive role, and their safety conditions directly affect the stable operation of key information infrastructure and social public safety. In the face of continuous increase of software volume and continuous improvement of internal logic complexity, the traditional mode of relying on manual audit codes is difficult to adapt to high-efficiency and comprehensive safety guarantee requirements. Under the situation, an automatic vulnerability discovery technology, particularly a fuzzy test (Fuzzing) method, has high-efficiency test execution capacity and excellent expansibility, and becomes an irreplaceable key technical means in the construction of a software security system. As an important branch of the fuzzy test technology, the directed gray box fuzzy test (DIRECTED GREY-box fuzzy test, DGF) guides the test activity to be purposefully advanced to a specific code area by presetting the target code position and combining the program structural characteristics and the coverage feedback information in the execution process. Unlike traditional gray box testing methods that aim to improve global code coverage, such testing focuses on reaching critical code segments (including known bug locations, patch code segments, or high risk function entries) in the program quickly and accurately, thereby improving the detection efficiency of specific security threats. The technology generally establishes a program call relation diagram through static analysis, constructs a distance-based measurement system, dynamically optimizes seed selection priority and variation strategies in the test process, enables test resources to be intensively used for target area detection, and achieves remarkable effects in the scenes of vulnerability reproduction verification, patch effectiveness evaluation and the like. Most of the currently widely used directed gray box fuzzy test tools still follow a preset fixed pattern in mutation policy implementation, i.e. mutation operators are randomly selected and applied to arbitrary byte positions of input data during the test phase. The random variation mode lacking intelligent guidance has the obvious defects that firstly, the seed management strategy is simpler, and the test resource allocation lacks systematic optimization. The existing scheme generally adopts a unified processing mode for various seeds, so that a large amount of calculation resources are consumed on the seed test with highly coincident execution paths, and the full exploration of the program behavior space is difficult to realize. Secondly, the semantic features of the input data are not enough to be perceived, and key path constraint conditions are easy to damage. The control effectiveness of each byte in program input on the execution path has significant difference, such as a numerical field for determining the Number of loops, a special mark (Magic Number) for identifying the file type, or important bytes such as a judgment variable for influencing the trend of branches, which has decisive effects on maintaining the effectiveness of the current execution path and meeting complex condition constraints. If the mutation operation is carried out on the key bytes at will, the established path constraint relation is likely to be destroyed, so that the program execution flow deviates from the expected target and returns to an irrelevant initial state. This situation not only results in the generation of a large number of invalid test samples, but even worse, invalid loss of test resources and even potentially disruption of unstable execution environments that have approached the target state. Thirdly, the mutation operator efficiency lacks quantitative analysis, and an adaptive optimization mechanism is difficult to establish. The difference of the effects of different mutation operations in different application scenes is obvious, for example, arithmetic operation adjustment is generally more effective than bit flipping operation on ASCII character data, and the expected effect cannot be basically generated by random insertion processing on checksum fields. The existing random selection strategy can not adaptively select mutation operators most likely to promote the test process according to the context elements such as the test progress status, the seed characteristic parameters, the target position information and the like, so that massive invalid examples are generated in the test process, and the discovery time of target loopholes is greatly prolonged. Disclosure of Invention Aiming at the problems in the prior art, the invention provides a self-adaptive seed variation method and a sel