CN-121996219-A - Program simplification method and system based on multi-granularity random optimization

CN121996219ACN 121996219 ACN121996219 ACN 121996219ACN-121996219-A

Abstract

The invention provides a program simplification method and a system based on multi-granularity random optimization, wherein the method comprises the steps of obtaining a user portrait input set and a target function input subset of a program to be simplified, respectively inputting the user portrait input set and the target function input subset into the program to be simplified, obtaining a corresponding candidate initial simplified program based on code coverage information, selecting the candidate initial simplified program with a higher target function value as a starting sample, wherein the target function is a linear weighted result of code deletion amount and program generality, executing mutation and search operation on the starting sample based on a Markov chain Monte Carlo random search method to obtain the simplified program, wherein the mutation and search operation comprises three stages, namely a code coverage granularity mutation stage, a coverage Duan Lidu mutation stage and a statement granularity mutation stage.

Inventors

Tang Jinran
LIU ZHIHUI
XIN QI
XIE XIAOYUAN
Xuan Jifeng

Assignees

武汉大学

Dates

Publication Date: 20260508
Application Date: 20260104

Claims (10)

1. A program simplification method based on multi-granularity random optimization, comprising: Acquiring a user portrait input set and a target function input subset of a program to be simplified, respectively inputting the user portrait input set and the target function input subset into the program to be simplified, generating corresponding candidate initial simplified programs according to code information covered by the user portrait input set and the target function input subset, and selecting the candidate initial simplified programs with higher target function values as initial samples, wherein the target functions are linear weighted results of code deletion quantity and program generality; And performing mutation and search operation on the initial sample based on the Markov chain Monte Carlo random search method to obtain a simplified program, wherein the mutation and search operation comprises three stages, namely a code coverage granularity mutation stage, a coverage Duan Lidu mutation stage and a statement granularity mutation stage.
2. A program simplification method based on multi-granularity random optimization as claimed in claim 1, wherein the mathematical representation of the objective function is: o= (1- β) × (α×attack surface reduction + (1- α) ×volume reduction) +β×program commonality Wherein O represents an objective function value, alpha and beta are weights, the alpha control code is reduced in volume and attack surface, and the beta control code is reduced and the program is universal.
3. The method for simplifying the program based on the multi-granularity random optimization of claim 2, wherein the code coverage information comprises a code coverage, a coverage input set and a coverage segment set, and the extraction process is as follows: Taking a user portrait input set and a target function input subset as input sets of a program to be simplified respectively, executing each input in the input sets by utilizing the program to be simplified, collecting all sentences actually covered by each input during execution, generating code coverage of each input, recording the coverage relation between the input and the sentences, and removing unused sentences by applying dead codes after executing all the input to obtain a sentence set; Constructing an overlay input set based on the code overlay of each input, wherein the overlay input set stores an overlay input cluster of each statement in the statement set, and the overlay input cluster stores the inputs of all the programs to be simplified of the overlay statements when the programs to be simplified are executed; Based on the coverage input set, clustering and segmenting sentences in the sentence set to obtain a coverage section set, wherein the coverage section set stores coverage sections, and each sentence in the coverage sections has the same coverage input cluster.
4. The method for simplifying a program based on multi-granularity random optimization of claim 1, wherein the process of mutation and search of the initial sample based on a markov chain monte carlo random search method comprises: A random search framework is constructed through a Markov chain Monte Carlo random search method, the framework is used for iterative mutation and search of a sample program, a mutated sample program is obtained after each round of mutation operation, and the mutated sample program is accepted through a probability acceptance mode based on the Markov chain Monte Carlo method, so that the sample program after the iterative mutation of the round is obtained.
5. A program simplification method based on multi-granularity random optimization as claimed in claim 3, wherein the code coverage granularity mutation stage comprises mutation operation of preset rounds, in each mutation operation, the current input is randomly selected from the user image input set through a markov chain monte carlo random search method, when the current sample program completely reserves the code coverage of the current input, and under the premise of reserving the code coverage generated based on the target function input subset, codes which are only covered by the current input in the current sample program are deleted to obtain a mutated sample program, otherwise, the code coverage corresponding to the current input is added back to the current sample program to obtain a mutated sample program.
6. The method for simplifying a program based on multi-granularity random optimization as claimed in claim 3, wherein the overlay Duan Lidu mutation stage uses a sample program after the code overlay granularity mutation stage is finished as an iteration start point, the method comprises mutation operation of a preset round, in each mutation operation, a current overlay section is randomly selected from an overlay section set generated based on a user portrait input set through a markov chain monte carlo random search method, when all sentences in the current overlay section are in a reserved state in the current sample program, on the premise of reserving code coverage generated based on a target function input subset, the whole current overlay section is deleted in a batch in the current sample program to obtain a mutated sample program, and otherwise, the whole current overlay section is reserved in a batch to obtain the mutated sample program.
7. A method for simplifying a program based on multi-granularity random optimization as claimed in claim 3, wherein the mutation stage of the granularity of the sentences includes a mutation operation of a preset round, a sample program after the end of the mutation stage is covered Duan Lidu as an iteration start point, in each mutation operation, a current sentence is randomly selected from a sentence set corresponding to a user portrait input set by a markov chain monte carlo random search method, when the current sentence in the current sample program is in a reserved state, the current sentence is deleted in the current sample program on the premise of keeping code coverage generated based on a target function input subset, so as to obtain a mutated sample program, and otherwise, the current sentence is reserved, so as to obtain a mutated sample program.
8. A program simplification system based on multi-granularity random optimization, comprising: the input module is used for acquiring a user portrait input set and a target function input subset of the program to be simplified, respectively inputting the user portrait input set and the target function input subset into the program to be simplified, acquiring a corresponding candidate initial simplified program based on code coverage information, and selecting the candidate initial simplified program with a higher target function value as a starting sample, wherein the target function is a linear weighting result of both code deletion quantity and program generality; The program simplifying module is used for executing mutation and searching operation on the initial sample based on the Markov chain Monte Carlo random searching method to obtain a simplified program, wherein the mutation and searching operation comprises three stages, namely a code coverage granularity mutation stage, a coverage Duan Lidu mutation stage and a statement granularity mutation stage.
9. An electronic device comprising a memory and a processor, the memory storing program instructions for execution by the processor, the processor invoking the program instructions to perform the method of any of claims 1-7.
10. A non-transitory computer readable storage medium, wherein the non-transitory computer readable storage medium stores computer instructions, the computer instructions cause the computer to perform the method of any one of claims 1 to 7.

Description

Program simplification method and system based on multi-granularity random optimization Technical Field The invention belongs to the field of program simplification, and particularly relates to a program simplification method and system based on multi-granularity random optimization. Background Code swelling can have various adverse effects. Not only does it take up a significant amount of additional equipment space, but more importantly, it significantly increases the attack surface of the software, currently, functionality-based software reduction techniques tend to focus mainly on code pruning, with the goal of deleting as much code as possible, while generating a reduced program that can meet a given functional specification. The function specifications are typically given by a set of inputs that are used to trigger the target function to be preserved. However, code pruning is not the only goal of program simplification, and program versatility is also critical. The versatility reflects the ability of the simplified program to maintain correct behavior in a complete application scenario. Thus, if commonality is ignored, the prior art techniques tend to generate simplified post-procedures that fit to a given input set, with less commonality, thereby compromising its actual usability and reliability. In fact, code reduction and program commonality are two key factors that affect function-based program simplification. The two factors conflict with each other, the more codes are pruned, the less functions are usually reserved, so that the universality is reduced, and conversely, the more codes are required to be reserved for improving the universality. Therefore, an effective procedure simplification method should be reasonably balanced between the two. Based on this, some studies convert the software simplification technical problem into an optimization problem, and attempt to generate a program that achieves the best balance between code deletion amount and versatility by means of random search or the like. Typical methods take the program to be simplified and its usage scenario as input, construct a user's usage representation using a set of inputs representing the actual usage, and generate candidate simplified programs based on the manner of deleting or restoring sentences. And quantizing the code deletion quantity, the universality and the weighted sum between the code deletion quantity and the universality of each candidate program by defining the objective function, and searching the program with the highest objective function in the candidate space as a final result. However, because the candidate program space is extremely large, it is difficult to find the optimal solution in a traversal manner, the existing method generally adopts a random search strategy based on Markov Chain Monte Carlo (MCMC) to perform approximate solution. The method comprises the steps of starting from a program to be simplified, generating a new candidate program by gradually deleting or recovering a single statement, calculating an objective function value for the new candidate program and determining whether the new candidate program is accepted or not, repeating the steps until the searching process is finished, and finally outputting a simplified program with the highest objective function value. Although such methods have a certain potential, there are significant limitations in practical optimization. The problem arises mainly from the simplistic variant model that it employs, i.e. the variant is performed for only one basic sentence (usually a single line of code) at a time. This results in a very weak code search capability, and even for medium-scale programs, the search space consisting of a statement-by-statement retention/deletion combination is as high as 2K (K is the number of statements) and is difficult to search efficiently. In addition, such methods are also difficult to delete large useless codes, and for a large code block C, the increase of the code deletion amount caused by each step of deletion in the sentence-by-sentence deletion mode is very small, but may result in significant reduction in generality, so that the search algorithm tends to reject the candidate program, and finally the code block C cannot be deleted entirely. Disclosure of Invention In order to overcome the defect that in the prior art, when a random search strategy of Markov chain Monte Carlo is adopted to approximately solve an optimization program, the searching capability of codes is very weak, large blocks of unnecessary codes are difficult to delete, the remarkable universality is reduced, and the practical optimization is obviously limited, the invention provides a program simplification method and a system based on multi-granularity random optimization. According to an aspect of the present invention, there is provided a program simplification method based on multi-granularity random optimization, including: Acquiring a user