CN-115185818-B - Binary set-based program dependency cluster detection method
Abstract
The invention discloses a program dependency cluster detection method based on a binary system set, which comprises the steps of defining a dependency set by a data structure in the form of the binary system set, inputting a general set operation interface based on bit operation, obtaining new dependency set data, storing the new dependency set data to each instruction node of a general set operation interface program, traversing each instruction node of the program, calculating dependency relations among instructions by using the general set operation interface to obtain dependency analysis data of each instruction node, reorganizing and classifying the obtained dependency analysis result according to the set content in the new set data, obtaining corrected dependency analysis result, and realizing correct detection of program dependency clusters in a program source code. The invention defines the set data structure by binary set, defines the operation interface according to the data structure, calls the interface to calculate the dependence, corrects the dependence result, and saves the correction result so as to remarkably reduce the space expenditure problem in analysis.
Inventors
- ZHANG YINGZHOU
- YANG JIAYI
- LU YUE
- MI JIE
- GE LILI
- Shuai Dongxin
- XU BIHUAN
Assignees
- 南京邮电大学
Dates
- Publication Date
- 20260505
- Application Date
- 20220620
Claims (7)
- 1. A program-dependent cluster detection method based on a binary set, comprising: Defining a dependency set by a data structure in a binary set form, and inputting the dependency set data in the binary set form into a general set operation interface based on bit operation to obtain new dependency set data; Storing new dependency set data to each instruction node of the general set operation interface program, and traversing each instruction node of the program, wherein the dependency relationship among instructions is calculated by using the general set operation interface during traversing to obtain dependency analysis data of each instruction node; after each instruction node traverses, the obtained dependency analysis results are recombined and classified according to the collection content in the new collection data, and corrected dependency analysis results are obtained; according to the corrected dependency analysis result, the program dependency cluster in the program source code is accurately detected; traversing each instruction node of the program, wherein during traversing, the dependency relationship among the instructions is calculated by utilizing a general set operation interface, and the method for obtaining the dependency analysis data of each instruction node comprises the following steps: The dependency set data utilizes a general set operation interface to calculate program dependency data of related dependency set data under each instruction node to obtain dependency analysis data of each instruction node, wherein the dependency comprises data dependency existing among instructions because of the definition use relation of variables and control dependency existing among instructions because of jump relation; the dependency analysis data of each instruction node is stored in a key value mode, wherein the unique label of each instruction is taken as an index key from the stored dependency analysis data, and the binary stored dependency set data is taken as an index value; The stored dependency analysis results are recombined and classified, and the method for obtaining corrected dependency analysis results comprises the following steps: extracting the data flow value of each instruction node, and indexing by key value pairs when extracting the dependency analysis result, so as to inquire the dependency set corresponding to each instruction node; Remapping the key value pair relation associated with the query dependency set to obtain a remapped key value pair relation; Re-indexing the relation according to the re-mapped key value pairs, extracting key value pairs of the same index key and merging the key value pairs of the same index key to obtain a re-combined key value pair relation, and completing the correction of the dependency analysis result; the corrected dependency analysis result is a standard for detecting whether the program dependency cluster in the source code of the analysis program is correct or not.
- 2. The method for detecting program dependency clusters based on binary set according to claim 1, wherein the data structure is in binary form, the minimum unit of the data structure is a bit, and integer representation is performed by adopting an expanded integer method, so that the traditional integer representation of 0-64 bits is replaced, and the expanded integer method of the data structure is as follows: And performing bit operation and serialization processing on the data structure by adopting a native basic data type Integer in the function programming language Haskell to obtain infinite extension of the storage length of the data structure.
- 3. The method for detecting program dependency clusters based on binary set according to claim 1, wherein the basic representation method of the dependency set data is characterized in that the null set is represented by 0, the non-null set is represented by an integer binary character string, and in the case of the non-null set, the element storage condition of the dependency set data is judged according to 0 or 1 of the binary character string.
- 4. The method for detecting program dependency clusters based on binary set according to claim 1, wherein inputting dependency set data in binary set form into a general set operation interface based on bit operation, obtaining new dependency set data further comprises defining the general set operation interface in advance; The method for defining the general set operation interface comprises the following steps: Inputting the associated element values of the operation interface and judging whether the element values belong to a general set of the operation interface; and according to the judging result, selecting to add element values which are not related to the operation interface into the general set, so as to combine the general sets before and after the judgment, wherein the combined general set does not contain repeated element values, and finally, the operation interface associated with the new element value is defined as a general set operation interface.
- 5. The method of claim 1, wherein the general purpose set operation interfaces each process the input dependency set data based on bit operations having a plurality of operation categories including AND, OR, shift right, shift left.
- 6. The method for detecting program dependent clusters based on binary set as claimed in claim 1, wherein the method for performing data persistent caching on the dependent clusters is characterized by comprising the following steps: And rapidly storing the integer data in a local hard disk and a distributed database by using a serialization method of the original support in the Haskell language, wherein a Python algorithm is used for caching in the storage process, and only the integer serialization method of the original support is required to be called in the caching process.
- 7. The method for detecting program dependency clusters based on binary set according to claim 1, wherein the method for implementing correct detection of program dependency clusters in program source code according to the corrected dependency analysis result is as follows: The data persistent cache is carried out on the detection result of the program dependency cluster, and the data persistent cache is used for realizing the subsequent loading of the dependency analysis result when the same program is analyzed subsequently; when program dependency cluster analysis starts, loading historical dependency analysis results which are cached in a local hard disk and a distributed database; and after the program dependent cluster analysis is finished, repeatedly storing the complete dependent cluster analysis result for subsequent on-demand loading access.
Description
Binary set-based program dependency cluster detection method Technical Field The invention belongs to the field of detection program analysis, and particularly relates to a program dependency cluster detection method based on a binary set. Background With the increasing size of software, the difficulty of software operation and maintenance is increasing. One of the important reasons is that unavoidable inter-dependent relationships exist among program components of software, the inter-dependent relationships form program dependent clusters, and the accurate detection of the widely existing dependent clusters in the program has great significance for the work of software understanding, testing, maintenance and the like. At present, research at home and abroad is focused on analyzing the cause of the program-dependent cluster formation, and how to improve the program-dependent cluster detection method is not further studied. The traditional dependency cluster detection algorithm is based on a System Dependency Graph (SDG), the SDG is a graph representation of control dependency relationship and data dependency relationship among software programs, edges represent the dependency relationship, a unique identifier (Id) of a node represents one element of a dependency set, when detection is carried out, each node is collected through a reachability traversal algorithm of the graph, dependency analysis results of all nodes are finally integrated to obtain slice criteria with the same dependency set, and the nodes corresponding to the criteria are mutually dependent, so that one dependency cluster in the program is detected. However, the SDG construction process is complex and difficult to understand, and particularly when analysis of the inter-process dependency relationship is performed, different parameter transfer conditions need to be considered in the construction process, so that huge time overhead is often generated in the process, meanwhile, the space occupation problem is more obvious by utilizing one element of the dependency set corresponding to the unique identifier of the SDG node, particularly for a large-scale program containing more SDG nodes, and therefore, the time and space efficiency of the SDG-based dependency cluster detection method cannot meet the requirement of detection and analysis of the dependency clusters of the large-scale program. Disclosure of Invention The invention aims to overcome the defects in the prior art, and provides a program dependency cluster detection method based on a binary set, wherein the binary set defines a set data structure, an operation interface is defined according to the data structure, an interface is called to calculate dependency, a dependency result is corrected, and the correction result is saved so as to remarkably reduce space expenditure in analysis, and solve the problem of overlarge space expenditure in the traditional detection method. The invention provides a program dependent cluster detection method based on a binary set, which comprises the following steps: Defining a dependency set by a data structure in a binary set form, and inputting the dependency set data in the binary set form into a general set operation interface based on bit operation to obtain new dependency set data; Storing new dependency set data to each instruction node of the general set operation interface program, and traversing each instruction node of the program, wherein the dependency relationship among instructions is calculated by using the general set operation interface during traversing to obtain dependency analysis data of each instruction node; after each instruction node traverses, the obtained dependency analysis results are recombined and classified according to the collection content in the new collection data, and corrected dependency analysis results are obtained; and according to the corrected dependency analysis result, the program dependency cluster in the program source code is accurately detected. In a further embodiment, the data structure is in a binary form, the minimum unit of the data structure is a bit, and the integer representation is performed by adopting an expanded integer method, so that the traditional integer representation of 0-64 bits is replaced, and the expanded integer method of the data structure is as follows: And performing bit operation and serialization processing on the data structure by adopting a native basic data type Integer in the function programming language Haskell to obtain infinite extension of the storage length of the data structure. In a further embodiment, the basic representation method of the dependency set data comprises the steps of representing an empty set by 0 and representing a non-empty set by an integer binary character string, wherein in the case of the non-empty set, the element storage condition of the dependency set data is judged according to 0 or 1 of the binary character string. In a further embodi