CN-121979771-A - Method and system for generating test cases
Abstract
The invention relates to a method and a system for generating test cases, wherein the method adopts a time exponential decay model to carry out change weighting on all change line numbers, change time stamps, change depths and change operators extracted from a Git change record of a current version, meanwhile, a penalty factor of author diversity entropy generated by the change operators is introduced in the process of changing weighting so as to obtain line-level change heat, a code call graph, a code inheritance graph and a data dependency graph are extracted from a complete code library of the current version through a static analysis tool so as to construct a code difference graph, the line-level change heat is introduced and weight aggregation is carried out so as to generate an influence graph, a hot spot method is extracted and executed from the influence graph so as to generate a path to be covered, and the path is input into a large language model so as to output recall test cases and new test cases. Therefore, the invention not only has the capability of generating new test cases, but also ensures the recall rate and the accuracy rate of the test cases.
Inventors
- YAN JIAN
- YE HUAQIN
- HUANG ZHIJIE
Assignees
- 福建福诺移动通信技术有限公司
Dates
- Publication Date
- 20260505
- Application Date
- 20251201
Claims (10)
- 1. The method for generating the test case is characterized by comprising the following steps: Collecting a Git change record of a current version, extracting all change line numbers, change time stamps, change depths and change operators of the Git change record, carrying out change weighting on all change line numbers, change time stamps and change depths by adopting a time exponential decay model, and introducing punishment factors of author diversity entropy generated by the change operators in the process of change weighting to obtain line level change heat; Acquiring a complete code library of a current version, extracting a code call graph, a code inheritance graph and a data dependency graph from the complete code library through a static analysis tool to construct a code iso-graph, introducing the line level change heat degree into the code iso-graph, and carrying out weight aggregation to generate an influence graph; And extracting and executing a hot spot method from the influence spectrum to generate a path set to be covered, and inputting the path set to be covered into a large language model to output recall test cases and new test cases.
- 2. The method of claim 1, wherein the employing a time-exponential decay model to change the weighting of all change line numbers, change timestamps, and change depths, while introducing a penalty factor for author diversity entropy generated by the change operator during the change weighting process to obtain line level change heat comprises: Obtaining the number of newly added characters, the number of deleted characters and the maximum value of AST node variation from the Git variation record to calculate variation depth; Inputting all change line numbers, change time stamps and change depths into a time index weighting formula to carry out change weighting, introducing punishment factors of author diversity entropy generated by a change operator in the process of changing weighting, and carrying out line level change heat calculation through a line level heat formula to obtain line level change heat, wherein the time index weighting formula is as follows: ; Wherein, the Representing a single change weighting of the Git change log r, Indicating that the line number is changed Is used for the depth of the change of (a), Indicating that the line number is changed A change time stamp of (a), Representing the current time, T represents the decay half-life, Representing the current attenuation coefficient; The row-level heat formula is: ; Wherein, the Indicating that the line number is changed Is changed to a hotter level in the row, The relative path is represented by a graph of the relative path, The number of the line of change is indicated, Indicating the change weighting of a single set of Git change records, Indicating the operator of change Penalty factors for the generated author diversity entropy, Indicating modified And (5) collecting.
- 3. The method of claim 2, wherein said simultaneously introducing penalty factors for author diversity entropy generated by said change operator in the course of changing the weights comprises: Calculating the change proportion of a change code submitted by each change operator in the Git change record, and inputting the change proportion into an author entropy formula to calculate so as to obtain the author diversity entropy of the corresponding change operator, wherein the author entropy formula is as follows: ; Wherein, the Indicating the operator of change Is used for the entropy of the author diversity, Indicating the operator of change Is provided with a plurality of pairs of paths, Representing the set of authors of the Git change record r, Indicating the operator of change A modified duty cycle of (2); Inputting the author diversity entropy of each change operator into a penalty formula for calculation to obtain penalty factors of the corresponding author diversity entropy, wherein the penalty formula is as follows: ; Wherein, the Indicating the operator of change Penalty factors for the generated author diversity entropy, Indicating the operator of change Is used for the entropy of the author diversity, Representing the set of authors for the Git change record r.
- 4. The method of claim 1, wherein extracting the code call graph, the code inheritance graph, and the data dependency graph from the complete code base by the static analysis tool to construct a code aliogram, introducing the line level change heat into the code aliogram, and performing weight aggregation to generate the influence graph comprises: extracting a code call graph, a code inheritance graph and a data dependency graph from the complete code library through a static analysis tool to construct a code heterogram containing multi-granularity entities, wherein the multi-granularity entities comprise code rows, methods, classes and packages; Calculating the referenced times of all classes and the referenced times of all methods in the code iso-graph to generate corresponding class static coupling degree and method static coupling degree, and collecting class production call frequency of all classes and method production call frequency of all methods through a production environment call chain to generate corresponding class runtime heat and method runtime heat; introducing the line level change heat degree into the code differential composition, combining the class static coupling degree, the method static coupling degree, the class runtime heat degree and the method runtime heat degree, carrying out weight aggregation according to the hierarchical relation of code lines, methods, classes and packets, carrying out cross-entity weight propagation through a lightweight GRAPHSAGE network, and outputting the graph node weight taking each granularity entity as a node so as to generate an influence graph.
- 5. The method of claim 1, wherein the extracting and executing the hot spot method from the influence map to generate the set of paths to be covered comprises: Acquiring a map node weight of each node in the influence map, and sorting all nodes in descending order according to the map node weight to obtain a node sequence after descending order sorting; Taking a method corresponding to a node positioned in the front N in the node sequence after descending order sequencing as a hot spot method, executing the hot spot method, and generating a current path condition; Acquiring new path conditions from change sentences corresponding to each change line number, respectively judging the feasibility of each new path condition and the current path conditions, and taking the new path conditions as paths to be covered if the current path conditions and the new path conditions cannot exist at the same time; Or (b) If the history test case which simultaneously contains the new path condition and the current path condition does not exist in the history test case library, the new path condition is used as a path to be covered; and summarizing all paths to be covered to generate a path set to be covered.
- 6. The method of claim 1, wherein inputting the set of paths to be covered into a large language model to output recall test cases and new test cases comprises: matching the path set to be covered with a pre-constructed triplet mapping table to obtain a function ID to be covered, wherein the triplet mapping table is (class, method and function ID); calculating cosine similarity of the function description text corresponding to the function ID to be covered and the historical test cases in the historical test case library, and outputting the historical test cases with the cosine similarity exceeding a first similarity threshold as recall test cases; And combining the function description text corresponding to the function ID to be covered with the path set to be covered and the change code corresponding to the path set to be covered to construct a structured prompt template, so that a large language model can generate and output a new test case according to the structured prompt template.
- 7. The method of generating test cases according to claim 6, wherein outputting the historical test cases with the cosine similarity exceeding the first similarity threshold as recall test cases comprises: Taking the history test cases with the cosine similarity exceeding a first similarity threshold as candidate recall test cases; Counting the number of service keywords in the ID of the function to be covered, inputting the number and the line level change heat into a first formula for calculation to obtain service priority, wherein the first formula is as follows: ; ; Wherein, the Indicating the priority of the traffic and, Representing class runtime warmth in the set of paths to be covered, Indicating the method operation heat of the path set to be covered, As a first weight to be used, The file level change heat representing r of the Git change record generated from the line level change heat, A second weight is indicated as being indicative of a second weight, The number is indicated as such, A third weight is indicated as being indicative of a third weight, Representing the number of changed lines as Is changed to a hotter level in the row, The relative path is represented by a graph of the relative path, The number of the line of change is indicated, Indicating that the line number is changed Is defined by the number of grammar nodes; Calculating defect density of candidate recall test cases, and inputting the defect density, the service priority and the cosine similarity into a second formula to calculate so as to obtain final similarity, wherein the second formula is as follows: ; Wherein, the Indicating the priority of the traffic and, Indicating the density of the defects, Representing the degree of cosine similarity, Representing the weight of the map node in the path set to be covered; And screening the recall test cases with the maximum final similarity from the candidate recall test cases to be used as final recall test case output.
- 8. The method of claim 1, wherein inputting the set of paths to be covered into a large language model to output recall test cases and new test cases comprises: Calculating AST structure fingerprints and sentence embedded vectors of each recall test case and each new test case, and carrying out fusion coding on the AST structure fingerprints and the corresponding sentence embedded vectors to generate composite fingerprints of each recall test case and composite fingerprints of each new test case; Calculating the first similarity of the composite fingerprints among all recall test cases, simultaneously calculating the second similarity of the composite fingerprints among all new test cases, and simultaneously calculating the third similarity of the composite fingerprints between each recall test case and each new test case; And performing de-duplication on all recall test cases and all new test cases according to the first similarity, the second similarity and the third similarity to obtain de-duplicated recall test cases and de-duplicated new test cases.
- 9. The method for generating a test case according to claim 1, further comprising: calculating the execution priorities of the recall test case and the new test case by adopting a third formula, and arranging a test plan according to the execution priorities, wherein the third formula is as follows: ; Wherein, the Indicating the priority of execution and, Indicating the priority of the traffic and, Representing the graph node weight of the node j corresponding to the recall test case/the new test case in the influence graph, Representing the code complexity of the recall test case/newly added test case, To represent recall defect density of recall test cases/newly added defect density of newly added test cases; And adopting XGBoost regression models, taking feature vectors consisting of the ID number of all the function IDs, the line level change heat, the recall number and recall defect density of the recall test cases, the newly increased number and the newly increased defect density of the newly increased test cases in the test plan as input vectors to predict the test man-hour and the test resources of the test plan, obtaining corresponding prediction results, and quantifying the defect risk existing in the test plan through Monte Carlo simulation to generate a risk thermodynamic diagram.
- 10. A system for generating test cases, comprising a memory, a processor and a computer program stored on the memory and executable on the processor, characterized in that the processor implements the method according to any of claims 1 to 9 when executing the computer program.
Description
Method and system for generating test cases Technical Field The present invention relates to the field of computer technologies, and in particular, to a method and a system for generating test cases. Background In the development mode of rapid software iteration, the code frequently changes to be normal, which brings great challenges to testing work. The prior art is mainly treated by the following three means, but has obvious limitations: and (one) manual experience driving, wherein a test engineer manually selects regression cases based on the code diff, the requirement document and the history defect experience. The method is highly dependent on personal experience, and transition test or insufficient test easily occurs; and (II) static difference comparison, namely recommending the existing use cases according to file granularity or function granularity by comparing grammar differences between two versions. However, the method only stays on the surface layer of the file function, so that the recall rate of the use case is low and the false alarm rate is high; And (III) simply utilizing the metadata of the version, namely triggering test case recommendation or generation by utilizing the Git difference, wherein the metadata of a version control system is introduced, but only historical test cases can be multiplexed, and the new test case generation capability for new addition or modification logic is lacked. Disclosure of Invention The invention aims to solve the technical problem that the invention provides the method and the system for generating the test case, which do not need to rely on manual experience, have the capability of generating a new test case and ensure the recall rate and the accuracy rate of the test case. In order to solve the technical problems, the invention adopts the following technical scheme: In a first aspect, the present invention provides a method for generating a test case, including: Collecting a Git change record of a current version, extracting all change line numbers, change time stamps, change depths and change operators of the Git change record, carrying out change weighting on all change line numbers, change time stamps and change depths by adopting a time exponential decay model, and introducing punishment factors of author diversity entropy generated by the change operators in the process of change weighting to obtain line level change heat; Acquiring a complete code library of a current version, extracting a code call graph, a code inheritance graph and a data dependency graph from the complete code library through a static analysis tool to construct a code iso-graph, introducing the line level change heat degree into the code iso-graph, and carrying out weight aggregation to generate an influence graph; And extracting and executing a hot spot method from the influence spectrum to generate a path set to be covered, and inputting the path set to be covered into a large language model to output recall test cases and new test cases. The invention has the advantages that the multi-dimensional change characteristics such as all change line numbers, change time stamps, change depths, change operators and the like are directly extracted from the Git change records, and a time exponential decay model is adopted to carry out change weighting, so that the obtained line level change heat is established on more comprehensive change characteristics, accurate and fine granularity change evaluation is realized, the limitation that the change evaluation is carried out only by relying on single surface layer difference in the prior art is broken through, the time exponential decay model can effectively avoid the calculated line level change heat distortion problem, the punishment factor of author diversity entropy generated by the change operators is introduced in the process of changing the weighting, the modification of single change operators can be effectively avoided, the occurrence of false line level change heat is caused, and the authenticity and accuracy of the obtained line level change heat are further improved. Compared with the method for constructing the association relationship based on the Git change record, the method has the advantages that the code call graph, the code inheritance graph and the data dependency graph which are obtained from the complete code library are used for constructing the code exception graph, the implicit association among codes can be completely covered, the integrity of the influence graph is ensured, the recall rate and the accuracy of the test cases are further ensured, the to-be-covered path set is generated according to a hot spot method, the output recall test cases and new test cases are focused on the to-be-covered path with high heat, resource waste caused by full coverage is avoided, and the capability of generating new test cases is provided. Optionally, the adopting a time exponential decay model to change and weight all chang