CN-122019398-A - Test case generation method and system based on multi-mode machine learning and dynamic optimization

CN122019398ACN 122019398 ACN122019398 ACN 122019398ACN-122019398-A

Abstract

The invention provides a test case generation method and a system based on multi-mode machine learning and dynamic optimization, which are used for executing cross-mode feature association on a demand semantic text, a source code instruction stream and a history defect event record in a software test scene, generating a multi-mode association feature space, inputting a dynamic attention alignment network, carrying out weight distribution on a business rule semantic dependency relationship, a code execution path dependency relationship and a defect propagation link association relationship, generating a joint characterization vector, executing double-task collaborative training of test case generation and defect positioning prediction on the joint characterization vector, adjusting the feature extraction weight of the double-task to generate a test case generation parameter set, inputting the test case generation parameter set into a generator of an countermeasure generation network, strengthening the feature expression of a boundary test scene, generating a boundary test case set, carrying out priority ordering calculation case coverage weight on the boundary test case set, and generating a target test case set. The invention improves the effectiveness and comprehensiveness of the test case and the utilization efficiency of the test resources.

Inventors

LI WEI
WANG QIAN
ZHANG LIN
Lan yuanshuai
LI GANG
ZHENG TAO

Assignees

四川吉利学院

Dates

Publication Date: 20260512
Application Date: 20260416

Claims (10)

1. A test case generation method based on multi-modal machine learning and dynamic optimization, the method comprising: Executing cross-modal feature association processing on a demand semantic text, a source code instruction stream and a history defect event record in a software test scene to generate a multi-modal associated feature space containing business rule semantic dependency relationships, code execution path dependency relationships and defect propagation link association relationships; inputting the multi-modal associated feature space into a dynamic attention alignment network, and carrying out dynamic weight distribution on the semantic dependency relationship of the business rule, the code execution path dependency relationship and the defect propagation link association relationship through the inter-modal feature interaction strength calculation to generate a unified joint characterization vector with cross-modal constraint; performing double-task collaborative training of test case generation and defect positioning prediction on the unified joint characterization vector, and adjusting the feature extraction weight of the double tasks through gradient back propagation sharing among tasks to generate a test case generation parameter set; inputting the test case generation parameter set into a generator of a countermeasure generation network, and strengthening the feature expression of the boundary test scene through feature countermeasure learning with a discriminator to generate a boundary test case set; And performing priority ranking on the boundary test case set, calculating case coverage weight based on the business rule semantic dependency relationship and the defect propagation link association relationship, and generating a target test case set containing a priority ranking result.
2. The method of claim 1, wherein the executing cross-modal feature association processing on the demand semantic text, the source code instruction stream, and the historical defect event record in the software test scenario generates a multi-modal associated feature space including business rule semantic dependencies, code execution path dependencies, and defect propagation link associations, comprising: Performing semantic role labeling and dependency syntax double-layer analysis on the required semantic text, identifying argument roles of core predicates, and constructing a semantic dependency network containing role-level relations, wherein the argument roles represent semantic dominant relations through directed edges; performing fusion analysis on the source code instruction stream by using a function call graph and a control flow graph, extracting the dependency relationship between a function entry parameter and a return value, tracking the jump logic of a conditional branch statement, and generating a code execution path network comprising a basic block execution sequence and a data dependency relationship, and marking the execution frequency and the branch condition by path nodes; Performing time sequence association analysis on the historical defect event record, extracting association relation between a defect report and code submission, and constructing a propagation link map of defect introducing positions, triggering conditions and influence ranges, wherein the propagation probability and the number of influence modules are marked on the link edge; Inputting the semantic dependency network, the code execution path network and the defect propagation link map into a cross-modal correlation engine, calculating semantic similarity between semantic nodes and code nodes, path coincidence between the code nodes and defect nodes and description matching between the semantic nodes and the defect nodes, and generating a three-dimensional correlation intensity matrix; Performing association modeling on three mode features based on the three-dimensional association intensity matrix, and constructing a heterogeneous feature network containing semantic-code-defect triple association through feature clustering in modes and feature alignment among modes, wherein network nodes carry mode type labels and association intensity attributes; and carrying out feature learning on the heterogeneous feature network by adopting a graph neural network, and fusing multi-modal node information through a neighborhood aggregation and message transmission mechanism to generate a multi-modal associated feature space which keeps the topological structure and the association relation.
3. The method of claim 2, wherein the employing the graph neural network to perform feature learning on the heterogeneous feature network, fusing multi-modal node information through a neighborhood aggregation and message passing mechanism, generating a multi-modal associated feature space that retains a topological structure and an association relationship, includes: Constructing a three-layer graph neural network architecture comprising a modal distinguishing embedded layer, a neighborhood aggregation layer and an association strengthening layer, wherein the modal distinguishing embedded layer distributes initial embedded vectors for different modal nodes, the neighborhood aggregation layer fuses neighbor node information, and the association strengthening layer enhances cross-modal association characteristics; in the modal distinguishing embedding layer, generating initial semantic embedding by adopting a pre-training language model for semantic dependency network nodes, generating initial code embedding by adopting a code pre-training model for code execution path network nodes, generating initial defect embedding by adopting a defect text coding for defect propagation link map nodes, and keeping the dimensions of three embedding vectors consistent; In the neighborhood aggregation layer, multi-hop neighborhood sampling is carried out on each node, the same-mode node and cross-mode interaction node are preferentially selected during sampling, the aggregation weight of the neighbor nodes is calculated through an attention mechanism, the weight is positively correlated with the correlation strength among the nodes, and a preliminary aggregation feature is generated; In the association strengthening layer, performing inter-modal cross attention computation on the preliminary aggregation feature, wherein the semantic node pays attention to the execution path feature of the code node and the propagation feature of the defect node, the code node pays attention to the argument character feature of the semantic node and the influence range feature of the defect node, and the defect node pays attention to the predicate feature of the semantic node and the branch condition feature of the code node to generate association strengthening feature; Adding the initial embedded vector and the associated strengthening characteristic through residual connection, inputting the next iteration after layer normalization treatment, dynamically adjusting the neighborhood sampling radius in the iteration process, and adopting the sampling radius for the node pair corresponding to the associated strength; and after the iteration is finished, extracting final embedded vectors of all nodes, and performing dimension reduction on the final embedded vectors through principal component analysis to generate a multi-mode associated feature space.
4. The method of claim 1, wherein the inputting the multi-modal associated feature space into the dynamic attention alignment network, the dynamic weight distribution of business rule semantic dependencies, code execution path dependencies, and defect propagation link associations through inter-modal feature interaction strength calculation, the generating a unified joint token vector with cross-modal constraints, comprises: constructing a dynamic attention alignment network comprising a modal interaction analysis module, a dynamic weight calculation module and a feature fusion module, wherein the modal interaction analysis module is used for calculating the feature interaction intensity, the dynamic weight calculation module is used for generating a weight distribution scheme, and the feature fusion module is used for generating a joint characterization vector; Inputting the multi-mode association feature space into a modal interaction analysis module, respectively calculating interaction strength of the business rule semantic dependency relationship and the code execution path dependency relationship, interaction strength of the code execution path dependency relationship and the defect propagation link association relationship, and interaction strength of the business rule semantic dependency relationship and the defect propagation link association relationship, wherein the interaction strength is comprehensively calculated through feature co-occurrence frequency and an association strength matrix; constructing an interaction intensity tensor based on interaction intensity among three modes, inputting the interaction intensity tensor into a dynamic weight calculation module, extracting a main interaction component through tensor decomposition, and allocating initial weights to the mode pairs corresponding to the main interaction component; weighting the business rule semantic dependency relationship characteristic, the code execution path dependency relationship characteristic and the defect propagation link association relationship characteristic according to the initial weight to generate a weighted modal characteristic, dynamically adjusting the weight in the weighting process, and correspondingly adjusting the weight when a certain modal characteristic variance deviates from a preset range; Inputting the weighted modal characteristics into a characteristic fusion module, performing element level multiplication on the weighted semantic characteristics and the weighted code characteristics to obtain semantic-code fusion characteristics, and performing channel splicing on the semantic-code fusion characteristics and the weighted defect characteristics to generate a preliminary joint characterization vector; And executing cross-modal consistency check on the preliminary joint characterization vector, calculating the distribution similarity of the semantic feature part, the code feature part and the defect feature part, and when the distribution of a certain part deviates from the overall distribution by more than a preset range, readjusting the corresponding modal weight through a dynamic weight calculation module, and repeating the fusion process until consistency constraint is met to generate the unified joint characterization vector.
5. The method of claim 4, wherein constructing an interaction strength tensor based on the interaction strengths among the three modalities, inputting the interaction strength tensor into the dynamic weight calculation module, extracting the main interaction component through tensor decomposition, comprises: Constructing an interaction strength evaluation index system, wherein the interaction strength evaluation index system comprises a semantic consistency index, a structural similarity index and a time sequence relevance index, the semantic consistency index is calculated through cosine similarity, the structural similarity index is calculated through graph editing distance, and the time sequence relevance index is calculated through event co-occurrence window analysis; respectively calculating scores of the business rule semantic dependency relationship and the code execution path dependency relationship on three indexes, and obtaining a first interaction strength by weighted summation; Calculating scores of the code execution path dependency relationship and the defect propagation link association relationship on three indexes, and obtaining second interaction strength by weighted summation; calculating the scores of the business rule semantic dependency relationship and the defect propagation link association relationship on three indexes, and obtaining a third interaction strength by weighted summation; constructing interaction intensity tensors by using the three types of interaction intensities, inputting the tensors into a dynamic weight calculation module, extracting main interaction components through tensor decomposition, and allocating initial weights to the modal pairs corresponding to the main interaction components; minimizing the deviation between weight distribution and interaction strength through iterative optimization, measuring the error through mean square error, and adjusting the weight by adopting a gradient descent method until the error is smaller than a preset range; Multiplying the optimized weight distribution result by the modal characteristics in the multi-modal associated characteristic space to generate a weighted modal characteristic, wherein each dimension value of the weighted modal characteristic is the product of the original characteristic value and the corresponding modal weight.
6. The method of claim 5, wherein the constructing the interaction strength assessment index system comprises calculating a semantic consistency index, a structural similarity index, and a time sequence relevance index, the semantic consistency index being calculated by cosine similarity, the structural similarity index being calculated by graph edit distance, the time sequence relevance index being calculated by event co-occurrence window analysis, comprising: converting the semantic dependency relationship of the business rule and the code execution path dependency relationship into semantic vector representation, and calculating an included angle cosine value in a vector space through cosine similarity, wherein the larger the value is, the tighter the semantic association is; Expressing the multi-mode characteristic structure as a directed graph, and calculating the number of insert-delete replacement operations required for converting one graph into another graph through a graph edit distance algorithm, wherein the smaller the number of operations is, the more similar the structure is; Dividing co-occurrence windows based on event timestamp information, counting co-occurrence frequencies of different modal features in the same time window, and taking the ratio of the result of normalizing the co-occurrence frequencies to the size of the time window as a time sequence association index; Determining fusion weights of semantic consistency indexes, structural similarity indexes and time sequence relevance indexes through historical relevance data training, so that the relevance between a comprehensive evaluation result and actual relevance intensity is maximized; According to the multi-modal feature types, automatically adjusting the calculation parameters of each index, increasing semantic consistency weight by the text modal features, increasing structural similarity weight by the graphic modal features, and increasing time sequence relevance weight by the time sequence modal features; Storing the optimal evaluation index combination of different types of multi-mode feature pairs, and completing the selection of the evaluation index through feature type matching.
7. The method of claim 1, wherein performing test case generation and defect localization prediction bi-task co-training on the unified joint token vector adjusts feature extraction weights of the bi-task via inter-task gradient back propagation sharing, generating a test case generation parameter set, comprising: Constructing a dual-task network architecture comprising a shared feature extraction layer, a test case generation task layer and a defect positioning prediction task layer, wherein the shared feature extraction layer provides general features for dual tasks, the test case generation task layer outputs test case parameters, and the defect positioning prediction task layer outputs defect position probability; Inputting the unified joint characterization vector into a shared feature extraction layer, extracting deep features through a mixed structure of a convolution layer and a circulation layer, capturing local feature interaction by the convolution layer, modeling a time sequence dependency relationship by the circulation layer, and generating a double-task shared feature vector; Inputting the double-task shared feature vector into a test case generation task layer, generating an input parameter sequence, an execution step sequence and an expected result sequence of the test case through a sequence generation model, and calculating the editing distance loss of the generated sequence and the real test case sequence to be used as the test case generation task loss; Inputting the double-task sharing feature vector into a defect positioning prediction task layer, predicting the defect probability distribution of a code module through a classification model, and calculating cross entropy loss of the prediction distribution and the true defect position to be used as defect positioning prediction task loss; the test case generation task loss and the defect positioning prediction task loss are weighted and summed to generate a joint loss function; And executing back propagation based on the joint loss function, calculating the gradient of the loss on the shared feature extraction layer parameter, wherein the gradient is a weighted sum of a task gradient generated by the test case and a defect positioning prediction task gradient, the weight is consistent with the dynamic weight of the double tasks, updating the shared feature extraction layer parameter, synchronously adjusting the feature extraction weight of the double tasks, performing iterative training until the joint loss converges, and extracting the network parameter of the test case generation task layer as a test case generation parameter set.
8. The method of claim 7, wherein the performing back propagation based on the joint loss function calculates a gradient of loss versus shared feature extraction layer parameters, the gradient generating a weighted sum of task gradients and defect localization prediction task gradients for the test case, the weights consistent with the dual task dynamic weights, updating the shared feature extraction layer parameters by a gradient descent method, comprising: In each counter-propagation process, calculating gradients of the task loss generated by the test case on each parameter of the shared feature extraction layer respectively, and marking the gradients as first gradients; Calculating gradients of the loss of the defect positioning prediction task on each parameter of the shared feature extraction layer, and marking the gradients as second gradients; Performing gradient consistency analysis on the first gradient and the second gradient, and calculating gradient direction cosine similarity, wherein the closer the direction cosine similarity value is to 1, the higher the consistency degree of the dual-task gradient direction is, and the closer the direction cosine similarity value is to-1, the higher the gradient direction conflict degree is; When the gradient direction similarity is higher than a preset range, directly weighting and summing the first gradient and the second gradient according to the dual-task dynamic weight to generate a shared gradient; When the similarity is lower than a preset range, projecting the conflict gradient to a consistent direction by adopting a gradient projection method, and then weighting and summing to generate a shared gradient after projection; Performing gradient clipping on the shared gradient, controlling the gradient norm within a preset range, avoiding gradient explosion, retaining the gradient direction during clipping, and only adjusting the gradient size; and based on the momentum optimizer, carrying out parameter updating on the cut shared gradient, integrating the historical gradient direction by the momentum items, accelerating convergence, inhibiting oscillation, and dynamically attenuating the learning rate along with training rounds.
9. The method of claim 1, wherein the inputting the set of test case generation parameters into a generator of a challenge generation network generates a set of boundary test cases by feature expression of a feature challenge learning reinforcement boundary test scenario with a arbiter, comprising: Constructing an countermeasure generation network comprising a generator and a discriminator, wherein the generator adopts a transposed convolution and cyclic network mixed architecture to generate test case characteristics, the discriminator adopts a convolution network architecture to distinguish real test cases from generated test cases, and the inputs of the two are test case characteristic vectors; Splicing the test case generation parameter set with a random noise vector, and taking the spliced test case generation parameter set and the random noise vector as input of a generator, gradually improving feature dimensions by the generator through a transposed convolution layer, and outputting candidate boundary test case feature vectors through a dependency relationship among test case steps of loop layer modeling; Collecting boundary test cases in a history test case library, extracting a feature vector as a real sample, inputting the feature vector and the candidate boundary test case feature vector output by a generator into a discriminator together, extracting multi-scale features of the input feature vector by the discriminator, outputting true and false probabilities through a full connection layer, taking the real sample expected probability as a reference value one, and generating a sample expected probability as a reference value zero; Calculating cross entropy loss of the discriminator on the real sample and the generated sample, and updating the discriminator parameters through back propagation to improve the authenticity distinguishing capability as the discriminator loss; Fixing parameters of the discriminator, calculating the probability that a generated sample of the generator is judged to be a real sample by the discriminator, positively correlating the probability with the loss of the generator, introducing feature matching loss, calculating the distance between the generated sample and the real sample at the intermediate layer feature of the discriminator, and calculating the total loss of the generator to be the sum of the counterloss and the feature matching loss; and executing back propagation based on total loss of the generator, updating parameters of the generator, iteratively executing the updating and updating process of the discriminator, calculating the boundary feature coverage rate of the generated sample after each iteration, stopping training when the coverage rate reaches a preset range, converting candidate boundary test case feature vectors output by the generator into structured test case description, and generating a boundary test case set.
10. A computer system comprising a memory and a processor, the memory storing a computer program executable on the processor, wherein the processor implements the steps of the method of any one of claims 1 to 9 when the program is executed.

Description

Test case generation method and system based on multi-mode machine learning and dynamic optimization Technical Field The invention relates to the technical field of data processing, in particular to a test case generation method and system based on multi-mode machine learning and dynamic optimization. Background Along with the continuous improvement of the complexity of a software system, the test case generation is used as a key link for guaranteeing the quality of the software, and the effectiveness of the test case generation directly influences the test efficiency and the defect discovery capability. In the prior art, the test case generation method is usually developed based on a single data source, and part of the method is combined with a history defect record to supplement cases, but the problems of modal information splitting generally exist in the methods, and the internal association between semantic rules, code execution logic and defect propagation rules cannot be fully mined, so that the effectiveness of the test case is difficult to meet the test requirements of a complex software system. Disclosure of Invention In view of the above, the present invention provides a test case generating method and system based on multi-modal machine learning and dynamic optimization. The technical scheme of the invention is realized as follows: On the one hand, the embodiment of the invention provides a test case generation method based on multi-mode machine learning and dynamic optimization, which comprises the steps of executing cross-mode feature association processing on a demand semantic text, a source code instruction stream and a history defect event record in a software test scene to generate a multi-mode association feature space containing a business rule semantic dependency relationship, a code execution path dependency relationship and a defect propagation link association relationship, inputting the multi-mode association feature space into a dynamic attention alignment network, performing dynamic weight distribution on the business rule semantic dependency relationship, the code execution path dependency relationship and the defect propagation link association relationship through inter-mode feature interaction strength calculation to generate a unified joint characterization vector with cross-mode constraint, executing test case generation and defect positioning prediction dual-task collaborative training on the unified joint characterization vector, adjusting the feature extraction weight of the dual-task by inter-task gradient reverse propagation sharing to generate a test case generation parameter set, inputting the test case generation parameter set into a generator of an anti-generation network, enhancing the feature expression of a boundary test case test scene through feature anti-learning boundary test scene with a discriminator, generating a boundary test case priority, calculating the boundary test case priority and sequencing rule priority and the test case priority and the defect application priority and sequencing rule and the object priority and the test case. In another aspect, embodiments of the present invention provide a computer system comprising a memory and a processor, the memory storing a computer program executable on the processor, the processor implementing the steps of the method described above when the program is executed. According to the invention, cross-modal feature association processing is executed on the demand semantic text, the source code instruction stream and the historical defect event record in the software test scene, and originally isolated multi-source data is constructed into a unified feature space containing the business rule semantic dependency relationship, the code execution path dependency relationship and the defect propagation link association relationship, so that the test case generation process can simultaneously utilize the semantic rule, the code structure and the multi-dimensional information of the defect history. The dynamic attention alignment network calculates the characteristic interaction intensity among the modes and performs dynamic weight distribution, so that the self-adaptive fusion of the characteristics of different modes is realized, and the information loss or redundancy caused by fixed weight is avoided. The test case generation and defect positioning prediction tasks are cooperatively trained, and the weight is extracted through gradient back propagation sharing adjustment characteristics among tasks, so that two related tasks can be mutually promoted and jointly optimized, and the quality of test case generation parameters is improved. The characteristic countermeasure learning is performed by using the generator and the discriminator of the countermeasure generation network, so that the characteristic expression of the boundary test scene is enhanced, and the generated boundary test case set can more comprehensivel