CN-122019397-A - Database testing method, system, equipment and medium based on knowledge distillation

CN122019397ACN 122019397 ACN122019397 ACN 122019397ACN-122019397-A

Abstract

The invention relates to the technical field of database testing, in particular to a database testing method, a system, equipment and a medium based on knowledge distillation, wherein the method comprises the steps of constructing a multi-mode feature extractor and extracting multi-dimensional features of an input structured query statement and an execution context thereof; training a first network model based on a multi-mode characteristic sample generated by a reference database execution history test case and corresponding execution behavior data thereof to obtain a teacher model; training the second network model according to the output probability distribution and middle layer characteristics of the teacher model to obtain a student model, processing the structured query statement query of the database to be tested by the multi-mode characteristic extractor, respectively inputting the processed query statement query into the trained teacher model and the trained student model, and judging whether the current query has functional abnormality according to the difference of output prediction results. The method aims at realizing high coverage rate, low manual intervention and interpretable database function test.

Inventors

LI XIAOJIANG
LI YAPENG
Yu Maoxuan
HE ZHI
HU ZHIRUI
CHEN RAN
XIONG FAN
CUI ZIKUN
YE CHANGJIN
WANG CHUANGYI
ZHANG QING

Assignees

民航机场成都电子工程设计有限责任公司

Dates

Publication Date: 20260512
Application Date: 20260416

Claims (10)

1. A method for database testing based on knowledge distillation, the method comprising: constructing a multi-mode feature extractor, performing multi-dimensional feature extraction on an input structured query statement and an execution context thereof, and converting an extraction result into a feature vector; training a first network model based on a multi-mode feature sample generated by the execution history test case of the reference database and corresponding execution behavior data thereof to obtain a teacher model for representing the execution behavior of the reference database; training the second network model according to the output probability distribution and the middle layer characteristics of the teacher model to obtain a student model for predicting the execution behavior of the database to be tested; After the structured query statement query of the database to be tested is processed by the multi-mode feature extractor, the structured query statement query is respectively input into a trained teacher model and a trained student model, and whether the current query has abnormal functions or not is judged according to the difference of output prediction results; and feeding back the query sample and the execution information thereof which are judged to be abnormal to a training sample set, carrying out incremental updating on the teacher model, and synchronously updating the student model through knowledge distillation.
2. The knowledge distillation based database testing method according to claim 1, wherein constructing a multi-modal feature extractor, performing multi-dimensional feature extraction on an input structured query statement and its execution context, and converting the extraction result into a feature vector, comprises: Performing lexical analysis and grammar analysis on the input structured query sentence to generate an abstract grammar tree, and extracting grammar structure characteristics comprising operation types, list dependency and predicate distribution information; Analyzing an execution plan of the structured query statement in the database, and extracting the characteristics of the execution plan including operator sequences, operator estimation cost, actual execution line numbers and index use conditions; Collecting system state data of a database during operation, and extracting operation state characteristics comprising occupancy rate of a central processing unit, memory usage amount, lock waiting event sequences and transaction log information; Carrying out hash processing on a result set returned after the execution of the structured query statement to generate a digital signature of the result set, and extracting the cardinal number and the column value domain distribution information of the result set as the result signature characteristics; and carrying out vectorization alignment and fusion coding on the grammar structure features, the execution plan features, the running state features and the result signature features to obtain feature vectors.
3. The knowledge distillation based database testing method of claim 1, wherein training the first network model based on the multi-modal feature samples generated by the reference database execution history test cases and their corresponding execution behavior data to obtain a teacher model for characterizing the execution behavior of the reference database comprises: Constructing a first network model, wherein the first network model comprises a first branch for processing grammar structure characteristics, a second branch for processing execution plan characteristics and a third branch for processing runtime state characteristics; Setting a hierarchical attention mechanism at the output end of each branch, and carrying out self-adaptive weighted fusion on the characteristics of multi-branch output to generate a comprehensive behavior characterization vector; training a first network model by adopting a multi-task learning framework, wherein the training tasks at least comprise a result signature prediction task based on comparison learning and an execution plan regression task based on mean square error; Based on the correct execution behavior data verified in the historical test cases as a supervision signal, the first network model parameter is optimized by combining the weight attenuation regularization strategy, so that the teacher model learns the execution behavior mapping relation of the reference database in the multidimensional feature space.
4. The knowledge distillation based database testing method according to claim 1, wherein training the second network model according to the output probability distribution and the middle layer characteristics of the teacher model to obtain the student model for predicting the execution behavior of the database to be tested comprises: constructing a second network model, and calculating KL divergence between a predicted result of the second network model and soft label probability distribution output by a teacher model as a first distillation loss; mapping the middle layer features of the second network model to an implicit feature space of the teacher model through a projection matrix, and calculating the mean square error between the mapped features and the corresponding layer features of the teacher model to serve as feature alignment loss; And constructing a comprehensive loss function based on the first distillation loss and the characteristic alignment loss, and performing parameter optimization on a second network model.
5. The knowledge distillation based database testing method according to claim 4, wherein the training of the second network model based on the output probability distribution and middle layer characteristics of the teacher model results in a student model for predicting the execution behavior of the database under test, further comprising: Constructing an anti-distillation loss function, wherein the anti-distillation loss function generates probability of correctly identifying the features by a discriminator network through a minimum chemical raw model, and adjusts hidden layer feature distribution of a student model; calculating a sample weight coefficient based on a prediction entropy value output by a teacher model, and constructing a sample weight loss function of uncertainty perception, wherein the sample weight loss function adjusts the contribution degree of different samples in training according to the sample weight coefficient; Adding an anti-distillation loss function and an uncertainty perception sample weighting loss function into the comprehensive loss function, forming a multi-objective optimization framework together with the first distillation loss and the feature alignment loss function, and carrying out joint optimization on the student model through a dynamic weight allocation mechanism.
6. The knowledge distillation based database testing method according to claim 1, wherein the steps of processing the structured query sentence query of the database to be tested by the multi-modal feature extractor, respectively inputting the processed query sentence query to the trained teacher model and the trained student model, and determining whether the current query has the function abnormality according to the difference of the output prediction results, include: Respectively calculating a first sub-score of a first prediction result output by the teacher model and a second prediction result output by the student model in a result signature channel, a second sub-score of an execution plan channel, a third sub-score of a transaction log channel and a fourth sub-score of a system state channel; according to the first sub-score, the second sub-score, the third sub-score and the fourth sub-score, weighting and fusing are carried out by combining with a preset weight coefficient, and comprehensive abnormal scores are obtained through calculation; And comparing the comprehensive abnormal score with a preset decision threshold, and if the comprehensive abnormal score exceeds the preset decision threshold, judging that the current query has abnormal functions.
7. The knowledge distillation based database testing method according to claim 1, wherein feeding back the query sample determined to be abnormal and the execution information thereof to the training sample set, performing incremental update of the teacher model, and synchronously updating the student model by knowledge distillation, comprises: Based on the updated training sample set, training the teacher model again by adopting an incremental learning strategy, so that the teacher model fits the boundary data distribution contained in the newly added abnormal sample; Based on the combined analysis of the student model prediction confidence coefficient and the output difference of the teacher model and the student model, screening samples with the confidence coefficient lower than a first threshold value and the difference larger than a second threshold value from the newly added abnormal samples as priority labeling samples, and taking the samples into a next training sample set; And taking the updated teacher model as a knowledge source, and carrying out parameter adjustment or retraining on the student model through a knowledge distillation technology.
8. A knowledge distillation based database testing system for performing the knowledge distillation based database testing method of any one of claims 1 to 7, the system comprising: the multi-modal feature extraction module is configured to construct a multi-modal feature extractor, perform multi-dimensional feature extraction on the input structured query statement and the execution context thereof, and convert the extraction result into a feature vector; the teacher model training module is configured to train the first network model based on the multi-mode characteristic sample generated by the execution history test case of the reference database and corresponding execution behavior data thereof to obtain a teacher model for representing the execution behavior of the reference database; The student model training module is configured to train the second network model according to the output probability distribution and the middle layer characteristics of the teacher model to obtain a student model for predicting the execution behavior of the database to be tested; the abnormality detection module is configured to process the structured query statement query of the database to be tested by the multi-mode feature extractor, respectively input the query statement query into a teacher model and a student model which are trained, and judge whether the current query has abnormal functions according to the difference of output prediction results; and the self-learning updating module is configured to feed back the query sample determined to be abnormal and the execution information thereof to the training sample set, perform incremental updating on the teacher model and synchronously update the student model through knowledge distillation.
9. An electronic device, comprising: at least one processor, and A memory communicatively coupled to the at least one processor, wherein, The memory stores instructions executable by the at least one processor to enable the at least one processor to perform the knowledge distillation based database testing method of any of claims 1 to 7.
10. A computer readable storage medium storing a computer program, wherein the computer program when executed by a processor implements the knowledge distillation based database testing method of any of claims 1 to 7.

Description

Database testing method, system, equipment and medium based on knowledge distillation Technical Field The invention relates to the technical field of database testing, in particular to a database testing method, system, equipment and medium based on knowledge distillation. Background Currently, functional testing of database systems relies primarily on the manual labor of test engineers and automated test frameworks based on script recording. In manual testing, engineers design test cases according to a requirement specification and past experience, the method is low in efficiency, the test coverage range is highly subjective and random, complex scenes such as boundary conditions, concurrent transaction interleaving sequences and extreme parameter configuration are difficult to systematically construct, and a large number of potential defects cannot be effectively found. The automatic test framework improves the execution efficiency through script recording and playback, but the problem of generating test cases is not solved basically, the richness and the effectiveness of the test case library depend on manual maintenance, and inherent limitations exist. In the field of database compatibility testing, in the prior art, a method of comparing the results of a database to be tested with the results of a mature database as a standard post is often adopted, that is, the execution result of the database to be tested is simply compared with the output of the standard post database to judge the functional correctness. However, because objective differences exist among different database products in terms of design concept, function realization paths and SQL standard support degree, simple result comparison can generate a large number of misjudgments caused by reasonable differences, and a large amount of noise is introduced for defect analysis. Secondly, the existing comparison method is only stopped at a shallow comparison level of a final result set or a return code, and lacks correlation analysis of deep execution behavior characteristics such as execution plans, locking mechanisms, transaction logs, resource consumption and the like, so that a developer is difficult to quickly locate the root cause of the problem, and debugging and repairing cost is increased. Meanwhile, the existing test flow presents an open loop structure, the test case library is in a static state, the characteristic mode causing the defects cannot be learned and abstracted from the discovered anomalies, the test system does not have self-evolution capability, self-adaptive identification of novel defects is difficult to realize, and the improvement of test coverage rate and the continuous optimization of test efficiency are limited. Disclosure of Invention In order to realize high coverage rate, low manual intervention and interpretable database function test through intelligent feature modeling and difference detection, the invention provides a database test method, system, equipment and medium based on knowledge distillation, and the adopted technical scheme is as follows: the technical scheme of the first aspect of the invention provides a database testing method based on knowledge distillation, which comprises the following steps: constructing a multi-mode feature extractor, performing multi-dimensional feature extraction on an input structured query statement and an execution context thereof, and converting an extraction result into a feature vector; training a first network model based on a multi-mode feature sample generated by the execution history test case of the reference database and corresponding execution behavior data thereof to obtain a teacher model for representing the execution behavior of the reference database; training the second network model according to the output probability distribution and the middle layer characteristics of the teacher model to obtain a student model for predicting the execution behavior of the database to be tested; After the structured query statement query of the database to be tested is processed by the multi-mode feature extractor, the structured query statement query is respectively input into a trained teacher model and a trained student model, and whether the current query has abnormal functions or not is judged according to the difference of output prediction results; and feeding back the query sample and the execution information thereof which are judged to be abnormal to a training sample set, carrying out incremental updating on the teacher model, and synchronously updating the student model through knowledge distillation. Further, constructing a multi-modal feature extractor, performing multi-dimensional feature extraction on an input structured query statement and an execution context thereof, and converting an extraction result into a feature vector, including: Performing lexical analysis and grammar analysis on the input structured query sentence to generate an abstract grammar