CN-121979785-A - High-reliability test data generation method and device, electronic equipment and storage medium
Abstract
The invention provides a method and a device for generating high-reliability test data, electronic equipment and a storage medium, and relates to the technical field of software testing. The method comprises the steps of analyzing a requirement document of a target test application through artificial intelligence natural language processing, extracting a service scene, a data type, constraint conditions and compliance requirements, combining a service meta-model library, generating a data generation task book, calling a multi-technology module, generating first test data based on the data generation task book, wherein the first test data comprises structured data, unstructured data and dynamic scene data, performing privacy compliance and data cleaning on the first test data based on the data generation task book to obtain second test data, performing iterative optimization on the second test data based on a genetic algorithm to obtain third test data, and verifying the third test data through a three-stage verification mechanism of format verification, function verification and semantic verification to obtain effective test data, and providing high-quality and high-adaptability test data support for software testing.
Inventors
- SHU WEI
- GUO MANLI
- Wu Shekang
Assignees
- 广州心娱网络科技有限公司
Dates
- Publication Date
- 20260505
- Application Date
- 20251225
Claims (10)
- 1. A method of generating highly reliable test data, comprising: Analyzing a demand document of a target test application through artificial intelligence natural language processing, extracting a service scene, a data type, constraint conditions and compliance requirements, and generating a structured data generation task book of the target test application by combining a service meta-model library, wherein the data generation task book comprises data fields, formats, boundary values and association relations; Invoking a multi-technology module, and generating first test data based on the data generation task book, wherein the first test data comprises structured data, unstructured data and dynamic scene data, and the multi-technology module at least comprises a APIGen assembly line module, a large language model module, a federation learning module and a flow recording and playback module; Performing privacy compliance and data cleaning on the first test data based on the data generation task book to obtain second test data; Performing iterative optimization on the second test data based on a genetic algorithm to obtain third test data; and verifying the third test data through a format verification, function verification and semantic verification three-level verification mechanism to obtain effective test data.
- 2. The method for generating highly reliable test data according to claim 1, wherein the invoking the multi-technology module to generate the first test data based on the data generation task book comprises: Invoking the APIGen assembly line module, generating a task book analysis interface document according to the data, and generating interface parameter type structured data in a JSON or XML format, wherein the interface parameter type structured data comprises a normal value, a boundary value and a special symbol; calling the large language model module, extracting API document semantics according to the data generation task book, and generating text type unstructured data and attack type input test case points, wherein the text type unstructured data comprises a technical document, webpage content and log text; invoking the federation learning module, and generating cross-mechanism table type structured data through a vertical federation learning architecture according to the data generation task book; And calling the flow recording and playback module, recording real service flow according to the data generation task book, performing desensitization processing, and generating concurrent flow data and abnormal sequence data, wherein the concurrent flow data and the abnormal sequence data form dynamic scene data.
- 3. The method for generating high reliability test data according to claim 1, wherein the performing privacy compliance and data cleaning on the first test data based on the data generating task book to obtain second test data comprises: according to the data generation task book, processing sensitive fields of the structured data in a differential privacy and/or dynamic mask mode, and eliminating structured data with wrong format, logic conflict and repeated redundancy; According to the data generation task book, carrying out blurring processing on the unstructured data, and eliminating unstructured data which cannot be identified in a format and is invalid in content; and generating a task book according to the data, performing desensitization processing on the sensitive information in the dynamic scene data, and eliminating the dynamic scene data exceeding a service threshold value and having logic contradiction.
- 4. The method for generating highly reliable test data according to claim 1, wherein the iterative optimization of the second test data based on the genetic algorithm to obtain third test data comprises: Treating the second test data as a chromosome by the genetic algorithm; Constructing an adaptability function according to the test coverage rate, wherein the coverage rate weight of the high-risk scene data is higher than that of the normal scene data; Performing at least one selection-crossing-mutation operation on the chromosome based on the fitness function to obtain iterative optimized data; and combining the historical defect data, and adjusting the iteration optimized data of the high-risk scene in the iteration optimized data according to the weight.
- 5. The method for generating high-reliability test data according to claim 1, wherein the verifying the third test data by the three-level verification mechanism of format verification, function verification and semantic verification to obtain valid test data comprises: Invoking a format constraint rule preset by the data generation task book, and performing full format verification on the third test data to obtain format data, wherein the content of the full format verification comprises data field integrity, data type matching, format normalization and content integrity; the format compliance data is accessed into a verification inlet corresponding to the target test application, the target test application is triggered to execute target operation, whether the verification data can effectively drive the application function to operate or not is verified, and functional effective data are obtained; Loading preset business logic rules and compliance requirements, and carrying out logic consistency check on the functional effective data through a business rule engine to verify whether the data conforms to business scene requirements, industry compliance criteria and core business logic, thereby obtaining the effective test data.
- 6. The method for generating high reliability test data according to claim 1, further comprising: when the service code or the interface code of the target test application is monitored to submit and trigger content updating, generating a new data generation task book by synchronizing the updated service meta-model library; And calling the multi-technology module, generating a task book based on the new data, generating test data adapting to new code logic through privacy compliance, data cleaning, iterative optimization and verification processing, and finishing continuous updating.
- 7. The method for generating high reliability test data according to claim 1, further comprising: And deploying a data quality monitoring panel, counting the data passing rate, the defect finding rate and the coverage rate increasing value in the process of generating the effective test data, dynamically adjusting the data generation rule and the data generation proportion based on the counting result, and updating the data generation task book to realize feedback iteration.
- 8. A high reliability test data generating apparatus, comprising: the demand analysis module is used for analyzing a demand document of a target test application through artificial intelligence natural language processing, extracting a service scene, a data type, constraint conditions and compliance requirements, and generating a structured data generation task book of the target test application by combining a service meta model library, wherein the data generation task book comprises data fields, formats, boundary values and association relations; The multi-mode data generation module is used for calling a multi-technology module and generating first test data based on the data generation task book, wherein the first test data comprises structured data, unstructured data and dynamic scene data, and the multi-technology module at least comprises a APIGen assembly line module, a large language model module, a federal learning module and a flow recording and playback module; The privacy compliance and data cleaning module is used for carrying out privacy compliance and data cleaning on the first test data based on the data generation task book to obtain second test data; the iterative optimization module is used for carrying out iterative optimization on the second test data based on a genetic algorithm to obtain third test data; And the layering verification module is used for verifying the third test data through a three-level verification mechanism of format verification, function verification and semantic verification to obtain effective test data.
- 9. An electronic device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, characterized in that the processor implements the method of generating highly reliable test data according to any of claims 1 to 7 when executing the program.
- 10. A non-transitory computer readable storage medium having stored thereon a computer program, wherein the computer program when executed by a processor implements the method of generating high reliability test data according to any of claims 1 to 7.
Description
High-reliability test data generation method and device, electronic equipment and storage medium Technical Field The present invention relates to the field of software testing technologies, and in particular, to a method and apparatus for generating highly reliable test data, an electronic device, and a storage medium. Background The test data is the core foundation of the software test, and the quality and the generation efficiency directly determine the test effect and the item iteration speed. Current data set generation techniques typically rely on manual writing or use of semi-automated tools, which are time-consuming and labor-consuming, and it is difficult to ensure that the data covers all possible usage scenarios. Insufficient statics and verification of these datasets also often lead to some bias that cannot accommodate new or complex test scenarios. However, with the increase of the complexity of the software system, it has been difficult for the conventional manual test data generation method to meet the efficient and compliant test requirements. On the one hand, traditional manual construction of complex business scene data is time-consuming and laborious, which can lead to inefficiency of testers and even to the problem that complex test scenes are not in danger due to the fact that the complex test scenes are not properly and accurately tested, and therefore products are at risk. On the other hand, the conventional randomly generated data is difficult to cover boundary conditions and business logic, and the online failure rate is easy to be high. In addition, the fields of finance, medical treatment and the like have strict requirements on data privacy, the data is difficult to desensitize, and legal risks exist in directly using production data. Disclosure of Invention The invention provides a method, a device, electronic equipment and a storage medium for generating high-reliability test data, which are used for solving the defects of low test data generation efficiency, service semantic dislocation, insufficient compliance, high maintenance cost and incomplete coverage scene in the prior art, realizing regular modeling, rapid generation, accurate verification and dynamic iteration of the test data, and providing high-quality and high-adaptability test data support for software test. The invention provides a method for generating high-reliability test data, which comprises the following steps: Analyzing a demand document of a target test application through artificial intelligence natural language processing, extracting a service scene, a data type, constraint conditions and compliance requirements, and generating a structured data generation task book of the target test application by combining a service meta-model library, wherein the data generation task book comprises data fields, formats, boundary values and association relations; Invoking a multi-technology module, and generating first test data based on the data generation task book, wherein the first test data comprises structured data, unstructured data and dynamic scene data, and the multi-technology module at least comprises a APIGen assembly line module, a large language model module, a federation learning module and a flow recording and playback module; Performing privacy compliance and data cleaning on the first test data based on the data generation task book to obtain second test data; Performing iterative optimization on the second test data based on a genetic algorithm to obtain third test data; and verifying the third test data through a format verification, function verification and semantic verification three-level verification mechanism to obtain effective test data. According to the method for generating the high-reliability test data provided by the invention, the calling multi-technology module generates the first test data based on the data generation task book, and the method comprises the following steps: Invoking the APIGen assembly line module, generating a task book analysis interface document according to the data, and generating interface parameter type structured data in a JSON or XML format, wherein the interface parameter type structured data comprises a normal value, a boundary value and a special symbol; calling the large language model module, extracting API document semantics according to the data generation task book, and generating text type unstructured data and attack type input test case points, wherein the text type unstructured data comprises a technical document, webpage content and log text; invoking the federation learning module, and generating cross-mechanism table type structured data through a vertical federation learning architecture according to the data generation task book; And calling the flow recording and playback module, recording real service flow according to the data generation task book, performing desensitization processing, and generating concurrent flow data and abnormal s