CN-122019389-A - Large model driven time sequence database user-defined function test method and system

CN122019389ACN 122019389 ACN122019389 ACN 122019389ACN-122019389-A

Abstract

The embodiment of the application provides a method and a system for testing a user-defined function of a time sequence database driven by a large model. The method comprises the steps of extracting domain knowledge related to a user-defined function from a pre-built time sequence database knowledge base by adopting a retrieval enhancement generation strategy, constructing a static analysis instruction based on the domain knowledge, inputting a source code of the user-defined function and the static analysis instruction into a first large language model for analysis processing to obtain a static analysis result, determining time sequence characteristics of test data according to the static analysis result, generating test data conforming to the time sequence characteristics by utilizing a fine-tuned second large language model, constructing a test case according to the test data, and testing the user-defined function by utilizing the test case according to a preset execution strategy to obtain a dynamic test result. The method improves the efficiency and accuracy of the user-defined function test.

Inventors

WAN BANGRUI
LI HONGYANG
ZENG JIKAI
QIAN YING
LIU XIN

Assignees

重庆邮电大学

Dates

Publication Date: 20260512
Application Date: 20260205

Claims (10)

1. A method for testing a user-defined function of a time sequence database driven by a large model is characterized by comprising the following steps: receiving and decoding a code packet of the user-defined function to obtain a source code of the user-defined function; pre-checking the source code of the user-defined function, and judging whether the pre-checking is passed or not; under the condition that the pre-verification passes, extracting domain knowledge related to the user-defined function from a pre-constructed time sequence database knowledge base by adopting a retrieval enhancement generation strategy, and constructing a static analysis instruction based on the domain knowledge; inputting the source codes of the user-defined functions and the static analysis instructions into a first large language model for analysis processing to obtain a static analysis result; determining time sequence characteristics of the test data according to the static analysis result, and generating the test data conforming to the time sequence characteristics by utilizing the trimmed second large language model; And constructing a test case according to the test data, and testing the user-defined function according to a preset execution strategy by utilizing the test case to obtain a dynamic test result.
2. The method of claim 1, wherein the pre-verification comprises: performing an integrity check on the source code of the user-defined function, verifying whether the user-defined function contains components necessary for operation; performing general grammar checking on source codes of the user-defined functions, and verifying whether the user-defined functions accord with preset grammar rules or not; And when the integrity check and the grammar check are both passed, judging that the user-defined function is in an executable state.
3. The method of claim 1, wherein constructing a time-series database knowledge base comprises: Collecting texts and codes related to the user-defined functions from an official warehouse, a technical manual, an API document and an open source community of a target time sequence database to form an original knowledge set; Respectively cutting the text and the code in the original knowledge set to obtain an original knowledge fragment set; generating a corresponding index abstract for each original knowledge segment in the set of original knowledge segments; Forming index knowledge ordered pairs by the index abstracts and the corresponding original knowledge segment identifiers, and storing the index knowledge ordered pairs into an index knowledge base; Forming each original knowledge segment and a corresponding original knowledge segment identifier into a content ordered pair, and storing the content ordered pair into a content knowledge base; and establishing a mapping relation between the index knowledge base and the content knowledge base, and forming a complete time sequence database knowledge base based on the index knowledge base, the content knowledge base and the mapping relation.
4. The method of claim 3, wherein the generating a corresponding index digest for each of the set of original knowledge segments comprises: if the original knowledge fragment is a text fragment, extracting core semantic points of the text fragment by a natural language processing technology, and generating an index abstract of the text fragment; if the original knowledge segment is a code segment, analyzing the logic structure of the code segment to generate an index abstract of the code segment.
5. A method according to claim 3, wherein said extracting domain knowledge related to said user-defined function from a pre-built time-series database knowledge base using a search enhancement generation strategy comprises: analyzing the source code of the user-defined function, and determining the query requirement of the user-defined function; converting the query requirement into a query vector, and performing similarity matching on the query vector and an index abstract in the index knowledge base; determining a target index abstract most relevant to the query vector according to the similarity matching result; And determining a target original knowledge segment corresponding to the target index abstract in the content knowledge base according to the mapping relation between the index knowledge base and the content knowledge base, and taking the target original knowledge segment as a domain knowledge query result.
6. The method of claim 1, wherein determining the timing characteristics of the test data based on the static analysis results and generating test data conforming to the timing characteristics using the trimmed second large language model comprises: extracting the function category of the user-defined function from the static analysis result; Mapping the functional category of the user-defined function to the time sequence characteristic of the test data through a mapping function, wherein the mapping function is instantiated through a predefined rule base; And constructing a prompt instruction of the second large language model according to the time sequence characteristics, and generating test data conforming to the time sequence characteristics according to the prompt instruction by utilizing the second large language model.
7. The method of claim 1, wherein the constructing a test case according to the test data, and testing the user-defined function according to a preset execution policy by using the test case, to obtain a dynamic test result, includes: Generating an execution script according to the operation requirement of the user-defined function; Constructing a test case based on the test data, the execution script, the test scene requirements, the expected output and the case metadata; Registering the user-defined function in a test environment of a time sequence database, and loading and executing the test case in the test environment according to a preset execution strategy to obtain a dynamic test result.
8. The method according to claim 1, wherein the method further comprises: performing association analysis on the static analysis result and the dynamic test result to obtain an association analysis result; And generating a structured report for the user-defined function according to the correlation analysis result and a preset quality evaluation dimension.
9. A large model driven time series database user-defined function test system, the system comprising: the primary analysis unit is used for receiving and decoding a code packet of the user-defined function to obtain a source code of the user-defined function; The preliminary analysis unit is further used for pre-checking the source code of the user-defined function and judging whether the pre-checking is passed or not; the knowledge retrieval unit is used for extracting domain knowledge related to the user-defined function from a pre-constructed time sequence database knowledge base by adopting a retrieval enhancement generation strategy under the condition that the pre-verification passes, and constructing a static analysis instruction based on the domain knowledge; the static analysis unit is used for inputting the source codes of the user-defined functions and the static analysis instructions into a first large language model for analysis and processing to obtain a static analysis result; The test management unit is used for determining the time sequence characteristics of the test data according to the static analysis result; The test data generating unit is used for generating test data conforming to the time sequence characteristics by utilizing the trimmed second large language model; And the test execution unit is used for constructing a test case according to the test data, and testing the user-defined function according to a preset execution strategy by utilizing the test case to obtain a dynamic test result.
10. A large model driven time series database user-defined function test device, the device comprising: a memory; A processor; Wherein the memory stores computer-executable instructions; The processor executes the computer-executable instructions stored in the memory to implement the large model driven time series database user-defined function test method of any one of claims 1-8.

Description

Large model driven time sequence database user-defined function test method and system Technical Field The application relates to the technical field of software testing, in particular to a method and a system for testing a user-defined function of a time sequence database driven by a large model. Background With the rapid development of the internet of things and edge computing, the time sequence database is used as a core infrastructure for processing time sequence data, and is widely applied to various industries such as intelligent manufacturing, environment monitoring and intelligent transportation. The time series database (e.g., apache IoTDB) allows the user-defined functions to extend the data processing capabilities to meet specific business needs, such as complex aggregate computations, anomaly detection, and feature analysis. However, existing testing of user-defined functions still suffers from the following drawbacks and limitations: The test case generation efficiency is low, namely, the traditional User-Defined Function (UDF) test mainly relies on manually writing the test case, and is time-consuming and easy to make mistakes. And part of time sequence data has unique characteristics such as time stamp continuity and periodic sampling, and the manual construction of the data needs to be deeply understood on the database kernel and the service scene, so that the test preparation period is long and the cost is high. The automation technology has strong limitation that the existing automation testing tools cannot effectively process time sequence data characteristics based on symbol execution, fuzzy testing or genetic algorithm. And these tools are usually designed for general software testing, lack of knowledge in the field of time series databases, resulting in generated test cases that do not conform to time series database user-defined function specifications (e.g., return value type constraints, time window processing rules). Large model lack field adaptation in recent years, large language models (Large Language Model, LLM) have been used for automated test case generation, but direct application of LLM to generate a time series database UDF test case has knowledge shortfalls. LLM may not cover API documents, UDF best practices, and programming patterns of a particular timing database during the pre-training phase, resulting in inaccurate or invalid use cases. Disclosure of Invention In order to solve the problems, the application provides a method and a system for testing a user-defined function of a time sequence database driven by a large model. In a first aspect, the present application provides a method for testing a user-defined function of a time sequence database driven by a large model, the method comprising: receiving and decoding a code packet of the user-defined function to obtain a source code of the user-defined function; pre-checking the source code of the user-defined function, and judging whether the pre-checking is passed or not; under the condition that the pre-verification passes, extracting domain knowledge related to the user-defined function from a pre-constructed time sequence database knowledge base by adopting a retrieval enhancement generation strategy, and constructing a static analysis instruction based on the domain knowledge; inputting the source codes of the user-defined functions and the static analysis instructions into a first large language model for analysis processing to obtain a static analysis result; determining time sequence characteristics of the test data according to the static analysis result, and generating the test data conforming to the time sequence characteristics by utilizing the trimmed second large language model; And constructing a test case according to the test data, and testing the user-defined function according to a preset execution strategy by utilizing the test case to obtain a dynamic test result. Optionally, the pre-verification includes: performing an integrity check on the source code of the user-defined function, verifying whether the user-defined function contains components necessary for operation; performing general grammar checking on source codes of the user-defined functions, and verifying whether the user-defined functions accord with preset grammar rules or not; And when the integrity check and the grammar check are both passed, judging that the user-defined function is in an executable state. Optionally, constructing the time sequence database knowledge base includes: Collecting texts and codes related to the user-defined functions from an official warehouse, a technical manual, an API document and an open source community of a target time sequence database to form an original knowledge set; Respectively cutting the text and the code in the original knowledge set to obtain an original knowledge fragment set; generating a corresponding index abstract for each original knowledge segment in the set of original knowled