CN-121979536-A - Domain knowledge enhanced search driving workflow generation method and system
Abstract
A search driving type workflow generating method and system with enhanced domain knowledge are disclosed, wherein natural language demands of users are received, candidate workflow templates with high correlation are searched from a pre-built domain structured knowledge base through a mixed strategy of fusion sparse search and dense search, task step definitions, dependency graphs, API call specifications and domain constraint conditions in the candidate templates are systematically extracted to form a structured knowledge element set, the knowledge set and the demands of the users are deeply fused to construct structured prompt words comprising task description, knowledge reference and generation constraint, and a large model is utilized to generate high-quality executable workflow codes conforming to target platform specifications. The method effectively solves the problems of poor code normalization caused by lack of domain knowledge, low retrieval correlation caused by a single retrieval strategy, insufficient flexibility caused by a hard coding template and the like of the existing method, and remarkably improves the accuracy, normalization and practicability of workflow code generation.
Inventors
- ZHANG JIAPENG
- LUO LUYING
- TANG ZHUO
- SUN SHAN
- YIN DAN
- ZHANG KE
- XIAO XIONG
- LI RUIHUI
Assignees
- 杉湖智算科技(湖南)有限公司
Dates
- Publication Date
- 20260505
- Application Date
- 20260407
Claims (10)
- 1. The domain knowledge enhanced search driving type workflow generation method is characterized by comprising the following steps of: S1, receiving a natural language demand text input by a user; S2, based on the demand text, searching a plurality of candidate workflow templates from a pre-constructed domain structured knowledge base through a mixed searching strategy combining sparse searching and dense searching; s3, systematically extracting structural knowledge elements including task step definitions, dependency graphs, API call specifications and field constraint conditions from the candidate workflow templates to form a knowledge element set; S4, fusing the knowledge element set and the demand text into a structured prompt word through context construction, wherein the structured prompt word comprises a task description section, a knowledge reference section and a generation constraint section; S5, inputting the structured prompt word into a large language model to generate executable workflow codes conforming to the target platform specification.
- 2. The domain knowledge enhanced search driven workflow generation method of claim 1, wherein the hybrid search strategy comprises: calculating vocabulary matching scores of the required text and each template in a knowledge base by using a sparse retrieval algorithm to obtain a sparse retrieval score set; carrying out semantic vector coding on the required text and each template by using a pre-training language model, and calculating semantic similarity scores to obtain a dense retrieval score set; respectively carrying out normalization processing on the sparse retrieval score set and the dense retrieval score set; And carrying out weighted fusion on the normalized sparse retrieval score and the dense retrieval score to obtain a mixed retrieval fusion score of each template, and screening out a plurality of candidate workflow templates according to the mixed retrieval fusion score.
- 3. The domain knowledge enhanced search driven workflow generation method of claim 2 wherein the sparse search algorithm computes vocabulary matching scores based on the frequency of occurrence of tokens in the demand text in templates, template length and inverse document frequency of the tokens.
- 4. The domain knowledge enhanced search driven workflow generation method of claim 2, wherein the process of deriving a dense set of search scores comprises: Extracting the required text and global semantic vectors of all templates through a pre-trained transducer encoder; and calculating cosine similarity between the semantic vector of the required text and the semantic vector of each template to be used as a semantic matching score.
- 5. The domain knowledge enhanced search driven workflow generation method of claim 2, wherein the normalization process is Min-Max normalization, mapping the sparse search score and the dense search score to preset numerical intervals, respectively.
- 6. The domain knowledge enhanced search driven workflow generation method of claim 2 wherein said weighted fusion comprises computing said hybrid search fusion score by assigning weights to normalized scores of sparse searches and normalized scores of dense searches and performing a linear combination.
- 7. The domain knowledge enhanced search driven workflow generation method of claim 2, wherein after calculating the hybrid search fusion score, a score threshold is set, only templates with fusion scores higher than the threshold are retained, and a plurality of templates with highest fusion scores are selected as candidate workflow templates.
- 8. The domain knowledge enhanced search driven workflow generation method of claim 1, wherein said step S3 comprises: extracting task step definitions from the candidate workflow templates, wherein the task step definitions at least comprise step identifiers, names and function descriptions; extracting the dependency relationship between task steps, wherein the dependency relationship is represented by a directed edge, and verifying that the dependency relationship forms a directed acyclic graph; Extracting API call specifications corresponding to each task step, wherein the specifications at least comprise operator types, necessary parameters, input and output data specifications and code examples; domain constraints are extracted, including execution constraints, resource constraints, and data constraints.
- 9. The domain knowledge enhanced search driven workflow generation method of claim 1, wherein said step S4 comprises: constructing a task description section based on a user demand text, and defining demand contents, a target execution platform and an output format; based on the knowledge element set, hierarchically organizing information according to task flows, API calls and constraint conditions, and constructing a knowledge reference section; And constructing a generation constraint segment based on the target platform universal code specification and the domain constraint condition extracted from the knowledge elements, wherein the generation constraint segment comprises a static specification requirement and a dynamic generation instruction formed by converting the domain constraint.
- 10. A domain knowledge enhanced search driven workflow generation system comprising: At least one processing unit configured to: Receiving a natural language requirement text input by a user; Based on the demand text, a plurality of candidate workflow templates are retrieved from a pre-constructed domain structured knowledge base through a mixed retrieval strategy combining sparse retrieval and dense retrieval; Systematically extracting structured knowledge elements comprising task step definitions, dependency graphs, API call specifications and field constraint conditions from the candidate workflow templates to form a knowledge element set; Fusing the knowledge element set and the demand text into a structured prompt word through context construction, wherein the structured prompt word comprises a task description section, a knowledge reference section and a generation constraint section; And inputting the structured prompt word into a pre-deployed large language model to generate executable workflow codes conforming to the target platform specification.
Description
Domain knowledge enhanced search driving workflow generation method and system Technical Field The invention relates to a large language model and workflow automatic generation technology, in particular to a search driving type workflow generation method and system with enhanced domain knowledge. Background In recent years, with the rapid development of large language model technology, large models represented by GPT, claude, deepSeek and the like show strong capability in the code generation field, and a new technical path is provided for workflow automatic generation. However, when a large language model is applied to workflow code generation, due to lack of efficient acquisition and utilization mechanisms of knowledge of a specific field, the generated code often has problems of lack of structural constraint, inconsistent API specifications, lack of field best practices and the like, and the requirements of a production environment on code quality and executable performance are difficult to meet. Therefore, how to effectively inject domain knowledge into the generation process of a large language model becomes a key technical challenge for improving the generation quality of workflow codes. At present, three knowledge enhancement methods for workflow code generation are mainly used, namely a pure large language model direct generation method which completely depends on the internal knowledge of the large language model and generates workflow codes through a design prompt word guide model, a knowledge enhancement method based on a single retrieval strategy which adopts a certain retrieval mode in keyword matching or semantic similarity calculation to retrieve related contents from an external knowledge base as a generation reference, and a knowledge injection method based on a hard coding template which embeds domain knowledge into a generation flow through a predefined code template and a fixed filling rule. However, there are significant disadvantages to all three of the above methods: Firstly, in the direct generation method of the pure large language model, due to the lack of support of knowledge in the external field, the generated workflow codes often do not accord with the API call specification of a target platform (such as Apache Airflow), and the lack of the representation of best practices in the fields of task dependency, execution constraint, resource configuration and the like, so that the code quality is difficult to guarantee; secondly, the knowledge enhancement method based on the single retrieval strategy has the problem that retrieval precision and semantic understanding are difficult to consider, synonym replacement and expression change cannot be processed by only adopting sparse retrieval matched with keywords, and accurate matching capability on domain professional terms and API names is insufficient by only adopting dense retrieval of vector similarity, so that the correlation between the retrieved knowledge and user requirements is not high; thirdly, the knowledge injection method based on the hard coding template is seriously insufficient in flexibility, is difficult to adapt to the diversified natural language expression and complex and changeable workflow requirements of users, and has a rough knowledge organization process from the template to the final prompt word, and the structural knowledge elements contained in the template cannot be fully extracted and utilized. It should be noted that the information disclosed in the above background section is only for understanding the background of the application and thus may include information that does not form the prior art that is already known to those of ordinary skill in the art. Disclosure of Invention The invention aims to overcome the defects in the background technology and provide a search driving type workflow generation method and system with enhanced domain knowledge. In order to achieve the above purpose, the present invention adopts the following technical scheme: In a first aspect of the present invention, a domain knowledge enhanced search driven workflow generation method includes the steps of: S1, receiving a natural language demand text input by a user; S2, based on the demand text, searching a plurality of candidate workflow templates from a pre-constructed domain structured knowledge base through a mixed searching strategy combining sparse searching and dense searching; s3, systematically extracting structural knowledge elements including task step definitions, dependency graphs, API call specifications and field constraint conditions from the candidate workflow templates to form a knowledge element set; S4, fusing the knowledge element set and the demand text into a structured prompt word through context construction, wherein the structured prompt word comprises a task description section, a knowledge reference section and a generation constraint section; S5, inputting the structured prompt word into a larg