CN-121979991-A - Industrial process multi-mode voice question-answering system and method
Abstract
The invention provides an industrial process multi-mode voice question-answering system and method, which relate to the technical field of man-machine interaction and comprise the steps of receiving natural language query about an industrial process input by a user, carrying out entity identification and link on the natural language query, determining at least one process entity pointed by the natural language query, carrying out relation traversal and reasoning on the at least one process entity based on a preset industrial process knowledge graph, expanding to obtain at least one associated element implicitly related to the query, wherein the associated element comprises one or more of associated equipment, associated parameters and associated graph numbers, fusing the natural language query and the associated element obtained by expansion to generate an enhanced query, and carrying out retrieval in parallel by utilizing the enhanced query. The invention has higher search precision, the obtained graph and text delivery, zero operation cost and stream voice interaction, so that a worker can obtain guidance without removing gloves and typing when operating a machine tool or assembling.
Inventors
- ZHANG HAI
- HU YAOZONG
- WANG DONG
- LIU HAIXIN
Assignees
- 广州黄埔星数智科技有限公司
Dates
- Publication Date
- 20260505
- Application Date
- 20260123
Claims (10)
- 1. The industrial process multi-mode question-answering method based on knowledge graph enhancement is characterized by comprising the following steps of: S1, receiving natural language query about an industrial process input by a user; s2, entity identification and link are carried out on the natural language query, and at least one technological entity pointed by the natural language query is determined; S3, performing relation traversal and reasoning on the at least one process entity based on a preset industrial process knowledge graph, and expanding to obtain at least one association element implicitly related to the query, wherein the association element comprises one or more of association equipment, association parameters and association graph numbers; s4, fusing the natural language query with the association elements obtained by expansion to generate an enhanced query; s5, utilizing the enhanced query, executing the following search in parallel: Retrieving relevant text knowledge segments from a structured/unstructured document library; determining and acquiring at least one visualization process resource corresponding to the current query context from the knowledge graph according to a predefined association relationship; s6, generating a text reply by using a large language model based on the retrieved text knowledge fragment, and aligning and packaging the at least one visual process resource and the text reply to form a multi-mode reply result; And S7, outputting the multi-mode response result.
- 2. The multi-modal question-answering method according to claim 1, wherein in step S1, the receiving of the user input means that a front-end device collects a continuous voice stream of a user on an industrial site, and performs real-time voice activity detection and segmentation on the voice stream, and performs real-time transcription by using a streaming automatic voice recognition engine to obtain text content of the natural language query.
- 3. The multi-mode question-answering method of the industrial process according to claim 1, wherein in the step S3, the industrial process knowledge graph uses process step nodes as cores, equipment nodes, material nodes, parameter nodes and resource nodes are connected through relation edges, and the relation traversal and reasoning specifically comprises the steps of positioning corresponding process step nodes in the knowledge graph according to the identified process entity, and traversing to obtain closely related node information as the related elements.
- 4. The multi-modal question-answering method according to claim 3, wherein in step S5, the step of obtaining the visual process resource means directly obtaining a corresponding process drawing, parameter table or storage identifier or access address of a three-dimensional model according to a predefined relationship between the process node and the resource node.
- 5. The industrial process multi-modal question-answering method according to claim 1, wherein in step S6, aligning and packaging the visual process resource with the text answer means that the large language model inserts a structured placeholder at a location where the visual resource needs to be referenced when generating the text answer; And the back-end system packages the generated text stream and the visual resource access address acquired according to the resource node corresponding to the placeholder into a uniform structured data object with a time sequence or paragraph mark.
- 6. The industrial process multimodal question-answering method according to claim 5, wherein the structured data object is in JSON format.
- 7. The industrial process multi-mode question-answering method according to claim 1, further comprising an incremental synchronization step S0, specifically comprising: monitoring for updates of upstream process document sources; when the document updating is detected, automatically analyzing the changing content, synchronously updating the vector index of the document library, and triggering the updating of the relevant node attribute in the industrial process knowledge graph to keep the consistency of knowledge.
- 8. A speech question-answering system for implementing the industrial process multi-modal question-answering method according to any one of claims 1 to 7, comprising: the interactive access module is used for receiving natural language query input by a user, wherein the query is from self-flowing voice transcription or text input; The knowledge graph module stores a networked knowledge structure formed by industrial process entities and relations; The query understanding and enhancing module is used for carrying out entity identification on the query, querying the knowledge graph module to carry out relation reasoning and query expansion, and generating enhanced query; the mixed retrieval module is used for retrieving text fragments from an external document library in parallel according to the enhanced query and retrieving bound visualization process resources from the knowledge graph module; The multi-modal generation and synthesis module is used for calling the large language model to generate a text reply based on the searched text fragments, and synthesizing the text reply with the searched visual process resources to form a multi-modal reply; And the output module is used for rendering and presenting the multi-mode response result to a user.
- 9. The system of claim 8, wherein the interactive access module further comprises: And the streaming voice processing unit is used for butting the front-end microphone to perform voice activity detection, noise reduction and streaming voice recognition, so that real-time and continuous transcription of the voice of the user is realized.
- 10. The system as recited in claim 8, further comprising: And the increment synchronization module is used for monitoring process document update, automatically synchronizing update contents to the external document library and the knowledge graph module, and ensuring timeliness of system knowledge.
Description
Industrial process multi-mode voice question-answering system and method Technical Field The invention relates to the technical field of man-machine interaction, in particular to an industrial process multi-mode voice question-answering system and method. Background With the rapid development of artificial intelligence technology, intelligent question-answering systems based on large language models are applied in a plurality of vertical fields. The prior technical scheme, such as a ChatGPT-based electric power knowledge question and answer assistant construction method with publication number of CN116932911A, a railway industry intelligent question and answer assistant system with publication number of CN117633179A, a large language model-based urban intelligent question and answer assistant method with publication number of CN119886327A and other patent documents, shows a typical flow of a general search enhancement generation mode, namely vectorizing natural language questions input by a user, carrying out similarity search in a pre-constructed document vector library, inputting the searched text paragraphs as contexts into a large language model, and finally generating pure text answers. However, when the above-described general technical solution is applied to complex, dynamic, high-noise industrial field environments such as assembly, welding, machining lines, the inherent drawbacks are amplified, it is difficult to satisfy the actual requirements, and the following technical drawbacks exist: First, the problem of insufficient search accuracy and semantic gap is solved. The prior art mainly relies on semantic similarity or keyword matching of text content for retrieval. In an industrial setting, however, first-line operators are accustomed to using short, spoken language questions, while technical documents use strict, specialized technical terms. The semantic gap between the spoken language question and the specialized document leads to the fact that the traditional vector search is easy to miss technical documents containing key parameters such as tolerance, material specification and the like, or returns a large number of uncorrelated results, and the search precision is low. The information presentation is then single dimensional, lacking key visual content. Industrial process knowledge is highly dependent on non-textual information such as structured drawings, assembly drawings, parameter tables, and three-dimensional models. A solution presented by a power knowledge question and answer assistant construction method based on ChatGPT in CN116932911a can generally only return answers in plain text form, and cannot accurately and reliably correlate and present process drawings or technical charts directly related to the current process in the answers. This is a great discount on the information value of the field workers who need to be constructed on a graph. Finally, the interaction pattern is not adapted to field operation constraints. Existing question-answering systems rely mostly on text input or non-streaming single voice instructions. In industrial sites, the hands of operators are often occupied by tools, workpieces, and the environment is noisy. The existing interaction mode cannot support hands-free, continuous and real-time voice conversation, increases the cognitive load of operators and the operation interruption cost, and has limited practicability. That is, there is room for improvement in the prior art, and it is necessary to disclose an intelligent question-answering technique and system capable of understanding the industrial spoken language, accurately retrieving the mixed knowledge of graphics and text, and supporting the natural streaming voice interaction, so as to overcome the defects in the prior art. Disclosure of Invention The invention overcomes the defects in the prior art, provides the multi-mode voice question-answering system and the method for the industrial process, and has the advantages of higher search precision and image-text delivery obtained immediately after the search. In order to solve the technical problems, the invention is realized by the following technical scheme: an industrial process multi-mode question-answering method based on knowledge graph enhancement comprises the following steps: S1, receiving natural language query about an industrial process input by a user; s2, entity identification and link are carried out on the natural language query, and at least one technological entity pointed by the natural language query is determined; S3, performing relation traversal and reasoning on the at least one process entity based on a preset industrial process knowledge graph, and expanding to obtain at least one association element implicitly related to the query, wherein the association element comprises one or more of association equipment, association parameters and association graph numbers; s4, fusing the natural language query with the association el