CN-122021912-A - Intelligent query processing method based on multi-modal interaction

CN122021912ACN 122021912 ACN122021912 ACN 122021912ACN-122021912-A

Abstract

The invention discloses an intelligent query processing method based on multi-modal interaction, which comprises the following steps of collecting and preprocessing multi-modal input data from a user, classifying query intents in standardized query data sets, carrying out information retrieval in a locally constructed power policy knowledge graph according to key information entities and query intents data sets, carrying out deep learning on the entities and relations thereof in the power policy knowledge graph through a graph convolution network, carrying out improved Pareto optimization by combining the background and requirements of the query of the user, adjusting personalized answer strategies through a MAML element learning algorithm, and checking the consistency of answer strategies in multiple rounds of questions and answers by using a Hungarian algorithm. The method integrates multi-mode data processing and deep reasoning, optimizes personalized question-answering strategies and improves policy query accuracy.

Inventors

YAO PENG
WEI LIU
TANG LEI
XI MENGTING
BI QI
ZHAO YU
Cheng huanyu
XIA FEI
LIU MEIZHAO
GU YINGCHENG
LIU KAI
SONG YU

Assignees

国网江苏省电力有限公司镇江供电分公司

Dates

Publication Date: 20260512
Application Date: 20260203

Claims (10)

1. The intelligent query processing method based on multi-mode interaction is characterized by comprising the following steps: Collecting multi-mode input data from a user, and preprocessing the multi-mode input data to obtain a standardized query data set; Classifying query intentions in the standardized query data set, and extracting key information entities and a user query intention data set; step three, information retrieval is carried out in a locally constructed power policy knowledge graph according to the key information entity and the query intention data set; Deep learning of entities and relations thereof in the power policy knowledge graph through a graph convolution network, and reasoning out relations and internal logic among policy treatises to obtain a policy knowledge graph with reasoning enhancement; Fifthly, based on the policy knowledge graph enhanced by reasoning, combining the background and the requirement of user inquiry, adopting improved Pareto optimization to generate a personalized answer strategy; Step six, adjusting the personalized answer strategy through MAML element learning algorithm to obtain a final answer strategy; and step seven, checking consistency of answer strategies in multiple rounds of questions and answers by using a Hungarian algorithm to obtain consistency verified policy solutions.
2. The intelligent query processing method based on multi-modal interaction as claimed in claim 1, wherein said multi-modal input data includes text data, voice data and image data, said preprocessing step comprising: dividing words of the received text data, dividing the text into independent words or phrases, removing punctuation marks and redundant words, and optimizing the text data by using a word marking and stop word removing technology; for voice data input by a user, recognizing audio features in the voice data to perform joint decoding of acoustics and language, and converting voice content into a standardized text form; For image data uploaded by a user, extracting characters in the image by using an OCR technology, extracting character contents, and converting the character contents into a standardized text format; And carrying out structural organization on the processed text data, the text data after voice conversion and the text data extracted by OCR according to a uniform format to obtain a standardized query data set.
3. The intelligent query processing method based on multi-modal interaction according to claim 1, wherein the step two is specifically: carrying out named entity identification on the standardized query data set, and identifying and marking key information entities in the text, wherein the key information entities comprise policy names, regulation clauses and technical requirements; classifying query intents in the text by using a pre-trained BERT intention classification model, wherein the BERT intention classification model is trained based on labeled training data, and a user query intention data set is obtained by calculating the probability of each intention through the context analysis of the text.
4. The intelligent query processing method based on multi-modal interactions according to claim 3, wherein the calculating the probability of each intention is specifically: dividing the query text into sub word units by using a word divider of the BERT, and converting the sub words into corresponding digital IDs to generate an input format required by the BERT; Inputting the coded texts into a BERT intention classification model, and calculating a classification result of each text through forward propagation to obtain a group logits of values, wherein logits is an original score of each category; performing Softmax conversion on logits values output by the BERT intention classification model, and converting the original scores into probability distribution; by calculating the probability of each category, the category with the highest probability is selected as the final query intention.
5. The intelligent query processing method based on multi-modal interaction according to claim 1, wherein the third step is specifically: Searching in a locally constructed power policy knowledge graph through a key information entity and a query intention data set, and returning corresponding information to a user as a search result, wherein the power policy knowledge graph comprises power policy treaty, regulation clause and technical requirement information; If the required policy content cannot be found in the power policy knowledge graph, automatically enabling an external tool set to perform information retrieval, wherein the external tool set comprises an external policy document library and a regulation database; Optimizing a retrieval path by using an ant colony algorithm, and acquiring document fragments related to user query from an external tool set; The ant colony algorithm searches an optimal path in a plurality of search paths, in the information search process, selects the most relevant content from the administrative policy document by simulating the search paths of a plurality of ants, and acquires the latest information fragment which is most relevant to the query in the shortest time; and fusing the latest relevant document fragments acquired from the external tool set with information in the power policy knowledge graph to obtain a final retrieval result.
6. The intelligent query processing method based on multi-modal interaction according to claim 1, wherein the fourth step is specifically: Carrying out convolution operation on each node in the power policy knowledge graph through a graph convolution network, fusing information of the nodes with information of adjacent nodes, and reasoning out implicit relations and internal logic between the nodes; Deducing the hierarchical structure and the causal relationship between the policy treatises through multi-level convolution operation; based on the reasoning result of the graph convolution network, an implicit relation and a hierarchical structure are added into the original power policy knowledge graph to obtain a policy knowledge graph with reasoning enhancement.
7. The intelligent query processing method based on multi-modal interaction according to claim 1, wherein the fifth step is specifically: Extracting core information related to user inquiry by combining with a policy knowledge graph with reasoning enhancement, and combining with historical inquiry records, user identities and inquiry emergency information in a user inquiry background; Through analysis of the query background, an improved Pareto algorithm is adopted to optimize and generate a personalized answer strategy; the optimization targets of the improved Pareto algorithm comprise whether the content of the reply meets the requirements of the policy and whether the speed of reply generation meets the requirements of the user on quick response.
8. The intelligent query processing method based on multi-modal interaction according to claim 7, wherein the optimizing and generating the personalized answer strategy by adopting the improved Pareto algorithm is specifically as follows: calculating a degree matrix and an adjacent matrix corresponding to the topological structure of the reasoning-enhanced policy knowledge graph; the degree matrix is a matrix for representing the degree of each node in the reasoning-enhanced policy knowledge graph, and the degree is the connection number of each node; the degree matrix of the policy knowledge graph with enhanced reasoning is subjected to subtraction with the adjacent matrix to obtain a Laplace matrix; Inputting the Laplace matrix into a convolution layer, and approximating the Laplace matrix by using a Chebyshev polynomial to obtain a policy knowledge characteristic representation; different policy knowledge characteristic representations are used as nodes, and a node set is obtained; calculating Manhattan distances among the policy knowledge feature representations corresponding to different nodes in the node set, and if the Manhattan distances are smaller than a preset distance threshold, establishing connecting edges among the corresponding nodes to obtain a connecting edge set; obtaining a policy knowledge association graph according to the node set and the connection edge set; And generating a personalized answer strategy by solving all pareto optimal solutions based on the policy knowledge association diagram and combining the query background of the user.
9. The intelligent query processing method based on multi-modal interaction according to claim 1, wherein the step six is specifically: extracting various query tasks from different backgrounds, requirements and preferences of user query to serve as training samples; For each task, determining gradient information of the personalized answer strategy, and performing global optimization on the personalized answer strategy by using the gradient information; And optimizing the personalized answer strategy through multiple rounds of element learning training to obtain a final answer strategy.
10. The intelligent query processing method based on multi-modal interaction according to claim 1, wherein the step seven specifically comprises: the Hungarian algorithm matches the final answer strategy with the related content in the history policy document by calculating cosine similarity between the final answer strategy and the related content; The system judges the similarity between the currently generated answer strategy and the historical policy treaty according to the matching result obtained by the Hungarian algorithm calculation; if the similarity between the answer policy and the history policy document is greater than a preset threshold, the generated answer policy is consistent with the history policy, and the rules of the policy are met; if the similarity is lower than a preset threshold, error checking is triggered, and a process of regenerating the solution is performed.

Description

Intelligent query processing method based on multi-modal interaction Technical Field The invention relates to the technical field of geographic information, in particular to an intelligent query processing method based on multi-modal interaction. Background With the rapid development of artificial intelligence technology, intelligent inquiry systems are becoming popular in various industries, and particularly in the fields of policy interpretation, legal and legal consultation and the like, intelligent inquiry systems are playing an increasingly important role. Existing intelligent query systems rely primarily on traditional Natural Language Processing (NLP) techniques to process query information entered by users via text, speech, etc. However, existing intelligent interrogation systems have certain limitations in handling multimodal inputs (e.g., mixed inputs of speech, images, and text). Most systems still rely on a single text input for processing, and the complementarity from the multimodal information source is not fully utilized, resulting in limited comprehensiveness and processing power of the system in the face of complex problems. The existing intelligent query system has certain defects in query intention recognition. Conventional systems often rely on simple keyword matching or rule-based classification methods to resolve a user's query intent, but such approaches often fail to accurately identify the user's actual needs in the face of diverse, complex queries, especially when the user's query context and intent are ambiguous, the system's answers may deviate from the user's expectations. Thus, the prior art often appears to be frustrating in identifying the complex intent of the user query, resulting in an inaccurate and personalized response output by the system. In addition, existing intelligent query systems often rely on simple knowledge retrieval mechanisms for answer generation. In many cases, these systems use rule-based knowledge bases or predefined question-answer pairs for retrieval. However, with the continual updating of policies, regulations, and technical requirements, knowledge bases in conventional systems often have difficulty adapting to new information or making deep inferences about complex policy regulations. Especially in the professional fields of the power industry and the like, complex relations exist among policy regulations, and the knowledge base retrieval method in the prior art often cannot accurately infer implicit relations and internal logics among the policy regulations, so that the system lacks consistency when processing multiple questions and answers, and cannot accurately respond to the inquiry of a user. The existing personalized question-answer generation method also has the defects. The personalized strategies adopted by most intelligent query systems rely on simple rules or machine learning methods based on historical data, but the methods usually ignore key information such as specific query background, emergency degree, identity and the like of users. The query requirement and the background of the user are large in difference, and the existing system cannot provide accurate answers according to the specific requirement of the user, so that the satisfaction degree of the user is low. Therefore, how to provide an intelligent query processing method based on multi-modal interaction is a problem to be solved by those skilled in the art. Disclosure of Invention The invention aims to provide an intelligent query processing method based on multi-mode interaction, which improves the accuracy and response speed of an intelligent query system in the field of power policies by fusing multi-mode data processing, graph convolution network deep reasoning and improved Pareto optimization. And the BERT intention classification, named entity recognition and MAML element learning algorithm are adopted to realize rapid adaptation to new query tasks. The Hungarian algorithm ensures consistency of multiple questions and answers, generates personalized answer strategies, ensures accuracy, timeliness and user satisfaction of policy solutions, and solves the defects of the prior art in complex policy reasoning and personalized service. According to the embodiment of the invention, the intelligent query processing method based on multi-modal interaction comprises the following steps: Collecting multi-mode input data from a user, and preprocessing the multi-mode input data to obtain a standardized query data set; Classifying query intentions in the standardized query data set, and extracting key information entities and a user query intention data set; step three, information retrieval is carried out in a locally constructed power policy knowledge graph according to the key information entity and the query intention data set; Deep learning of entities and relations thereof in the power policy knowledge graph through a graph convolution network, and reasoning out relations an