CN-116910220-B - Multi-round dialogue interaction processing method, device, equipment and storage medium

CN116910220BCN 116910220 BCN116910220 BCN 116910220BCN-116910220-B

Abstract

The application relates to the technical field of artificial intelligence and provides a multi-round dialogue interaction processing method, device, equipment and storage medium. The method comprises the steps of collecting historical dialogue data and current dialogue data, inputting the historical dialogue data and the current dialogue data into a pre-trained joint model to obtain historical dialogue vectors and current dialogue vectors output by the joint model, splicing the historical dialogue data and the current dialogue data to obtain spliced data, inputting the spliced data into the joint model to obtain vectorized semantic information output by the joint model, determining that text retrieval is required based on the historical dialogue vectors and the current dialogue vectors, obtaining retrieval text based on the vectorized semantic information, and generating response information based on the retrieval text. The application can judge whether to quote knowledge and generate the proper search keyword to inquire the knowledge graph through the user semantic information, and based on the knowledge graph, the efficiency and the accuracy of intelligent dialogue are improved.

Inventors

GU SUNYAN
ZHANG XIANG
LU TAOYU

Assignees

中移（杭州）信息技术有限公司
中国移动通信集团有限公司

Dates

Publication Date: 20260508
Application Date: 20230731

Claims (10)

1. A multi-round dialogue interaction processing method, comprising: Collecting historical dialogue data and current dialogue data, inputting the historical dialogue data and the current dialogue data into a pre-trained joint model, and obtaining a historical dialogue vector and a current dialogue vector output by the joint model; Splicing the historical dialogue data and the current dialogue data to obtain spliced data, inputting the spliced data into the joint model, and obtaining vectorized semantic information output by the joint model; based on the historical dialogue vector and the current dialogue vector, determining that text retrieval is required, and acquiring retrieval text based on the vectorized semantic information; generating response information based on the search text; The joint model is obtained by training a preset model through sample data, the sample data comprise sample dialogue data and knowledge search content labels and knowledge classification labels of the sample dialogue data, the joint model comprises a shared encoder, an understanding decoder and a generating decoder, the shared encoder is used for encoding an input sequence and capturing context information of an input sentence, the understanding decoder comprises self-attention and a pre-training mask model, the self-attention generates attention weights related to the input sequence according to the context information of the input sequence, the pre-training mask model performs unsupervised pre-training in a mode of masking part of the input sequence, the generating decoder comprises self-attention of a mask and a pre-training DAE, the generating decoder is used for generating a next word or phrase according to a previously generated text segment and the context information, the self-attention of the mask is used for providing context understanding of the currently generated word, and the pre-training DAE is learned to the next word or phrase in a self-encoding mode for the generation of the potential word or phrase.
2. The multi-turn dialog interaction processing method of claim 1 wherein the determining that text retrieval is required based on the historical dialog vector and the current dialog vector comprises: splicing the historical dialogue vector and the current dialogue vector to obtain a spliced vector; inputting the spliced vector to a first decoder module, and obtaining a first decoding result output by the first decoder module; if the first decoding result is the first set value, text retrieval is needed.
3. The multi-turn dialogue interactive processing method according to claim 2, wherein after inputting the spliced vector to a first decoder module and obtaining a first decoding result output by the first decoder module, the method further comprises: if the first decoding result is the second set value, text retrieval is not needed, and universal boring response information is generated.
4. The multi-round dialogue interaction processing method according to claim 1, wherein the obtaining the search text based on the vectorized semantic information includes: Inputting the vectorized semantic information to a second decoder module, and obtaining a second decoding result output by the second decoder module; And acquiring the search text based on the second decoding result.
5. The multi-round dialogue interactive processing method according to claim 1, wherein the generating response information based on the search text includes: extracting target text from a knowledge base based on the search text; And generating the response information based on the target text.
6. The multi-round dialogue interaction processing method according to claim 1, wherein the joint model is trained based on the following steps: Collecting sample dialogue data; performing knowledge search content labeling and knowledge classification labeling on the sample dialogue data to generate the sample data; and training a preset model by adopting the sample data to obtain the joint model.
7. The method of claim 6, wherein the performing knowledge search content annotation and knowledge classification annotation on the sample dialogue data comprises: Labeling the sample dialogue data of the current turn as retrievable content of a knowledge base; The sample dialogue data for each round is labeled as requiring reference knowledge or not requiring reference knowledge.
8. A multi-round dialog interaction handling device, comprising: The collection module is used for collecting historical dialogue data and current dialogue data, inputting the historical dialogue data and the current dialogue data into a pre-trained joint model, and obtaining a historical dialogue vector and a current dialogue vector output by the joint model; the semantic information determining module is used for splicing the historical dialogue data and the current dialogue data to obtain spliced data, inputting the spliced data into the joint model and obtaining vectorized semantic information output by the joint model; The search text determining module is used for determining that text search is required based on the historical dialogue vector and the current dialogue vector, and acquiring search text based on the vectorized semantic information; the response information generation module is used for generating response information based on the search text; The joint model is obtained by training a preset model through sample data, the sample data comprise sample dialogue data and knowledge search content labels and knowledge classification labels of the sample dialogue data, the joint model comprises a shared encoder, an understanding decoder and a generating decoder, the shared encoder is used for encoding an input sequence and capturing context information of an input sentence, the understanding decoder comprises self-attention and a pre-training mask model, the self-attention generates attention weights related to the input sequence according to the context information of the input sequence, the pre-training mask model performs unsupervised pre-training in a mode of masking part of the input sequence, the generating decoder comprises self-attention of a mask and a pre-training DAE, the generating decoder is used for generating a next word or phrase according to a previously generated text segment and the context information, the self-attention of the mask is used for providing context understanding of the currently generated word, and the pre-training DAE is learned to the next word or phrase in a self-encoding mode for the generation of the potential word or phrase.
9. An electronic device comprising a processor and a memory storing a computer program, characterized in that the processor implements the steps of the multi-round dialog interaction handling method of any of claims 1 to 7 when the computer program is executed.
10. A non-transitory computer readable storage medium having stored thereon a computer program, characterized in that the computer program when executed by a processor implements the steps of the multi-round dialog interaction handling method of any of claims 1 to 7.

Description

Multi-round dialogue interaction processing method, device, equipment and storage medium Technical Field The application relates to the technical field of artificial intelligence, in particular to a multi-round dialogue interaction processing method, device, equipment and storage medium. Background Intelligent dialog is a sub-direction in the field of artificial intelligence, specifically to allow a person to interact with a computer through a human language. Intelligent conversations may be divided into single-round conversations and multi-round conversations, where multiple rounds of conversations typically interact in conjunction with historical information. The multi-round dialogue is divided into a multi-round dialogue of a vertical field task type and a multi-round dialogue of an open field type, the dialogue mode of a family scene is open, and communication between users and equipment is closer to interaction between people. At present, an intelligent dialogue generally builds an entity dictionary, places common information in the field into the entity dictionary, then identifies key entities in user interaction through an entity identification method, and finally searches out a response result through a knowledge base. However, the technical scheme has the following problems that the method for searching knowledge by constructing the entity dictionary needs to ensure that the dialogue keywords of the user are all in the entity dictionary, but the dialogue user of the family scene is spoken more, and in many cases, the keywords cannot be ensured to be in the entity dictionary. Meanwhile, in a multi-round dialogue, the method can directly search knowledge after identifying key entities, and in an actual family scene, a user may only want to chat and not necessarily want to know certain knowledge. Based on this, intelligent conversations result in low accuracy. Disclosure of Invention The embodiment of the application provides a multi-round dialogue interaction processing method, device, equipment and storage medium, which are used for solving the problem of low accuracy of intelligent dialogue. In a first aspect, an embodiment of the present application provides a method for processing multi-round dialogue interaction, including: Collecting historical dialogue data and current dialogue data, inputting the historical dialogue data and the current dialogue data into a pre-trained joint model, and obtaining a historical dialogue vector and a current dialogue vector output by the joint model; Splicing the historical dialogue data and the current dialogue data to obtain spliced data, inputting the spliced data into the joint model, and obtaining vectorized semantic information output by the joint model; based on the historical dialogue vector and the current dialogue vector, determining that text retrieval is required, and acquiring retrieval text based on the vectorized semantic information; generating response information based on the search text; The joint model is obtained by training a preset model by adopting sample data, wherein the sample data comprises sample dialogue data, knowledge searching content labels and knowledge classification labels. In one embodiment, the determining that text retrieval is required based on the historical dialog vector and the current dialog vector includes: splicing the historical dialogue vector and the current dialogue vector to obtain a spliced vector; inputting the spliced vector to a first decoder module, and obtaining a first decoding result output by the first decoder module; if the first decoding result is the first set value, text retrieval is needed. In one embodiment, after the splicing vector is input to the first decoder module and the first decoding result output by the first decoder module is obtained, the method further includes: if the first decoding result is the second set value, text retrieval is not needed, and universal boring response information is generated. In one embodiment, the obtaining the search text based on the vectorized semantic information includes: Inputting the vectorized semantic information to a second decoder module, and obtaining a second decoding result output by the second decoder module; And acquiring the search text based on the second decoding result. In one embodiment, the generating response information based on the search text includes: extracting target text from a knowledge base based on the search text; And generating the response information based on the target text. In one embodiment, the joint model is trained based on the following steps: Collecting sample dialogue data; performing knowledge search content labeling and knowledge classification labeling on the sample dialogue data to generate the sample data; and training a preset model by adopting the sample data to obtain the joint model. In one embodiment, the performing knowledge search content annotation and knowledge classification annotation on the sample