CN-115662431-B - Voice robot call method and device adopting high-generalization multitasking intention recognition

CN115662431BCN 115662431 BCN115662431 BCN 115662431BCN-115662431-B

Abstract

The application relates to a voice robot call method and device adopting high-generalization multitasking intention recognition. The method comprises the steps of establishing voice communication between a voice robot and a user, acquiring real-time voice text data of the user in the communication process, inputting the real-time voice text data into a multi-task intention recognition model to generate a replacement text, intention category and a predicted text, determining the accuracy of the multi-task intention recognition model according to the replacement text and the predicted text, updating a ditch communication operation based on the intention category when the accuracy is greater than a threshold value, and continuing the voice communication with the user according to the updated ditch communication operation by the voice robot. The method can make the generalization capability of the intent model stronger, so that the user intent can be analyzed more accurately, and smoother voice communication is carried out with the user.

Inventors

Ma Dabiao
LI MENG

Assignees

北海淇诚信息科技有限公司

Dates

Publication Date: 20260508
Application Date: 20220907

Claims (12)

1. A voice robot call method adopting high generalization multitasking intention recognition is characterized by comprising the following steps: The voice robot establishes voice communication with a user, and acquires real-time voice text data of the user in the communication process; Inputting the real-time voice text data into a multitasking intention recognition model, extracting characters with preset proportions in the real-time voice text data for replacement to generate input text data containing replaced characters, carrying out intention prediction on the input text data to generate intention types, and predicting the replaced characters in the input text data to generate predicted text corresponding to the input text data; comparing the similarity between the replaced character and the predicted text; Determining the accuracy of the meaning category in the multi-task intention recognition model according to the similarity comparison result; Updating a trench call based on the intent category when the accuracy is greater than a threshold; And the voice robot continues to perform voice communication with the user according to the updated ditch communication technology.
2. The method as recited in claim 1, further comprising: acquiring historical voice text data; Setting an intention category label for the historical voice text data; the multitasking modified BERT language model is trained with historical phonetic text data with intent class labels to generate the multitasking intent recognition model.
3. The method of claim 2, wherein training a multitasking modified BERT language model with historical phonetic text data having intent class labels to generate the multitasking intent recognition model comprises: Training the BERT language model through historical voice text data with the intention type labels to generate a pre-training model; performing replacement operation on partial characters in the historical voice text data to generate input text data; inputting the input text data into the pretraining model based on the multitasking improvement to obtain a predicted text and a predicted intention classification; And generating the multi-task intention recognition model when the loss function corresponding to the predicted text and the predicted intention classification meets a preset strategy.
4. The method of claim 3, wherein training the BERT language model with historical phonetic text data with intent class labels generates a pre-training model comprising: All characters in the historical voice text data are subjected to word vector conversion to generate word embedding tensor, sentence block tensor and position coding tensor of each character; generating a character vector for each character based on the word embedding tensor, the sentence blocking tensor, and the position coding tensor; training the BERT language model through character word vectors with intention category labels to generate the pre-training model.
5. The method of claim 3, wherein replacing a portion of the characters in the historical phonetic text data to generate the input text data comprises: extracting characters with preset proportion for replacement operation; Storing part of characters before the replacement operation; and generating the input text data through the character vectors of the non-replaced characters and the replaced characters.
6. The method of claim 3, wherein inputting the input text data into the pretrained model based on multitasking improvement results in a predicted text and a predicted intent classification, comprising: Inputting the input text data into the pre-training model; The pre-training model predicts the intention of the input text based on a bi-directional coding mechanism; The pre-training model identifies replaced characters in the input text based on a multitasking mechanism; the pre-training model generates a predicted text and a predicted intention classification according to the calculation result.
7. The method of claim 3, wherein generating the multi-tasking intent recognition model when the loss function corresponding to the predicted text and the predicted intent classification satisfies a preset policy comprises: comparing the similarity between the predicted text and the replaced character to generate a first comparison result; comparing the predicted intention classification with the intention labels to generate a second comparison result; And generating the multi-task intention recognition model according to the current parameters of the pre-training model when the first comparison result is larger than a text threshold value and the second comparison result is larger than an intention threshold value.
8. The method of claim 1, wherein the voice robot establishes a voice call with the user, and wherein acquiring real-time voice text data of the user during the call comprises: determining a ditch call operation according to the user information; the voice robot performs voice communication with the user based on the communication technology; The real-time voice data of the user is converted into voice text data through voice recognition.
9. The method as recited in claim 1, further comprising: And when the accuracy rate is smaller than or equal to a threshold value, the voice robot continues to perform voice communication with the user according to the original ditch communication technology.
10. A voice robot call apparatus employing highly generalized multitasking intention recognition, comprising: the text module is used for establishing voice communication between the voice robot and the user and acquiring real-time voice text data of the user in the communication process; The recognition module is used for inputting the real-time voice text data into a multi-task intention recognition model, extracting characters with preset proportions in the real-time voice text data to replace to generate input text data containing replaced characters, carrying out intention prediction on the input text data to generate intention types, and predicting the replaced characters in the input text data to generate predicted text corresponding to the input text data; The judging module is used for comparing the similarity between the replaced character and the predicted text, and determining the accuracy of the intention category in the multi-task intention recognition model according to the similarity comparison result; an updating module for updating the trench call based on the intent category when the accuracy is greater than a threshold; and the communication module is used for the voice robot to continue to carry out voice communication with the user according to the updated ditch communication technology.
11. An electronic device, comprising: One or more processors; a storage means for storing one or more programs; when executed by the one or more processors, cause the one or more processors to implement the method of any of claims 1 to 9.
12. A computer readable medium on which a computer program is stored, characterized in that the program, when executed by a processor, implements the method according to any one of claims 1 to 9.

Description

Voice robot call method and device adopting high-generalization multitasking intention recognition Technical Field The present application relates to the field of computer information processing, and in particular, to a voice robot call method, apparatus, electronic device, and computer readable medium using high-generalization multitasking intention recognition. Background The intelligent voice robot automatically initiates an intelligent robot phone outbound task according to a service scene based on technologies such as voice recognition and synthesis, machine learning, natural language understanding and the like, collects service results through voice dialogue interaction between people and the robot, performs statistical processing on the data, and obtains user feedback. The intelligent voice robot is a session intelligent robot facing a developer, and can realize intelligent sessions based on Natural Language Processing (NLP) on different message ends, such as websites, APP, entity robots and the like. The user can configure own specific knowledge base to realize intelligent question and answer, and can also realize self-service through multi-round dialogue and third party AP I integration, such as order inquiry, logistics tracking, self-service return robot and the like. The intelligent voice robot can analyze the dialogue content and mine possible problems and opportunities in the dialogue based on intelligent rules from dialogue recordings or dialogue texts. The method can help enterprises to improve service quality, monitor public opinion risks and optimize service strategies, and typical application scenes include intelligent customer service quality inspection, sales opportunity analysis and the like. The intention recognition is an important branch of natural language understanding, and is very central in the robot scene. Generally, the most common intention recognition method in the industry is mainly to use a text pre-training model. That is, on an already trained pre-trained model (which may be downloaded or retrained), the (text, intent) data is trimmed on this model to yield an intent recognition model. However, this method has a major disadvantage in that the original information in the pre-trained model is destroyed due to the change of the model parameters during the fine tuning process. For the above reasons, the intention recognition model trained in the prior art is not high in accuracy of user intention recognition. Accordingly, there is a need for a new voice robot call method, apparatus, electronic device, and computer readable medium employing highly generalized multitasking intent recognition. The above information disclosed in the background section is only for enhancement of understanding of the background of the application and therefore it may contain information that does not form the prior art that is already known to a person of ordinary skill in the art. Disclosure of Invention In view of the above, the present application provides a voice robot call method, apparatus, electronic device and computer readable medium using high generalization multitasking intention recognition, which can embed a multitasking training method into a training process of an intention model, prevent a model obtained in a pre-training stage from losing excessive pre-training information when training the intention model, and make the generalization ability of the intention model stronger, so that user intention can be analyzed more accurately, and more smooth voice communication is performed with a user. Other features and advantages of the application will be apparent from the following detailed description, or may be learned by the practice of the application. According to one aspect of the application, a voice robot call method adopting high-generalization multitasking intention recognition is provided, and the voice robot call method comprises the steps of establishing voice call between a voice robot and a user, acquiring real-time voice text data of the user in the call process, inputting the real-time voice text data into a multitasking intention recognition model to generate a replacement text, an intention category and a predicted text, determining the accuracy of the multitasking intention recognition model according to the replacement text and the predicted text, updating a ditch call operation based on the intention category when the accuracy is greater than a threshold value, and continuing to perform voice call with the user according to the updated ditch call operation by the voice robot. Optionally, the method further comprises the steps of acquiring historical voice text data, setting an intention type label for the historical voice text data, and training a BERT language model based on multitasking improvement through the historical voice text data with the intention type label to generate the multitasking intention recognition model. The method comprises the steps of training