US-12626066-B2 - Extracting conversational relationships based on speaker prediction and trigger word prediction

US12626066B2US 12626066 B2US12626066 B2US 12626066B2US-12626066-B2

Abstract

A method for processing a dialog relationship includes performing semantic feature extraction on a sample dialog text and sample statement or speaker pairs by an initial relationship prediction model, and performing relationship prediction based on the sample text semantic information and an actual statement or speaker relationship to determine a first loss based on a relationship prediction result. The method further includes performing masked speaker prediction based on the sample text semantic information to determine a second loss based on a masked speaker prediction result. The masked speaker prediction result represents a prediction of speakers masked in the sample dialog text. The method further includes performing trigger word prediction based on the sample text semantic information to determine a third loss, and training the initial relationship prediction model based on the first loss, the second loss and the third loss to obtain a dialog relationship prediction model.

Inventors

Tianyang ZHAO
Zhao Yan

Assignees

Tencent cloud computing (Beijing) Co., Ltd

Dates

Publication Date: 20260512
Application Date: 20230410
Priority Date: 20210617

Claims (20)

1 . A method for processing a dialog relationship, the method comprising: performing semantic feature extraction on a sample dialog text and sample statement or speaker pairs by an initial relationship prediction model to obtain sample text semantic information, each sample statement or speaker in the sample statement or speaker pairs being included in the sample dialog text; performing relationship prediction based on the sample text semantic information and an actual statement or speaker relationship, and determining a first loss based on a relationship prediction result, the relationship prediction result representing a relationship between the sample statements or speakers, and the actual statement or speaker relationship being a labeled relationship corresponding to statements or speakers in the sample dialog text; performing masked speaker prediction based on the sample text semantic information, and determining a second loss based on a masked speaker prediction result, the masked speaker prediction result representing a prediction of speakers masked in the sample dialog text; performing trigger word prediction based on the sample text semantic information, and determining a third loss based on a trigger word prediction result, and the trigger word prediction result representing positions of trigger words in the sample dialog text; training the initial relationship prediction model by jointly adjusting model parameters based on the first loss, the second loss and the third loss to obtain a dialog relationship prediction model; and when a dialog text is input into the dialog relationship prediction model, outputting for display a predicted relationship between statements or speakers that is determined by the dialog relationship prediction model.
2 . The method according to claim 1 , wherein the performing the semantic feature extraction includes: inputting the sample dialog text and the sample statement or speaker pairs into the initial relationship prediction model, and performing semantic feature extraction on the sample dialog text and the sample statement or speaker pairs in the initial relationship prediction model to obtain the sample text semantic information; the performing the relationship prediction includes: predicting a relationship between the sample statements or speakers based on the sample text semantic information; and generating the first loss according to the actual statement or speaker relationship and the predicted relationship between the sample statements or speakers; the performing the masked speaker prediction includes: obtaining one or more masked speakers in the sample dialog text; predicting one or more speakers corresponding to the one or more masked speakers based on the sample text semantic information; and generating the second loss according to the one or more masked speakers and the one or more predicted speakers; the performing the trigger word prediction includes: generating trigger word detection text data according to the actual statement or speaker relationship and the sample text semantic information; predicting one or more sequence labels corresponding to the trigger word detection text data; and generating the third loss according to one or more actual sequence labels and the one or more predicted sequence labels corresponding to the trigger word detection text data; and the training the initial relationship prediction model includes: performing model parameter adjustment on the initial relationship prediction model according to the first loss, the second loss, and the third loss to generate the dialog relationship prediction model, the dialog relationship prediction model being configured to predict a statement or speaker relationship corresponding to statement or speaker pairs in a target dialog text.
3 . The method according to claim 2 , wherein the predicting the one or more speakers corresponding to the one or more masked speakers based on the sample text semantic information includes: determining a masked state corresponding to each of the one or more masked speakers from the sample text semantic information, the masked state representing semantic information corresponding to the one or more masked speakers in the sample dialog text; and predicting the one or more speakers corresponding to the one or more masked speakers based on the masked state.
4 . The method according to claim 3 , wherein the method further includes: acquiring an original dialog text and one of the sample statement or speaker pairs; determining the one or more masked speakers based on the one of the sample statement or speaker pairs in response to the one of the sample statement or speaker pairs including at least one speaker; and performing hiding processing on the original dialog text based on the one or more masked speakers to obtain the sample dialog text.
5 . The method according to claim 2 , wherein the generating the trigger word detection text data includes: determining a relationship vector corresponding to the actual statement or speaker relationship; and splicing the relationship vector and the sample text semantic information to generate the trigger word detection text data.
6 . The method according to claim 5 , wherein the splicing includes: determining at least one sample dialog phrase from the sample text semantic information; splicing the relationship vector and the at least one sample dialog phrase to obtain at least one trigger word detection text corresponding to the at least one sample dialog phrase; and determining the trigger word detection text corresponding to each of the sample dialog phrases as the trigger word detection text data; and the predicting one or more sequence labels corresponding to the trigger word detection text data includes: performing trigger word prediction based on the trigger word detection text data to obtain a predictive phrase label corresponding to each of the sample dialog phrases, the predictive phrase label representing a trigger word type of the sample dialog phrase; and determining the predictive phrase label corresponding to each of the sample dialog phrases as the one or more predicted sequence labels.
7 . The method according to claim 1 , wherein the performing the semantic feature extraction includes: splicing the sample dialog text and the sample statement or speaker pairs based on sample text splicing symbols in the initial relationship prediction model to generate sample splicing text data; replacing a first sample statement or speaker in the sample splicing text data with a first symbol, and replacing a second sample statement or speaker in the sample splicing text data with a second symbol, to generate sample text sequence data, the first sample statement or speaker and the second sample statement or speaker comprising one of the sample statement or speaker pairs; and performing semantic feature extraction on the sample text sequence data to obtain the sample text semantic information.
8 . The method according to claim 7 , wherein the sample text splicing symbols comprise sample global semantic symbols; the sample text sequence data includes sample dialog sequence data corresponding to the sample dialog text, and the sample dialog sequence data includes N sample dialog phrases, N being a positive integer; the performing the semantic feature extraction on the sample text sequence data to obtain the sample text semantic information includes: performing hidden layer feature extraction on the sample global semantic symbols in the sample text sequence data, the N sample dialog phrases, the first symbol and the second symbol respectively to obtain a sample global hidden state corresponding to the sample global semantic symbols, sample phrase hidden states respectively corresponding to the N sample dialog phrases, a first initial statement or speaker hidden state corresponding to the first symbol and a second initial statement or speaker hidden state corresponding to the second symbol; and determining the sample global hidden state, N sample phrase hidden states, the first initial statement or speaker hidden state and the second initial statement or speaker hidden state as the sample text semantic information corresponding to the sample dialog text.
9 . The method according to claim 8 , wherein the performing the hidden layer feature extraction includes: obtaining sample global relationships between the N sample dialog phrases and one of the sample global semantic symbols, between the first symbol and the one of the sample global semantic symbols, and between the second symbol and the one of the sample global semantic symbols, and performing feature fusion on the sample global relationships to generate the sample global hidden state corresponding to the one of the sample global semantic symbols; and performing hidden layer feature extraction on the N sample dialog phrases, the first symbol and the second symbol respectively to obtain the sample phrase hidden states respectively corresponding to the N sample dialog phrases, the first initial statement or speaker hidden state corresponding to the first symbol and the second initial statement or speaker hidden state corresponding to the second symbol.
10 . The method according to claim 8 , wherein the N sample dialog phrases comprise the first symbol and the second symbol; and the performing the relationship prediction based on the sample text semantic information includes: obtaining at least one first sample phrase hidden state corresponding to the first symbol from the N sample phrase hidden states contained in the sample text semantic information; performing maximum pooling processing on each of the first sample phrase hidden states and the first initial statement or speaker hidden state to obtain a first hidden state corresponding to the first symbol; obtaining at least one second sample phrase hidden state corresponding to the second symbol from the N sample phrase hidden states; performing maximum pooling processing on each of the second sample phrase hidden states and the second initial statement or speaker hidden state to obtain a second hidden state corresponding to the second symbol; splicing the sample global hidden state, the first hidden state and the second hidden state to obtain sample hidden state information; and predicting a relationship between the first statement or speaker and the second statement or speaker based on the sample hidden state information.
11 . The method according to claim 10 , wherein the predicting the relationship between the first statement or speaker and the second statement or speaker based on the sample hidden state information includes: performing semantic enhancement on the sample hidden state information to obtain sample enhancement semantic information; determining a sample relationship prediction probability of M kinds of candidate relationships corresponding to the first statement or speaker and the second statement or speaker based on the sample enhancement semantic information, M being a positive integer; and determining a candidate relationship corresponding to a maximum sample relationship prediction probability as the predicted relationship between the first statement or speaker and the second statement or speaker.
12 . A method for processing a dialog relationship, the method comprising: inputting a target dialog text and statement or speaker pairs into a dialog relationship prediction model, and performing semantic feature extraction on the target dialog text and the target statement or speaker pairs in the dialog relationship prediction model to obtain target text semantic information corresponding to the target dialog text, each target statement or speaker in the target statement or speaker pairs being included in the target dialog text; and performing relationship prediction on the target text semantic information based on the dialog relationship prediction model to obtain a target relationship between the target statements or speakers in the target statement or speaker pairs, the dialog relationship prediction model being obtained by the method for processing the dialog relationship according to claim 1 .
13 . The method according to claim 12 , wherein the performing the semantic feature extraction on the target dialog text and the target statement or speaker pairs in the dialog relationship prediction model includes: splicing the target dialog text and the target statement or speaker pairs based on target text splicing symbols to generate target splicing text data; replacing a first target statement or speaker in the target splicing text data with a first target symbol, and replacing a second target statement or speaker in the target splicing text data with a second target symbol, to generate target text sequence data, one of the target statement or speaker pairs comprising the first target statement or speaker and the second target statement or speaker; and performing semantic feature extraction on the target text sequence data to obtain the sample text semantic information corresponding to the target dialog text.
14 . The method according to claim 12 , wherein the inputting target dialog text and target statement or speaker pairs into a dialog relationship prediction model includes: obtaining the target dialog text and dialog consultation information associated with the target dialog text; parsing the dialog consultation information and extracting the target statement or speaker pairs indicated by the dialog consultation information; and inputting the target dialog text and the target statement or speaker pairs into the dialog relationship prediction model.
15 . An apparatus for processing a dialog relationship, the apparatus comprising: processing circuitry configured to perform semantic feature extraction on a sample dialog text and sample statement or speaker pairs by an initial relationship prediction model to obtain sample text semantic information, each sample statement or speaker in the sample statement or speaker pairs being included in the sample dialog text; perform relationship prediction based on the sample text semantic information and an actual statement or speaker relationship, and determine a first loss based on a relationship prediction result, the relationship prediction result representing a relationship between the sample statements or speakers, and the actual statement or speaker relationship being a labeled relationship corresponding to statements or speakers in the sample dialog text; perform masked speaker prediction based on the sample text semantic information, and determine a second loss based on a masked speaker prediction result, the masked speaker prediction result representing a prediction of speakers masked in the sample dialog text; perform trigger word prediction based on the sample text semantic information, and determine a third loss based on a trigger word prediction result, and the trigger word prediction result representing positions of trigger words in the sample dialog text; train the initial relationship prediction model by jointly adjusting model parameters based on the first loss, the second loss and the third loss to obtain a dialog relationship prediction model; and when a dialog text is input into the dialog relationship prediction model, output for display a predicted relationship between statements or speakers that is determined by the dialog relationship prediction model.
16 . The apparatus according to claim 15 , wherein the processing circuitry is further configured to: input the sample dialog text and the sample statement or speaker pairs into the initial relationship prediction model, and performing semantic feature extraction on the sample dialog text and the sample statement or speaker pairs in the initial relationship prediction model to obtain the sample text semantic information; predict a relationship between the sample statements or speakers based on the sample text semantic information; generate the first loss according to the actual statement or speaker relationship and the predicted relationship between the sample statements or speakers; obtain one or more masked speakers in the sample dialog text; predict one or more speakers corresponding to the one or more masked speakers based on the sample text semantic information; generate the second loss according to the one or more masked speakers and the one or more predicted speakers; generate trigger word detection text data according to the actual statement or speaker relationship and the sample text semantic information; predict one or more sequence labels corresponding to the trigger word detection text data; generate the third loss according to one or more actual sequence labels and the one or more predicted sequence labels corresponding to the trigger word detection text data; and perform model parameter adjustment on the initial relationship prediction model according to the first loss, the second loss, and the third loss to generate the dialog relationship prediction model, the dialog relationship prediction model being configured to predict a statement or speaker relationship corresponding to statement or speaker pairs in a target dialog text.
17 . The apparatus according to claim 16 , wherein the processing circuitry is further configured to: determine a masked state corresponding to each of the one or more masked speakers from the sample text semantic information, the masked state representing semantic information corresponding to the one or more masked speakers in the sample dialog text; and predict the one or more speakers corresponding to the one or more masked speakers based on the masked state.
18 . The apparatus according to claim 17 , wherein the processing circuitry is further configured to: acquire an original dialog text and one of the sample statement or speaker pairs; determine the one or more masked speakers based on the one of the sample statement or speaker pairs in response to the one of the sample statement or speaker pairs including at least one speaker; and perform hiding processing on the original dialog text based on the one or more masked speakers to obtain the sample dialog text.
19 . The apparatus according to claim 16 , wherein the processing circuitry is further configured to: determine a relationship vector corresponding to the actual statement or speaker relationship; and splice the relationship vector and the sample text semantic information to generate the trigger word detection text data.
20 . The apparatus according to claim 19 , wherein the processing circuitry is further configured to: determine at least one sample dialog phrase from the sample text semantic information; splice the relationship vector and the at least one sample dialog phrase to obtain at least one trigger word detection text corresponding to the at least one sample dialog phrase; determine the trigger word detection text corresponding to each of the sample dialog phrases as the trigger word detection text data; performing trigger word prediction based on the trigger word detection text data to obtain a predictive phrase label corresponding to each of the sample dialog phrases, the predictive phrase label representing a trigger word type of the sample dialog phrase; and determine the predictive phrase label corresponding to each of the sample dialog phrases as the one or more predicted sequence labels.

Description

RELATED APPLICATIONS This application is a continuation of International Application No. PCT/CN2021/108503, entitled “DIALOGUE RELATIONSHIP PROCESSING METHOD, COMPUTER AND READABLE STORAGE MEDIUM” and filed on Jul. 26, 2021, which claims priority to Chinese Patent Application No. 202110674476.3, entitled “METHOD FOR PROCESSING DIALOG RELATIONSHIP, COMPUTER AND READABLE STORAGE MEDIUM” and filed on Jun. 17, 2021. The entire disclosures of the prior applications are hereby incorporated by reference. FIELD OF THE TECHNOLOGY This application relates to the technical field of computers, including a method for processing a dialog relationship, a computer and a readable storage medium. BACKGROUND OF THE DISCLOSURE At present, there is a need to process conversational data in many application scenarios. The analysis process of statements or speakers in conversational data is involved in the data processing process. For example, a task of relationship extraction between the statements or speakers is used for determining a relationship between any statements or speakers in conversational data. In the task of relationship extraction, the semantic features of dialog text and statements or speakers pairs are generally extracted to obtain the relevant semantic feature representation, and then the relationship corresponding to the statements or speakers pairs is predicted based on the semantic feature representation. In the above-mentioned relationship prediction process, as the semantic feature representation extraction of the dialog relationship is relatively complicated, it is difficult to accurately find the context information related to the statements or speakers, and thus, the prediction accuracy of the relationship is affected. SUMMARY Embodiments of this disclosure provide a method for processing a dialog relationship, a computer and a readable storage medium, which can improve the accuracy of relationship prediction in a dialog scene. In an embodiment, a method for processing a dialog relationship includes performing semantic feature extraction on a sample dialog text and sample statement or speaker pairs by an initial relationship prediction model to obtain sample text semantic information. Each sample statement or speaker in the sample statement or speaker pairs is included in the sample dialog text. The method further includes performing relationship prediction based on the sample text semantic information and an actual statement or speaker relationship, and determining a first loss based on a relationship prediction result. The relationship prediction result represents a relationship between the sample statements or speakers, and the actual statement or speaker relationship is a labeled relationship corresponding to statements or speakers in the sample dialog text. The method further includes performing masked speaker prediction based on the sample text semantic information, and determining a second loss based on a masked speaker prediction result. The masked speaker prediction result represents a prediction of speakers masked in the sample dialog text. The method further includes performing trigger word prediction based on the sample text semantic information, and determining a third loss based on a trigger word prediction result. The trigger word prediction result represents positions of trigger words in the sample dialog text. The method further includes training the initial relationship prediction model based on the first loss, the second loss and the third loss to obtain a dialog relationship prediction model. In an embodiment, an apparatus for processing a dialog relationship includes processing circuitry configured to perform semantic feature extraction on a sample dialog text and sample statement or speaker pairs by an initial relationship prediction model to obtain sample text semantic information. Each sample statement or speaker in the sample statement or speaker pairs being included in the sample dialog text. The processing circuitry is further configured to perform relationship prediction based on the sample text semantic information and an actual statement or speaker relationship, and determine a first loss based on a relationship prediction result. The relationship prediction result represents a relationship between the sample statements or speakers, and the actual statement or speaker relationship is a labeled relationship corresponding to statements or speakers in the sample dialog text. The processing circuitry is further configured to perform masked speaker prediction based on the sample text semantic information, and determine a second loss based on a masked speaker prediction result. The masked speaker prediction result represents a prediction of speakers masked in the sample dialog text. The processing circuitry is further configured to perform trigger word prediction based on the sample text semantic information, and determine a third loss based on a trigger word prediction result. The