CN-115796187-B - Open domain dialogue method based on dialogue structure diagram constraint

CN115796187BCN 115796187 BCN115796187 BCN 115796187BCN-115796187-B

Abstract

The invention discloses an open domain dialogue method based on dialogue structure diagram constraint, which comprises the steps of obtaining initial dialogue statement vector representation of an encoder, utilizing dialogue sequential and correlation characteristics, designing brand-new contrast learning loss functions to further train so as to obtain dialogue statement vectors containing sufficient semantics, clustering the newly obtained dialogue statement vectors to obtain topic-level sentence clusters, finally using imitation learning to simulate the transfer of topics in a dialogue data set so as to construct a dialogue structure diagram of the topic-level, namely the transfer among clusters, and using the imitation learning to constrain text generation of an autoregressive decoder. According to the invention, sentence information is fully extracted by using contrast learning, a dialogue structure diagram is obtained by using imitation learning, and the next round of dialogue topics are predicted by using the dialogue structure diagram, so that the correlation between generated dialogues and topics is well restrained, and the fluency of the whole dialogue is improved.

Inventors

Yin Congchi
LI PIJI

Assignees

南京航空航天大学

Dates

Publication Date: 20260508
Application Date: 20221126

Claims (2)

1. An open domain dialog method based on dialog structure diagram constraints, comprising the steps of: (1) Inputting dialogue sentences, taking the average pooled output of the bidirectional attention transducer encoder as the initial vector representation of the dialogue sentences, designing a loss function, training the bidirectional attention transducer encoder in a self-supervision mode, and outputting the dialogue sentence vector representation fully containing semantics by the bidirectional attention transducer encoder after training; (2) Clustering the obtained dialogue sentence vector representation fully containing the semantics to form a plurality of clusters, wherein each cluster represents a dialogue topic, using a behavior cloning method to simulate the transfer of the dialogue topic, calculating the transfer probability among the clusters, and constructing a dialogue structure diagram by taking the clusters as vertexes of the dialogue structure diagram and taking the transfer probability as edges of the dialogue structure diagram; the realization process of simulating the transfer of the dialogue topics by using the behavior cloning method is as follows: Defining dialogue sentence vector h of each sentence as state, and central vector c of each cluster as action; after the continuous action under Euclidean space is obtained, selecting the cluster center vector c closest to the cosine distance of the action as the action to be finally taken, and entering the next state; (3) Constraining dialogue sentences generated by a left-to-right attention transducer decoder through the obtained dialogue structure diagram, and narrowing the distance between the generated dialogue sentences and the cluster, namely reducing the KL divergence of sentence vectors and topic cluster center vectors: ; Wherein, the The dialog sentence vector obtained after the average pooling is output for the left-to-right attention transducer decoder, And predicting cluster center vectors to which the dialogue sentence vectors belong for the dialogue structure diagram.
2. The method of claim 1, wherein the step (1) of designing the loss function includes defining absolute correlation loss and relative correlation loss based on the input dialogue sentence satisfaction order and correlation, The absolute correlation loss is: ; Wherein: Representing the ith dialogue statement, An ith dialogue sentence representing the a character, And Representation of Is a data enhancement sample of (1); An initial vector representation of the ith dialogue statement representing the a character, And Representation of Vector representations of the two data enhancement samples; is the cosine distance between the dialogue statement vectors, Is a super parameter representing a temperature coefficient, X j represents a set of the j-th group of dialogue sentences, and D represents the enhanced dialogue data set; the relative correlation loss includes a strong correlation loss and a weak correlation loss, the strong correlation loss being defined as: ; Wherein, the Ith dialogue sentence representing B character, and at the same time Defined as dialogue sentences Is a next dialog sentence of the dialog box, Initial vector representation of the ith dialogue statement representing the B character, while the following will be Defined as dialogue sentences An initial vector representation of the next dialog sentence; the weak correlation loss is defined as: ; Wherein, the Defined as dialogue sentences Is a dialog sentence which is a dialog sentence, Defined as dialogue sentences Is the initial vector representation of the previous dialog sentence, Super-parameters for controlling weak correlation loss intensity; then the absolute correlation loss function and the relative correlation loss function are respectively: ; ; where N is the dialog data set sample size, Representing the i-1 th dialogue sentence; Training the bi-directional attention transducer encoder on a dialogue dataset by means of a small batch gradient descent, the bi-directional attention transducer encoder outputting a dialogue statement vector representation that fully contains semantics after training is completed.

Description

Open domain dialogue method based on dialogue structure diagram constraint Technical Field The invention belongs to the field of natural language processing in the field of computers, and particularly relates to an open domain dialogue method based on dialogue structure diagram constraint. Background In recent years, large-scale pre-trained language models have achieved success in many tasks in the field of natural language processing. On the dialog generation task, an autoregressive-based pre-trained language model may generate fluent rich dialogs. However, in a multi-turn open-domain dialog, the model often ignores the transition of topics between contexts, generating responses that are not related to the current topic, thereby making the dialog feel obtrusive. A dialog structure diagram constraint model is required to generate topic-related dialogs. Some work in the past focused on the construction of dialog structures in task oriented dialog systems. However, open domain dialogues have a huge number of dialog states and a large number of uncertain transition dialog state transitions compared to task oriented dialogues, and thus the method of constructing dialog structure diagrams in task oriented dialogues is not applicable in the context of open domain dialogues. The difficulty of open domain dialog structure construction is in two aspects, how to extract dialog states or topics in an unsupervised manner, and how to capture transitions between topics. At present, few researches on dialogue structure diagrams in open domain dialogue are carried out, a main stream method is to construct a sentence level and topic level diagram by utilizing a graphic neural network, complete the construction of the dialogue structure diagram by utilizing statistics of word and sentence co-occurrence frequency to calculate transition probability among sentences or topics, and finally integrate topic information into a reward function in a reinforcement learning mode to train a dialogue model. Disclosure of Invention The invention solves the technical problems that: The invention aims to design an open domain dialogue method based on dialogue structure diagram constraint so as to solve the problem that the existing dialogue generation is irrelevant to the current or expected topics. Humans can easily recognize topics and shifts of topics in different situations in a conversation, thereby organizing language for relevant answers. The present invention attempts to mimic human grasp of topics and provides a method of how to construct a structure diagram of an open domain multi-turn dialog and how to generate a dialog using structure diagram constraints. The method shows excellent performance on a high-quality human dialogue data set, and the model is insensitive to parameters and has strong robustness. The invention adopts the following technical scheme for solving the technical problems: an open domain dialog method based on dialog structure diagram constraints, comprising the steps of: (1) Inputting dialogue sentences, taking the average pooled output of the bidirectional attention transducer encoder as the initial vector representation of the dialogue sentences, designing a loss function, training the bidirectional attention transducer encoder in a self-supervision mode, and outputting the dialogue sentence vector representation fully containing semantics by the bidirectional attention transducer encoder after training; (2) Clustering the obtained dialogue sentence vector representation fully containing the semantics to form a plurality of clusters, wherein each cluster represents a dialogue topic, using a behavior cloning method to simulate the transfer of the dialogue topic, calculating the transfer probability among the clusters, and constructing a dialogue structure diagram by taking the clusters as vertexes of the dialogue structure diagram and taking the transfer probability as edges of the dialogue structure diagram; (3) And constraining the dialogue sentences generated by the left-to-right attention transducer decoder through the obtained dialogue structure diagram, and zooming in the distance between the generated dialogue sentences and the cluster. Preferably, the design process of the loss function in step 1) is that, based on the input dialogue statement satisfying the order and the correlation, absolute correlation loss and relative correlation loss are defined, The absolute correlation loss is: wherein S i represents an ith dialogue sentence, An ith dialogue sentence representing the a character,AndRepresentation ofIs a data enhancement sample of (1); h Ai represents the initial vector representation of the ith dialogue statement of the a character, AndA vector representation of two data enhancement samples representing h i; sim is the cosine distance between dialogue sentence vectors, τ is an hyper-parameter representing temperature coefficient, X j represents the set of j-th group dialogue sentences, D repre