EP-4071673-B1 - SYSTEM AND METHOD WITH NEURAL REPRESENTATION OF EVENT-CENTRIC COMMONSENSE KNOWLEDGE FOR RESPONSE SELECTION

EP4071673B1EP 4071673 B1EP4071673 B1EP 4071673B1EP-4071673-B1

Inventors

ARAKI, JUN
Kim, Hyeongsik
OTANI, NAOKI

Dates

Publication Date: 20260506
Application Date: 20220328

Claims (10)

A computer-implemented method for training a dialogue framework, the method comprising: receiving input data (300; 410), wherein the input data (300; 410) comprises a request to perform a task; obtaining situational data (310; 420) that provides context for the input data (300; 410); creating a first dataset that includes the input data (300; 410) and the situational data (310; 420); generating, via an encoder, an encoded representation of the first dataset, the encoder including an encoding network of a first pre-trained generative machine learning model that relates to a generative knowledge graph, wherein generating the encoded representation comprises: generating first encoded data by encoding the input data (300; 410) and the situational data (310; 420) via a first encoding scheme performed by a token embedder (210), wherein the first encoding scheme includes token embedding in an embedding space, and generating second encoded data by encoding the input data (300; 410) and the situational data (310; 420) via a second encoding scheme performed by the encoding network, wherein the second encoding scheme includes knowledge embedding with respect to the generative knowledge graph that includes event-based data, wherein the encoded representation (330, 440) includes the first encoded data and the second encoded data; generating, via a decoder, response data based on the first dataset by decoding the encoded representation (330, 440) of the first dataset, the decoder including a decoding network of a second pre-trained generative machine learning model; generating, via the decoder, goal data based on the first dataset by decoding the encoded representation (330, 440), wherein the goal data provides an indication of commonsense reasoning data; generating actuator control data based on the response data; and providing the actuator control data to an actuator system for controlling the actuator system (630), wherein the actuator system (630) comprises a braking system of a vehicle (800), and wherein the actuator system (630) is configured to actuate at least the braking system to stop the vehicle (800) upon receiving the actuator control data; wherein the input data (300; 410) and the response data are connected to the goal data via the generative knowledge graph, and wherein the goal data is used in multi-hop reasoning to guide the input data (300; 400) to the response data via the generative knowledge graph, wherein a first hop of the multi-hop reasoning is defined from the input data (300; 410) and the situational data (310; 420) to the goal data and a second hop of the multi-hop reasoning is defined from the goal data to the response data.
The computer-implemented method of claim 1, wherein: the second pre-trained generative machine learning model includes a first language model head (240) and a second language model head (250); the first language model head (240) is configured to generate the response data; and the second language model head (250) is configured to generate the goal data.
The computer-implemented method of claim 1, further comprising: creating a second dataset that includes the response data and the situational data (310; 420); generating, via the encoder, an encoded representation (520) of the second dataset; generating additional goal data (540) via a decoding network of a third pre-trained generative machine learning model (530); and fine-tuning the encoder based on loss data associated with the additional goal data (540).
The computer-implemented method of claim 1, wherein the encoding network of the first pre-trained generative machine learning model is domain agnostic and language agnostic.
The computer-implemented method of claim 1, wherein the decoding network of the second pre-trained generative machine learning model is domain agnostic and language agnostic.
A system comprising: at least one non-transitory computer readable medium including computer readable data; and a processor operably connected to the at least one non-transitory computer readable medium, the processor being configured to execute the computer readable data to perform a method that includes: receiving input data (300; 410), wherein the input data (300; 410) comprises a request to perform a task; obtaining situational data (310; 420) that provides context for the input data (300; 400); creating a first dataset that includes the input data (300; 400) and the situational data (310; 420); generating, via an encoder, an encoded representation of the first dataset, the encoder including an encoding network of a first pre-trained generative machine learning model that relates to a generative knowledge graph, wherein generating the encoded representation comprises: generating first encoded data by encoding the input data (300; 410) and the situational data (310; 420) via a first encoding scheme, wherein the first encoding scheme includes token embedding in an embedding space, and generating second encoded data by encoding the input data (300; 410) and the situational data (310; 420) via a second encoding scheme, wherein the second encoding scheme includes knowledge embedding with respect to the generative knowledge graph that includes event-based data, wherein the encoded representation (330, 440) includes the first encoded data and the second encoded data; generating, via a decoder, response data based on the first dataset by decoding the encoded representation (330, 440) of the first dataset, the decoder including a decoding network of a second pre-trained generative machine learning model; generating, via the decoder, goal data based on the first dataset by decoding the encoded representation (330, 440), wherein the goal data provides an indication of commonsense reasoning data; generate actuator control data based on the response data; and provide the actuator control data to an actuator system for controlling the actuator system (630), wherein the actuator system (630) comprises a braking system of a vehicle (800); and wherein the actuator system (630) is configured to actuate at least the braking system to stop the vehicle (800) upon receiving the actuator control data; wherein the input data (300; 410) and the response data are connected to the goal data via the generative knowledge graph, and wherein the goal data is used in multi-hop reasoning to guide the input data (300; 410) to the response data via the generative knowledge graph, wherein a first hop of the multi-hop reasoning is defined from the input data (300; 410) and the situational data (310; 420) to the goal data and a second hop of the multi-hop reasoning is defined from the goal data to the response data.
The system of claim 6, wherein: the second pre-trained generative machine learning model includes a first language model head (240) and a second language model head (250); the first language model head (240) is configured to generate the response data; and the second language model head (250) is configured to generate the goal data.
The system of claim 6, further comprising: creating a second dataset that includes the response data (450) and the situational data (420); generating, via the encoder, an encoded representation (520) of the second dataset; generating another goal data (540) via a decoding network of a third pre-trained generative machine learning model (530); and fine-tuning the encoder based on loss data relating to the goal data (540) and the another goal data (540).
The system of claim 6, wherein the encoding network of the first pre-trained generative machine learning model is domain agnostic and language agnostic.
The system of claim 6, wherein the decoding network of the second pre-trained generative machine learning model is domain agnostic and language agnostic.

Description

FIELD This disclosure relates generally to computer-implemented systems and methods involving natural language processing and knowledge representation and reasoning. In particular, the disclosure relates to a method and to a system for training a dialogue framework. BACKGROUND In general, task-oriented dialogue systems are configured to engage with human users to accomplish tasks. An adversarial learning framework for persona-based dialogue modeling is known from US 2020/098353 A1. Ji et al., "Language Generation with Multi-Hop Reasoning on Commonsense Knowledge Graph", arxiv.org, 2020, relates to multi-hop reasoning. Most task-oriented dialogue systems merely obtain and provide information, which is needed to complete given tasks. Although there are some studies that relate to incorporating external knowledge into task-oriented dialogue systems, those studies mostly involve representing external knowledge as a symbolic knowledge graph G = {(s, r, o)}, where s denotes a subject, o denotes an object, and r denotes a relation between the subject and the object. As a non-limiting example, for instance, the symbolic knowledge graph G = {(JoeBiden, spouse-of, JillBiden)} represents the knowledge that Joe Biden is the spouse of Jill Biden. However, there are many drawbacks with respect to directly incorporating symbolic knowledge graphs into task-oriented dialogue systems. For example, symbolic knowledge graphs tend to be costly to construct, update, and maintain. Also, most symbolic knowledge graphs do not scale to other domains and/or other languages. This scalability problem is particularly severe for commonsense knowledge, which is prohibitively broad and diverse. Furthermore, most symbolic knowledge graphs are entity-centric, thereby focusing on knowledge regarding entities and their relations. In view of this tendency, most prior work on incorporating external knowledge into task-oriented dialogue systems deals only with encyclopedic knowledge centered on entities (e.g., things and concepts) such as "Ovens are in the kitchen," and "Berlin is the capital of Germany." These kinds of entity-centric knowledge significantly restrict reasoning capabilities. While there are some knowledge bases, such as ConceptNet 5.5, that cover both entities and events, these knowledge bases do not provide sufficient knowledge to a downstream application. For example, a downstream application may require a more complex and realistic piece of knowledge beyond G={(earthquake, cause, tsunami)} such as the additional knowledge that an earthquake causes a tsunami "if the earthquake is strong and happens under an ocean near a land." These kinds of fine-grained or conditioned knowledge may not be available in these knowledge bases, which cover both entities and events. In addition, there are some technical issues with respect to incorporating knowledge from symbolic knowledge graphs into task-oriented dialogue. For example, there are many language expressions, which do not necessarily match with symbols (or strings) labeled to nodes in the symbolic knowledge graphs. Also, there are some cases in which a language expression in an utterance does not correspond to any nodes (or concepts) in a knowledge graph, thereby hindering the incorporation of external knowledge from symbolic knowledge graphs into task-oriented dialogue. SUMMARY The invention provides a computer-implemented method for training a dialogue framework, and a system as recited in the independent claims. Advantageous embodiments are set out in the dependent claims. The following is a summary of certain embodiments described in detail below. The described aspects are presented merely to provide the reader with a brief summary of these certain embodiments and the description of these aspects is not intended to limit the scope of this disclosure. Indeed, this disclosure may encompass a variety of aspects that may not be explicitly set forth below. According to at least one aspect, a computer-implemented method includes creating a first dataset that includes input data and situational data. The situational data provides context for the input data. The method includes generating, via an encoder, an encoded representation of the first dataset. The encoder includes an encoding network of a first generative machine learning model that relates to a generative knowledge graph. A decoder includes a decoding network of a second generative machine learning model. The method includes generating, via the decoder, response data based on the first dataset by decoding the encoded representation. The method also includes generating, via the decoder, goal data based on the first dataset by decoding the encoded representation. The goal data is used in multi-hop reasoning to guide the input data to the response data via the generative knowledge graph. According to at least one aspect, a system includes at least one non-transitory computer readable medium and a processor. The non-transitory compu