US-12619824-B2 - Method for generating summary and system therefor

US12619824B2US 12619824 B2US12619824 B2US 12619824B2US-12619824-B2

Abstract

Provided are a method for generating a summary and a system therefor. The method according to some embodiments may include calculating a likelihood loss for a summary model using a first text sample and a first summary sentence corresponding to the first text sample, calculating an unlikelihood loss for the summary model using a second text sample and the first summary sentence, the second text sample being a negative sample generated from the first text sample, and updating the summary model based on the likelihood loss and the unlikelihood loss.

Inventors

Sung Roh Yoon
Young June GWON
Bong Kyu HWANG
Ju Dong KIM
Jae Woong Yun
Hyun Jae Lee
Hyun Jin Choi
Jong Yoon SONG
Noh Il Park
Seong Ho JOE

Assignees

SAMSUNG SDS CO., LTD.
SEOUL NATIONAL UNIVERSITY R&DB FOUNDATION

Dates

Publication Date: 20260505
Application Date: 20231026
Priority Date: 20221028

Claims (16)

1 . A method for generating a summary, the method being performed by at least one processor and comprising: calculating a likelihood loss for a summary model using a first text sample and a first summary sentence corresponding to the first text sample, wherein the summary model is a neural network-based model; calculating a first unlikelihood loss for the summary model using a second text sample and the first summary sentence and calculating a second unlikelihood loss for the summary model using a third text sample and the first summary sentence, the second text sample and the third text sample being negative samples generated from the first text sample in different manners, and the third text sample being generated by removing a portion associated with a main keyword from the first text sample; updating weight parameters of the summary model based on the likelihood loss, the first unlikelihood loss, and the second unlikelihood loss, in a manner such that a likelihood that the first summary sentence is generated from the first text sample increases, a likelihood that the first summary sentence is generated from the second text sample decreases, and a likelihood that the first summary sentence is generated from the third text sample decreases; updating the summary model to include the updated weight parameters; and generating a summary of an input text sample using the updated summary model.
2 . The method of claim 1 , wherein the first or second unlikelihood loss is calculated based on a difference between a summary sentence of the second or third text sample generated through the summary model and the first summary sentence, and is calculated as a smaller value as the difference increases.
3 . The method of claim 1 , wherein the calculating of the first unlikelihood loss comprises: extracting the main keyword from the first summary sentence; and generating the second text sample by masking the a portion associated with the main keyword in the first text sample.
4 . The method of claim 3 , wherein the main keyword is extracted by performing part-of-speech analysis or named entity recognition on the first summary sentence.
5 . The method of claim 3 , wherein the main keyword comprises a keyword of which a part of speech is a numeral or a proper noun.
6 . The method of claim 3 , wherein the portion associated with the main keyword is a token group including a matching token for the main keyword and adjacent tokens of the matching token, a sentence including the matching token, or a paragraph including the matching token.
7 . The method of claim 3 , wherein the generating of the third text sample comprises: extracting a plurality of sentences from a text sample different from the first text sample; and generating the third text sample by inserting the plurality of sentences into the first text sample from which the portion associated with the main keyword is removed such that an order of the plurality of sentences is maintained.
8 . The method of claim 1 , wherein the likelihood loss is a first likelihood loss, the second text sample is generated by replacing a portion of the first text sample with a mask token, the method further comprises: generating a fourth text sample by adding the mask token to the first text sample; and calculating a second likelihood loss for the summary model using the fourth text sample and the first summary sentence, and the weight parameters of the summary model are updated based on the first likelihood loss, the second likelihood loss, the first unlikelihood loss, and the second unlikelihood loss.
9 . The method of claim 1 , wherein the updating of the weight parameters of the summary model comprises: summing up the likelihood loss, the first unlikelihood loss, and the second unlikelihood loss based on pre-assigned weights; and updating the weight parameters of the summary model based on the summed loss, and a weight assigned to the likelihood loss is higher than a weight assigned to the first or second unlikelihood loss.
10 . The method of claim 1 , wherein the summary model is a model predicting tokens constituting a summary sentence of the input text sample in an auto-regressive manner, and the method further comprises: obtaining a text sample for evaluation and a summary sentence for evaluation, the text sample for evaluation being at least partially different from a text sample corresponding to the summary sentence for evaluation; calculating confidence scores for tokens constituting the summary sentence for evaluation by inputting the text sample for evaluation to the summary model; and evaluating performance of the summary model based on the calculated confidence scores.
11 . The method of claim 1 , wherein the summary model is a model predicting tokens constituting a summary sentence of an input text sample in an auto-regressive manner, and the method further comprises: obtaining a text sample for evaluation and a summary sentence for evaluation; predicting a plurality of tokens by inputting the text sample for evaluation to the summary model and performing decoding through a teacher forcing technique, the teacher forcing technique being performed in a manner of providing the summary sentence for evaluation to the summary model; and evaluating performance of the summary model by comparing a first saliency of the summary model for the input text sample for evaluation appearing in a process of predicting the plurality of tokens and a second saliency for the provided summary sentence for evaluation with each other.
12 . The method of claim 11 , wherein the first saliency is calculated based on gradient values for tokens of the text sample for evaluation obtained by back-propagating prediction losses of the plurality of tokens.
13 . The method of claim 1 , wherein the summary model is a model predicting tokens constituting a summary sentence of the input text sample in an auto-regressive manner, and the method further comprises: obtaining a text sample for evaluation and a summary sentence for evaluation, the text sample for evaluation being at least partially different from a text sample corresponding to the summary sentence for evaluation; calculating a confidence score for each token by inputting the text sample for evaluation to the summary model and performing decoding through a teacher forcing technique, the teacher forcing technique being performed in a manner of providing the summary sentence for evaluation to the summary model; and evaluating performance of the summary model based on an entropy value for the confidence score for each token.
14 . The method of claim 13 , wherein the entropy value is a first entropy value, and the evaluating of the performance of the summary model comprises: calculating a second entropy value for the text sample corresponding to the summary sentence for evaluation by inputting the text sample corresponding to the summary sentence for evaluation into the summary model and performing decoding through the teacher forcing technique; and evaluating the performance of the summary model based on a difference between the first entropy value and the second entropy value.
15 . A system for generating a summary, comprising: one or more processors; and a memory configured to store one or more instructions, wherein the one or more processors, by executing the stored one or more instructions, perform: calculating a likelihood loss for a summary model using a first text sample and a first summary sentence corresponding to the first text sample, wherein the summary model is a neural network-based model; calculating a first unlikelihood loss for the summary model using a second text sample and the first summary sentence and calculating a second unlikelihood loss for the summary model using a third text sample and the first summary sentence, the second text sample and the third text sample being negative samples generated from the first text sample in different manners, and the third text sample being generated by removing a portion associated with a main keyword from the first text sample; updating weight parameters of the summary model based on the likelihood loss, the first unlikelihood loss, and the second unlikelihood loss, in a manner such that a likelihood that the first summary sentence is generated from the first text sample increases, a likelihood that the first summary sentence is generated from the second text sample decreases, and a likelihood that the first summary sentence is generated from the third text sample decreases; updating the summary model using the updated weight parameters; and generating a summary of an input text sample using the updated summary model.
16 . A non-transitory computer-readable recording medium storing a computer program, which, when executed by at least one processor, causes the at least one processor to perform: calculating a likelihood loss for a summary model using a first text sample and a first summary sentence corresponding to the first text sample, wherein the summary model is a neural network-based model; calculating a first unlikelihood loss for the summary model using a second text sample and the first summary sentence and calculating a second unlikelihood loss for the summary model using a third text sample and the first summary sentence, the second text sample and the third text sample being negative samples generated from the first text sample in different manners, and the third text sample being generated by removing a portion associated with a main keyword from the first text sample; updating weight parameters the summary model based on the likelihood loss, the first unlikelihood loss, and the second unlikelihood loss, in a manner such that a likelihood that the first summary sentence is generated from the first text sample increases, a likelihood that the first summary sentence is generated from the second text sample decreases, and a likelihood that the first summary sentence is generated from the third text sample decreases; updating the summary model to include the updated weight parameters; and generating a summary of an input text sample using the updated summary model.

Description

CROSS-REFERENCE TO RELATED APPLICATION This application claims priority from Korean Patent Application No. 10-2022-0139083 filed on Oct. 26, 2022 in the Korean Intellectual Property Office, and all the benefits accruing therefrom under 35 U.S.C. 119, the contents of which in its entirety are herein incorporated by reference. BACKGROUND 1. Technical Field The disclosure relates to a method for generating a summary and a system therefor, and more particularly, to a method for generating a summary sentence for an original text in an abstractive or generative summary manner and a system therefor. 2. Description of the Related Art Text summarization manners are largely divided into an extractive summarization manner and an abstractive summarization (or generative summarization) manner. The extractive summarization manner is a manner of extracting keywords or core sentences from an original text to generate a summary sentence, and the abstractive summarization manner is a manner of generating new keywords or sentences based on a core context of an original text to summarize the original text. It has been known that difficulty of the abstractive summarization manner is much higher than that of the extractive summarization manner. Meanwhile, as deep learning technology related to natural language processing develops rapidly, methods for generating a summary sentence in the abstractive summarization manner through a deep learning model have been recently proposed. However, the proposed methods have a problem that they cannot guarantee factual consistency of the summary sentence for the original text. That is, the deep learning model changes keywords (or sentences) representing a main factual relationship of the original text or generates keywords (or a sentences) representing a new factual relationship, and accordingly, there is a problem that important information of the original text is distorted or information that does not exist in the original text is included in the summary sentence. Such a problem has been recognized as a significantly serious problem due to characteristics of a summary task that refines important information in the original text. SUMMARY Aspects of the disclosure provide a method capable of accurately generating a summary sentence for an original text in an abstractive or generative summarization manner, and a system for performing the method. Aspects of the disclosure also provide a method capable of accurately generating a high-quality summary sentence having high factual consistency, and a system for performing the method. Aspects of the disclosure also provide a method and evaluation metrics capable of accurately evaluating performance of a summary model related to factual consistency. However, aspects of the disclosure are not restricted to those set forth herein. The above and other aspects of the disclosure will become more apparent to one of ordinary skill in the art to which the disclosure pertains by referencing the detailed description of the disclosure given below. According to some embodiments of the disclosure, there is provided a method for generating a summary performed by at least one computing device. The method may include calculating a likelihood loss for a summary model using a first text sample and a first summary sentence corresponding to the first text sample, calculating an unlikelihood loss for the summary model using a second text sample and the first summary sentence, the second text sample being a negative sample generated from the first text sample, and updating the summary model based on the likelihood loss and the unlikelihood loss. In some embodiments, the unlikelihood loss may be calculated based on a difference between a summary sentence of the second text sample generated through the summary model and the first summary sentence, and may be calculated as a smaller value as the difference increases. In some embodiments, the calculating of the unlikelihood loss may include extracting a main keyword from the first summary sentence, and generating the second text sample by masking a portion associated with the main keyword in the first text sample. In some embodiments, the main keyword may be extracted by performing part-of-speech analysis or named entity recognition on the first summary sentence. In some embodiments, the main keyword may include a keyword of which a part of speech is a numeral or a proper noun. In some embodiments, the portion associated with the main keyword may be a token group including a matching token for the main keyword and adjacent tokens of the matching token, a sentence including the matching token, or a paragraph including the matching token. In some embodiments, the unlikelihood loss is a first unlikelihood loss, the method may further include generating a third text sample by removing the portion associated with the main keyword from the first text sample, and calculating a second unlikelihood loss for the summary model using the