US-12626051-B1 - Real-time summary evaluation large language model

US12626051B1US 12626051 B1US12626051 B1US 12626051B1US-12626051-B1

Abstract

The present disclosure generally relates to systems and methods for generating and evaluating a transcript summary based on a transcript of an active meeting. A summary evaluation system may evaluate a summary based on both completeness and accuracy. The summary evaluation system may generate, via a machine learning model, questions and correct answers based on a first representation of the transcript (e.g., the transcript summary). The summary evaluation system may determine possible answers to the questions based on a second representation of the transcript (e.g., transcript). Depending on a comparison between the answers, a completeness score of the summary is generated. In some embodiments, the first representation of the transcript may be the transcript, the second representation of the transcript may be the generated transcript summary, and the score may be an accuracy score. Regeneration of the summary may be prompted if the score is below a threshold.

Inventors

Devansh Shah
Michael Mark Goodwin
Srikanth Venkata TENNETI
Mehmet Umut Isik

Assignees

AMAZON TECHNOLOGIES, INC.

Dates

Publication Date: 20260512
Application Date: 20240314

Claims (19)

1 . A system, comprising: a memory to store specific computer-executable instructions; and a processor in communication with the memory, wherein the processor is to execute the specific computer-executable instructions to at least: obtain a first representation of a transcript and a question generation prompt; provide the first representation of the transcript and the question generation prompt as input into a first machine learning model; determine, using the first machine learning model, a set of questions and a set of correct answers, based on the first representation of the transcript; provide the set of questions, a second representation of the transcript, and a question answering prompt as input into the first machine learning model; determine, using the first machine learning model, a set of possible answers corresponding to the set of questions based on the second representation of the transcript; provide the set of possible answers and the set of questions to a second machine learning model; determine, using the second machine learning model, a score based on a comparison between the set of correct answers and the set of possible answers; and generate, in response to the score being below a score threshold, a third representation of the transcript based on the set of correct answers.
2 . The system of claim 1 , wherein the transcript comprises a text-based transcript of audio, a conversation, a video conference, an audio conference, a phone call, a video, an audio file, a recording, or a communication of an active meeting.
3 . The system of claim 1 , wherein the first representation of the transcript comprises a transcript summary and the second representation of the transcript comprises a transcript.
4 . The system of claim 1 , wherein the first representation of the transcript comprises a transcript and the second representation of the transcript comprises a transcript summary.
5 . The system of claim 1 , wherein the third representation of the transcript comprises a transcript summary incorporating the set of correct answers.
6 . The system of claim 1 , wherein the score is one of a completeness score or an accuracy score.
7 . The system of claim 1 , wherein the comparison includes a sematic similarity between each correct answer of the set of correct answers and each possible answer of the set of possible answers.
8 . The system of claim 1 , wherein the score is based on a number of matches between the set of possible answers and the set of correct answers.
9 . The system of claim 1 , wherein the first machine learning model is a large language model and the second machine learning model is an embedding model.
10 . A method comprising: obtaining a transcript summary and a question generation prompt, wherein the transcript summary is based on a transcript; providing the transcript summary and the question generation prompt as input into a first machine learning model; determining, using the first machine learning model, a question and a correct answer, based on the transcript summary; providing the question, the transcript, and a question answering prompt as input into the first machine learning model; determining, using the first machine learning model, a possible answer to the question based on the transcript; providing the possible answer and the question to a second machine learning model; determining, using the second machine learning model, a score based on a comparison between the correct answer and the possible answer; and generating, in response to the score being below a score threshold, a new summary of the transcript based on the correct answer.
11 . The method of claim 10 , wherein the transcript comprises a text-based transcript of audio, a conversation, a video conference, an audio conference, a phone call, a video, an audio file, a recording, or a communication of an active meeting.
12 . The method of claim 10 , wherein the score is a completeness score.
13 . The method of claim 10 , wherein the comparison includes a sematic similarity between the correct answer and the possible answer.
14 . The method of claim 10 , wherein the first machine learning model is a large language model and the second machine learning model is an embedding model.
15 . A non-transitory, computer-readable medium comprising computer-executable instructions for transferring ownership of inventory, wherein the computer-executable instructions, when executed by a computer system, cause the computer system to: provide a transcript and a question generation prompt as input into a first machine learning model; determine, using the first machine learning model, a question and a correct answer, based on the transcript; provide the question, a summary transcript based on the transcript, and a question answering prompt as input into the first machine learning model; determine, using the first machine learning model, a possible answer corresponding to the questions based on the summary transcript; determine, using a second machine learning model, a score based on a comparison between the correct answer and the possible answer; and update, in response to the score being above a score threshold, the summary transcript based on the correct answer.
16 . The non-transitory, computer-readable medium of claim 15 , wherein the transcript comprises a text-based transcript of audio, a conversation, a video conference, an audio conference, a phone call, a video, an audio file, a recording, or a communication of an active meeting.
17 . The non-transitory, computer-readable medium of claim 15 , wherein the score is an accuracy score.
18 . The non-transitory, computer-readable medium of claim 15 , wherein the comparison includes a sematic similarity between the correct answer and the possible answer.
19 . The non-transitory, computer-readable medium of claim 15 , wherein the computer system is further to generate, in response to the score being below a score threshold, a new summary of the transcript based on the correct answer.

Description

BACKGROUND Participants of a meeting typically speak during the meeting to present content or contribute to a discussion. Once a meeting ends, speech recognition techniques may be performed on the meeting audio (e.g., recording) to create a transcript. The transcript may be available for users to read and reference any speech that occurred during the meeting. BRIEF DESCRIPTION OF THE DRAWINGS Various features will now be described with reference to the following drawings. Throughout the drawings, reference numbers may be re-used to indicate correspondence between referenced elements. The drawings are provided to illustrate examples described herein and are not intended to limit the scope of the disclosure. FIG. 1 is a block diagram depicting an illustrative environment in which a summary system provides generation and evaluation of transcript summaries. FIG. 2 is a block diagram depicting a summary generation system to provide a summary of a transcript of an active meeting. FIG. 3A is a block diagram depicting a summary evaluation system to provide an accuracy score of a summary generated by the summary generation system as described in FIG. 2. FIG. 3B is a block diagram depicting a summary evaluation system to provide a completeness score of a summary generated by the summary generation system as described in FIG. 2. FIG. 4A is a block diagram depicting an additional embodiment of summary evaluation system to provide an accuracy score of a summary generated by the summary generation system as described in FIG. 2. FIG. 4B is a block diagram depicting an additional embodiment of summary evaluation system to provide a completeness score of a summary generated by the summary generation system as described in FIG. 2. FIG. 5 is a block diagram that illustrates the general architecture of a computing system implementing the summary system 104 of FIG. 1. FIG. 6 is an example routine for generating a transcript summary based on a transcript of an active meeting. FIG. 7 is an example routine for evaluating a generated transcript based on completeness and accuracy. DETAILED DESCRIPTION Generally, aspects of the present disclosure relate to generating and evaluating a transcript summary of an active meeting. As used herein, an active meeting may refer to any meeting, such as a teleconference or online meeting, in which the meeting is online, in-progress, in-session, or otherwise has been started and not yet ended. An active meeting may also refer to a meeting during which at least one user has joined or logged onto the meeting and at least one user remains in the meeting, a meeting that has not been ended by the host, etc. As an active meeting progresses, a transcript of the dialogue between participants may be generated and stored. Although the transcript provides a raw script of the meeting content, it is often helpful to generate a summary based on the transcript to condense the meeting into important points. Text summarization plays an important role in the context of an active meeting to condense real-time conversations into a concise recap as the meeting progresses. The generation of a summary based on a transcript of an active meeting provides participants with highlights and important points of the ongoing meeting. In addition, the generation of a summary during an active meeting (as opposed to an end-of-call summarization) provides participants with meeting highlights as the meeting progresses. Transcript summaries from active meetings are typically generated using machine learning models, such as large language models (“LLMs”). In addition, the quality of a summary may be evaluated by certain metrics, such as completeness and accuracy. Text summarization of transcripts of active meetings are typically generated incrementally, such as at predefined checkpoints. Checkpoints may be predefined based on a fixed number of words or tokens. In a baseline or fixed approach, a summarization of the entire available transcript may be generated at each checkpoint. This provides the LLM with the entire transcript of the meeting up until that point in time, and will replace the summary with newer versions at each checkpoint. Another example includes a rolling summary approach, in which a transcript chunk and an existing summary is fed into the LLM at each checkpoint to provide an updated summary. This approach allows continuity in the generated summaries. Although both summarization techniques allow incremental generation of summaries (as opposed to post-call summarization), multiple calls to an LLM may cause several technical issues. Repeated calls to an LLM may become computationally expensive. For example, multiple calls may result in increased utilization of the processor and memory executing the summarization process. If calls are made to an LLM hosted in a cloud environment, repeated calls may also result in a reduction in the available network bandwidth. It is also noted that computational costs based on the prior a