US-20260129147-A1 - PREEMPTIVELY ESTABLISHED LIVE CONNECTIONS FOR REAL-TIME TRANSCRIPTIONS IN VIRTUAL MEETINGS

US20260129147A1US 20260129147 A1US20260129147 A1US 20260129147A1US-20260129147-A1

Abstract

Systems, methods, and other embodiments associated with efficient allocation of live connections for real-time transcriptions of virtual meetings are described. In one embodiment, an example method includes preemptively establishing a set of live connections to an automatic speech recognition service that are available for use, and fewer than the participants of a virtual meeting. In response to a participant of the virtual meeting becoming active, the method dedicate one WebSocket connection from the set of WebSocket connections to real-time transcription of an individual audio stream from the participant. The method labels transcription results received back through the one live connection with a username of the participant. And, the method injects the labeled transcription results back into the virtual meeting for display in a user interface.

Inventors

Prabhutva AGRAWAL
Phanindra Vittal Rao Mankale

Assignees

ORACLE INTERNATIONAL CORPORATION

Dates

Publication Date: 20260507
Application Date: 20241106

Claims (20)

1 . One or more non-transitory computer-readable media that include stored thereon computer-executable instructions that when executed by at least a processor of a computing system cause the computing system to: preemptively establish a set of live connections to an automatic speech recognition service that are available for use, wherein the set of live connections includes fewer live connections than a total K of participants in a virtual meeting; in response to a participant of the virtual meeting becoming active, dedicate one live connection from the set of live connections to real-time transcription of an individual audio stream from the participant; in real-time, label transcription results received back through the one live connection with a username of the participant; and in real-time, inject the labeled transcription results back into the virtual meeting for display in a user interface of the virtual meeting.
2 . The one or more non-transitory computer-readable media of claim 1 , wherein the instructions, when executed by the processor, cause the computing system to: monitor a mute/unmute status of the participant to determine when the participant becomes active; in response to the mute/unmute status changing from mute to unmute, allocate the one live connection for sole use by the participant and connect the individual audio through the one live connection to an individual session of the automatic speech recognition service; and in response to the mute/unmute status changing from unmute to mute, disconnect the individual audio stream from the one live connection, and deallocate the one live connection back to the set of live connections that are available for use.
3 . The one or more non-transitory computer-readable media of claim 1 , wherein the instructions to dedicate one live connection from the set of live connections to real-time transcription of the individual audio stream from the participant, when executed by at least the processor, cause the computing system to: associate a session ID of the one live connection with a user ID of the participant; and send the individual audio stream of the participant to the automatic speech recognition service through the one live connection to cause the automatic speech recognition service to transcribe speech from the individual audio stream into the transcription results in real-time, wherein audio streams of other participants are not sent through the one live connection.
4 . The one or more non-transitory computer-readable media of claim 1 , wherein the instructions to preemptively establish the set of live connections to the automatic speech recognition service, when executed by at least the processor, cause the computing system to, prior to the participant becoming active, for at least the one live connection in the set of live connections: connect a client to an endpoint of the automatic speech recognition service; configure the client to capture the transcription results upon receipt from the automatic speech recognition service; transmit credentials for the client to the automatic speech recognition service; receive a session ID for the one live connection, wherein the session ID denotes an individual session of the automatic speech recognition service that is accessible through the one live connection; and add the session ID for the one live connection to a list of session IDs for the set of live connections.
5 . The one or more non-transitory computer-readable media of claim 1 , further comprising instructions that when executed by at least the processor cause the computing system to close live connections that are in excess of a baseline count C of live connections and which have been available for use longer than a threshold amount of time T.
6 . The one or more non-transitory computer-readable media of claim 1 , further comprising instructions that when executed by at least the processor cause the computing system to expand the set of live connections to the automatic speech recognition service by preemptively establishing additional live connections when the live connections that are available for use falls to a threshold number.
7 . The one or more non-transitory computer-readable media of claim 1 , wherein the live connections to the automatic speech recognition service are WebSocket connections.
8 . A computer-implemented method, comprising: preemptively establishing a set of live connections to an automatic speech recognition service that are available for use, wherein the set of live connections includes fewer live connections than a total K of participants in a virtual meeting; in response to a participant of the virtual meeting becoming active, dedicate one live connection from the set of live connections to real-time transcription of an individual audio stream from the participant; in real-time, labeling transcription results received back through the one live connection with a username of the participant; and in real-time, injecting the labeled transcription results back into the virtual meeting for display in a user interface of the virtual meeting.
9 . The computer-implemented method of claim 8 , further comprising associating a session ID of the one live connection with a user ID of the participant; and sending the individual audio stream of the participant to the automatic speech recognition service through the one live connection to cause the automatic speech recognition service to transcribe speech from the audio stream into the transcription results in real-time, wherein audio streams of other participants are not sent through the one live connection.
10 . The computer-implemented method of claim 8 , further comprising, in response to the participant of the virtual meeting becoming inactive, releasing the one live connection from dedication to the participant back into the set of live connections that are available for use.
11 . The computer-implemented method of claim 8 , further comprising closing live connections that are in excess of a baseline count C of live connections and which have been available for use longer than a threshold amount of time T.
12 . The computer-implemented method of claim 8 , further comprising expanding the set of live connections to the automatic speech recognition service by preemptively establishing additional live connections when the live connections that are available for use falls to a threshold number.
13 . The computer-implemented method of claim 8 , wherein the participant is considered active when the audio stream of the participant is unmuted, and wherein the participant is considered inactive when the audio stream of the participant is muted.
14 . The computer-implemented method of claim 8 , wherein the real-time transcription includes translation from a first human language to a second human language, wherein speech in the individual audio stream is in the first human language, and the transcription results are in the second human language.
15 . A computing system, comprising: at least one processor connected to at least one memory; one or more non-transitory computer-readable media that include stored thereon computer-executable instructions that when executed by at least a processor of the computing system cause the computing system to: preemptively establish a set of WebSocket connections to an automatic speech recognition service that are available for use, wherein the set of WebSocket connections includes fewer WebSocket connections than a total K of participants in a virtual meeting; in response to a participant of the virtual meeting becoming active, dedicate one WebSocket connection from the set of WebSocket connections to real-time transcription of an individual audio stream from the participant; in real-time, label transcription results received back through the one WebSocket connection with a username of the participant; and in real-time, inject the labeled transcription results back into the virtual meeting for display in a user interface of the virtual meeting.
16 . The computing system of claim 15 , wherein the instructions to dedicate the one WebSocket connection from the set of WebSocket connections to the real-time transcription of the individual audio stream from the participant, when executed by at least the processor, cause the computing system to: associate a session ID of the one WebSocket connection with a user ID of the participant; and send the individual audio stream of the participant to the automatic speech recognition service through the one WebSocket connection to cause the automatic speech recognition service to transcribe speech from the audio stream into the transcription results in real-time, wherein audio streams of other participants are not sent through the one WebSocket connection.
17 . The computing system of claim 15 , wherein the instructions, when executed by at least the processor, cause the computing system to, in response to the participant of the virtual meeting becoming inactive, release the one WebSocket connection from dedication to the participant back into the set of WebSocket connections that are available for use.
18 . The computing system of claim 15 , wherein the instructions, when executed by at least the processor, cause the computing system to close WebSocket connections that are in excess of a baseline count C of WebSocket connections and which have been available for use longer than a threshold amount of time T.
19 . The computing system of claim 15 , wherein the instructions, when executed by at least the processor, cause the computing system to expand the set of WebSocket connections to the automatic speech recognition service by preemptively establishing additional WebSocket connections when the WebSocket connections that are available for use falls to a threshold number.
20 . The computing system of claim 15 , wherein the instructions, when executed by at least the processor, cause the computing system to join the virtual meeting as an additional participant to obtain the individual audio stream input by the participant.

Description

BACKGROUND Virtual meeting and collaboration services allow a plurality of participants to communicate and collaborate remotely through video, audio, and chat, facilitating online meetings, presentations, and teamwork. Automated speech recognition services may be used to convert audio of speech into text. Live connections such as WebSocket connections are highly resource intensive, and take up substantial compute resources, such as allocated memory, to maintain. BRIEF DESCRIPTION OF THE DRAWINGS The accompanying drawings, which are incorporated in and constitute a part of the specification, illustrate various systems, methods, and other embodiments of the disclosure. It will be appreciated that the illustrated element boundaries (e.g., boxes, groups of boxes, or other shapes) in the figures represent one embodiment of the boundaries. In some embodiments one element may be implemented as multiple elements or that multiple elements may be implemented as one element. In some embodiments, an element shown as an internal component of another element may be implemented as an external component and vice versa. Furthermore, elements may not be drawn to scale. FIG. 1 illustrates one embodiment of a transcription management system that is associated with efficient, autonomous provisioning of preemptively established live connections for real-time transcription of speech in virtual meetings. FIG. 2A illustrates one embodiment of a transcription management method that is associated with efficient, autonomous provisioning of preemptively established live connections for real-time transcription of speech in virtual meetings. FIG. 2B illustrates one embodiment of a connection dedication step in which some sub-steps of dedication of the connection are indicated. FIG. 3 illustrates a data flow diagram for an example real-time meeting transcription system that is associated with efficient, autonomous provisioning of preemptively established live connections for real-time transcription of speech in virtual meetings. FIG. 4 illustrates one embodiment of a transcription socket manager that is associated with efficient, autonomous provisioning of preemptively established live connections for real-time transcription of speech in virtual meetings. FIG. 5 illustrates one embodiment of a WebSocket handler that is associated with efficient, autonomous provisioning of preemptively established live connections for real-time transcription of speech in virtual meetings. FIG. 6 illustrates one embodiment of a UserID handler that is associated with efficient, autonomous provisioning of preemptively established live connections for real-time transcription of speech in virtual meetings. FIG. 7 illustrates an embodiment of a computing system configured with the example systems and/or methods disclosed. DETAILED DESCRIPTION Systems, methods, and other embodiments are described herein that provide for efficient allocation of preemptively established live connections for real-time transcriptions in virtual meetings. In one embodiment, a transcription management system actively allocates persistent live connections that have been preemptively established with an artificial intelligence (AI)-based transcription service (such as an automatic speech recognition (ASR) service) to those individual audio streams from a virtual meeting that are associated with participants that are active. For example, the transcription management system intelligently provisions a block of pre-established WebSocket connections to the AI transcription service on an as-needed basis to process audio streams of participants who are speaking. In this way, the transcription management system dynamically interconnects an individual audio stream for an active participant to a session of an AI transcription service on an as-needed basis and maintains unambiguous associations between participant identity and transcript. Various embodiments of the transcription management system may provide one or more improvements to the technology of automated speech transcription. One improvement may be that the transcription management system enables the use of substantially fewer live connections than the number of participants in the virtual meeting, thereby substantially reducing the compute resources (e.g., memory and network bandwidth) consumed by the live connections to the AI transcription service. One improvement may be that the transcription management system enables independent (dedicated) transcription of speech from individual, active participants without creating and assigning a dedicated live connection for each participant. One improvement may be that the transcription management system ensures that the audio stream of one participant is transcribed without interference by the audio streams of other participants, thereby increasing transcription accuracy. One improvement may be that the transcription management system largely eliminates wait time for transcription (or captioning