US-12621358-B2 - Methods and systems for verbal polling during a conference call discussion

US12621358B2US 12621358 B2US12621358 B2US 12621358B2US-12621358-B2

Abstract

One or more content streams are obtained from a client device associated with a participant during a conference call. The one or more content streams indicate verbal phrases provided by the participant during the conference call. At least a portion of the one or more content streams are fed as input to a machine learning model. A polling question for polling other participants of the conference call and one or more answer options corresponding to the polling question are determined based on one or more outputs of the machine learning model. The polling question and the one or more answer options are provided for polling the other participants via one or more client devices associated with the other participants.

Inventors

Emily Burd
Akshat Sharma

Assignees

GOOGLE LLC

Dates

Publication Date: 20260505
Application Date: 20231030

Claims (20)

1 . A method comprising: obtaining one or more content streams from a client device associated with a participant of a plurality of participants during a conference call, wherein the one or more content streams indicate verbal phrases provided by the participant during an ongoing discussion between the plurality of participants during the conference call; feeding the one or more content streams as input to a machine learning model trained to determine a semantic context associated with verbal phrases included in given content streams and predict, based on the determined context, whether the verbal phrases comprise polling questions for polling conference call participants; determining, based on one or more outputs of the machine learning model, a polling question for polling other participants of the plurality of participants and one or more answer options corresponding to the polling question; and providing the polling question and the one or more answer options for polling the other participants via one or more client devices associated with the other participants.
2 . The method of claim 1 , wherein obtaining the one or more content streams from the client device associated with the participant comprises: receiving, via a user interface (UI) of the client device, an indication that the participant is to provide the verbal phrases; and responsive to receiving the indication, initiating a recording of the verbal phrases provided by the participant.
3 . The method of claim 2 , wherein the UI of the client device comprises a polling UI element that is associated with initiating verbal polling of the plurality of participants, and wherein the indication that the participant is to provide the verbal phrases is received via the UI upon detection that the participant has interacted with the polling UI element.
4 . The method of claim 1 , wherein providing the polling question and the one or more answer options for polling the other participants via the one or more client devices comprises: updating a user interface (UI) of the one or more client devices to include a textual form of the polling question and the one or more answer options.
5 . The method of claim 1 , further comprising: extracting audio data from the one or more content streams obtained from the client device, wherein the audio data comprises one or more audio signals representing the verbal phrases provided by the participant, and wherein the extracted audio data is fed as the input to the machine learning model.
6 . The method of claim 1 , further comprising: converting an audio file comprising audio data of the one or more content streams into one or more text strings representing a textual version of the verbal phrases provided by the participant, wherein the one or more text strings are fed as the input to the machine learning model.
7 . The method of claim 1 , further comprising: obtaining one or more additional content streams from at least one of the one or more client devices associated with the other participants, wherein the one or more additional content streams indicate verbal phrases provided by an additional participant in response to the polling question; and updating at least one of the polling question or the one or more answer options provided for polling the other participants based on the obtained one or more additional content streams.
8 . The method of claim 1 , wherein the machine learning model was trained on a corpus of historical data associated with one or more users of a platform, the historical data comprising at least one of: audio data associated with one or more historical verbal phrases provided by the one or more users or historical polling data indicating one or more historical polling questions used for historical polling of other users of the platform.
9 . A system comprising: a memory device; and a processing device coupled to the memory device, the processing device to perform operations comprising: obtaining one or more content streams from a client device associated with a participant of a plurality of participants during a conference call, wherein the one or more content streams indicate verbal phrases provided by the participant during an ongoing discussion between the plurality of participants during the conference call; feeding the one or more content streams as input to a machine learning model trained to determine a semantic context associated with verbal phrases included in given content streams and predict, based on the determined context, whether the verbal phrases comprise polling questions for polling conference call participants; determining, based on one or more outputs of the machine learning model, a polling question for polling other participants of the plurality of participants and one or more answer options corresponding to the polling question; and providing the polling question and the one or more answer options for polling the other participants via one or more client devices associated with the other participants.
10 . The system of claim 9 , wherein obtaining the one or more content streams from the client device associated with the participant comprises: receiving, via a user interface (UI) of the client device, an indication that the participant is to provide the verbal phrases; and responsive to receiving the indication, initiating a recording of the verbal phrases provided by the participant.
11 . The system of claim 10 , wherein the UI of the client device comprises a polling UI element that is associated with initiating verbal polling of the plurality of participants, and wherein the indication that the participant is to provide the verbal phrases is received via the UI upon detection that the participant has interacted with the polling UI element.
12 . The system of claim 9 , wherein providing the polling question and the one or more answer options for polling the other participants via the one or more client devices comprises: updating a user interface (UI) of the one or more client devices to include a textual form of the polling question and the one or more answer options.
13 . The system of claim 9 , wherein the operations further comprise: extracting audio data from the one or more content streams obtained from the client device, wherein the audio data comprises one or more audio signals representing the verbal phrases provided by the participant, and wherein the extracted audio data is fed as the input to the machine learning model.
14 . The system of claim 9 , wherein the operations further comprise: converting an audio file comprising audio data of the one or more content streams into one or more text strings representing a textual version of the verbal phrases provided by the participant, wherein the one or more text strings are fed as the input to the machine learning model.
15 . The system of claim 9 , wherein the operations further comprise: obtaining one or more additional content streams from at least one of the one or more client devices associated with the other participants, wherein the one or more additional content streams indicate verbal phrases provided by an additional participant in response to the polling question; and updating at least one of the polling question or the one or more answer options provided for polling the other participants based on the obtained one or more additional content streams.
16 . A non-transitory computer readable storage medium comprising instructions for a server that, when executed by a processing device, cause the processing device to perform operations comprising: obtaining one or more content streams from a client device associated with a participant of a plurality of participants during a conference call, wherein the one or more content streams indicate verbal phrases provided by the participant during an ongoing discussion between the plurality of participants during the conference call; feeding the one or more content streams as input to a machine learning model trained to determine a semantic context associated with verbal phrases included in given content streams and predict, based on the determined context, whether the verbal phrases comprise polling questions for polling conference call participants; determining, based on one or more outputs of the machine learning model, a polling question for polling other participants of the plurality of participants and one or more answer options corresponding to the polling question; and providing the polling question and the one or more answer options for polling the other participants via one or more client devices associated with the other participants.
17 . The non-transitory computer readable storage medium of claim 16 , wherein obtaining the one or more content streams from the client device associated with the participant comprises: receiving, via a user interface (UI) of the client device, an indication that the participant is to provide the verbal phrases; and responsive to receiving the indication, initiating a recording of the verbal phrases provided by the participant.
18 . The non-transitory computer readable storage medium of claim 17 , wherein the UI of the client device comprises a polling UI element that is associated with initiating verbal polling of the plurality of participants, and wherein the indication that the participant is to provide the verbal phrases is received via the UI upon detection that the participant has interacted with the polling UI element.
19 . The non-transitory computer readable storage medium of claim 16 , wherein providing the polling question and the one or more answer options for polling the other participants via the one or more client devices comprises: updating a user interface (UI) of the one or more client devices to include a textual form of the polling question and the one or more answer options.
20 . The non-transitory computer readable storage medium of claim 16 , wherein the operations further comprise: extracting audio data from the one or more content streams obtained from the client device, wherein the audio data comprises one or more audio signals representing the verbal phrases provided by the participant, and wherein the extracted audio data is fed as the input to the machine learning model.

Description

RELATED APPLICATIONS This continuation application claims priority to U.S. patent application Ser. No. 17/538,593 filed on Nov. 30, 2021 and entitled “METHODS AND SYSTEMS FOR VERBAL POLLING DURING A CONFERENCE CALL DISCUSSION,” which claims the benefit of U.S. Provisional Patent Application No. 63/236,570 filed on Aug. 24, 2021 and entitled “METHODS AND SYSTEMS FOR VERBAL POLLING DURING A CONFERENCE CALL DISCUSSION,” which is incorporated by reference herein. TECHNICAL FIELD Aspects and implementations of the present disclosure relate to methods and systems for verbal polling during a conference call discussion. BACKGROUND Video or audio-based conference call discussions can take place between multiple participants via a conference platform. A conference platform includes tools that allow multiple client devices to be connected over a network and share each other's audio data (e.g., voice of a user recorded via a microphone of a client device) and/or video data (e.g., a video captured by a camera of a client device, or video captured from a screen image of the client device) for efficient communication. A conference platform can also include tools to allow a participant of a conference call to pose a question to other participants (e.g., via a conference platform user interface (UI)) during the conference call discussion to solicit responses (referred to as polling). The conference platform can collect responses provided by the other participants and generate polling results. SUMMARY The below summary is a simplified summary of the disclosure in order to provide a basic understanding of some aspects of the disclosure. This summary is not an extensive overview of the disclosure. It is intended neither to identify key or critical elements of the disclosure, nor delineate any scope of the particular implementations of the disclosure or any scope of the claims. Its sole purpose is to present some concepts of the disclosure in a simplified form as a prelude to the more detailed description that is presented later. In some implementations, a method and system are disclosed for verbal polling during a conference call discussion. In an implementation, a user interface (UI) is provided to participants of a video conference call. The UI enables one of the participants to verbally provide a question for polling of one or more additional participants of the participants. An indication that a first participant is to provide a verbal question is received via the UI. The verbal question provided by the first participant is recorded. An indication that the first participant has finished providing the verbal question is received via the UI. A determination is made that the verbal question is to be used for polling of second participants of the video conference call. A textual form of the verbal question is provided to the one or more second participants of the video conference call in the UI. BRIEF DESCRIPTION OF THE DRAWINGS Aspects and implementations of the present disclosure will be understood more fully from the detailed description given below and from the accompanying drawings of various aspects and implementations of the disclosure, which, however, should not be taken to limit the disclosure to the specific aspects or implementations, but are for explanation and understanding only. FIG. 1 illustrates an example system architecture, in accordance with implementations of the present disclosure. FIG. 2 is a block diagram illustrating a conference platform and a polling engine for the conference platform, in accordance with implementations of the present disclosure. FIGS. 3A-3H illustrate audio-based polling via a user interface (UI) provided by a conference platform, in accordance with implementations of the present disclosure. FIG. 4 depicts a flow diagram of a method for verbal polling via a UI provided by a conference platform, in accordance with implementations of the present disclosure. FIG. 5 is a block diagram illustrating an exemplary computer system, in accordance with implementations of the present disclosure. DETAILED DESCRIPTION Aspects of the present disclosure relate to verbal polling during a conference call discussion. A conference platform can enable video or audio-based conference call discussions between multiple participants via respective client devices that are connected over a network and share each other's audio data (e.g., voice of a user captured via a microphone of a client device) and/or video data (e.g., a video captured by a camera of a client device) during a conference call. In some instances, a conference platform can enable a significant number of client devices (e.g., up to one hundred or more client devices) to be connected via the conference call. A participant of a conference call may want to pose a question to the other participants of the conference call to solicit responses from the other participants (referred to herein as polling). The participant can provide a polling questio