US-12621409-B2 - Multi-party optimization for audiovisual enhancement

US12621409B2US 12621409 B2US12621409 B2US 12621409B2US-12621409-B2

Abstract

A computer-implemented method for selectively applying audiovisual enhancement functions to an audiovisual communications stream includes transmitting, by a sender computing system, an audiovisual communication stream to a receiver computing system, obtaining, by the sender computing system, one or more receiver perception feedback signals associated with the audiovisual communication stream, the one or more receiver perception feedback signals obtained as output from one or more receiver perception feedback models at the receiver computing system and descriptive of perception of the audiovisual communication stream by a user operating the receiver computing system, and applying, by the sender computing system, one or more audiovisual enhancement functions to the audiovisual communication stream based at least in part on the one or more receiver perception feedback signals.

Inventors

Matthew Sharifi

Assignees

GOOGLE LLC

Dates

Publication Date: 20260505
Application Date: 20210303

Claims (20)

1 . A computer-implemented method for selectively applying audiovisual enhancement functions to an audiovisual communications stream, the method comprising: transmitting, by a sender computing system, an audiovisual communication stream to a receiver computing system; obtaining, by the sender computing system, one or more receiver perception feedback signals associated with the audiovisual communication stream, the one or more receiver perception feedback signals obtained as output from one or more receiver perception feedback models at the receiver computing system and descriptive of perception of the audiovisual communication stream by a user operating the receiver computing system, wherein the one or more receiver perception feedback models comprise a gaze detection model; and applying, by the sender computing system, one or more audiovisual enhancement functions to the audiovisual communication stream based at least in part on the one or more receiver perception feedback signals, wherein the one or more audiovisual enhancement functions comprise an audio fidelity adjustment function, and wherein the method comprises: responsive to the one or more receiver perception feedback signals indicating that the user's gaze is on a particular spatial region of a visual component of the transmitted audiovisual communication stream, applying, by the sender computing system, an audio fidelity adjustment to a part of an audio component of the audiovisual communication stream corresponding to the particular spatial region.
2 . The computer-implemented method of claim 1 , wherein the one or more receiver perception feedback models comprise an ambient noise level recognition model, and wherein the method comprises: responsive to the one or more receiver perception feedback signals indicating that the ambient noise level of an environment of the receiver computing system has increased, applying, by the sender computing system, the one or more audiovisual enhancement functions to increase a fidelity of an audio component of the transmitted audiovisual communication stream.
3 . The computer-implemented method of claim 1 , wherein the one or more audiovisual enhancement functions comprise a focus filter configured to provide an improved fidelity at a focus region, and wherein the method comprises: applying, by the sender computing system, the focus filter to the audiovisual stream to provide an improved fidelity at spatial region of a visual component of the transmitted audiovisual communication stream that is indicated by the receiver perception feedback signals to be a focus of the gaze of the user operating the receiver computing system.
4 . The computer-implemented method of claim 1 wherein the one or more receiver perception feedback models comprise a user reaction recognition model and the one or more audiovisual enhancement functions comprise a video fidelity adjustment and/or an audio fidelity adjustment, and wherein the method comprises: responsive to the one or more receiver perception feedback signals indicating that the user may be having difficulty comprehending the transmitted audiovisual communication stream, applying, by the sender computer system, the one or more audiovisual enhancement functions to increase a fidelity of an audio component of the transmitted audiovisual communication stream and/or increase a fidelity of at least part of a visual component of the transmitted audiovisual communication stream.
5 . The computer-implemented method of claim 4 , wherein the method comprises: responsive to the one or more receiver perception feedback signals indicating that the user may be having difficulty comprehending the visual component of the transmitted audiovisual communication stream, applying by the sender computer system the one or more audiovisual enhancement functions to increase a fidelity of a region of a visual component of the transmitted audiovisual communication stream that is indicated to be a focus of the gaze of the user operating the receiver computing system.
6 . The computer-implemented method of claim 1 , wherein the method comprises: responsive to the one or more receiver perception feedback signals indicating that the user's gaze is on the particular spatial region of the visual component of the transmitted audiovisual communication stream, applying by the sender computer system the one or more audiovisual enhancement functions to increase a fidelity of a part of the audio component of the transmitted audiovisual communication stream that derives from an entity that is depicted in the particular spatial region of the visual component of the transmitted audiovisual communication stream.
7 . The computer-implemented method of claim 1 , wherein the one or more receiver perception feedback models comprise an automatic speech recognition model and the one or more audiovisual enhancement functions comprise an audio fidelity adjustment, and wherein the method comprises: responsive to the one or more receiver perception feedback signals indicating a decrease in a confidence associated with recognition, using the automatic speech recognition model, of speech that is present in an audio component of the audiovisual communication stream received at the receiver computing system, applying, by the sender computer system, the one or more audiovisual enhancement functions to increase a fidelity of the audio component of the transmitted audiovisual communication stream.
8 . The computer-implemented method of claim 1 , the one or more audiovisual enhancement functions comprise a video fidelity adjustment and/or an audio fidelity adjustment, and wherein the method comprises: responsive to the one or more receiver perception feedback signals indicating that the gaze of the user operating the receiver computing system is focused outside a display region on which the visual component of the audiovisual communication stream is presented, applying, by the sender computer system, the one or more audiovisual enhancement functions to decrease a fidelity of the visual component of the transmitted audiovisual communication stream and/or to increase a fidelity of an audio component of the transmitted audiovisual communication stream.
9 . The computer-implemented method of claim 1 , wherein the one or more receiver perception feedback signals are continuously received from the receiver computing system to provide real-time feedback.
10 . The computer-implemented method of claim 1 , wherein the one or more audiovisual enhancement functions comprise one or more of: a choice of compression scheme type; a video fidelity adjustment; an audio fidelity adjustment; a focus filter configured to provide an improved fidelity at a focus region; and one or more audiovisual enhancement functions comprise an ambient noise filter.
11 . The computer-implemented method of claim 1 , wherein the one or more receiver perception feedback models comprise one or more of: an ambient noise level recognition model; a user reaction recognition model; and an automatic speech recognition model.
12 . A computing system configured for selectively applying audiovisual enhancement functions to an audiovisual communications stream, the computing system comprising: a sender computing system comprising one or more processors and one or more memory devices storing computer readable instructions that, when implemented, cause the one or more processors to perform operations, the operations comprising: transmitting an audiovisual communication stream to a receiver computing system; receiving, from the receiver computing system, one or more receiver perception feedback signals; applying one or more audiovisual enhancement functions to the audiovisual communication stream based at least in part on the one or more receiver perception feedback signals, wherein the one or more receiver perception feedback signals are generated by a gaze detection model and the one or more audiovisual enhancement functions comprise a video fidelity adjustment and/or an audio fidelity adjustment; and responsive to the one or more receiver perception feedback signals indicating that the gaze of a user operating the receiver computing system is focused outside a display region on which a visual component of the audiovisual communication stream is presented, applying, by the sender computer system, the one or more audiovisual enhancement functions to decrease a fidelity of the visual component of the audiovisual communication stream and/or to increase a fidelity of an audio component of the audiovisual communication stream.
13 . The computing system of claim 12 , wherein the one or more receiver perception feedback signals are continuously received from the receiver computing system to provide real-time feedback.
14 . The computing system of claim 12 , wherein the sender computing system comprises a sender encoder model, the sender encoder model configured to encode the audiovisual stream prior to transmission to the receiver computing system.
15 . The computing system of claim 12 , wherein the one or more audiovisual enhancement functions comprise one or more of: a choice of compression scheme type; a focus filter configured to provide an improved fidelity at a focus region; and one or more audiovisual enhancement functions comprise an ambient noise filter.
16 . The computing system of claim 12 , wherein the one or more receiver perception feedback signals are generated by one or more of: an ambient noise level recognition model; a user reaction recognition model; and an automatic speech recognition model.
17 . A computing system configured for selectively applying audiovisual enhancement functions to an audiovisual communications stream, the computing system comprising: a receiver computing system comprising one or more processors and one or more memory devices storing computer readable instructions that, when implemented, cause the one or more processors to perform operations, the operations comprising receiving, from a sender computing system, an audiovisual communication stream; obtaining one or more receiver perception feedback signals associated with the audiovisual communication stream, the one or more receiver perception feedback signals obtained as output from one or more receiver perception feedback models comprising a gaze detection model; and transmitting the one or more receiver perception feedback signals to the sender computing system, the one or more receiver perception feedback signals indicating that a gaze of a user is on a particular spatial region of a visual component of the audiovisual communication stream; and receiving, from the sender computing system, a modified audiovisual communication stream, the modified audiovisual communication stream including an audio fidelity adjustment to a part of an audio component of the audiovisual communication stream corresponding to the particular spatial region.
18 . The computing system of claim 17 , wherein the receiver computing system comprises a receiver decoder model, the receiver decoder model configured to decode the audiovisual stream from the sender computing system.
19 . The computing system of claim 17 , wherein the one or more audiovisual enhancement functions comprise one or more of: a choice of compression scheme type; a video fidelity adjustment; an audio fidelity adjustment; a focus filter configured to provide an improved fidelity at a focus region; and one or more audiovisual enhancement functions comprise an ambient noise filter.
20 . The computing system of claim 17 , wherein the one or more receiver perception feedback models comprise one or any combination of: a gaze detection model; an ambient noise level recognition model; a user reaction recognition model; and an automatic speech recognition model.

Description

PRIORITY CLAIM The present application is based upon and claims the right of priority under 35 U.S.C. § 371 to International Application No. PCT/US2021/020632 filed on Mar. 3, 2021, which is incorporated by reference herein. FIELD The present disclosure relates generally to multi-party optimization for audiovisual enhancement. More particularly, the present disclosure relates to systems and methods for selectively applying audiovisual enhancement functions to an audiovisual communications stream based at least in part on one or more receiver feedback signals from one or more receiver feedback models. BACKGROUND Audiovisual communications refers to a manner of communication by transmission of audio and/or video content from a sender to a receiver. For instance, audiovisual data can be captured or otherwise obtained at a sender and transmitted by the internet to a receiver. Digital audiovisual communication (e.g., video conferencing) has become increasingly common for business and/or social use. Multiple parties can participate in audiovisual communication, with each party being capable of transmitting and/or receiving audiovisual data to and/or from other parties. SUMMARY Aspects and advantages of embodiments of the present disclosure will be set forth in part in the following description, or can be learned from the description, or can be learned through practice of the embodiments. One example aspect of the present disclosure is directed to a computer-implemented method for selectively applying audiovisual enhancement functions to an audiovisual communications stream. The computer-implemented method includes transmitting, by a sender computing system, an audiovisual communication stream to a receiver computing system. The computer-implemented method includes obtaining, by the sender computing system, one or more receiver perception feedback signals associated with the audiovisual communication stream, the one or more receiver perception feedback signals obtained as output from one or more receiver perception feedback models at the receiver computing system and descriptive of perception of the audiovisual communication stream by a user operating the receiver computing system. The computer-implemented method includes applying, by the sender computing system, one or more audiovisual enhancement functions to the audiovisual communication stream based at least in part on the one or more receiver perception feedback signals. Another example aspect of the present disclosure is directed to a computing system configured for selectively applying audiovisual enhancement functions to an audiovisual communications stream. The computing system includes a sender computing system including one or more processors and one or more memory devices storing computer readable instructions that, when implemented, cause the one or more processors to perform operations. The operations include transmitting an audiovisual communication stream to a receiver computing system. The operations include receiving, from the receiver computing system, one or more receiver perception feedback signals. The operations include applying one or more audiovisual enhancement functions to the audiovisual communication stream based at least in part on the one or more receiver perception feedback signals. Another example aspect of the present disclosure is directed to a computing system configured for selectively applying audiovisual enhancement functions to an audiovisual communications stream. The computing system includes a receiver computing system including one or more processors and one or more memory devices storing computer readable instructions that, when implemented, cause the one or more processors to perform operations. The operations include receiving, from a sender computing system, an audiovisual communication stream. The operations include obtaining one or more receiver perception feedback signals associated with the audiovisual communication stream, the one or more receiver perception feedback signals obtained as output from one or more receiver perception feedback models. The operations include transmitting the one or more receiver perception feedback signals to the sender computing system. Another example aspect of the present disclosure is directed to a computer-implemented method for selectively applying audiovisual enhancement functions in a multiparty communication. The computer-implemented method includes receiving, by a receiver computing system, at least one audiovisual communication stream including a plurality of audiovisual communication channels from a plurality of sender computing systems. The computer-implemented method includes obtaining, by the receiver computing system, a plurality of receiver perception feedback signals respectively associated with each of the plurality of audiovisual communication channels, the one or more receiver perception feedback signals obtained as output from one or more receiver perception feed