Search

EP-4740208-A1 - ENVIRONMENTAL NOISE COMPENSATION IN TELECONFERENCING

EP4740208A1EP 4740208 A1EP4740208 A1EP 4740208A1EP-4740208-A1

Abstract

A method of compensating for environmental noise during a teleconference may involve: estimating, by a control system, a current speech spectrum corresponding to speech of remote teleconference participants; estimating, by the control system, a current noise spectrum corresponding to environmental noise in a local environment in which a local teleconference participant is located; calculating, by the control system, a current speech intelligibility index (SII) based, at least in part, on the current speech spectrum and the current noise spectrum; determining, by the control system and based at least in part on the current SII, whether to make an adjustment of a local audio system used by the local teleconference participant, wherein the determining involves evaluating the current SII according to one or more target SII parameters; and updating at least one of the one or more target SII parameters responsive to user input corresponding to a playback volume change.

Inventors

  • WANG, NING
  • THASARATHAN, SHANUSH PREMA
  • CHO, BYUNG HOON

Assignees

  • Dolby Laboratories Licensing Corporation

Dates

Publication Date
20260513
Application Date
20240701

Claims (18)

  1. 1. A method of compensating for environmental noise during a teleconference, the method comprising: estimating, by a control system, a current speech spectrum corresponding to speech of remote teleconference participants; estimating, by the control system, a current noise spectrum corresponding to environmental noise in a local environment in which a local teleconference participant is located; calculating, by the control system, a current speech intelligibility index (SII) based, at least in part, on the current speech spectrum and the current noise spectrum; determining, by the control system and based at least in part on the current SII, whether to make an adjustment of a local audio system used by the local teleconference participant, wherein the determining involves evaluating the current SII according to one or more target SII parameters; and updating, by the control system, at least one of the one or more target SII parameters responsive to user input corresponding to a playback volume change.
  2. 2. The method of claim 1, wherein the determining involves determining whether the current SII is within a target SII range.
  3. 3. The method of claim 1 or claim 2, further comprising, responsive to determining that the adjustment should be made, adjusting, by the control system, at least a portion of the local audio system.
  4. 4. The method of any one of claims 1-3, further comprising determining a confidence value corresponding to the current noise spectrum, the confidence value indicating a likelihood of a current input audio frame corresponding mainly to ambient noise.
  5. 5. The method of claim 4, wherein the confidence value is a broadband confidence value and wherein determining whether to make the adjustment of the local audio system is based, at least in part, on the broadband confidence value.
  6. 6. The method of claim 4 or claim 5, further comprising determining whether to update one or more noise statistics based, at least in part, on the confidence value.
  7. 7. The method of claim 4, further comprising determining a band-based confidence value for each frequency band of a plurality of frequency bands and wherein determining whether to make the adjustment of the local audio system involves determining whether to update one or more noise statistics for each frequency band based, at least in part, on the band-based confidence value.
  8. 8. The method of any one of claims 1-7, wherein estimating the current noise spectrum involves estimating an echo coupling gain corresponding to local loudspeaker playback captured by a local microphone system.
  9. 9. The method of claim 8, wherein estimating the echo coupling gain involves: determining a maximum band loudspeaker reference power for each frequency band of a current audio frame and previous N-l audio frames, where N s an integer corresponding to a number of audio frames in a delay line; tracking a minimum power for each frequency band of an input microphone signal; and estimating an echo coupling gain for each frequency band of the input microphone signal based, at least in part, on the maximum band loudspeaker reference power and the minimum power.
  10. 10. The method of claim 9, further comprising determining whether to update an estimated echo coupling gain based, at least in part, on whether a current estimated echo coupling gain value has changed more than a threshold amount from a most recent estimated echo coupling gain value, on whether an estimated echo coupling gain has been updated within a threshold time interval, or on combinations thereof.
  11. 11. The method of any one of claims 1-10, further comprising, responsive to determining that the adjustment should be made, adjusting, by the control system, at least a portion of the local audio system to maintain the current SII within a target SII range.
  12. 12. The method of claim 11, wherein maintaining the current SII within a target SII range involves increasing a loudspeaker playback volume of one or more audio frames until the current SII is greater than a first target SII and less than a high target SII.
  13. 13. The method of claim 12, wherein the first target SII is greater than a median target SII and less than the high target SII.
  14. 14. The method of claim 12 or claim 13, wherein maintaining the current SII within a target SII range involves decreasing a loudspeaker playback volume of one or more audio frames until the current SII is less than a second target SII and greater than a low target SII.
  15. 15. The method of claim 14, wherein the second target SII is less than a median target SII and greater than the low target SII.
  16. 16. An apparatus configured to perform the method of any one of claims 1- 15.
  17. 17. A system configured to perform the method of any one of claims 1-15.
  18. 18. One or more non-transitory media having instructions stored thereon for controlling one or more devices to perform the method of any one of claims 1- 15.

Description

ENVIRONMENTAL NOISE COMPENSATION IN TELECONFERENCING CROSS REFERENCE TO RELATED APPLICATIONS [0001] The present application claims priority to U.S. Provisional Patent Application No. 63/512,424, filed on July 7, 2023, and U.S. Provisional Patent Application No. 63/635,570, filed on April 17, 2024, all of which are incorporated by reference in their entirety. TECHNICAL FIELD [0002] This disclosure pertains to devices, systems and methods for environmental noise compensation (ENC), particularly ENC in the teleconferencing context. As used herein, the term “teleconferencing” encompasses both audio/ video teleconferencing and audio teleconferencing. BACKGROUND [0003] Teleconferencing has become an important part of modern life. The ability to communicate clearly while teleconferencing is based mainly on speech intelligibility, which in turn is based in part on the presence or absence of noise in the audio signal(s). Although existing devices, systems and methods for speech intelligibility estimation, noise estimation and ENC provide benefits, improved systems and methods would be desirable. SUMMARY [0004] At least some aspects of the present disclosure may be implemented via methods. Some such methods involve compensating for environmental noise during a teleconference. For example, some methods may involve estimating, by a control system, a current speech spectrum corresponding to speech of remote teleconference participants and estimating, by the control system, a current noise spectrum corresponding to environmental noise in a local environment in which a local teleconference participant is located. Some methods may involve calculating, by the control system, a current speech intelligibility index (SII) based, at least in part, on the current speech spectrum and the current noise spectrum. Some methods may involve determining, by the control system and based at least in part on the current SII, whether to make an adjustment of a local audio system used by the local teleconference participant. Some methods may involve updating, by the control system, at least one of the one or more target SII parameters responsive to user input corresponding to a playback volume change. [0005] The determining may involve evaluating the current SII according to one or more target SII parameters. In some examples, the determining may involve determining whether the current SII is within a target SII range. Some methods may involve adjusting, by the control system, at least a portion of the local audio system responsive to determining that the adjustment should be made. [0006] Some methods may involve determining a confidence value corresponding to the current noise spectrum. The confidence value may, for example, indicate a likelihood of a current input audio frame corresponding mainly to ambient noise. According to some examples, the confidence value may be a broadband confidence value. Determining whether to make the adjustment of the local audio system may be based, at least in part, on the broadband confidence value. Some methods may involve determining a band-based confidence value for each frequency band of a plurality of frequency bands. In some examples, determining whether to make the adjustment of the local audio system may involve determining whether to update one or more noise statistics for each frequency band based, at least in part, on the band-based confidence value. Some methods may involve determining whether to update one or more noise statistics based, at least in part, on the confidence value. [0007] In some examples, estimating the current noise spectrum may involve estimating an echo coupling gain corresponding to local loudspeaker playback captured by a local microphone system. In some such examples, estimating the echo coupling gain may involve determining a maximum band loudspeaker reference power for each frequency band of a current audio frame and previous N-l audio frames, where N s an integer corresponding to a number of audio frames in a delay line. In some such examples, estimating the echo coupling gain may involve tracking a minimum power for each frequency band of an input microphone signal and estimating an echo coupling gain for each frequency band of the input microphone signal based, at least in part, on the maximum band loudspeaker reference power and the minimum power. Some methods may involve determining whether to update an estimated echo coupling gain based, at least in part, on whether a current estimated echo coupling gain value has changed more than a threshold amount from a most recent estimated echo coupling gain value, on whether an estimated echo coupling gain has been updated within a threshold time interval, or both. [0008] Some methods may involve, responsive to determining that the adjustment should be made, adjusting at least a portion of the local audio system to maintain the current SII within a target SII range. Some methods may involve, responsive to determining th