US-12626714-B2 - Recovery of voice audio quality using a deep learning model
Abstract
Certain aspects provide methods and apparatus for recovering audio quality of voice when processing signals associated with a wearable audio output device. A method that may be performed includes receiving, by an in-ear microphone acoustically coupled to an environment inside an ear canal of a user, an audio signal having a first frequency band, predicting high-frequency band information for the audio signal using a model trained using training data of known high-frequency bands associated with low-frequency bands, generating an output signal having a second frequency band based, at least in part, on the first frequency band of the audio signal and the predicted high-frequency band information for the audio signal, and outputting, by the wearable audio output device, the output signal having the second frequency band.
Inventors
- CHUAN-CHE HUANG
- Somasundaram Meiyappan
- Nathan Blagrove
- Elio Dante Querze, III
- Shuo Zhang
- Isaac Keir Julien
- Francois Laberge
- Alaganandan Ganeshkumar
Assignees
- BOSE CORPORATION
Dates
- Publication Date
- 20260512
- Application Date
- 20220422
- Priority Date
- 20210429
Claims (20)
- 1 . A method for recovering audio quality of voice when processing signals associated with a wearable audio output device, comprising: receiving, by an in-ear microphone acoustically coupled to an environment inside an ear canal of a user, an audio signal having a first frequency band; receiving an external signal from an environment outside the ear canal of the user; determining the environment outside the ear of the user comprises a noisy environment by comparing a signal energy of the audio signal to a signal energy of the external signal; predicting high-frequency band information for the audio signal using a model trained using training data of known high-frequency bands associated with low-frequency bands; generating an output signal having a second frequency band based, at least in part, on the first frequency band of the audio signal and the predicted high-frequency band information for the audio signal; and outputting, by the wearable audio output device, the output signal having the second frequency band.
- 2 . The method of claim 1 , wherein the second frequency band of the output signal comprises a dynamic range greater than a dynamic range of the first frequency band.
- 3 . The method of claim 1 , wherein predicting high-frequency band information for the audio signal using the model trained using training data of known high-frequency bands associated with low-frequency bands comprises: extracting low-frequency band information of the first frequency band; and selecting the high-frequency band information based at least in part on a mapping between the low-frequency band information and the high-frequency band information in the trained model.
- 4 . The method of claim 1 , wherein the external signal is received by an external microphone acoustically coupled to the environment outside of the ear canal of the user.
- 5 . The method of claim 4 , further comprising: processing the audio signal using active noise reduction (ANR) to produce a noise reduced signal, wherein the noise reduced signal is generated in response to the external signal and has a third frequency band; predicting high-frequency band information for the noise reduced signal using the trained model; and wherein the output signal is based, at least in part, on the third frequency band of the noise reduced signal and the predicted high-frequency band information for the noise reduced signal.
- 6 . The method of claim 5 , wherein processing the audio signal using ANR to produce a noise reduced signal comprises: calculating a set of noise cancellation parameters in response to the external signal; and utilizing the set of noise cancellation parameters to process the audio signal.
- 7 . The method of claim 1 , further comprising: receiving feedback associated with a voice of the user of the wearable audio output device; and wherein the trained model is further trained based on the feedback.
- 8 . The method of claim 1 , wherein the trained model comprises a trained deep neural network.
- 9 . A wearable audio output device, comprising: at least one in-ear microphone acoustically coupled to an environment inside an ear canal of a user, the at least one in-ear microphone configured to receive an audio signal having a first frequency band; at least one processor and a memory coupled to the at least one in-ear microphone, the memory including instructions executable by the at least one processor to cause the wearable audio output device to: receive an external signal from an environment outside the ear canal of the user; determine the environment outside the ear of the user comprises a noisy environment by comparing a signal energy of the audio signal to a signal energy of the external signal; predict high-frequency band information for the audio signal using a model trained using training data of known high-frequency bands associated with low-frequency bands; and generate an output signal having a second frequency band based, at least in part, on the first frequency band of the audio signal and the predicted high-frequency band information for the audio signal; and at least one speaker coupled to the at least one in-ear microphone, the at least one speaker configured to: output the output signal having the second frequency band.
- 10 . The wearable audio output device of claim 9 , wherein the second frequency band of the output signal comprises a dynamic range greater than a dynamic range of the first frequency band.
- 11 . The wearable audio output device of claim 9 , wherein in order to predict high-frequency band information for the audio signal using the model trained using training data of known high-frequency bands associated with low-frequency bands, the memory further includes instructions executable by the at least one processor to cause the wearable audio output device to: extract low-frequency band information of the first frequency band; and select the high-frequency band information based at least in part on a mapping between the low-frequency band information and the high-frequency band information in the trained model.
- 12 . The wearable audio output device of claim 9 , further comprising: at least one external microphone acoustically coupled to the environment outside the ear canal of the user, wherein the at least one external microphone is configured to receive the external signal.
- 13 . The wearable audio output device of claim 12 , wherein the memory further includes instructions executable by the at least one processor to: process the audio signal using active noise reduction (ANR) to produce a noise reduced signal, wherein the noise reduced signal is generated in response to the external signal and has a third frequency band; predict high-frequency band information for the noise reduced signal using the trained model; and wherein the output signal is based, at least in part, on the third frequency band of the noise reduced signal and the predicted high-frequency band information for the noise reduced signal.
- 14 . The wearable audio output device of claim 13 , wherein in order to process the audio signal using ANR to produce a noise reduced the memory further includes instructions executable by the at least one processor to cause the wearable audio output device to: calculate a set of noise cancellation parameters in response to the external signal; and utilize the set of noise cancellation parameters to process the audio signal.
- 15 . The wearable audio output device of claim 9 , wherein the memory further includes instructions executable by the at least one processor to: receive feedback associated with a voice of the user of the wearable audio output device; and wherein the trained model is further trained based on the feedback.
- 16 . The wearable audio output device of claim 9 , wherein the trained model comprises a trained deep neural network.
- 17 . A computer-readable device storing instructions which when executed by at least one processor performs a method for recovering audio quality of voice when processing signals associated with a wearable audio output device, the method comprising: receiving, by an in-ear microphone acoustically coupled to an environment inside an ear canal of a user, an audio signal having a first frequency band; receiving an external signal from an environment outside the ear canal of the user; determining the environment outside the ear of the user comprises a noisy environment by comparing a signal energy of the audio signal to a signal energy of the external signal; predicting high-frequency band information for the audio signal using a model trained using training data of known high-frequency bands associated with low-frequency bands; generating an output signal having a second frequency band based, at least in part, on the first frequency band of the audio signal and the predicted high-frequency band information for the audio signal; and outputting, by the wearable audio output device, the output signal having the second frequency band.
- 18 . The computer-readable device of claim 17 , wherein the second frequency band of the output signal comprises a dynamic range greater than a dynamic range of the first frequency band.
- 19 . The computer-readable device of claim 17 , wherein predicting high-frequency band information for the audio signal using the model trained using training data of known high-frequency bands associated with low-frequency bands comprises: extracting low-frequency band information of the first frequency band; and selecting the high-frequency band information based at least in part on a mapping between the low-frequency band information and the high-frequency band information in the trained model.
- 20 . The computer-readable device of claim 17 , the external signal is received by an external microphone acoustically coupled to the environment outside of the ear canal of the user.
Description
CROSS-REFERENCE TO RELATED APPLICATIONS This application is a national stage application under 35 U.S.C. 371 of PCT/US2022/026003, filed Apr. 22, 2022, which claims priority to and benefit of Indian Patent Application number 202121019693, filed Apr. 29, 2021, the contents of which are herein incorporated by reference in their entirety as fully set forth below. FIELD This application claims priority to and benefit of Indian Patent Application No. 202121019693, filed Apr. 29, 2021, the contents of which are herein incorporated by reference in its entirety as fully set forth below. Aspects of the present disclosure generally relate to enhancing audio quality of voice when using an in-ear microphone. As described in more detail herein, high frequency audio quality of voice may be recovered using a model trained to recognize patterns between high and low-frequency bands. BACKGROUND Wearable audio output devices, such as headphones or earbuds, may include any number of microphones. One or more microphones of the wearable audio output device may be contained in a structure proximal to a mouth of a user of the wearable audio output device to pick up speech produced by the user. However, voice signal quality may be degraded by outside interference where one or more microphones are exposed to an external environment. Advancements in wearable audio output devices incorporate in-ear microphones to mitigate such issues. In-ear microphones may be placed inside an ear canal of the user where it captures in-ear voice signal. With a good seal of the ear canal, the in-ear voice signal may be relatively isolated from ambient external noise. As such, the in-ear microphone may be efficient for communicating in environments where external microphones become unusable. Unfortunately, voice pickup by an in-ear microphone has its own limitations. In-ear microphones significantly degrade the dynamic range (e.g., bandwidth) of a user's voice, and while it is possible to communicate with a narrow range, the user's voice may be muffled and have relatively low intelligibility, thereby making speech of the user less natural. Therefore, there is a need for improvements in the voice quality pickup when using in-ear microphones. SUMMARY All examples and features mentioned herein can be combined in any technically possible manner. Aspects provide methods and apparatus for recovering audio quality of voice when processing signals associated with a wearable audio output device. According to aspects, the wearable audio output device may include an in-ear microphone acoustically coupled to an environment inside an ear canal of a user, and in some cases, additionally, an external microphone acoustically coupled to an environment outside the ear canal of the user. Certain aspects provide a method performed by a wearable audio output device. The method includes receiving, by an in-ear microphone acoustically coupled to an environment inside an ear canal of a user, an audio signal having a first frequency band, predicting high-frequency band information for the audio signal using a model trained using training data of known high-frequency bands associated with low-frequency bands, generating an output signal having a second frequency band based, at least in part, on the first frequency band of the audio signal and the predicted high-frequency band information for the audio signal, and outputting, by the wearable audio output device, the output signal having the second frequency band. In certain aspects, the second frequency band of the output signal comprises a dynamic range greater than a dynamic range of the first frequency band. In certain aspects, predicting high-frequency band information for the audio signal using the model trained using training data of known high-frequency bands associated with low-frequency bands comprises extracting low-frequency band information of the first frequency band and selecting the high-frequency band information based at least in part on a mapping between the low-frequency band information and the high-frequency band information in the trained model. In certain aspects, the method further comprises receiving, by an external microphone acoustically coupled to an environment outside the ear canal of the user, an external signal and determining the environment comprises a noisy environment by comparing a signal energy of the audio signal to a signal energy of the external signal. In certain aspects, the method further comprises processing the audio signal using active noise reduction (ANR) to produce a noise reduced signal, wherein the noise reduced signal is generated in response to the external signal and has a third frequency band; predicting high-frequency band information for the noise reduced signal using the trained model; and wherein the output signal is based, at least in part, on the third frequency band of the noise reduced signal and the predicted high-frequency band information for the noise reduced sig