EP-4544788-B1 - AUDIO SIGNAL PROCESSING METHOD AND SYSTEM FOR CORRECTING A SPECTRAL SHAPE OF A VOICE SIGNAL MEASURED BY A SENSOR IN AN EAR CANAL OF A USER

EP4544788B1EP 4544788 B1EP4544788 B1EP 4544788B1EP-4544788-B1

Inventors

ROBBEN, STIJN
HUSSENBOCUS, ABDEL YUSSEF

Dates

Publication Date: 20260506
Application Date: 20230622

Claims (15)

An audio signal processing method (20) implemented by an audio system (10) which comprises at least an internal sensor (11), wherein the internal sensor (11) corresponds to air conduction sensor located in an ear canal of a user of the audio system (10) and arranged to measure acoustic signals which propagate internally to a head of the user, wherein the audio signal processing method (20) comprises: producing an internal audio signal by the internal sensor, determining (210) an audio spectrum of the internal audio signal, determining (220) a spectral center of the audio spectrum, determining (230) a spectrum shape correction filter based on the spectral center, filtering (240) the internal audio signal by using the spectrum shape correction filter, thereby producing a filtered internal audio signal.
The audio signal processing method (20) according to claim 1, wherein the spectral center is a spectral centroid or a spectral median of the audio spectrum.
The audio signal processing method (20) according to any one of the preceding claims, wherein determining the spectrum shape correction filter comprises comparing the spectral center with one or more predetermined thresholds.
The audio signal processing method (20) according to claim 3, wherein, responsive to the spectral center being greater than at least one predetermined threshold, determining the spectrum shape correction filter comprises configuring said spectrum shape correction filter to modify the audio spectrum of the internal audio signal to reduce the spectral center of said audio spectrum.
The audio signal processing method (20) according to any one of claims 3 to 4, wherein one of the one or more predetermined thresholds is between 200 Hertz and 800 Hertz, or between 300 Hertz and 600 Hertz.
The audio signal processing method (20) according to any one of the preceding claims, further comprising one or more of: (a) evaluating a voice activity in the internal audio signal and, responsive to no voice activity being detected in the internal audio signal, not applying or not modifying the spectrum shape correction filter; (b) the determining the spectrum shape correction filter comprises selecting, based on the spectral center, a spectrum shape correction filter among a plurality of predetermined different spectrum shape correction filters; (c) the internal audio signal comprises a plurality of successive audio frames, the spectrum shape correction filter being determined by processing one or more previous audio frames of the internal audio signal is applied to a current audio frame before determining the spectral center for the current audio frame, and the audio signal processing method further comprises determining an inverse spectrum shape correction filter of the spectrum shape correction filter determined by processing the one or more previous audio frames and filtering the current audio frame by the inverse spectrum shape correction filter before determining the spectral center for the current audio frame; (d) the filtering the internal audio signal is performed by applying the spectrum shape correction in time domain or in frequency domain; and/or (e) the audio system further comprises an external sensor arranged to measure acoustic signals which propagate externally to the user's head, said audio signal processing method further comprising: producing an external audio signal by the external sensor, and producing an output signal by combining the external audio signal with the filtered internal audio signal.
An audio system (10) comprising at least an internal sensor (11), wherein the internal sensor corresponds to an air conduction sensor to be located in an ear canal of a user of the audio system (10) and arranged to measure acoustic signals which propagate internally to a head of the user, wherein the internal sensor (11) is configured to produce an internal audio signal, wherein said audio system (10) further comprises a processing circuit (13) configured to: Determine (210) an audio spectrum of the internal audio signal, determine (220) a spectral center of the audio spectrum, determine (230) a spectrum shape correction filter based on the spectral center, filter (240) the internal audio signal by using the spectrum shape correction filter, thereby producing a filtered internal audio signal.
The audio system (10) according to claim 7, wherein the spectral center is a spectral centroid or a spectral median of the audio spectrum.
The audio system (10) according to any one of claims 7 to 8, wherein the processing circuit (13) is configured to determine the spectrum shape correction filter by comparing the spectral center with one or more predetermined thresholds.
The audio system (10) according to claim 9, wherein, responsive to the spectral center being greater than at least one predetermined threshold, the processing circuit (13) is configured to determine the spectrum shape correction filter by configuring said spectrum shape correction filter to modify the audio spectrum of the internal audio signal to reduce the spectral center of said audio spectrum.
The audio system (10) according to any one of claims 9 to 10, wherein one of the one or more predetermined thresholds is between 200 Hertz and 800 Hertz, or between 300 Hertz and 600 Hertz.
The audio system (10) according to any one of claims 7 to 11, wherein the processing circuit (13) is further configured to: evaluate a voice activity in the internal audio signal and, responsive to no voice activity being detected in the internal audio signal, not applying or not modifying the spectrum shape correction filter.
The audio system (10) according to any one of claims 7 to 12, wherein the processing circuit is configured to determine the spectrum shape correction filter by selecting, based on the spectral center, a spectrum shape correction filter among a plurality of predetermined different spectrum shape correction filters.
The audio system (10) according to any one of claims 7 to 13, wherein one or more of the following apply: (a) the internal audio signal comprises a plurality of successive audio frames, the spectrum shape correction filter determined by processing a one or more previous audio frames of the internal audio signal is applied to a current audio frame before determining the spectral center for the current audio frame, the processing circuit (13) is further configured to determine an inverse spectrum shape correction filter of the spectrum shape correction filter determined by processing the one or more previous audio frames and to filter the current audio frame by the inverse spectrum shape correction filter before determining the spectral center for the current audio frame; (b) filtering the internal audio signal is performed by applying the spectrum shape correction in time domain or in frequency domain; and/or (c) the audio system (10) further comprises an external sensor (12) arranged to measure acoustic signals which propagate externally to the user's head, wherein the external sensor (12) is configured to produce an external audio signal, wherein the processing circuit (13) is further configured to produce an output signal by combining the external audio signal with the filtered internal audio signal.
A non-transitory computer readable medium comprising computer readable code to be executed by an audio system (10) comprising at least an internal sensor (11), wherein the internal sensor (11) corresponds to an air conduction sensor to be located in an ear canal of a user of the audio system (10) and arranged to measure acoustic signals which propagate internally to a head of the user, wherein said audio system (10) further comprises a processing circuit (13), wherein said computer readable code causes said audio system (10) to: produce an internal audio signal by the internal sensor, determine (210) an audio spectrum of the internal audio signal, determine (220) a spectral center of the audio spectrum, determine (230) a spectrum shape correction filter based on the spectral center, filter (240) the internal audio signal by using the spectrum shape correction filter, thereby producing a filtered internal audio signal.

Description

BACKGROUND OF THE INVENTION Field of the Invention The present disclosure relates to audio signal processing and relates more specifically to a method and computing system for correcting a spectral shape of a voice signal measured by an audio sensor located inside an ear canal of a user of the audio system. The present disclosure finds an advantageous application, although in no way limiting, in wearable devices such as earbuds or earphones or smart glasses used to pick-up voice for a voice call established using any voice communication system. EP-A1-3742756 suggests a method and device for detecting a wearing state of an earphone and an earphone are disclosed. The method includes that: a source audio signal input into a loudspeaker of an earphone and a feedback audio signal collected by a prepositive microphone are acquired; a transfer function between the source audio signal and the feedback audio signal is acquired according to the source audio signal and the feedback audio signal; and a wearing state of the earphone is acquired according to the transfer function, and audio compensation processing is performed on the source audio signal according to the wearing state. EP-A1-3089475 suggests a method and an apparatus for earphone sound effect compensation and an earphone. The method of the present invention comprises: obtaining monitored signal data in a current wearing state of an earphone user according to a signal collected by a monitoring microphone and an audio signal played by a loudspeaker of the earphone; computing error data of the monitored signal data in the current wearing state relative to standard signal data in a standard wearing state of the earphone; and performing sound effect compensation to the earphone according to the error data. Description of the Related Art To improve picking up a user's voice signal in noisy environments, wearable devices like earbuds or earphones are typically equipped with different types of audio sensors such as microphones and/or accelerometers. These audio sensors are usually positioned such that at least one audio sensor, referred to as external sensor, picks up mainly air-conducted voice and such that at least another audio sensor, referred to as internal sensor, picks up mainly bone-conducted voice. Compared to an external sensor, an internal sensor picks up the user's voice with less ambient noise but with a limited spectral bandwidth (mainly low frequencies), such that the bone-conducted voice provided by the internal sensor can be used to enhance the air-conducted voice provided by the external sensor, and vice versa. External sensors are usually air conduction sensors (e.g. microphones), while internal sensors can be either air conduction sensors or bone conduction sensors (e.g. accelerometers). Voice signals measured by a bone conduction sensor are usually unaffected by the fit of an earbud, wherein a tight fit corresponds to substantially no gap between the earbud and the user's ear while a loose fit corresponds to the presence of a gap between the earbud and the user's ear. As long as the earbud is in contact with the skin inside the ear canal, a consistent voice signal capture is obtained with minimal ambient noise leakage. On the other hand, voice signals captured by an internal air conduction sensor are affected by the fit of the earbud. In particular, a loose fit will usually result in a reduction in the low frequency (below ~600 Hertz) components due to less occlusion effect. A loose fit may also result in a boost in the mid frequency (in the range of around 600 Hertz to 1500 Hertz) components due to more resonance in the ear canal and due to increased ambient noise leakage. The use of an active Noise Cancellation (ANC) unit may also affect voice signals captured by an internal air conduction sensor, especially in the case of a feedback ANC unit. More specifically, the use of an ANC unit causes a reduction in the low frequency components of voice signals captured by an internal air conduction sensor, thereby reducing the occlusion effect. In some existing solutions, audio signals from an internal sensor and an external sensor are mixed together for mitigating noise, by using the audio signal provided by the internal sensor mainly for low frequencies while using the audio signal provided by the external sensor for higher frequencies. However, in the case of loose fitting of the earbud or with an active ANC unit, the reduction of the low frequency components and/or the boost of the mid frequency components of the audio signal provided by the internal sensor eventually results in an inconsistent sounding voice in the output signal. Audio signals from internal sensors may also be used for purposes other than mixing with audio signals from e.g. external sensors. For instance, audio signals from internal sensors may be used for voice activity detection (VAD), speech level estimation, speech recognition, etc., which are also affected by loose fitting