Search

EP-4738885-A1 - AUDIO ENHANCEMENT METHOD AND APPARATUS, AND EARPHONE

EP4738885A1EP 4738885 A1EP4738885 A1EP 4738885A1EP-4738885-A1

Abstract

The present application relates to an audio enhancement method and apparatus, and an earphone. The audio enhancement method includes: acquiring (S202) an air conduction audio signal and a bone conduction audio signal; acquiring (S204) a first audio feature of the air conduction audio signal and a second audio feature of the bone conduction audio signal; performing (S206) feature extraction on the bone conduction audio signal through a trained feature extraction model to obtain a voiceprint feature of the bone conduction audio signal; inputting (S208) the voiceprint feature of the bone conduction audio signal, the first audio feature, and the second audio feature into a trained amplitude prediction model to obtain a predicted amplitude; and enhancing (S210) audio based on the predicted amplitude to obtain a target audio signal. The dual model mechanism of the present application integrates the advantages of bone conduction and air conduction, whereby the obtained target audio signal does not lose the frequency band and noise reduction is achieved, thereby improving audio quality.

Inventors

  • ZHANG, XU
  • WEI, Jianqiang
  • LI, LEI
  • ZHU, Shikun
  • DENG, Xiangyu
  • FANG, Bo

Assignees

  • Anker Innovations Technology Co., Ltd.

Dates

Publication Date
20260506
Application Date
20251024

Claims (15)

  1. An audio enhancement method, comprising: - acquiring (S202) an air conduction audio signal from an air conduction microphone and a bone conduction audio signal from a bone conduction microphone; - acquiring (S204) a first audio feature of the air conduction audio signal and a second audio feature of the bone conduction audio signal; - performing (S206) feature extraction on the bone conduction audio signal through a trained feature extraction model to obtain a voiceprint feature of the bone conduction audio signal; - inputting (S208) the voiceprint feature of the bone conduction audio signal, the first audio feature, and the second audio feature into a trained amplitude prediction model to obtain a predicted amplitude; and - obtaining (S210) a target audio signal based on the predicted amplitude.
  2. The method according to claim 1, wherein the acquiring an air conduction audio signal and a bone conduction audio signal comprises: - collecting an original air conduction signal based on an air conduction microphone, and collecting an original bone conduction signal based on a bone conduction microphone; - performing short-time Fourier transformation on the original air conduction signal to obtain a frequency domain signal of the original air conduction signal, and obtaining the air conduction audio signal based on the original air conduction signal and the frequency domain signal of the original air conduction signal; and - performing short-time Fourier transformation on the original bone conduction signal to obtain a frequency domain signal of the original bone conduction signal, and obtaining the bone conduction audio signal based on the original bone conduction signal and the frequency domain signal of the original bone conduction signal.
  3. The method according to claim 2, wherein the obtaining a target audio signal based on the predicted amplitude comprises: - multiplying the predicted amplitude by a phase of the air conduction audio signal to obtain a multiplication result as a time-frequency domain enhanced signal; and - performing inverse short-time Fourier transformation on the enhanced signal to obtain the target audio signal.
  4. The method according to claim 3, wherein the collecting an original air conduction signal based on an air conduction microphone comprises: - collecting original air conduction signals from different directions based on a plurality of air conduction microphones to obtain a plurality of original air conduction signals.
  5. The method according to claim 4, wherein the plurality of air conduction microphones comprise a single-directional air conduction microphone and an all-directional air conduction microphone, and the collecting original air conduction signals from different directions based on a plurality of air conduction microphones to obtain a plurality of original air conduction signals comprises: - determining a target direction where audio signals are greater than a preset decibel threshold, and directionally collecting an audio signal in the target direction based on the single-directional air conduction microphone to obtain a single-directional original air conduction signal; and - collecting audio signals in all directions based on the all-directional air conduction microphone to obtain all-directional original air conduction signals.
  6. The method according to claim 5, wherein the air conduction audio signal is plural, and the performing short-time Fourier transformation on the original air conduction signal to obtain a frequency domain signal of the original air conduction signal, and obtaining the air conduction audio signal based on the original air conduction signal and the frequency domain signal of the original air conduction signal comprises: - performing short-time Fourier transformation on the plurality of original air conduction signals to obtain frequency domain signals of the original air conduction signals, and obtaining a plurality of air conduction audio signals based on the original air conduction signals and the corresponding frequency domain signals.
  7. The method according to claim 6, wherein the plurality of air conduction audio signals comprise a single-directional air conduction audio signal transformed from the single-directional original air conduction signal.
  8. The method according to claim 7, wherein the multiplying of the predicted amplitude by a phase of the air conduction audio signal to obtain a multiplication result as a time-frequency domain enhanced signal comprises: - multiplying the predicted amplitude by a phase of the single-directional air conduction audio signal to obtain a multiplication result as the time-frequency domain enhanced signal.
  9. The method according to any one of the preceding claims, wherein before inputting the voiceprint feature of the bone conduction audio signal, the first audio feature, and the second audio feature into a trained amplitude prediction model to obtain a predicted amplitude, the method further comprises: - acquiring (S302) an air conduction audio signal sample and a bone conduction audio signal sample; - acquiring (S304) a third audio feature of the air conduction audio signal sample and a fourth audio feature of the bone conduction audio signal sample; and - using (S306) the bone conduction audio signal sample as input data for a feature extraction model, using output of the feature extraction model, the third audio feature, and the fourth audio feature as input data for an amplitude prediction model, and using an amplitude of the air conduction audio signal sample as target output of the amplitude prediction model, to perform fusion training on the feature extraction model and the amplitude prediction model to obtain the trained feature extraction model and the trained amplitude prediction model.
  10. The method according to claim 9, wherein the acquiring an air conduction audio signal sample and a bone conduction audio signal sample comprises: - collecting an original air conduction signal sample based on the air conduction microphone, and collecting an original bone conduction signal sample based on the bone conduction microphone, wherein a single-directional original air conduction signal sample is directionally collected based on the single-directional air conduction microphone, and an all-directional original air conduction signal sample is collected based on the all-directional air conduction microphone; - performing short-time Fourier transformation on the original air conduction signal sample to obtain a frequency domain signal sample of the original air conduction signal sample, and obtaining the air conduction audio signal sample based on the original air conduction signal sample and the frequency domain signal sample of the original air conduction signal sample; and - performing short-time Fourier transformation on the original bone conduction signal sample to obtain a frequency domain signal sample of the original bone conduction signal sample, and obtaining the bone conduction audio signal sample based on the original bone conduction signal sample and the frequency domain signal sample of the original bone conduction signal sample.
  11. An audio enhancement apparatus, comprising: - a first acquisition module (701), configured to acquire an air conduction audio signal and a bone conduction audio signal; - a second acquisition module (702), configured to acquire a first audio feature of the air conduction audio signal and a second audio feature of the bone conduction audio signal; - a feature extraction module (703), configured to perform feature extraction on the bone conduction audio signal through a trained feature extraction model to obtain a voiceprint feature of the bone conduction audio signal; - an amplitude prediction module (704), configured to input the voiceprint feature of the bone conduction audio signal, the first audio feature, and the second audio feature into a trained amplitude prediction model to obtain a predicted amplitude; and - an audio enhancement module (705), configured to obtain a target audio signal based on the predicted amplitude.
  12. The audio enhancement apparatus of claim 11, wherein the first acquisition module (701) is further configured to: - collect an original air conduction signal based on an air conduction microphone, and collect an original bone conduction signal based on a bone conduction microphone; - perform short-time Fourier transformation on the original air conduction signal to obtain a frequency domain signal of the original air conduction signal, and obtain the air conduction audio signal based on the original air conduction signal and the frequency domain signal of the original air conduction signal; and - perform short-time Fourier transformation on the original bone conduction signal to obtain a frequency domain signal of the original bone conduction signal, and obtain the bone conduction audio signal based on the original bone conduction signal and the frequency domain signal of the original bone conduction signal
  13. An earphone, comprising an air conduction microphone, a bone conduction microphone, a memory, and a processor, the air conduction microphone being configured to collect an air conduction signal, the bone conduction microphone being configured to collect a bone conduction signal, and the memory storing a computer program, wherein the processor, when executing the computer program, implements the steps of the method according to any one of claims 1 to 10.
  14. A computer-readable medium, on which computer-readable instructions are stored, wherein the computer-readable instructions, when executed by a processor, implement the steps of the method according to any one of claims 1 to 10.
  15. A computer program comprising instructions which, when the computer program is executed by a computer, cause the computer to carry out the steps of the method according to any one of claims 1 to 10.

Description

Technical Field The present disclosure relates to the technical field of audio processing, and specifically, to an audio enhancement method and apparatus, and an earphone. Background This section is intended to provide a background or context for embodiments of the present invention set forth in claims and detailed description. The description in this section should not be admitted as the prior art. The popularity of mobile communication devices allows users to make calls or record audio anytime and anywhere. However, ambient noise and interfering human voice during these activities degrade the clarity of audio signals collected by the devices, leading to poor call or recording quality. Therefore, noise reduction is required to cancel ambient noise or interference with calls or recording, thereby improving audio quality. Summary In view of the above technical problems, it is an object of the present disclosure to provide an audio enhancement method and apparatus capable of improving audio quality, and an earphone. As a solution, an audio enhancement method and an audio enhancement apparatus are provided according to the independent claims. In a first aspect, an audio enhancement method is provided, including: acquiring an air conduction audio signal, preferably from an air conduction microphone, and a bone conduction audio signal, preferably from a bone conduction microphone;acquiring a first audio feature of the air conduction audio signal and a second audio feature of the bone conduction audio signal;performing feature extraction on the bone conduction audio signal through a trained feature extraction model to obtain a voiceprint feature of the bone conduction audio signal;inputting the voiceprint feature of the bone conduction audio signal, the first audio feature, and the second audio feature into a trained amplitude prediction model to obtain a predicted amplitude; andobtaining a target audio signal based on the predicted amplitude. In a second aspect, an audio enhancement apparatus is provided, including: a first acquisition module, configured to acquire an air conduction audio signal and a bone conduction audio signal;a second acquisition module, configured to acquire a first audio feature of the air conduction audio signal and a second audio feature of the bone conduction audio signal;a feature extraction module, configured to perform feature extraction on the bone conduction audio signal through a trained feature extraction model to obtain a voiceprint feature of the bone conduction audio signal;an amplitude prediction module, configured to input the voiceprint feature of the bone conduction audio signal, the first audio feature, and the second audio feature into a trained amplitude prediction model to obtain a predicted amplitude; andan audio enhancement module, configured to obtain a target audio signal based on the predicted amplitude. In a third aspect, an earphone is provided, including an air conduction microphone, a bone conduction microphone, a memory, and a processor, the air conduction microphone being configured to collect an air conduction (audio) signal, the bone conduction microphone being configured to collect a bone conduction (audio) signal, and the memory storing a computer program, where the processor, when executing the computer program, implements the following steps: acquiring an air conduction audio signal and a bone conduction audio signal;acquiring a first audio feature of the air conduction audio signal and a second audio feature of the bone conduction audio signal;performing feature extraction on the bone conduction audio signal through a trained feature extraction model to obtain a voiceprint feature of the bone conduction audio signal;inputting the voiceprint feature of the bone conduction audio signal, the first audio feature, and the second audio feature into a trained amplitude prediction model to obtain a predicted amplitude; andobtaining a target audio signal based on the predicted amplitude. According to the audio enhancement method and apparatus and the earphone, the air conduction audio signal and the bone conduction audio signal are acquired; the first audio feature of the air conduction audio signal and the second audio feature of the bone conduction audio signal are acquired; feature extraction is performed on the bone conduction audio signal through the trained feature extraction model to obtain the voiceprint feature of the bone conduction audio signal; the voiceprint feature of the bone conduction audio signal, the first audio feature, and the second audio feature are input into the trained amplitude prediction model to obtain the predicted amplitude; and the target audio signal is obtained based on the predicted amplitude. The audio signal collected through air conduction has a wide frequency range, while the audio signal collected through bone conduction is almost not interfered by environmental noise. The primary model is utilized to extract the voicepr