KR-20260066768-A - Dual-filter Kalman method for eliminating acoustic feedback in a hands-free karaoke environment

KR20260066768AKR 20260066768 AKR20260066768 AKR 20260066768AKR-20260066768-A

Abstract

A method performed by at least one processor comprises the steps of: receiving an output microphone signal generated by an omnidirectional microphone, wherein the output signal comprises a mixed signal including a user voice signal, an audio playback signal, and a voice reference signal of the user voice, and wherein the mixed signal is output from a loudspeaker; inputting the output microphone signal and the voice reference signal into a first Kalman filter to generate a first filtered signal; inputting the output signal and the audio playback signal into a second Kalman filter to generate a second filtered signal; estimating the user voice signal by subtracting the first filtered signal and the second filtered signal from the output microphone signal to generate a voice estimation signal of the user voice; and outputting the voice estimation signal through the loudspeaker.

Inventors

장 하오
위 둥

Assignees

텐센트 아메리카 엘엘씨

Dates

Publication Date: 20260512
Application Date: 20250314
Priority Date: 20240701

Claims (20)

In a method performed by at least one processor, A step of receiving an output microphone signal generated by an omnidirectional microphone - said output signal includes a mixed signal including a user voice signal, an audio playback signal, and a voice reference signal of the user voice, said mixed signal is output from a loudspeaker -; A step of inputting the output microphone signal and the voice reference signal into a first Kalman filter to generate a first filtered signal; A step of inputting the output signal and the audio playback signal into a second Kalman filter to generate a second filtered signal; A step of estimating the user voice signal by subtracting the first filtered signal and the second filtered signal from the output microphone signal to generate a voice estimation signal of the user voice; and A method comprising the step of outputting the voice estimation signal through the loudspeaker.
In paragraph 1, A method comprising a previous voice estimation signal that is delayed by a system delay and multiplied by an amplifier gain, wherein the voice reference signal is a voice reference signal.
In paragraph 1, A method further comprising the step of updating the first Kalman filter and the second Kalman filter based on the second estimated signal.
In paragraph 3, The step of updating the first Kalman filter and the second Kalman filter is, A method further comprising the step of determining the ratio between the square of the voice reference signal, the square of the audio playback signal, and the sum of the squares of the voice reference signal.
In paragraph 4, The step of updating the first Kalman filter and the second Kalman filter is, A step of determining a first transition factor of the first Kalman filter based on the sum of a global transition factor and the value obtained by multiplying the ratio by the value obtained by subtracting the global transition factor from 1; and A method further comprising the step of determining a second transition coefficient of the second Kalman filter based on the sum of the global transition coefficient and the value obtained by multiplying the value obtained by subtracting the global transition coefficient from 1 by the value obtained by subtracting the ratio from 1.
In paragraph 5, The step of updating the first Kalman filter and the second Kalman filter is, A step of updating the first gain of the first Kalman filter and the first state estimation error covariance of the first Kalman filter based on the first transition coefficient; and A method further comprising the step of updating the second gain of the second Kalman filter and the second state estimation covariance of the second Kalman filter based on the second transition coefficient.
In paragraph 1, The above omnidirectional microphone is a hands-free microphone, method.
In the device, At least one memory configured to store program code; and It includes at least one processor configured to read the program code and operate as directed by the program code, and the program code is, A receiving code configured such that at least one processor receives an output microphone signal generated by an omnidirectional microphone—the output signal comprises a mixed signal including a user voice signal, an audio playback signal, and a voice reference signal of the user voice, and the mixed signal is output from a loudspeaker—; A first input code configured such that the above-mentioned at least one processor inputs the output microphone signal and the voice reference signal into a first Kalman filter to generate a first filtered signal; A second input code configured such that the above-mentioned at least one processor inputs the output signal and the audio playback signal into a second Kalman filter to generate a second filtered signal; An estimation code configured such that the above at least one processor estimates the user voice signal by subtracting the first filtered signal and the second filtered signal from the output microphone signal to generate a voice estimation signal of the user voice; and A device comprising an output code configured such that at least one processor outputs the voice estimation signal through the loudspeaker.
In paragraph 8, A device comprising a previous voice estimation signal that is delayed by a system delay and multiplied by an amplifier gain, wherein the voice reference signal is a device.
In paragraph 8, The above program code is, A device comprising an update code further configured such that at least one processor updates the first Kalman filter and the second Kalman filter based on a second estimation signal.
In Paragraph 10, The above update code is, A device further comprising a first determination code configured such that at least one processor determines the ratio between the square of the voice reference signal, the square of the audio playback signal, and the sum of the squares of the voice reference signal.
In Paragraph 11, The above update code is, A second determination code configured such that the at least one processor determines a first transition coefficient of the first Kalman filter based on the sum of a global transition coefficient and the value obtained by multiplying the ratio by the value obtained by subtracting the global transition coefficient from 1; and A device further comprising a third determination code configured such that at least one processor determines a second transition coefficient of the second Kalman filter based on the sum of the global transition coefficient and the value obtained by multiplying the value obtained by subtracting the global transition coefficient from 1 by the value obtained by subtracting the ratio from 1.
In Paragraph 12, The above update code is, A first filter update code configured to cause the at least one processor to update the first gain of the first Kalman filter and the first state estimation error covariance of the first Kalman filter based on the first transition coefficient; and An apparatus further comprising a second filter update code configured to cause at least one processor to update a second gain of the second Kalman filter and a second state estimation covariance of the second Kalman filter based on the second transfer coefficient.
In paragraph 8, The above omnidirectional microphone is a device that is a hands-free microphone.
In a non-transient computer-readable medium in which instructions are stored, when the instructions are executed by a processor, the processor, A step of receiving an output microphone signal generated by an omnidirectional microphone - said output signal includes a mixed signal including a user voice signal, an audio playback signal, and a voice reference signal of the user voice, said mixed signal is output from a loudspeaker -; A step of inputting the output microphone signal and the voice reference signal into a first Kalman filter to generate a first filtered signal; A step of inputting the output signal and the audio playback signal into a second Kalman filter to generate a second filtered signal; A step of estimating the user voice signal by subtracting the first filtered signal and the second filtered signal from the output microphone signal to generate a voice estimation signal of the user voice; and A non-transient computer-readable medium that enables a method including the step of outputting the voice estimation signal through the loudspeaker.
In paragraph 15, A non-transient computer-readable medium comprising a previous voice estimation signal that is delayed by a system delay and multiplied by an amplifier gain, wherein the voice reference signal is a voice reference signal.
In paragraph 15, A non-transient computer-readable medium further comprising the step of updating the first Kalman filter and the second Kalman filter based on a second estimated signal.
In Paragraph 17, The step of updating the first Kalman filter and the second Kalman filter is, A non-transient computer-readable medium further comprising the step of determining the ratio between the square of the voice reference signal, the square of the audio playback signal, and the sum of the squares of the voice reference signal.
In Paragraph 17, The step of updating the first Kalman filter and the second Kalman filter is, A step of determining a first transition coefficient of the first Kalman filter based on the sum of a global transition coefficient and the value obtained by multiplying the ratio by the value obtained by subtracting the global transition coefficient from 1; and A non-transient computer-readable medium further comprising the step of determining a second transition coefficient of the second Kalman filter based on the sum of the global transition coefficient and the value obtained by subtracting the global transition coefficient from 1 and the value obtained by subtracting the ratio from 1.
In paragraph 19, the step of updating the first Kalman filter and the second Kalman filter is, A step of updating the first gain of the first Kalman filter and the first state estimation error covariance of the first filter based on the first transition coefficient; and A non-transient computer-readable medium further comprising the step of updating the second gain of the second Kalman filter and the second state estimation covariance of the second Kalman filter based on the second transition coefficient.

Description

Dual-filter Kalman method for eliminating acoustic feedback in a hands-free karaoke environment This application is based on and claims priority to U.S. Patent Application No. 18/760,813 filed on July 1, 2024, all of which are incorporated herein by reference. The present disclosure relates to a dual-filter Kalman method for removing acoustic feedback in a hands-free karaoke environment. Hands-free karaoke systems represent a modern evolution in the field of recreational singing, allowing users to sing without holding a microphone. These systems typically capture the singer's voice using microphones mounted or built into the surrounding environment, or wearable microphones. This setup enables a more immersive and interactive singing experience by providing users with the freedom to communicate more with the audience and use expressive gestures without being constrained by a handheld microphone. While hands-free karaoke systems offer significant benefits by enhancing performer mobility and interaction, they present specific challenges related to audio quality and system complexity. In hands-free setups, capturing clear audio can be particularly challenging in noisy environments. Since the microphone is not positioned near the mouth, the system must effectively separate the singer's voice from reverberation, background noise, and music playback. The lack of directional control provided by handheld microphones increases the risk of feedback and echo, which can degrade sound quality. Effectively managing this requires sophisticated audio processing technology. The system requires advanced signal processing algorithms to separate vocals from the playback vocals and music. This complexity is amplified by the need for real-time processing to reduce latency, which is critical for live performance environments. Overcoming these challenges requires sophisticated audio processing solutions and careful system design to ensure that the quality of the karaoke experience is not degraded while securing the benefits of hands-free performance. According to one aspect of the present disclosure, a method performed by at least one processor comprises: receiving an output microphone signal generated by an omnidirectional microphone—the output signal comprises a mixed signal including a user voice signal, an audio playback signal, and a voice reference signal of the user voice, and the mixed signal is output from a loudspeaker—; inputting the output microphone signal and the voice reference signal into a first Kalman filter to generate a first filtered signal; inputting the output signal and the audio playback signal into a second Kalman filter to generate a second filtered signal; estimating the user voice signal by subtracting the first filtered signal and the second filtered signal from the output microphone signal to generate a voice estimation signal of the user voice; and outputting the voice estimation signal through a loudspeaker. According to one aspect of the present disclosure, the device comprises: at least one memory configured to store program code; and at least one processor configured to read program code and operate as directed by program code, wherein the program code comprises: a receiving code configured to allow at least one processor to receive an output microphone signal generated by an omnidirectional microphone—the output signal comprises a mixed signal including a user voice signal, an audio playback signal, and a voice reference signal of the user voice, and the mixed signal is output from a loudspeaker—; a first input code configured to allow at least one processor to input the output microphone signal and the voice reference signal into a first Kalman filter to generate a first filtered signal; a second input code configured to allow at least one processor to input the output signal and the audio playback signal into a second Kalman filter to generate a second filtered signal; an estimation code configured to allow at least one processor to estimate the user voice signal by subtracting the first filtered signal and the second filtered signal from the output microphone signal to generate a voice estimation signal of the user voice; and an output code configured to allow at least one processor to output the voice estimation signal through a loudspeaker. According to one aspect of the present disclosure, in a non-transient computer-readable medium in which instructions are stored, the instructions are executed by a processor, wherein the processor performs a method comprising the steps of: receiving an output microphone signal generated by an omnidirectional microphone—the output signal comprises a mixed signal including a user voice signal, an audio playback signal, and a voice reference signal of the user voice, and the mixed signal is output from a loudspeaker—; inputting the output microphone signal and the voice reference signal into a first Kalman filter to generate a first filtered signal; inputti