US-12621389-B2 - Conference terminal and echo cancellation method

US12621389B2US 12621389 B2US12621389 B2US 12621389B2US-12621389-B2

Abstract

A conference terminal, an echo cancellation method and apparatus, and a sound pickup device are provided. The conference terminal comprises a loudspeaker and at least one omni-directional microphone group. The omni-directional microphone group comprises at least two omni-directional microphones. According to the conference terminal, a weight vector of a beam former enabling the at least two omni-directional microphones to form a dipole beam mode is determined, so that an echo signal in the direction of the loudspeaker is suppressed, and a sound signal in a target direction is enhanced. The sound signal is collected by means of the omni-directional microphones. For the at least two omnidirectional microphones, the weighted sum of at least two sound signals is determined according to the weight vector as an echo cancellation signal.

Inventors

Cheng Xue
Weilong HUANG
Jinwei Feng

Assignees

Zhejiang Alibaba Robot Co., Ltd.

Dates

Publication Date: 20260505
Application Date: 20211022

Claims (20)

1 . A conference terminal, comprising: a loudspeaker; at least one omnidirectional microphone set comprising at least two omnidirectional microphones; a processor; and a memory for storing a program which implements an echo cancellation method, the conference terminal, after being powered up and running the program for the echo cancellation method through the processor, performing the following steps: determining a weight vector of a beamformer that enables the at least two omnidirectional microphones to form a dipole beam pattern, wherein the weight vector is configured to suppress an echo signal in a direction of the loudspeaker and enhance a sound signal in a target direction; acquiring the sound signal through the at least two omnidirectional microphones, wherein the sound signal comprises the echo signal in the direction of the loudspeaker and the sound signal in the target direction; and determining, for the omnidirectional microphone set, a weighted sum of at least two sound signals acquired by the at least two omnidirectional microphones based on the weight vector of the beamformer, wherein the weighted sum is an echo-canceled sound signal.
2 . The conference terminal according to claim 1 , wherein the number of the at least two omnidirectional microphones is two.
3 . The conference terminal according to claim 1 , wherein the number of the at least one omnidirectional microphone set is three, three omnidirectional microphone sets are arranged centered on the loudspeaker, the three omnidirectional microphone sets covering target sound sources in all directions.
4 . An echo cancellation method for a conference terminal, wherein the conference terminal comprises: a loudspeaker and at least one omnidirectional microphone set comprising at least two omnidirectional microphones; the echo cancellation method comprising: determining a weight vector of a beamformer that enables the at least two omnidirectional microphones to form a dipole beam pattern, wherein the weight vector is configured to suppress an echo signal in a direction of the loudspeaker and enhance a sound signal in a target direction; acquiring the sound signal through the at least two omnidirectional microphones, wherein the sound signal comprises the echo signal in the direction of the loudspeaker and the sound signal in the target direction; and determining, for the omnidirectional microphone set, a weighted sum of at least two sound signals acquired by the at least two omnidirectional microphones based on the weight vector of the beamformer, wherein the weighted sum is an echo-canceled sound signal.
5 . The method according to claim 4 , wherein determining the weight vector of the beamformer that enables the at least two omnidirectional microphones to form the dipole beam pattern, comprises: determining a noise covariance matrix and a steering vector for the conference terminal; and determining the weight vector based on the noise covariance matrix and the steering vector by means of a Minimum Variance Distortion-free Response (MVDR) beamforming algorithm.
6 . The method according to claim 5 , wherein the noise covariance matrix is determined by: playing data of a preset sound when the conference terminal is started; and determining a speech autocorrelation matrix as the noise covariance matrix based on the sound signal comprising the preset sound acquired by the at least two omnidirectional microphones.
7 . The method according to claim 6 , further comprising: updating the autocorrelation matrix as an updated noise covariance matrix based on the sound signal comprising a conference sound acquired by the at least two omnidirectional microphones, if it is detected that it is mute in the target direction during the operation of the conference terminal.
8 . The method according to claim 4 , further comprising: determining a signal-to-noise ratio of the at least two omnidirectional microphones if a movement of a target sound source is detected; and selecting, according to the signal-to-noise ratio, the echo-canceled sound signal corresponding to the at least two omnidirectional microphones in a target omnidirectional microphone set.
9 . A sound pickup device, comprising: a loudspeaker; at least one omnidirectional microphone set comprising at least two omnidirectional microphones; a processor; and a memory for storing a program which implements the echo cancellation method according to claim 4 , the terminal being powered up and running the program for the echo cancellation method through the processor.
10 . The sound pickup device according to claim 9 , wherein the program implements the following steps: determine a noise covariance matrix and a steering vector for the conference terminal; and determine the weight vector based on the noise covariance matrix and the steering vector by means of a Minimum Variance Distortion-free Response (MVDR) beamforming algorithm.
11 . The sound pickup device according to claim 9 , wherein the program implements the following steps: play data of a preset sound when the conference terminal is started; and determine a speech autocorrelation matrix as the noise covariance matrix based on the sound signal comprising the preset sound acquired by the at least two omnidirectional microphones.
12 . The sound pickup device according to claim 9 , wherein the program implements the following steps: update the autocorrelation matrix as an updated noise covariance matrix based on the sound signal comprising a conference sound acquired by the at least two omnidirectional microphones, if it is detected that it is mute in the target direction during the operation of the conference terminal.
13 . The sound pickup device according to claim 9 , wherein the program implements the following steps: determine a signal-to-noise ratio of the at least two omnidirectional microphones if a movement of a target sound source is detected; and select, according to the signal-to-noise ratio, the echo-canceled sound signal corresponding to the at least two omnidirectional microphones in a target omnidirectional microphone set.
14 . A computer program, comprising: computer-readable codes which, when run on a computing processing device, cause the computing processing device to execute the echo cancellation method according to claim 4 .
15 . A non-transitory computer-readable medium storing the computer program of claim 14 .
16 . The computer program according to claim 14 , wherein the computer-readable codes, when run on a computing processing device, cause the computing processing device to execute the following steps: determine a noise covariance matrix and a steering vector for the conference terminal; and determine the weight vector based on the noise covariance matrix and the steering vector by means of a Minimum Variance Distortion-free Response (MVDR) beamforming algorithm.
17 . The computer program according to claim 14 , wherein the computer-readable codes, when run on a computing processing device, cause the computing processing device to execute the following steps: play data of a preset sound when the conference terminal is started; and determine a speech autocorrelation matrix as the noise covariance matrix based on the sound signal comprising the preset sound acquired by the at least two omnidirectional microphones.
18 . The computer program according to claim 14 , wherein the computer-readable codes, when run on a computing processing device, cause the computing processing device to execute the following steps: update the autocorrelation matrix as an updated noise covariance matrix based on the sound signal comprising a conference sound acquired by the at least two omnidirectional microphones, if it is detected that it is mute in the target direction during the operation of the conference terminal.
19 . The computer program according to claim 14 , wherein the computer-readable codes, when run on a computing processing device, cause the computing processing device to execute the following steps: determine a signal-to-noise ratio of the at least two omnidirectional microphones if a movement of a target sound source is detected; and select, according to the signal-to-noise ratio, the echo-canceled sound signal corresponding to the at least two omnidirectional microphones in a target omnidirectional microphone set.
20 . The non-transitory computer-readable medium according to claim 14 , wherein the non-transitory computer-readable medium stores the computer program.

Description

CROSS REFERENCE TO RELATED APPLICATIONS The present application is a U.S. national phase of International PCT Patent Application No. PCT/CN2021/125763, filed Oct. 22, 2021, which is incorporated herein by reference in its entirety. TECHNICAL FIELD The present application relates to a field of speech processing technology, in particular, to a conference terminal, an echo cancellation method and apparatus, and a sound pickup device. BACKGROUND Internet technology brings about changes in people's communication tools and cloud-based audio-visual conferencing systems are gradually popularized. Echoes may be produced during use of an audio-visual conference terminal, resulting in a speaker being able to hear his/her own voice, thereby affecting the conferencing effects. As such, echo cancellation in video conferencing environment has always been a hot topic for research. SUMMARY The present application provides a conference terminal. The present application additionally provides an echo cancellation method and apparatus, and a sound pickup device. The present application provides a conference terminal, including: a loudspeaker;at least one omnidirectional microphone set including at least two omnidirectional microphones;a processor; anda memory for storing a program which implements a method of echo cancellation, the terminal, after being powered up and running the program for the method through the processor, performing following steps:determining a weight vector of a beamformer that enables the at least two omnidirectional microphones to form a dipole beam pattern to suppress an echo signal in a direction of the loudspeaker and enhance a sound signal in a target direction;acquiring the sound signal through the omnidirectional microphones;determining, for the omnidirectional microphone set, a weighted sum of at least two sound signals corresponding to the at least two omnidirectional microphones based on the weight vector, as an echo-canceled sound signal. Optionally, the at least two omnidirectional microphones are two omnidirectional microphones. Optionally, the at least one omnidirectional microphone set is three omnidirectional microphone sets centered on the loudspeaker. The three omnidirectional microphone sets cover target sound sources in all directions. The present application further provides an echo cancellation method for a conference terminal. The conference terminal includes: a loudspeaker and at least one omnidirectional microphone set including at least two omnidirectional microphones. The method includes: determining a weight vector of a beamformer that enables the at least two omnidirectional microphones to form a dipole beam pattern to suppress an echo signal in a direction of the loudspeaker and enhance a sound signal in a target direction;acquiring the sound signal through the omnidirectional microphones;determining, for the omnidirectional microphone set, a weighted sum of at least two sound signals corresponding to the at least two omnidirectional microphones based on the weight vector of the beamformer, as an echo-canceled sound signal. Optionally, determining the weight vector of the beamformer that enables the at least two omnidirectional microphones to form the dipole beam pattern includes: determining a noise covariance matrix and a steering vector for the conference terminal;determining the weight vector based on the noise covariance matrix and the steering vector by means of a Minimum Variance Distortion-free Response (MVDR) beamforming algorithm. Optionally, the noise covariance matrix is determined in following manner: playing data of a preset sound when the conference terminal is started;determining a speech autocorrelation matrix as the noise covariance matrix based on the sound signal including the preset sound acquired by the omnidirectional microphones. Optionally, it is further included that: updating the autocorrelation matrix as an updated noise covariance matrix based on the sound signal including a conference sound acquired by the omnidirectional microphones, if it is detected that it is mute in the target direction during the conference terminal is operating. Optionally, it is further included that: determining a signal-to-noise ratio of the omnidirectional microphones if a movement of a target sound source is detected;selecting, according to the signal-to-noise ratio, the echo-canceled sound signal corresponding to the at least two omnidirectional microphones in a target omnidirectional microphone set. The present application further provides an echo cancellation apparatus which is located at a conference terminal. The conference terminal includes: a loudspeaker and at least one omnidirectional microphone set including at least two omnidirectional microphones. The apparatus includes: a parameter determination unit for determining a weight vector of a beamformer that enables the at least two omnidirectional microphones to form a dipole beam pattern to suppress an echo s