CN-122027930-A - Pickup method and pickup system

CN122027930ACN 122027930 ACN122027930 ACN 122027930ACN-122027930-A

Abstract

The application discloses a pickup method and a pickup system. The method comprises the steps of collecting an audio signal in a preset area based on an initial pickup mode, determining scene characteristics in the preset area according to the audio signal, wherein the scene characteristics are used for representing an acoustic environment of the preset area, selecting a target pickup mode based on the scene characteristics according to a preset matching rule, and switching to the target pickup mode and collecting the audio signal in the preset area if the initial pickup mode and the target pickup mode are different pickup modes. According to the acoustic environment characteristics, the pickup modes are dynamically matched, the audio collection effect under different scenes is improved, so that the stability and flexibility of the pickup effect are ensured, and the audio quality and the user experience are improved.

Inventors

ZHOU HAOCHENG
LIN LIFENG

Assignees

厦门亿联网络技术股份有限公司

Dates

Publication Date: 20260512
Application Date: 20251231

Claims (14)

1. A sound pickup method, comprising: collecting audio signals in a preset area based on an initial pickup mode; Determining scene characteristics in the preset area according to the audio signal, wherein the scene characteristics are used for representing the acoustic environment of the preset area; Selecting a target pickup mode based on the scene characteristics according to a preset matching rule; and if the initial pickup mode is different from the target pickup mode, switching to the target pickup mode and collecting the audio signals in the preset area.
2. The method of claim 1, wherein the scene characteristics include at least one of a number of sound sources, a sound source distribution location, a signal to noise ratio of the preset area.
3. The method of claim 2, wherein the target pickup mode is an omni-directional pickup mode, a directional pickup mode, or a hybrid pickup mode of omni-directional and directional combinations; The selecting a target pickup mode based on the scene features according to a preset matching rule includes: If the signal-to-noise ratio is higher than a first threshold value and a plurality of sound sources with different distribution positions exist, selecting the omnidirectional pickup mode; if the signal to noise ratio is lower than a second threshold, selecting the directional pickup mode; and if the signal-to-noise ratio is between the first threshold value and the second threshold value and the number of sound sources is less than a third threshold value or the sound sources are in a moving state, selecting the mixed pickup mode.
4. The method of claim 2, wherein the determining scene characteristics within the preset area comprises: from the audio signal, the number and/or distribution position of sound sources is determined based on a spatial spectral algorithm.
5. The method of claim 2, wherein the determining scene characteristics within the preset area comprises: Acquiring an image of the preset area; And carrying out image recognition on the image to determine the distribution position of the sound source.
6. The method of claim 5, wherein the determining scene characteristics within the preset zone further comprises: Determining first position information of a sound source based on a spatial spectrum algorithm according to the audio signal; determining second position information of the sound source according to the image recognition; And fusing the first position information and the second position information to obtain the distribution position of the sound source.
7. The method of claim 2, wherein the determining scene characteristics within the preset area comprises: Determining a pickup subarea covered by the directional microphone based on the audio signal acquired by the directional microphone; And determining the number of sound sources in the pickup subarea according to a comparison result of the audio signal energy in the pickup subarea and a threshold value and combining a preset mapping relation between the size of the area and the number of people.
8. The method of claim 1, wherein the switching to the target pickup mode comprises: in the switching process, the energy weight of the audio signal collected by the pickup mode before switching is gradually reduced, and the energy weight of the audio signal collected by the target pickup mode is gradually increased, so that smooth switching is realized.
9. The method of claim 1, further comprising, after switching to the target pickup mode and collecting audio signals within the predetermined area: continuously or periodically calculating a current signal-to-noise ratio based on the audio signal acquired by the target pickup mode; And if the current signal-to-noise ratio is continuously lower than a stability threshold value in a preset time period, triggering to redetermine scene characteristics and selecting a target pickup mode.
10. The method of claim 1, wherein when the target pickup mode is a hybrid pickup mode, after the capturing of the audio signal within the predetermined area, further comprising: mixing the omnidirectional audio signals collected by the omnidirectional microphones and the directional audio signals collected by the directional microphones; and outputting the audio signal after the sound mixing processing.
11. The method of claim 1, wherein the scene characteristics further comprise acoustic events within the preset region, the method further comprising: detecting whether the audio signal contains a preset acoustic event or not; And if so, executing a pickup mode switching strategy or an audio post-processing strategy associated with the acoustic event.
12. A sound pickup system for use with a microphone or other device, the sound pickup system comprising: The audio acquisition module is used for acquiring audio signals in a preset area based on an initial pickup mode; The audio analysis module is used for determining scene characteristics of the preset area according to the audio signals, wherein the scene characteristics are used for representing the acoustic environment of the preset area and comprise at least one of the number of sound sources, the distribution position of the sound sources, the signal to noise ratio of the preset area and whether preset acoustic events exist or not; the system comprises a sound pickup mode selection module, a sound pickup mode selection module and a scene characteristic selection module, wherein the sound pickup mode selection module is used for selecting a target sound pickup mode based on the scene characteristic according to a preset matching rule, and the target sound pickup mode comprises an omnidirectional sound pickup mode, a directional sound pickup mode or a mixed sound pickup mode combining omnidirectional and directional; and the pickup mode execution module is used for switching to the target pickup mode under the condition that the initial pickup mode is different from the target pickup mode, and collecting the audio signals in the preset area based on the mode.
13. A computer readable storage medium storing a computer program, which when executed by a processor implements the method of any one of claims 1 to 11.
14. An electronic device, comprising: At least one omni-directional microphone; At least one directional microphone; Processor, and Memory storing a computer program which, when executed by the processor, implements the method of any one of claims 1 to 11.

Description

Pickup method and pickup system Technical Field The application relates to the technical field of audio acquisition, in particular to a sound pickup method and a sound pickup system. Background In modern video conference systems, audio acquisition and transmission are the core links for guaranteeing efficient performance of a conference, and microphones are key devices for sound signal acquisition, and the pickup performance of the microphones directly affects the communication quality of the conference. Along with the continuous expansion of video conference application scenes, from small offices to medium-to-large conference rooms, a few people can speak fixedly to multiple people and even communicate with each other in a mobile mode, and higher requirements are placed on pickup capability of microphones under different use scenes. In existing conference systems, microphones are generally classified into omni-directional microphones and directional microphones. Currently, in actual conference, sound pickup is generally performed through a preset omni-directional microphone or a preset directional microphone, that is, the sound pickup microphone is relatively fixed. When meeting scale, speaker number distribution or environmental noise sound production change, fixed microphone is difficult to adapt actual acoustic scene, leads to the pickup flexibility poor, and then the incomplete or noise interference's of pickup condition probably appears to influence pickup effect and user experience. In the existing scheme, the pickup mode is generally regarded as a preset and static system parameter, and an effective mechanism for dynamically adjusting the pickup strategy based on a real-time acoustic scene is not proposed yet. Disclosure of Invention The embodiment of the application provides a sound pickup method and a sound pickup system, which are used for improving the suitability of a sound pickup mode to a dynamic acoustic scene. In one aspect, an embodiment of the present application provides an adaptation method, including: collecting audio signals in a preset area based on an initial pickup mode; Determining scene characteristics in the preset area according to the audio signal, wherein the scene characteristics are used for representing the acoustic environment of the preset area; Selecting a target pickup mode based on the scene characteristics according to a preset matching rule; and if the initial pickup mode is different from the target pickup mode, switching to the target pickup mode and collecting the audio signals in the preset area. In one possible design, the scene features include at least one of the number of sound sources, the distribution position of the sound sources, and the signal-to-noise ratio of the preset area. In one possible design, the target pickup mode is an omni-directional pickup mode, a directional pickup mode, or a hybrid pickup mode combining omni-direction and directional pickup; the selecting a target pickup mode according to a preset matching rule includes: If the signal-to-noise ratio is higher than a first threshold value and a plurality of sound sources with different distribution positions exist, selecting the omnidirectional pickup mode; if the signal to noise ratio is lower than a second threshold, selecting the directional pickup mode; and if the signal-to-noise ratio is between the first threshold value and the second threshold value and the number of sound sources is less than a third threshold value or the sound sources are in a moving state, selecting the mixed pickup mode. In one possible design manner, the determining the scene feature in the preset area includes: from the audio signal, the number and/or distribution position of sound sources is determined based on a spatial spectral algorithm. In one possible design manner, the determining the scene feature in the preset area includes: Acquiring an image of the preset area; And carrying out image recognition on the image to determine the distribution position of the sound source. In one possible design manner, the determining the scene feature in the preset area further includes: Determining first position information of a sound source based on a spatial spectrum algorithm according to the audio signal; determining second position information of the sound source according to the image recognition; And fusing the first position information and the second position information to obtain the distribution position of the sound source. In one possible design manner, the determining the scene feature in the preset area includes: Determining a pickup subarea covered by the directional microphone based on the audio signal acquired by the directional microphone; And determining the number of sound sources in the pickup subarea according to a comparison result of the audio signal energy in the pickup subarea and a threshold value and combining a preset mapping relation between the size of the area and the number of peopl