CN-121985251-A - Audio acquisition method, device, electronic equipment and computer readable medium

CN121985251ACN 121985251 ACN121985251 ACN 121985251ACN-121985251-A

Abstract

The application discloses an audio acquisition method, an audio acquisition device, electronic equipment and a computer readable medium, which are applied to the electronic equipment, wherein the electronic equipment comprises an audio acquisition module, the audio acquisition module comprises a central point and a plurality of audio acquisition components wound around the central point, the distance between each audio acquisition component and the central point is the same, and reference lines of acquisition areas corresponding to each audio acquisition component are mutually parallel. The collection of the audio frequency in the pick-up area can be completed without additional hardware equipment. The use cost is reduced, and the convenience of audio acquisition is improved.

Inventors

LI YATONG
LI RONGJIN
CHEN DONGPENG

Assignees

深圳市声扬科技有限公司

Dates

Publication Date: 20260505
Application Date: 20251222

Claims (15)

1. The audio acquisition method is characterized by being applied to electronic equipment, wherein the electronic equipment comprises an audio acquisition module, the audio acquisition module comprises a center point and a plurality of audio acquisition components wound on the center point, the distance between each audio acquisition component and the center point is the same, and reference lines of acquisition areas corresponding to the audio acquisition components are parallel to each other, and the method comprises the following steps: acquiring sampled audio signals acquired by each audio acquisition component; Acquiring a rotation angle and a vertical distance of each audio acquisition component, wherein the rotation angle is used for representing an angle formed by a connecting line of the audio acquisition component and the central point and a reference direction, the reference direction comprises a direction pointing to a reference point on a circumference formed by a plurality of audio acquisition components from the central point, and the vertical distance is used for representing the vertical length of the audio acquisition components and the central point; And carrying out audio recognition on the sampled audio signals acquired by each audio acquisition component, the rotation angle of the audio acquisition component and the vertical distance through an audio acquisition model to obtain target audio signals of a pickup area corresponding to the audio acquisition module.
2. The method of claim 1, wherein the step of performing audio recognition on the sampled audio signal collected by each audio collection assembly, the rotation angle of the audio collection assembly, and the vertical distance by using the audio collection model, before obtaining the target audio signal of the pickup area corresponding to the audio collection module, further comprises: Acquiring a training data set and a reference data set, wherein the training data set comprises a plurality of groups of training data, each group of training data comprises a training audio signal, a training rotation angle and a training vertical distance which are received by each audio acquisition component, the reference data set comprises reference data corresponding to each group of training data, and each reference data comprises a reference audio signal corresponding to each audio acquisition component; Performing audio recognition on each group of training data through an initial model to obtain initial audio signals corresponding to each group of training data; training the initial model based on a first difference of the initial audio signal and a reference audio signal corresponding to the set of training data to reduce the first difference; And taking the trained initial model as the audio acquisition model.
3. The method according to claim 2, wherein the reference audio signal is an audio signal emitted by the sound source in the case where the sound source is in the sound pickup area; in the case where the sound source is not in the sound pickup area, the reference audio signal is a mute audio signal.
4. The method of claim 3, wherein the reference data further includes a reference tag, the reference tag being used to characterize whether the sound source is in the pickup area, the performing audio recognition on each set of training data through an initial model to obtain an initial audio signal corresponding to each set of training data, and the method includes: Performing audio recognition on each group of training data through an initial model to obtain an initial audio signal and an initial tag corresponding to each group of training data; Training the initial model based on a first difference of the initial audio signal and a reference audio signal corresponding to the set of training data to reduce the first difference, comprising: Training the initial model based on a first variance and a second variance to reduce the first variance and the second variance, wherein the second variance is used to characterize a variance of an initial tag from a reference tag corresponding to the set of training data.
5. The method according to claim 2, wherein the performing audio recognition on each set of training data by using the initial model to obtain an initial audio signal corresponding to each set of training data includes: Acquiring a real number spectrum and an imaginary number spectrum corresponding to the training audio signal received by each audio acquisition component; determining an amplitude spectrum feature vector of each set of training data based on a real spectrum and an imaginary spectrum corresponding to each set of training audio signals; based on the real number spectrum and the imaginary number spectrum corresponding to each group of training audio signals, calculating phase difference feature vectors among channels corresponding to each audio acquisition component in the group of training data; based on the training rotation angle and the training vertical distance corresponding to each audio acquisition component, constructing an angle distance feature vector corresponding to the group of training data; And calculating an initial audio signal corresponding to the training data based on the real spectrum, the imaginary spectrum, the amplitude spectrum feature vector, the phase difference feature vector and the angle distance feature vector.
6. The method of claim 5, wherein determining the magnitude spectrum feature vector for each set of training data based on the real spectrum and the imaginary spectrum corresponding to the set of training audio signals comprises: based on the quadratic evolution of the square sum of the real spectrum and the imaginary spectrum corresponding to each group of training audio signals, constructing and obtaining an initial amplitude spectrum feature vector of the group of training data; and carrying out amplitude spectrum compression on the initial amplitude spectrum feature vector to obtain the amplitude spectrum feature vector of the training data.
7. The method of claim 5, wherein calculating the phase difference feature vector between channels corresponding to each audio acquisition component in the set of training data based on the real spectrum and the imaginary spectrum corresponding to each set of training audio signals comprises: Calculating an initial phase spectrum of each set of training data based on a real spectrum and an imaginary spectrum corresponding to each set of training audio signals; And determining phase difference characteristic vectors among channels corresponding to all the audio acquisition components based on the initial phase spectrum.
8. The method of claim 7, wherein determining a phase difference feature vector between channels corresponding to each audio acquisition component based on the initial phase spectrum comprises: Representing initial phase difference feature vectors among channels corresponding to each audio acquisition component in a triangular form through the initial phase spectrum; and carrying out phase encoding processing on the initial phase difference feature vector to obtain phase difference feature vectors among channels corresponding to each audio acquisition component.
9. The method of claim 5, wherein constructing the angular distance feature vector corresponding to the set of training data based on the training rotation angle and the training vertical distance corresponding to each of the audio acquisition components comprises: splicing training rotation angles corresponding to each audio acquisition component to obtain rotation angle feature vectors; Splicing training vertical distances corresponding to each audio acquisition component to obtain a vertical distance feature vector; and splicing the rotation angle feature vector and the vertical distance feature vector to obtain an angle distance feature vector corresponding to the training data.
10. The method of claim 5, wherein the calculating the initial audio signal corresponding to the set of training data based on the real spectrum, the imaginary spectrum, the magnitude spectrum feature vector, the phase difference feature vector, and the angular distance feature vector comprises: Splicing the amplitude spectrum feature vector, the phase difference feature vector and the angle distance feature vector to obtain a first mixed feature vector; Performing encoding and decoding processing on the first mixed feature vector to obtain mask information corresponding to each audio acquisition component; Respectively carrying out noise reduction treatment on the corresponding real number spectrum and the corresponding imaginary number spectrum based on the mask information to respectively obtain a noise reduction real number spectrum and a noise reduction imaginary number spectrum; and calculating an initial audio signal corresponding to the set of training data based on the noise reduction real number spectrum and the noise reduction imaginary number spectrum.
11. The method of claim 10, wherein the calculating the initial audio signal corresponding to the set of training data based on the noise-reduced real spectrum and the noise-reduced imaginary spectrum comprises: Summing and then averaging the multi-channel noise reduction real spectrums in the training data by a pooling layer to obtain a single-channel real spectrum; summing and then averaging the noise reduction imaginary spectrums of multiple channels in the group of training data through a pooling layer to obtain a single-channel imaginary spectrum; and performing inverse Fourier transform based on the single-channel real number spectrum and the single-channel imaginary number spectrum to obtain an initial audio signal corresponding to the set of training data.
12. The method of claim 5, wherein the acquiring real and imaginary spectrums corresponding to the training audio signal received by each of the audio acquisition components comprises: performing Fourier transform on training audio in each group of training audio signals to obtain a first arithmetic expression; transforming the first expression through an Euler formula to obtain a second expression; And taking the real number part in the second expression as a real number spectrum corresponding to the training audio frequency and taking the imaginary number part in the second expression as an imaginary number spectrum corresponding to the training audio frequency.
13. An audio acquisition device, characterized in that is applied to electronic equipment, electronic equipment includes audio acquisition module, audio acquisition module includes the central point and around locating a plurality of audio acquisition components of central point, wherein every audio acquisition component with the distance between the central point is the same, every the reference line of the collection region that audio acquisition component corresponds is parallel to each other, the device includes: the first acquisition unit is used for acquiring the sampled audio signals acquired by each audio acquisition component; A second obtaining unit, configured to obtain a rotation angle of each audio collection component and a vertical distance, where the rotation angle is used to represent an angle formed by a connection line between the audio collection component and the center point and a reference direction, the reference direction includes a direction pointing from the center point to a reference point on a circumference formed by a plurality of audio collection components, and the vertical distance is used to represent a vertical length between the audio collection components and the center point; The recognition unit is used for carrying out audio recognition on the sampled audio signals collected by each audio collection assembly, the rotation angle of the audio collection assembly and the vertical distance through the audio collection model to obtain target audio signals of the pickup area corresponding to the audio collection module.
14. An electronic device, comprising: one or more processors; A memory; One or more applications, wherein the one or more applications are stored in the memory and configured to be executed by the one or more processors, the one or more applications configured to perform the method of any of claims 1-12.
15. A computer readable storage medium, characterized in that the readable storage medium has stored therein a program code, which is callable by a processor for executing the method according to any one of claims 1-12.

Description

Audio acquisition method, device, electronic equipment and computer readable medium Technical Field The present application relates to the field of audio technologies, and in particular, to an audio acquisition method, an apparatus, an electronic device, and a computer readable medium. Background Currently, with the development of electronic information technology, audio signals in a pickup area of a microphone may be collected. However, the current collection of audio signals in the pickup area is costly and less convenient. Disclosure of Invention The application provides an audio acquisition method, an audio acquisition device, electronic equipment and a computer readable medium, so as to improve the defects. In a first aspect, an embodiment of the present application provides an audio collection method, applied to an electronic device, where the electronic device includes an audio collection module, where the audio collection module includes a center point and a plurality of audio collection components wound around the center point, where distances between each audio collection component and the center point are the same, reference lines of collection areas corresponding to each audio collection component are parallel to each other, the method includes obtaining a sampled audio signal collected by each audio collection component, obtaining a rotation angle of each audio collection component, and obtaining a vertical distance, where the rotation angle is used to represent an angle formed by a connection line between the audio collection component and the center point and a reference direction, the reference direction includes a direction pointing from the center point to a reference point on a circumference formed by a plurality of audio collection components, and the vertical distance is used to represent a vertical length of the audio collection component and the center point, and identifying, by an audio collection model, the sampled audio signal collected by each audio collection component, the rotation angle of the audio collection component, and the vertical distance, and the audio collection area corresponding to the audio collection module. Optionally, in some embodiments, before performing audio recognition on the sampled audio signal collected by each audio collection component, the rotation angle of the audio collection component, and the vertical distance of the audio collection component to obtain a target audio signal of a pickup area corresponding to the audio collection component, the method further includes obtaining a training data set and a reference data set, wherein the training data set includes a plurality of sets of training data, each set of training data includes a training audio signal received by each audio collection component, a training rotation angle, and a training vertical distance, the reference data set includes reference data corresponding to each set of training data, each reference data includes a reference audio signal corresponding to each audio collection component, performing audio recognition on each set of training data by an initial model to obtain an initial audio signal corresponding to each set of training data, training the initial model based on a first difference between the initial audio signal and the reference audio signal corresponding to the set of training data to reduce the first difference, and taking the initial model after training as the audio collection model. Optionally, in some embodiments, the reference audio signal is an audio signal emitted by the sound source when the sound source is in the sound pickup area, and the reference audio signal is a mute audio signal when the sound source is not in the sound pickup area. Optionally, in some embodiments, the reference data further includes a reference tag, the reference tag is used for representing whether the sound source is in the pickup area, the audio recognition is performed on each set of training data through an initial model to obtain an initial audio signal corresponding to each set of training data, the audio recognition is performed on each set of training data through the initial model to obtain an initial audio signal corresponding to each set of training data and an initial tag, the initial model is trained based on a first difference between the initial audio signal and the reference audio signal corresponding to the set of training data to reduce the first difference, and the initial model is trained based on a first difference and a second difference to reduce the first difference and the second difference, wherein the second difference is used for representing a difference between the initial tag and the reference tag corresponding to the set of training data. Optionally, in some embodiments, the performing audio recognition on each set of training data through the initial model to obtain an initial audio signal corresponding to each set of training data includes obtaining