CN-121983018-A - Anti-interference optimization method and system for live broadcast audio of complex sound field

CN121983018ACN 121983018 ACN121983018 ACN 121983018ACN-121983018-A

Abstract

The invention provides a method and a system for optimizing anti-interference of live audio in a complex sound field, which relate to the technical field of audio signal processing. Based on the direction of the interference source and the time-frequency evolution track, dynamically generating a time-varying filter coefficient, carrying out self-adaptive attenuation on the interference frequency band in the time spectrum, and finally outputting purified audio through inverse transformation. The invention realizes the accurate positioning and dynamic suppression of the spatial interference source in the live broadcast, and effectively improves the definition and the anti-interference capability of the live broadcast audio.

Inventors

BAO TIANYANG
ZHOU SHICHAO

Assignees

北京弘晟研学教育咨询有限公司

Dates

Publication Date: 20260505
Application Date: 20260317

Claims (10)

1. The method for optimizing the anti-interference of the live audio of the complex sound field is characterized by comprising the following steps of: Collecting multichannel audio data of a teaching live broadcast scene, and performing time-frequency conversion on the multichannel audio data to obtain time spectrum data; calculating the phase difference of each channel in the time spectrum data, and determining the space azimuth angle of each time frequency point according to the phase difference to obtain time frequency space distribution data; The method comprises the steps of counting energy accumulation distribution of each space azimuth interval in time-frequency space distribution data, comparing the energy accumulation distribution with a preset teaching sound source space area, and identifying the interference source azimuth outside the teaching sound source space area to obtain an interference source space positioning result; According to the interference source space positioning result, extracting a time-frequency point set belonging to the interference source azimuth in the time-frequency space distribution data, establishing an interference source time-frequency evolution track, and carrying out dynamic classification marking on time-frequency points in the time-frequency spectrum data according to the interference source time-frequency evolution track to obtain a time-frequency classification label; Generating a time-varying filter coefficient synchronous with the evolution of the interference source according to time-varying characteristics of the time-frequency classification tag and the time-frequency evolution track of the interference source, and applying the time-varying filter coefficient to the time-frequency spectrum data to carry out self-adaptive attenuation on the interference frequency band to obtain time-frequency spectrum data after filtering; And performing inverse time-frequency conversion on the filtered time spectrum data to generate an audio output signal after interference elimination.
2. The method of claim 1, wherein collecting multi-channel audio data of a live teaching scene, performing time-frequency transformation on the multi-channel audio data, and obtaining time-frequency spectrum data comprises: Synchronously acquiring audio signals of a teaching live broadcast scene through a plurality of spatially distributed audio acquisition units, recording the spatial position coordinates of each audio acquisition unit, and establishing the corresponding relation between the audio acquisition units and the spatial position coordinates to obtain multichannel audio data with spatial position identifiers; Carrying out windowing and framing treatment on the multichannel audio data, dividing continuous audio data into a plurality of audio frames overlapped in time, and respectively carrying out short-time Fourier transform on each audio frame to obtain a frequency domain complex spectrum corresponding to each audio frame; And extracting amplitude values and phase values of all frequency points in the frequency domain complex spectrum, and fusing the amplitude values and the phase values with time positions and frequency positions corresponding to the audio frames and space position coordinates in the channel phase space data to obtain time spectrum data containing time-frequency amplitude, time-frequency phase and space position information.
3. The method of claim 1, wherein calculating a phase difference for each channel in the time-frequency spectrum data, determining a spatial azimuth angle for each time-frequency point based on the phase difference, and obtaining time-frequency spatial distribution data comprises: Extracting complex frequency spectrum values of each channel at each time-frequency point from the time-frequency spectrum data, and carrying out phase calculation on the complex frequency spectrum values to obtain phase angle values of each time-frequency point of each channel; selecting a channel with a known space position coordinate as a phase reference channel, and calculating the phase angle value difference between each non-reference channel and the phase reference channel at the same time frequency point to obtain inter-channel phase difference data comprising a time frequency point mark and a phase difference value; Extracting phase difference values of time-frequency points in the inter-channel phase difference data, calculating propagation path differences of sound waves from a sound source to different audio acquisition units by combining the spatial baseline distance and the sound wave propagation speed between the audio acquisition units, and calculating spatial incidence angle values of the sound waves relative to the normal direction of the audio acquisition unit array according to the spatial geometric position relation between the propagation path differences and the audio acquisition unit array; and carrying out association binding on the spatial incidence angle value of each time-frequency point and the time-frequency point identification in the inter-channel phase difference data, and constructing a corresponding relation between the time-frequency point identification and the spatial incidence angle value to obtain time-frequency spatial distribution data.
4. The method of claim 1, wherein counting energy accumulation distributions for each spatial azimuth interval in the time-frequency spatial distribution data, comparing the energy accumulation distributions with a preset teaching sound source spatial region, identifying interference source azimuth outside the teaching sound source spatial region, and obtaining an interference source spatial positioning result comprises: extracting a space incidence angle value and a time spectrum amplitude value of each time-frequency point from time-frequency space distribution data, dividing the space incidence angle value into a plurality of space azimuth intervals according to preset angle intervals, accumulating the time spectrum amplitude values of all time-frequency points in each space azimuth interval, and counting to obtain the energy accumulation distribution of each space azimuth interval; Constructing a time-frequency energy matrix for the energy accumulation distribution of each space azimuth interval, calculating the energy variance of the time-frequency energy matrix in the time dimension and the energy entropy value of the time-frequency energy matrix in the frequency dimension, generating a time-frequency stability index of each space azimuth interval, and marking the time-frequency stability index value for each space azimuth interval in the energy accumulation distribution; The method comprises the steps of obtaining a spatial azimuth angle range corresponding to a preset teaching sound source spatial region, comparing the spatial position of energy accumulation distribution marked with time-frequency stability index values with that of the preset teaching sound source spatial region, and identifying a spatial azimuth interval which is located outside the spatial azimuth angle range and is lower than a preset stability threshold value in the energy accumulation distribution as an interference source azimuth; And extracting a space azimuth identification and a time-frequency stability index value of the interference source azimuth to obtain an interference source space positioning result.
5. The method of claim 1, wherein extracting a set of time-frequency points belonging to the direction of the interference source in the time-frequency space distribution data according to the interference source space positioning result, establishing an interference source time-frequency evolution track, and dynamically classifying and marking the time-frequency points in the time-frequency spectrum data according to the interference source time-frequency evolution track, and obtaining the time-frequency classification label comprises: According to the space azimuth identification in the interference source space positioning result, retrieving time-frequency points matched with the space azimuth identification in the time-frequency space distribution data to obtain a time-frequency point set belonging to the interference source azimuth; Time-frequency points in the time-frequency point set are sequenced according to time coordinates, frequency coordinate differences of the time-frequency points at adjacent moments are calculated, adjacent time-frequency points with the frequency coordinate differences smaller than a preset continuity threshold are sequentially connected to form a time-frequency evolution path, frequency change rates of all the time-frequency evolution paths are counted, and all the time-frequency evolution paths are summarized to establish an interference source time-frequency evolution track; Extracting termination frequency coordinates and frequency change rates of each time-frequency evolution path in the time-frequency evolution track of the interference source, calculating a frequency extension quantity, adding the termination frequency coordinates and the frequency extension quantity to obtain a predicted frequency coordinate, positioning a time-frequency point corresponding to the predicted frequency coordinate in time-frequency spectrum data, and marking the time-frequency point as a predicted interference time-frequency point; And extracting a time-frequency point covered by the time-frequency evolution track of the interference source as a historical interference time-frequency point, combining the historical interference time-frequency point with the predicted interference time-frequency point, marking the combined time-frequency point as the interference time-frequency point, marking the rest time-frequency points as effective time-frequency points, and obtaining the time-frequency classification label.
6. The method of claim 1, wherein generating time-varying filter coefficients synchronized with the evolution of the interferer based on time-varying characteristics of the time-frequency classification tags and the trajectory of the time-frequency evolution of the interferer, applying the time-varying filter coefficients to the time-spectrum data to adaptively attenuate the interference frequency band, and obtaining filtered time-spectrum data comprises: extracting the frequency change rate of a time-frequency evolution path in a continuous time window from an interference source time-frequency evolution track as a time-varying feature, and extracting a time-frequency evolution path with the time-varying feature exceeding a preset rate threshold value as a jump interference path; Extracting frequency coordinates of interference time-frequency points in a time window corresponding to the transition interference path from the time-frequency classification labels, calculating frequency drift amount corresponding to the frequency coordinates according to time-varying characteristics of the transition interference path, and superposing the frequency drift amount to the frequency coordinates to obtain a predicted interference frequency range; extracting a first amplitude value sequence of a time-frequency point in a predicted interference frequency range and a second amplitude value sequence of an interference time-frequency point on a jump interference path, calculating a correlation value between the first amplitude value sequence and the second amplitude value sequence, and screening out time-frequency points with consistent amplitude evolution trend according to the correlation value to form a confirmed interference coverage area; extracting and confirming the spectrum characteristic difference between the time frequency points in the interference coverage area and the adjacent time frequency points outside the boundary, setting the inhibition intensity for the time frequency points in the interference coverage area according to the spectrum characteristic difference, and generating a time-varying filter coefficient synchronous with the evolution of the interference source; And multiplying the time-varying filtering coefficient with the complex value of the corresponding time-frequency point in the time-frequency spectrum data to obtain the time-frequency spectrum data after filtering.
7. The method of claim 1, wherein performing inverse time-frequency transformation on the filtered time-frequency spectrum data to generate the interference-canceled audio output signal comprises: extracting complex values of all time-frequency points from the filtered time-frequency spectrum data, and organizing the complex values according to frequency dimensions and time dimensions to form a two-dimensional complex matrix; performing inverse Fourier transform on the two-dimensional complex matrix along the frequency dimension, converting the frequency domain complex value into a time domain real value, and generating a plurality of time domain audio frames; extracting the length of an overlapping region between adjacent time domain audio frames, and distributing weighting coefficients to each sample point in the overlapping region according to the length of the overlapping region to carry out weighted overlapping; And splicing the weighted and overlapped sample points in the overlapping area with the sample points in the non-overlapping area of each time domain audio frame according to the time sequence, and generating an audio output signal after interference elimination.
8. A complex sound field live audio anti-interference optimization system for implementing the method of any of the preceding claims 1-7, comprising: The audio acquisition unit is used for acquiring multi-channel audio data of the teaching live broadcast scene, and performing time-frequency conversion on the multi-channel audio data to obtain time spectrum data; the time-frequency conversion unit is used for calculating the phase difference of each channel in the time-frequency spectrum data, and determining the space azimuth angle of each time-frequency point according to the phase difference to obtain time-frequency space distribution data; the space positioning unit is used for counting the energy accumulation distribution of each space azimuth interval in the time-frequency space distribution data, comparing the energy accumulation distribution with a preset teaching sound source space area, and identifying the interference source azimuth outside the teaching sound source space area to obtain an interference source space positioning result; the interference identification unit is used for extracting a time-frequency point set belonging to the interference source direction in the time-frequency space distribution data according to the interference source space positioning result, establishing an interference source time-frequency evolution track, and carrying out dynamic classification marking on time-frequency points in the time-frequency spectrum data according to the interference source time-frequency evolution track to obtain a time-frequency classification label; The track analysis unit is used for generating a time-varying filter coefficient synchronous with the evolution of the interference source according to the time-varying characteristics of the time-frequency classification tag and the time-frequency evolution track of the interference source, and applying the time-varying filter coefficient to the time-frequency spectrum data to carry out self-adaptive attenuation on the interference frequency band so as to obtain time-frequency spectrum data after filtering; and the time-frequency marking unit is used for performing inverse time-frequency conversion on the filtered time-frequency spectrum data to generate an audio output signal after interference elimination.
9. An electronic device, comprising: A processor; A memory for storing processor-executable instructions; Wherein the processor is configured to invoke the instructions stored in the memory to perform the method of any of claims 1 to 7.
10. A computer readable storage medium having stored thereon computer program instructions, which when executed by a processor, implement the method of any of claims 1 to 7.

Description

Anti-interference optimization method and system for live broadcast audio of complex sound field Technical Field The invention relates to an audio signal processing technology, in particular to a live audio anti-interference optimization method and system for a complex sound field. Background In existing live teaching systems, audio processing typically employs conventional noise reduction techniques. It is common practice to use a single-channel or fixed-beam-formed microphone array to collect teacher speech and to coordinate noise suppression algorithms, such as spectral subtraction or statistical model-based wiener filtering, to reduce ambient noise. These methods rely mainly on differences in spectral characteristics of the audio signal to achieve speech enhancement by estimating the noise spectrum and subtracting it from the mixed signal. The partial advanced scheme can utilize the spatial information of the dual microphones to perform simple sound source localization so as to distinguish a front target sound source from partial ambient noise, and further perform directional pickup or fixed-direction filtering processing. However, the existing conventional methods have significant drawbacks. Sources of interference in a live teaching scene, such as sudden quiz of students, temporary crowd outside the teaching room, noise generated by moving objects, etc., often have uncertainty of dynamics and spatial orientation. Conventional methods based on fixed noise spectrum estimation or static beam pointing have difficulty tracking the continuous evolution of these sources in space and time in real time. This results in either excessive suppression of interfering sounds similar to the teacher's speech spectrum, resulting in speech distortion, or inability to effectively track and suppress moving or intermittently occurring sources of interference, so that significant interfering components remain in the purified audio, affecting the clarity and immersion of the live listening lesson. Disclosure of Invention The embodiment of the invention provides a live audio anti-interference optimization method and system for a complex sound field, which can solve the problems in the prior art. In a first aspect of the embodiment of the present invention, a method for optimizing anti-interference of live audio in a complex sound field is provided, including: Collecting multichannel audio data of a teaching live broadcast scene, and performing time-frequency conversion on the multichannel audio data to obtain time spectrum data; calculating the phase difference of each channel in the time spectrum data, and determining the space azimuth angle of each time frequency point according to the phase difference to obtain time frequency space distribution data; The method comprises the steps of counting energy accumulation distribution of each space azimuth interval in time-frequency space distribution data, comparing the energy accumulation distribution with a preset teaching sound source space area, and identifying the interference source azimuth outside the teaching sound source space area to obtain an interference source space positioning result; According to the interference source space positioning result, extracting a time-frequency point set belonging to the interference source azimuth in the time-frequency space distribution data, establishing an interference source time-frequency evolution track, and carrying out dynamic classification marking on time-frequency points in the time-frequency spectrum data according to the interference source time-frequency evolution track to obtain a time-frequency classification label; Generating a time-varying filter coefficient synchronous with the evolution of the interference source according to time-varying characteristics of the time-frequency classification tag and the time-frequency evolution track of the interference source, and applying the time-varying filter coefficient to the time-frequency spectrum data to carry out self-adaptive attenuation on the interference frequency band to obtain time-frequency spectrum data after filtering; And performing inverse time-frequency conversion on the filtered time spectrum data to generate an audio output signal after interference elimination. Collecting multichannel audio data of a teaching live broadcast scene, performing time-frequency conversion on the multichannel audio data, and obtaining time-frequency spectrum data comprises the following steps: Synchronously acquiring audio signals of a teaching live broadcast scene through a plurality of spatially distributed audio acquisition units, recording the spatial position coordinates of each audio acquisition unit, and establishing the corresponding relation between the audio acquisition units and the spatial position coordinates to obtain multichannel audio data with spatial position identifiers; Carrying out windowing and framing treatment on the multichannel audio data, dividing continuo