CN-122024749-A - Audio digital signal processing method and system based on SOC (system on chip)

CN122024749ACN 122024749 ACN122024749 ACN 122024749ACN-122024749-A

Abstract

The invention relates to the technical field of audio digital signal processing and discloses an audio digital signal processing method and system based on an SOC chip, wherein the method comprises the following steps of S1, synchronously collecting multiple paths of microphone signals, audio playing reference signals and gesture data by using an SOC chip system clock, marking time stamps for the multiple paths of microphone signals, the audio playing reference signals and the gesture data, S2, generating a known characteristic sequence in an audio playing link where the audio playing reference signals are located, adjusting amplitude gain of the known characteristic sequence, modulating the amplitude gain in a frequency range from 18kHz to 20kHz, and then mixing the amplitude gain with the audio playing reference signals to output original audio signals, and S3, carrying out normalized despreading processing on the multiple paths of microphone signals. By adopting the technical scheme of combining physical gesture sensing and acoustic channel prediction correction, the technical effect of pre-compensating acoustic path jump is achieved, and the quick response to wearing gesture and head rotation is realized.

Inventors

SHI LEI
FAN GUANGFU

Assignees

北京融讯互联科技有限公司

Dates

Publication Date: 20260512
Application Date: 20260320

Claims (10)

1. An audio digital signal processing method based on an SOC chip is characterized by comprising the following steps: S1, synchronously acquiring multiple paths of microphone signals, audio playing reference signals and gesture data by using an SOC chip system clock, and marking time stamps for the multiple paths of microphone signals, the audio playing reference signals and the gesture data; S2, generating a known characteristic sequence in an audio playing link where an audio playing reference signal is located, adjusting the amplitude gain of the known characteristic sequence, modulating the amplitude gain to a frequency band of 18kHz to 20kHz, and then aliasing the amplitude gain to an original audio signal for output; s3, carrying out normalized despreading processing on the multipath microphone signals, and extracting channel impulse response, clock drift compensation value and channel uncertainty based on a known characteristic sequence; S4, establishing an acoustic channel state transition equation by combining the gesture data, and carrying out prediction processing and correction processing on channel impulse response by using the gesture data; S5, driving an uplink echo cancellation algorithm, a local feedback suppression algorithm and an outer privacy suppression algorithm based on the corrected channel impulse response, and distributing suppression gain upper limits of the algorithms according to the uncertainty of the channel; S6, monitoring a residual echo index, an environmental wind noise index and a channel uncertainty output by an uplink echo cancellation algorithm, and triggering algorithm parameter updating when the residual echo index, the environmental wind noise index or the channel uncertainty exceeds a preset threshold; And S7, carrying out nonlinear residual component compensation on the audio signals obtained after the multipath microphone signals are processed by the uplink echo cancellation algorithm, the local feedback suppression algorithm and the external privacy suppression algorithm, and outputting the compensated audio signals.
2. The audio digital signal processing method according to claim 1, wherein in S1, the multi-channel microphone signal with the time stamp, the audio playing reference signal and the gesture data are transmitted to the memory buffer area through a direct memory access mechanism inside the SOC chip; Marking the multi-path microphone signals, the audio playing reference signals and the gesture data with time stamps in the S1 comprises the steps of calling a global timer in the SOC chip to carry out sequence numbering marking on the audio sampling sequence, and establishing an interpolation mapping relation between gesture data sampling frequency and audio sampling rate so as to realize phase alignment of multi-source heterogeneous data under microsecond precision.
3. The method according to claim 1, wherein the step of adjusting the amplitude gain of the known signature sequence in step S2 includes obtaining a psychoacoustic masking threshold of the original audio signal in a frequency band ranging from 18kHz to 20kHz in real time, and dynamically determining a range of the amplitude gain according to the psychoacoustic masking threshold and a current environmental wind noise index, so that the mixed known signature sequence is within a human ear auditory masking range while maintaining an observed signal-to-noise ratio.
4. The audio digital signal processing method according to claim 1, wherein the formula for extracting the channel impulse response in S3 is: , Wherein, the To the estimated first The channel impulse response of the channel, Is the first The multiple paths of microphone signals are routed to the same receiver, For a sequence of features to be known, Is the time offset; The uncertainty of the channel in the S3 is calculated based on the peak-to-average ratio of the correlation peak after despreading of the known characteristic sequence; The calculation of the channel uncertainty in the S3 based on the peak-to-average ratio of the correlation peak after despreading of the known characteristic sequence comprises the steps of extracting the average energy of the correlation background in the preset duration and the maximum energy of the correlation peak after despreading, and calculating the ratio of the maximum energy of the correlation peak to the average energy of the correlation background so as to represent the confidence level of the channel impulse response in the current acoustic environment according to the reciprocal of the ratio.
5. The audio digital signal processing method according to claim 1, wherein the prediction processing in S4 includes calculating a process noise weight using the pose data Process noise weight The calculation formula of (2) is as follows: , Wherein, the For the preset base noise constant to be present, As the value of the angular velocity in the attitude data, As an index of the wind noise of the environment, And (3) with The weighting coefficient is preset; The establishing an acoustic channel state transition equation in S4 includes mapping the triaxial acceleration and the triaxial angular velocity in the attitude data into relative spatial displacement vectors of the wearing device relative to the head of the user, and compensating nonlinear time-varying components in the acoustic channel state transition matrix based on the relative spatial displacement vectors so as to correct the acoustic path phase offset caused by the physical position change in advance.
6. The method according to claim 1, wherein assigning the upper limit of each algorithm suppression gain in S5 includes prioritizing down a gain weight of the outer-put privacy-preserving algorithm when the channel uncertainty rises to a first threshold; In the step S5, allocating the upper limit of the suppression gain of each algorithm according to the uncertainty of the channel includes establishing a cooperative constraint mechanism of an uplink echo cancellation algorithm and an outgoing privacy suppression algorithm, and when the uncertainty of the channel is in a fluctuation rising stage, reducing nonlinear interference on the acoustic reference signal of the microphone end by reducing the output weight of the outgoing privacy suppression algorithm so as to ensure continuous convergence of the uplink echo cancellation algorithm on the acoustic channel state transition equation.
7. The audio digital signal processing method according to claim 1, wherein the triggering algorithm parameter updating in S6 includes invoking a lightweight neural network enhancement module to perform nonlinear suppression processing on components corresponding to the residual echo index; and in the step S6, if the residual echo index, the environment wind noise index and the channel uncertainty are all lower than the preset threshold, the SOC chip reduces the updating frequency of the channel impulse response.
8. The audio digital signal processing method according to claim 1, wherein the triggering algorithm parameter updating in S6 includes starting a dedicated hardware acceleration unit inside the SOC chip according to the instantaneous rate of change of the residual echo index and the environmental wind noise index, improving the iterative updating step length of the adaptive filter to the nonstationary acoustic path by using the hardware synchronous computing capability, and turning off the dedicated hardware acceleration unit when the residual echo index, the environmental wind noise index and the channel uncertainty fall below a preset threshold value to realize the event-triggered low-power operation.
9. The audio digital signal processing method according to claim 1, wherein in S7, residual energy corresponding to the audio signal updated by the algorithm parameter is fed back to S4 as a correction reference of the channel impulse response; The compensating of the nonlinear residual component in the S7 comprises extracting spectral line residual energy processed by an algorithm in an audio signal, calculating a masking envelope by using a preset psychoacoustic model, and carrying out spectrum refinement and shaping on the compensated audio signal according to the masking envelope so as to inhibit nonlinear distortion and improve the hearing transparency of output audio.
10. An audio digital signal processing system based on an SOC chip, comprising: The acquisition module is configured to synchronously acquire multiple paths of microphone signals, audio playing reference signals and gesture data by using an SOC chip system clock, and marks time stamps for the multiple paths of microphone signals, the audio playing reference signals and the gesture data; The adjusting module is configured to generate a known characteristic sequence in an audio playing link where the audio playing reference signal is located, adjust the amplitude gain of the known characteristic sequence, modulate the amplitude gain in a frequency band from 18kHz to 20kHz and then alias the amplitude gain to the original audio signal for output; The extraction module is configured to perform normalized despreading processing on the multipath microphone signals, and extract channel impulse response, clock drift compensation value and channel uncertainty based on the known feature sequence; the correction module is configured to establish an acoustic channel state transition equation in combination with the gesture data and to conduct prediction processing and correction processing on the channel impulse response by using the gesture data; the distribution module is configured to drive an uplink echo cancellation algorithm, a local feedback suppression algorithm and an external privacy suppression algorithm based on the corrected channel impulse response, and distribute each algorithm to suppress the upper limit of gain according to the uncertainty of the channel; The triggering module is configured to monitor a residual echo index, an environmental wind noise index and channel uncertainty output by the uplink echo cancellation algorithm, and trigger algorithm parameter update when the residual echo index, the environmental wind noise index or the channel uncertainty exceeds a preset threshold; And the compensation module is configured to compensate nonlinear residual components of the audio signals obtained after the multipath microphone signals are processed by the uplink echo cancellation algorithm, the local feedback suppression algorithm and the external privacy suppression algorithm, and output the compensated audio signals.

Description

Audio digital signal processing method and system based on SOC (system on chip) Technical Field The invention relates to the technical field of audio digital signal processing, in particular to an audio digital signal processing method and system based on an SOC (system on chip). Background Under the conversation and audio playing scenes of the open audio equipment, how to effectively inhibit outward radiated sound leakage caused by the open structure while ensuring that a user obtains high-quality hearing so as to protect conversation privacy of the user is a technical problem which is highly challenging to solve in the current open audio field. At present, the prior art generally adopts a sound wave cancellation technology of introducing specific frequency at an audio output end, or utilizes a microphone array to pick up an environment reference signal, combines a traditional linear echo cancellation algorithm and an adaptive filtering algorithm, models and counteracts a sound leakage signal at a digital signal processing layer, so as to achieve the purposes of reducing the leakage volume and guaranteeing the pick-up definition. In the prior art, the cancellation scheme based on the traditional linear correlation has a significant defect in system stability when facing the dynamically changeable acoustic environment. Because the open type equipment is extremely easy to be influenced by strong time-varying factors such as wearing posture deviation, head rotation, sudden environmental noise or gust, the acoustic transmission path presents non-stationary characteristics, so that the traditional self-adaptive filtering algorithm is not easy to track path rapid change in real time, the situation that the convergence speed is not kept up with the change speed is extremely easy to occur, further, the echo cancellation amount is caused to fluctuate greatly when rapid change occurs, even serious echo residues or positive feedback howling are caused, and continuous and stable privacy protection and high-quality hearing experience cannot be provided. In order to solve the problems, the invention provides a method based onA method for processing audio digital signals of a chip. Disclosure of Invention Aiming at the defects of the prior art, the invention provides an audio digital signal processing method based on an SOC chip, which aims to solve the problems in the background art. In order to achieve the purpose, the invention is realized by the following technical scheme that the audio digital signal processing method based on the SOC chip comprises the following steps: S1, synchronously acquiring multiple paths of microphone signals, audio playing reference signals and gesture data by using an SOC chip system clock, and marking time stamps for the multiple paths of microphone signals, the audio playing reference signals and the gesture data; S2, generating a known characteristic sequence in an audio playing link where an audio playing reference signal is located, adjusting the amplitude gain of the known characteristic sequence, modulating the amplitude gain to a frequency band of 18kHz to 20kHz, and then aliasing the amplitude gain to an original audio signal for output; s3, carrying out normalized despreading processing on the multipath microphone signals, and extracting channel impulse response, clock drift compensation value and channel uncertainty based on a known characteristic sequence; S4, establishing an acoustic channel state transition equation by combining the gesture data, and carrying out prediction processing and correction processing on channel impulse response by using the gesture data; S5, driving an uplink echo cancellation algorithm, a local feedback suppression algorithm and an outer privacy suppression algorithm based on the corrected channel impulse response, and distributing suppression gain upper limits of the algorithms according to the uncertainty of the channel; S6, monitoring a residual echo index, an environmental wind noise index and a channel uncertainty output by an uplink echo cancellation algorithm, and triggering algorithm parameter updating when the residual echo index, the environmental wind noise index or the channel uncertainty exceeds a preset threshold; And S7, carrying out nonlinear residual component compensation on the audio signals obtained after the multipath microphone signals are processed by the uplink echo cancellation algorithm, the local feedback suppression algorithm and the external privacy suppression algorithm, and outputting the compensated audio signals. Preferably, in the step S1, the multi-path microphone signal with the timestamp, the audio playing reference signal and the gesture data are transmitted to the memory buffer area through a direct memory access mechanism inside the SOC chip; Marking the multi-path microphone signals, the audio playing reference signals and the gesture data with time stamps in the S1 comprises the steps of calling a global ti