CN-122027964-A - Audio parameter correction model training method, audio correction method and terminal equipment

CN122027964ACN 122027964 ACN122027964 ACN 122027964ACN-122027964-A

Abstract

The invention relates to an audio parameter correction model training method, an audio correction method and terminal equipment, wherein the method comprises the steps of extracting first time domain characteristics of a target sample audio signal by a time domain characteristic sub-network; the method comprises the steps of extracting first frequency domain features of a target sample audio signal by a frequency domain feature sub-network, carrying out fusion processing on the first time domain features, the first frequency domain features and reference features by a feature fusion sub-network to obtain first fusion features, constructing a first four-dimensional feature tensor corresponding to the first fusion features by a four-dimensional tensor constructing sub-network, carrying out four-dimensional feature consistency analysis on a first feature combination and a target reference feature combination by a consistency analysis sub-network to obtain a consistency feature difference value set, correcting distortion of the sample audio signal by a self-adaptive parameter adjustment sub-network on the basis of a standard core parameter set and the consistency feature difference value set to obtain a target correction parameter set, and optimizing model parameters of an audio parameter correction model on the basis of the standard core parameter set and the target correction parameter set.

Inventors

WANG XIANGXIANG

Assignees

深圳信扬创智科技有限公司

Dates

Publication Date: 20260512
Application Date: 20260106

Claims (10)

1. An audio parameter correction model training method, comprising: Acquiring a plurality of sample audio signals which are distortion audio signals acquired by different microphone arrays respectively; Taking any one of the plurality of sample audio signals as a target sample audio signal, and respectively executing the following step S1 to obtain a trained audio parameter correction model, wherein the audio parameter correction model comprises a time domain characteristic sub-network, a frequency domain characteristic sub-network, a characteristic fusion sub-network, a four-dimensional tensor construction sub-network, a consistency analysis sub-network and an adaptive parameter adjustment sub-network, and the target sample audio signal corresponds to a target microphone array; Wherein, the S1 comprises: extracting first time domain features of the target sample audio signal through the time domain feature sub-network; extracting a first frequency domain feature of the target sample audio signal through the frequency domain feature sub-network; The first time domain feature, the first frequency domain feature and the reference feature are fused through the feature fusion sub-network to obtain a first fusion feature, and the reference feature is used for indicating a reference time domain feature and a reference frequency domain feature; Constructing a first four-dimensional feature tensor corresponding to the first fusion feature through the four-dimensional tensor construction sub-network; Performing four-dimensional feature consistency analysis on a first feature combination and a target reference feature combination through a consistency analysis sub-network to obtain a consistency feature difference value set, wherein the first feature combination comprises the first time domain feature, the first frequency domain feature, the first four-dimensional feature tensor and microphone parameter features of the target microphone array, the target reference feature combination comprises a reference time domain feature, a reference frequency domain feature, a reference four-dimensional feature tensor and calibration parameters, and the consistency feature difference value set comprises a time domain dimension consistency feature difference value, a frequency domain dimension consistency feature difference value, a reference dimension consistency feature difference value and a microphone dimension consistency feature difference value, and the target reference feature combination is a reference feature combination matched with the target microphone array in a reference feature library; Correcting the distortion of the sample audio signal based on a standard core parameter set and the consistency characteristic difference value set through the self-adaptive parameter adjustment sub-network to obtain a target correction parameter set, and optimizing model parameters of the audio parameter correction model based on the standard core parameter set and the target correction parameter set.
2. The method according to claim 1, wherein the method further comprises: Acquiring a plurality of reference audio signals; respectively carrying out feature analysis on the plurality of reference audio signals to obtain a plurality of reference feature combinations; the reference feature library is constructed based on the plurality of reference feature combinations.
3. The method of claim 2, wherein the sources of the plurality of reference audio signals comprise at least one of: Different reference audio signals acquired in a standard acoustic environment based on the calibrated high-fidelity microphone; A corrected standard audio signal; an external environmental audio signal collected by the microphone.
4. The method of claim 3, wherein the sources of the plurality of reference audio signals comprise external ambient audio signals captured by a microphone, the plurality of reference audio signals comprising a plurality of external ambient audio signals; the performing feature analysis on the plurality of reference audio signals to obtain a plurality of reference feature combinations respectively includes: Performing dimension feature analysis based on each external environment audio signal in a target external environment audio signal set to obtain a plurality of dimension feature sets corresponding to a plurality of dimensions respectively, wherein the target external environment audio signal set is a plurality of external environment audio signals belonging to a target class in the plurality of external environment audio signals; taking any one of the plurality of dimensional feature sets of the plurality of dimensions as a dimensional feature set of a target dimension, and executing the following S2 to obtain a plurality of consistency features respectively corresponding to the plurality of dimensions; Wherein, the S2 includes: based on each dimension feature in the dimension feature set of the target dimension and the dimension feature mean of the target dimension, carrying out consistency analysis on each dimension feature of the target dimension, and determining the feature with the smallest fluctuation in each dimension feature of the target dimension as a target consistency feature; and constructing a reference feature combination corresponding to the target category based on the consistency features.
5. The method of claim 4, wherein the performing a consistency analysis on each dimension feature of the target dimension based on each dimension feature in the set of dimension features of the target dimension and a dimension feature mean of the target dimension, determining a feature with a minimum fluctuation in each dimension feature of the target dimension as a target consistency feature comprises: Determining absolute deviation of each dimension feature of the target dimension based on each dimension feature in the dimension feature set of the target dimension and the dimension feature mean of the target dimension; And determining the target consistency feature based on the ratio of the absolute deviation of each dimension feature to the maximum absolute deviation of the target dimension.
6. The method of claim 4, wherein the performing a consistency analysis on each dimension feature of the target dimension based on each dimension feature in the set of dimension features of the target dimension and a dimension feature mean of the target dimension, determining a feature with a minimum fluctuation in each dimension feature of the target dimension as a target consistency feature comprises: Determining absolute deviation of each dimension feature of the target dimension based on each dimension feature in the dimension feature set of the target dimension and the dimension feature mean of the target dimension; and determining the target consistency feature based on the ratio of the absolute deviation of each dimension feature to the dimension feature mean of the target dimension.
7. The method according to claim 4, wherein the method further comprises: acquiring a plurality of collected audio signals of the target class; And adding the plurality of audio signals into the target external environment audio signal set, and returning to execute dimensional feature analysis based on each external environment audio signal in the target external environment audio signal set to obtain a plurality of dimensional feature sets corresponding to a plurality of dimensions respectively until the reference feature combination corresponding to the target category is reconstructed.
8. An audio correction method, comprising: Playing a test audio signal under the condition that the current microphone array is accessed; acquiring a first audio signal obtained by acquiring the test audio signal by the current microphone array; Inputting the first audio signal into an audio parameter correction model, and outputting a first correction parameter set, wherein the audio parameter correction model is obtained by training an audio parameter correction model training method according to any one of claims 1-7; correcting the first real-time audio signal acquired by the current microphone array based on the first correction parameter set to obtain a corrected second audio signal; and playing the second audio signal.
9. The method of claim 8, wherein the correcting the first real-time audio signal acquired by the current microphone array based on the first correction parameter set results in a corrected second audio signal, and further comprising: Inputting the second audio signal into the audio parameter correction model and outputting a second correction parameter set, wherein the second correction parameter set is generated by optimizing the first correction parameter set based on the consistency characteristic difference value set corresponding to the second audio signal when the consistency characteristic difference value set corresponding to the second audio signal and the consistency characteristic difference value set corresponding to the first real-time audio signal are different from each other and do not meet a correction target; Correcting the second real-time audio signal acquired by the current microphone array based on the second correction parameter set to obtain a corrected third audio signal; And playing the third audio signal.
10. A terminal device, comprising: a memory configured to store a computer program; A processor configured to cause the terminal device to implement the audio parameter modification model training method of any one of claims 1 to 7 or to implement the audio modification method of claim 8 or 9 when the computer program is invoked.

Description

Audio parameter correction model training method, audio correction method and terminal equipment Technical Field Embodiments of the present disclosure relate to the field of audio processing. More particularly, the invention relates to an audio parameter correction model training method, an audio correction method and a terminal device. Background In a loudspeaker, a loudspeaker or a smart voice system, a user is often allowed to insert an external microphone. But the frequency response characteristics of different microphones are different, which may lead to unstable sound quality and even distortion. The traditional method generally relies on manual debugging or frequency response database matching, and has the problems of low adaptation efficiency, large error and the like. Disclosure of Invention In order to solve the above technical problems or at least partially solve the above technical problems, an embodiment of the present disclosure provides an audio parameter correction model training method, an audio correction method, and a terminal device. In a first aspect, an embodiment of the present disclosure provides an audio parameter correction model training method, including obtaining a plurality of sample audio signals, where the plurality of sample audio signals are distortion audio signals collected by different microphone arrays, respectively, performing the following step S1 on any one of the plurality of sample audio signals as a target sample audio signal to obtain a trained audio parameter correction model, where the audio parameter correction model includes a time domain feature sub-network, a frequency domain feature sub-network, a feature fusion sub-network, a four-dimensional tensor construction sub-network, a consistency analysis sub-network, and an adaptive parameter adjustment sub-network, and the target sample audio signal corresponds to a target microphone array, where the S1 includes extracting a first time domain feature of the target sample audio signal through the time domain feature sub-network, extracting a first frequency domain feature of the target sample audio signal through the frequency domain feature sub-network, performing fusion processing on the first time domain feature, the first frequency domain feature and a reference feature through the feature fusion sub-network to obtain a first fusion feature, where the reference feature is used to indicate a reference time domain feature and the reference frequency feature, constructing a first dimensional feature sub-network by the four-dimensional tensor construction sub-network, combining the first dimensional feature with the first dimensional feature and the fourth dimensional feature and the reference feature, and the first dimensional feature is performed by the first dimensional tensor analysis sub-network, and the first dimensional feature and the reference feature is combined to obtain a difference feature set by the first time domain feature and the first dimensional feature and the reference feature set, the consistency characteristic difference value set comprises a time domain dimension consistency characteristic difference value, a frequency domain dimension consistency characteristic difference value, a reference dimension consistency characteristic difference value and a microphone dimension consistency characteristic difference value, the target reference characteristic combination is a reference characteristic combination matched with the target microphone array in a reference characteristic library, distortion of the sample audio signal is corrected based on a standard core parameter set and the consistency characteristic difference value set through the self-adaptive parameter adjustment sub-network to obtain a target correction parameter set, and model parameters of the audio parameter correction model are optimized based on the standard core parameter set and the target correction parameter set. In the embodiment of the disclosure, an audio parameter correction model is trained based on distorted audio signals collected from different microphone arrays, time-frequency domain analysis is performed on the distorted audio signals, a first time domain feature, a first frequency domain feature, a first four-dimensional feature tensor and microphone parameter features obtained by the time-frequency domain analysis are respectively subjected to four-dimensional feature consistency analysis with reference time domain features, reference frequency domain features, reference four-dimensional feature tensor and calibration parameters, a consistency feature difference value set is obtained, distortion of the sample audio signals is corrected based on a standard core parameter set and the consistency feature difference value set, a target correction parameter set is obtained, model parameters of the audio parameter correction model are optimized based on the standard core parameter set and the target correction para