CN-121999762-A - Discrimination method, device, equipment and medium for acoustic model countermeasure training

CN121999762ACN 121999762 ACN121999762 ACN 121999762ACN-121999762-A

Abstract

The application relates to the technical field of model training, the intelligent medical field and the financial service field, in particular to a judging method, device, equipment and medium for acoustic model countermeasure training. When the method is used for executing the countermeasure training of the acoustic model, target acoustic data output by the acoustic model are obtained, feature extraction is carried out on the target acoustic data, target acoustic features are obtained, authenticity of intermediate features of the targets is judged, and a target judgment result is output, so that the trained acoustic model is obtained. The method can be applied to the intelligent medical field and the financial service field, and the method comprises the steps of firstly acquiring target acoustic data output by the acoustic model and extracting characteristics when the acoustic model is used for countertraining, judging the authenticity of the intermediate characteristics of the target according to the downsampling result, and outputting the target judging result for training the acoustic model. And judging the authenticity of the generated data of the acoustic model in the countermeasure training process, so as to train the acoustic model to improve the model performance.

Inventors

SHI YAN

Assignees

平安科技（深圳）有限公司

Dates

Publication Date: 20260508
Application Date: 20260107

Claims (10)

1.A discrimination method for acoustic model challenge training, comprising: S10, when performing acoustic model countermeasure training, acquiring target acoustic data output by the acoustic model, and extracting features of the target acoustic data to obtain target acoustic features; s11, inputting the target acoustic features into a pre-trained intermediate feature extractor for up-sampling, and outputting a final up-sampling result as target intermediate features; S12, inputting the target intermediate features into a trained feature discriminator for downsampling to obtain a downsampling result corresponding to each downsampling layer; And S13, judging the authenticity of the target intermediate feature based on all the downsampling results, and outputting a target judging result, wherein the target judging result is used for training the acoustic model to obtain a trained acoustic model.
2. The method of claim 1, wherein the training process of the trained feature discriminators comprises: acquiring real acoustic data and synthetic acoustic features of constructed synthetic acoustic data, and extracting features of the real acoustic data to obtain real acoustic features; Inputting the real acoustic features into the pre-trained intermediate feature extractor for up-sampling, outputting a final up-sampling result as a real intermediate feature, inputting the synthesized acoustic features into the pre-trained intermediate feature extractor for up-sampling, and outputting a final up-sampling result as a real intermediate feature, wherein the intermediate feature extractor comprises L up-sampling layers, and L is an integer greater than zero; Respectively inputting the real intermediate features and the synthesized intermediate features into feature discriminators for downsampling to obtain corresponding downsampling results, wherein the feature discriminators comprise M downsampling layers, and M is an integer greater than zero; For any real intermediate feature, judging the authenticity of the real intermediate feature based on all downsampling results corresponding to the real intermediate feature, and outputting a first judging result; For any synthesized intermediate feature, based on all downsampling results corresponding to the synthesized intermediate feature, judging the authenticity of the synthesized intermediate feature, and outputting a second judging result; Determining a discrimination loss based on the first discrimination result and the second discrimination result, and training the feature discriminator according to the discrimination loss to obtain a trained feature discriminator.
3. The discrimination method for acoustic model challenge training of claim 2, wherein acquiring synthetic acoustic features of the constructed synthetic acoustic data includes: obtaining model parameters of an original generator in the acoustic model; updating a local generator based on the model parameters; Based on the local generator construction synthetic acoustic features, synthetic acoustic features generated in constructing the synthetic acoustic data are acquired.
4. A method of discriminating as defined in claim 3 further comprising, prior to said training said feature discriminator based on said discrimination loss to obtain a trained feature discriminator: determining a generation loss according to the second discrimination result; Training the feature discriminator according to the discrimination loss to obtain a trained feature discriminator, comprising: And training the feature discriminator according to the discrimination loss and the generation loss to obtain a trained feature discriminator.
5. The discrimination method for acoustic model challenge training of claim 4, further comprising, after said determining a generation loss based on the second discrimination result: Training the local generator according to the discrimination loss and the generation loss to obtain a trained local generator; and acquiring the updated parameters of the trained local generator, and updating the original generator based on the updated parameters.
6. The method for training an acoustic model challenge according to claim 2, wherein training the feature classifier based on the discrimination loss, before obtaining a trained feature classifier, further comprises: For any downsampling layer of the feature discriminator, calculating the contrast loss of a downsampling result corresponding to the real intermediate feature output by the downsampling layer and a downsampling result corresponding to the synthesized intermediate feature; Training the feature discriminator according to the discrimination loss to obtain a trained feature discriminator, comprising: And training the feature discriminator according to the discrimination loss and the comparison loss to obtain a trained feature discriminator.
7. The apparatus for discriminating an acoustic model challenge training of any of claims 1 to 6 wherein said training said acoustic model to obtain a trained acoustic model comprises: taking the target acoustic data and the target discrimination result as a target training set; And training the acoustic model by using the target training set to obtain a trained acoustic model.
8. A discriminating apparatus for acoustic model challenge training, comprising: The acoustic feature extraction module is used for acquiring target acoustic data output by the acoustic model when the acoustic model countermeasure training is executed, and extracting features of the target acoustic data to obtain target acoustic features; The intermediate feature extraction module is used for inputting the target acoustic features into a pre-trained intermediate feature extractor for up-sampling, and outputting a final up-sampling result as target intermediate features; the downsampling module is used for inputting the target intermediate features into the trained feature discriminator to downsample to obtain a downsampling result corresponding to each downsampling layer; The target judging module is used for judging the authenticity of the target intermediate feature based on all the downsampling results and outputting a target judging result, and the target judging result is used for training the acoustic model to obtain a trained acoustic model.
9. A computer device, characterized in that it comprises a processor, a memory and a computer program stored in the memory and executable on the processor, which processor implements the discrimination method for acoustic model challenge training according to any of claims 1 to 6 when executing the computer program.
10. A computer-readable storage medium storing a computer program, wherein the computer program, when executed by a processor, implements the discrimination method for acoustic model countermeasure training according to any one of claims 1 to 6.

Description

Discrimination method, device, equipment and medium for acoustic model countermeasure training Technical Field The application relates to the technical field of model training, the intelligent medical field and the financial service field, in particular to a judging method, device, equipment and medium for acoustic model countermeasure training. Background The acoustic model is widely applied in the financial service field and the intelligent medical field, and is mainly used for describing the mapping relation between the acoustic characteristics of the voice signal and the voice content, and the accuracy and the reliability of the whole voice processing system are directly influenced by the performance of the acoustic model. For example, in an intelligent voice assistant, an acoustic model needs to accurately convert a voice signal of a user into text information for subsequent semantic understanding and task execution, and in a voice synthesis system, the acoustic model generates natural and smooth voice according to an input text. First, most of the conventional acoustic model training adopts a supervised learning mode, i.e., training is performed using a large amount of labeled speech data. In the training process, the model continuously adjusts the parameters of the model, so that the output of the model is as close to the true result of the annotation as possible. The conventional acoustic model has poor generalization ability in the face of unseen voice data. Because of the limitation of the training data, the model may only memorize the characteristic modes in the training data, but cannot adapt to the changes of different voice styles, accents, environmental noise and the like well, so that the recognition accuracy in practical application is greatly reduced. For example, in the field of financial services, an intelligent customer service system of a bank collects a large number of call records of customers and customer service personnel, performs text labeling on the call records, and trains an acoustic model by using the labeled data, however, the accent, the speech speed, the emotion and the like of the customers are different, and when the traditional acoustic model identifies some professional financial terms, the model may be erroneously identified, so that the requirements of the customers cannot be accurately responded. Second, most acoustic models train the model by introducing discriminators to distinguish between true and false acoustic data. However, the conventional training method is difficult to analyze the detailed information, so that the error rate of the judgment result is high. For example, in a medical setting, voice data may be subject to a variety of disturbances, such as noise from a hospital environment, noise from medical devices, and the like. The common countermeasure training method lacks an effective processing mechanism for acoustic data characteristics, and is difficult to accurately capture fine characteristic changes in medical voice data, so that the judgment of a discriminator is inaccurate, the training effect of an acoustic model is further affected, and misjudgment on patient condition description is caused. Therefore, how to judge the authenticity of the generated data of the acoustic model in the countermeasure training process, and further train the acoustic model to improve the model performance becomes a problem to be solved. Disclosure of Invention In view of the above, the embodiments of the present application provide a method, apparatus, device, and medium for determining an acoustic model challenge training, so as to solve the problem of how to determine the authenticity of data generated by the acoustic model during the challenge training, and further train the acoustic model to improve the model performance. In a first aspect, an embodiment of the present application provides a method for discriminating acoustic model countermeasure training, including: when performing the countermeasure training of the acoustic model, acquiring target acoustic data output by the acoustic model, and extracting features of the target acoustic data to obtain target acoustic features; inputting the target acoustic features into a pre-trained intermediate feature extractor for up-sampling, and outputting a final up-sampling result as a target intermediate feature; Inputting the target intermediate features into a trained feature discriminator for downsampling to obtain a downsampling result corresponding to each downsampling layer; Based on all the downsampling results, judging the authenticity of the target intermediate features, and outputting target judging results, wherein the target judging results are used for training the acoustic model to obtain a trained acoustic model. In a second aspect, an embodiment of the present application provides a discriminating apparatus for acoustic model challenge training, including: The acoustic feature extraction module is use