CN-121983086-A - Voice fake identifying method and system based on frequency band critical area dynamic characteristic ablation

CN121983086ACN 121983086 ACN121983086 ACN 121983086ACN-121983086-A

Abstract

The invention provides a voice authentication method and a system based on frequency band critical area dynamic feature ablation, which relate to the technical field of voice signal processing and multimedia content security and comprise the steps of extracting a time-frequency feature map from a voice signal to be detected; the method comprises the steps of preprocessing an extracted time-frequency characteristic diagram, inputting the preprocessed time-frequency characteristic diagram into a voice depth fake identification model to obtain a fake identification result, wherein the voice depth fake identification model firstly carries out dynamic characteristic ablation processing on the critical transition region in the time-frequency characteristic diagram, then carries out frequency band differential modeling on the time-frequency characteristic diagram subjected to the dynamic characteristic ablation processing to obtain voice characteristics, and finally classifies the voice characteristics to obtain a real voice or fake voice identification result.

Inventors

ZHANG JIANQIANG
WU XIAOMING
WANG FUQIANG
ZHANG PENG
MA XIAOFENG
Hao Qiubin
ZHAO WEI

Assignees

山东省计算中心（国家超级计算济南中心）
齐鲁工业大学(山东省科学院)

Dates

Publication Date: 20260505
Application Date: 20260212

Claims (10)

1. The voice authentication method based on the dynamic characteristic ablation of the critical area of the frequency band is characterized by comprising the following steps: Extracting a time-frequency characteristic diagram from a voice signal to be detected; Preprocessing the extracted time-frequency characteristic diagram, including frequency band division and positioning of a critical transition region between a low-frequency region and a high-frequency region; Inputting the preprocessed time-frequency characteristic diagram into a voice depth fake identifying model to obtain a fake identifying result; the voice depth false discrimination model firstly carries out dynamic feature ablation treatment on a critical transition region in the time-frequency feature map, then carries out frequency band differential modeling on the time-frequency feature map after the dynamic feature ablation treatment to obtain voice features, and finally classifies the voice features to obtain a discrimination result of real voice or fake voice.
2. The voice authentication method based on frequency band critical area dynamic feature ablation according to claim 1, wherein the time-frequency feature map is extracted by preprocessing a voice signal and performing time-frequency transformation, and then extracting a linear spectrogram, wherein the linear spectrogram is used as the time-frequency feature map.
3. The voice authentication method based on frequency band critical area dynamic feature ablation according to claim 1, wherein the frequency band division is to divide the time-frequency feature map into a low-frequency area and a high-frequency area along a frequency axis according to a preset frequency threshold or a proportion rule; and the critical transition area between the positioning low-frequency area and the high-frequency area is obtained by taking the sensitive frequency band covering the junction of the low-frequency and high-frequency characteristics as the critical transition area according to the preset sensitive frequency band width.
4. The voice authentication method based on frequency band critical area dynamic feature ablation of claim 1, wherein the dynamic feature ablation is to change the intensity or connectivity of feature responses in the critical transition area according to a preset strategy, the strategy comprising feature elimination and dynamic weighted attenuation.
5. The voice authentication method based on dynamic feature ablation of critical areas of frequency bands of claim 4, wherein the feature elimination is selecting a portion of frequency bins within the critical transition area and discarding their features.
6. The voice authentication method based on band critical area dynamic feature ablation of claim 4, wherein the dynamic weighted attenuation is a characteristic value in the critical transition area multiplied by an attenuation coefficient, wherein the attenuation coefficient is a random variable in a preset interval or a dynamically adjusted preset value.
7. The voice authentication method based on frequency band critical area dynamic feature ablation according to claim 1, wherein the frequency band differential modeling is to extract texture features by using a maximum pooling operation for the low frequency area and noise distribution features by using an average pooling operation for the high frequency area.
8. The voice authentication system based on frequency band critical area dynamic characteristic ablation is characterized by comprising: The extraction module is configured to extract a time-frequency characteristic diagram from a voice signal to be detected; The preprocessing module is configured to preprocess the extracted time-frequency characteristic diagram and comprises frequency band division and positioning of a critical transition region between a low-frequency region and a high-frequency region; The fake identifying module is configured to input the preprocessed time-frequency characteristic diagram into a voice depth fake identifying model to obtain a fake identifying result; the voice depth false discrimination model firstly carries out dynamic feature ablation treatment on a critical transition region in the time-frequency feature map, then carries out frequency band differential modeling on the time-frequency feature map after the dynamic feature ablation treatment to obtain voice features, and finally classifies the voice features to obtain a discrimination result of real voice or fake voice.
9. A non-transitory computer readable storage medium storing computer instructions which, when executed by a processor, implement the method of voice authentication based on band critical area dynamic feature ablation of any of claims 1-7.
10. An electronic device comprising a processor, a memory and a computer program, wherein the processor is connected to the memory, the computer program is stored in the memory, and when the electronic device is running, the processor executes the computer program stored in the memory, so that the electronic device executes a voice authentication method for realizing the dynamic characteristic ablation based on the critical area of the frequency band according to any one of claims 1 to 7.

Description

Voice fake identifying method and system based on frequency band critical area dynamic characteristic ablation Technical Field The invention relates to the technical field of voice signal processing and multimedia content security, in particular to a voice fake identifying method and a voice fake identifying system based on frequency band critical area dynamic characteristic ablation. Background With the iteration of generating the challenge network (GANs) and various types of speech synthesis techniques, high-fidelity counterfeit speech presents a substantial challenge for voiceprint authentication and financial payment security. Currently, a voice fake identifying technology based on a deep Convolutional Neural Network (CNN) can identify most of known attacks by analyzing the isochronic frequency characteristics of a Mel spectrogram. In the prior art, in order to improve the detection precision, a frequency band differential modeling strategy is usually adopted, namely, different pooling operations (such as low-frequency maximum pooling and high-frequency average pooling) are respectively adopted for a low-frequency part (comprising main semantic and voiceprint information) and a high-frequency part (comprising synthetic noise and artifacts) of a voice spectrum, however, the hard frequency band division can form a critical transition area at the junction of the low frequency and the high frequency, the area is influenced by two different feature extraction strategies at the same time, feature distribution is complex and sensitive, the model tends to be overfitted with specific frequency band artifacts generated by the critical area rather than the intrinsic true pseudo features of a learning voice signal in a conventional training mode, and once a forging algorithm of a test sample is changed, the feature distribution of the critical area drifts with the change, so that the performance of the model is reduced. Therefore, the existing voice fake identifying technology based on frequency band division is easy to generate characteristic dependence at the junction of low frequency and high frequency, and the problem of poor generalization of a model to unknown attacks is caused. Disclosure of Invention In order to solve the problems, the invention provides a voice fake identifying method and a voice fake identifying system based on frequency band critical area dynamic characteristic ablation, which effectively eliminate detection dead zones caused by frequency band boundary effects and remarkably improve the detection precision of a model under a cross-data set scene. According to some embodiments, the present invention employs the following technical solutions: A voice authentication method based on frequency band critical area dynamic characteristic ablation comprises the following steps: Extracting a time-frequency characteristic diagram from a voice signal to be detected; Preprocessing the extracted time-frequency characteristic diagram, including frequency band division and positioning of a critical transition region between a low-frequency region and a high-frequency region; Inputting the preprocessed time-frequency characteristic diagram into a voice depth fake identifying model to obtain a fake identifying result; the voice depth false discrimination model firstly carries out dynamic feature ablation treatment on a critical transition region in the time-frequency feature map, then carries out frequency band differential modeling on the time-frequency feature map after the dynamic feature ablation treatment to obtain voice features, and finally classifies the voice features to obtain a discrimination result of real voice or fake voice. According to some embodiments, the present invention employs the following technical solutions: A voice authentication system based on frequency band critical area dynamic characteristic ablation comprises: The extraction module is configured to extract a time-frequency characteristic diagram from a voice signal to be detected; The preprocessing module is configured to preprocess the extracted time-frequency characteristic diagram and comprises frequency band division and positioning of a critical transition region between a low-frequency region and a high-frequency region; The fake identifying module is configured to input the preprocessed time-frequency characteristic diagram into a voice depth fake identifying model to obtain a fake identifying result; the voice depth false discrimination model firstly carries out dynamic feature ablation treatment on a critical transition region in the time-frequency feature map, then carries out frequency band differential modeling on the time-frequency feature map after the dynamic feature ablation treatment to obtain voice features, and finally classifies the voice features to obtain a discrimination result of real voice or fake voice. According to some embodiments, the present invention employs the following technical solutions: