CN-117219086-B - Method, device, equipment and medium for generating anti-disturbance aiming at voiceprint recognition

CN117219086BCN 117219086 BCN117219086 BCN 117219086BCN-117219086-B

Abstract

The present disclosure provides an anti-disturbance generation method, apparatus, device and medium for voiceprint recognition. The countermeasure disturbance generation method comprises the steps of obtaining a training voiceprint sample set, initializing a countermeasure sample generation network, determining target recognition objects for each original voiceprint sample in the training voiceprint sample set, inputting the original voiceprint samples into the countermeasure sample generation network to obtain voiceprint countermeasure samples, inputting the voiceprint countermeasure samples into the voiceprint recognition network to obtain a first recognition result vector, determining a sample loss function of the countermeasure sample generation network according to the original voiceprint samples, the voiceprint countermeasure samples, the first probability and the second probability, and training the countermeasure sample generation network based on the sample loss function to generate countermeasure disturbance. The embodiment of the disclosure can improve the efficiency of generating the disturbance countermeasure and can also improve the dominance and concealment of the disturbance countermeasure. The embodiments of the present disclosure may be applied to artificial intelligence, communication security, and the like.

Inventors

LIU XIAOCHEN
GU ZHAOQUAN
TAN HAO
ZHANG JUNJIAN
HU NING
JING XIAO

Assignees

鹏城实验室

Dates

Publication Date: 20260505
Application Date: 20230807

Claims (11)

1. An anti-disturbance generation method for voiceprint recognition, comprising: acquiring a training voiceprint sample set, wherein the training voiceprint sample set comprises a plurality of original voiceprint samples; Initializing an challenge sample generation network, wherein the challenge sample generation network is a single-layer neural network, the number of neurons is equal to the preset number of frames of the challenge disturbance, and each neuron corresponds to one frame of the challenge disturbance; determining, for each of the original voiceprint samples in the training voiceprint sample set, a target recognition object that is different from a real recognition object of the original voiceprint sample; inputting the original voiceprint sample into the challenge sample generation network to obtain a voiceprint challenge sample; inputting the voiceprint countermeasure sample into a voiceprint recognition network to obtain a first recognition result vector, wherein the first recognition result vector comprises a first probability that the voiceprint countermeasure sample is recognized as the target recognition object and a second probability that the voiceprint countermeasure sample is recognized as other objects except the target recognition object; determining an attack success rate of the voiceprint challenge sample identified as the target identification object based on the first probability and the second probability; determining a disturbance concealment value for the voiceprint challenge sample based on the original voiceprint sample and the voiceprint challenge sample; And determining a sample loss function of the countermeasure sample generation network based on the attack success rate and the disturbance concealment value, training the countermeasure sample generation network based on the sample loss function, and generating the countermeasure disturbance.
2. The method of generating an countermeasure disturbance according to claim 1, wherein the acquiring a training voiceprint sample set includes: Randomly sampling the training voiceprint sample set from the voiceprint data total set; after the training the challenge sample generation network based on the sample loss function to generate the challenge disturbance, the challenge disturbance generation method further includes: Randomly acquiring a test voiceprint sample set from the voiceprint sample total set, wherein the test voiceprint sample set comprises a plurality of test voiceprint samples; inputting the test voiceprint sample into the challenge sample generation network to obtain a test challenge sample; inputting the test countermeasure sample into the voiceprint recognition network to obtain a second recognition result vector, inputting the test voiceprint sample into the voiceprint recognition network to obtain a third recognition result vector, and inputting the countermeasure disturbance into the voiceprint recognition network to obtain a fourth recognition result vector; determining a first correlation based on the second recognition result vector and the third recognition result vector; determining a second relatedness based on the second recognition result vector and the fourth recognition result vector; Comparing the first correlation with the second correlation, and determining the influence degree of the countermeasure disturbance on the test voiceprint sample.
3. The challenge-disturbance generating method according to claim 1, wherein the determining an attack success rate at which the voiceprint challenge sample is recognized as the target recognition object based on the first probability and the second probability includes: subtracting the maximum value in the first probability and the second probability to obtain the attack success rate; And acquiring a first threshold value, adjusting the attack success rate according to the first threshold value, and taking the first threshold value as the attack success rate if the attack success rate is greater than or equal to the first threshold value.
4. The method of claim 1, wherein determining a disturbance concealment value for the voiceprint challenge sample based on the original voiceprint sample and the voiceprint challenge sample comprises: calculating an l2 norm of a difference between the original voiceprint sample and the voiceprint challenge sample as the disturbance concealment value.
5. The method of countermeasure disturbance generation according to claim 1, wherein determining a sample loss function of the countermeasure sample generation network based on the attack success rate and the disturbance concealment value includes: Obtaining a concealment coefficient; And subtracting the product of the hidden coefficient and the disturbance hidden value from the attack success rate to obtain the sample loss function.
6. The method of generating an countermeasure disturbance according to claim 1, wherein the preset number of frames for the countermeasure disturbance is 3200 frames, and the number of neurons of the countermeasure sample generation network is 3200.
7. The method of countermeasure disturbance generation according to claim 1, wherein inputting the raw voiceprint samples into the countermeasure sample generation network includes: Copying the neurons of the challenge sample generation network if the number of frames of the original voiceprint samples is greater than the number of neurons of the challenge sample generation network, joining the challenge sample generation network; Deleting a plurality of portions of the neurons in the challenge sample generation network if the number of frames of the original voice print sample is less than the number of neurons in the challenge sample generation network to ensure that each frame of the original voice print sample corresponds to one of the neurons in the challenge sample generation network; the original voiceprint samples are input to the challenge sample generation network if the number of frames of the original voiceprint samples is equal to the number of neurons of the challenge sample generation network.
8. An anti-disturbance generating device for voiceprint recognition, comprising: The first acquisition unit is used for acquiring a training voiceprint sample set, wherein the training voiceprint sample set comprises a plurality of original voiceprint samples; An initializing unit, configured to initialize an antagonistic sample generation network, where the antagonistic sample generation network is a single-layer neural network, and the number of neurons is equal to a preset number of frames of the antagonistic disturbance, and each neuron corresponds to one frame of the antagonistic disturbance; A first determining unit configured to determine, for each of the original voiceprint samples in the training voiceprint sample set, a target recognition object that is different from a true recognition object of the original voiceprint sample; the first input unit is used for inputting the original voiceprint sample into the countermeasure sample generation network to obtain a voiceprint countermeasure sample; A second input unit, configured to input the voiceprint challenge sample into a voiceprint recognition network, to obtain a first recognition result vector, where the first recognition result vector includes a first probability that the voiceprint challenge sample is recognized as the target recognition object and a second probability that the voiceprint challenge sample is recognized as an object other than the target recognition object; The training unit is used for determining the attack success rate of the voiceprint countermeasure sample identified as the target identification object based on the first probability and the second probability, determining the disturbance concealment value of the voiceprint countermeasure sample based on the original voiceprint sample and the voiceprint countermeasure sample, determining the sample loss function of the countermeasure sample generation network based on the attack success rate and the disturbance concealment value, training the countermeasure sample generation network based on the sample loss function, and generating the countermeasure disturbance.
9. An electronic device comprising a memory and a processor, the memory storing a computer program, characterized in that the processor implements the method of counteracting disturbance generation according to any of claims 1 to 7 when executing the computer program.
10. A computer-readable storage medium storing a computer program, characterized in that the computer program, when executed by a processor, implements the method of generating an countermeasure disturbance according to any one of claims 1 to 7.
11. A computer program product comprising a computer program, which is read and executed by a processor of a computer device, causing the computer device to perform the method of counteracting disturbance generation according to any of claims 1 to 7.

Description

Method, device, equipment and medium for generating anti-disturbance aiming at voiceprint recognition Technical Field The present disclosure relates to the field of artificial intelligence, and in particular, to an anti-disturbance generation method, apparatus, device, and medium for voiceprint recognition. Background Voiceprint recognition technology is increasingly being applied to more and more scenes. With the improvement of the recognition accuracy of the voiceprint recognition system, the anti-sample generation technology is also developed. The countermeasure sample is a sample obtained by adding a countermeasure disturbance to the input sample and is used for attacking the voiceprint recognition system, so that the voiceprint recognition system obtains a wrong recognition result. The anti-interference capability of the voiceprint recognition system can be improved by using the voiceprint recognition system against sample attack. The higher the requirements for the anti-disturbance generation technique in order to obtain a better attack effect. In the prior art, two types of anti-disturbance generation technologies exist, one is to generate different anti-disturbance for different input samples. This approach is difficult to specify the target recognition object of the challenge sample (the erroneous object to which the challenge sample is specified to be recognized), lacks practicality, and requires regeneration of the challenge disturbance for each input sample, takes a long time, and consumes a large amount of computing resources. Another way is to generate a generic challenge disturbance that is valid for different input samples. This approach generates disturbances using the generated countermeasure network, mainly by training a large-scale generated countermeasure network for the countermeasure discrimination network. However, the training difficulty is high and the training time is long due to the large network scale. Disclosure of Invention The embodiment of the disclosure provides an anti-disturbance generation method, device, equipment and medium for voiceprint recognition, which can improve the efficiency of generating anti-disturbance and can also improve the concealment and dominance of the anti-disturbance. According to an aspect of the present disclosure, there is provided an anti-disturbance generating method including: acquiring a training voiceprint sample set, wherein the training voiceprint sample set comprises a plurality of original voiceprint samples; Initializing an challenge sample generation network, wherein the challenge sample generation network is a single-layer neural network, the number of neurons is equal to the preset number of frames of the challenge disturbance, and each neuron corresponds to one frame of the challenge disturbance; determining, for each of the original voiceprint samples in the training voiceprint sample set, a target recognition object that is different from a real recognition object of the original voiceprint sample; inputting the original voiceprint sample into the challenge sample generation network to obtain a voiceprint challenge sample; inputting the voiceprint countermeasure sample into a voiceprint recognition network to obtain a first recognition result vector, wherein the first recognition result vector comprises a first probability that the voiceprint countermeasure sample is recognized as the target recognition object and a second probability that the voiceprint countermeasure sample is recognized as other objects except the target recognition object; determining a sample loss function of the challenge sample generation network according to the original voiceprint sample, the voiceprint challenge sample, the first probability and the second probability, training the challenge sample generation network based on the sample loss function, and generating the challenge disturbance. According to an aspect of the present disclosure, there is provided an anti-disturbance generating device including: The first acquisition unit is used for acquiring a training voiceprint sample set, wherein the training voiceprint sample set comprises a plurality of original voiceprint samples; An initializing unit, configured to initialize an antagonistic sample generation network, where the antagonistic sample generation network is a single-layer neural network, and the number of neurons is equal to a preset number of frames of the antagonistic disturbance, and each neuron corresponds to one frame of the antagonistic disturbance; A first determining unit configured to determine, for each of the original voiceprint samples in the training voiceprint sample set, a target recognition object that is different from a true recognition object of the original voiceprint sample; the first input unit is used for inputting the original voiceprint sample into the countermeasure sample generation network to obtain a voiceprint countermeasure sample; A second input unit, conf