CN-116259094-B - Face living body detection model generation method and device and electronic equipment

CN116259094BCN 116259094 BCN116259094 BCN 116259094BCN-116259094-B

Abstract

The invention discloses a face living body detection model generation scheme which comprises the steps of inputting a training sample into a pre-created twin neural network, carrying out augmentation on the training sample through a first branch and a second branch of the twin neural network to generate an augmented face image, respectively inputting the augmented face image into the two branches to obtain a first face feature, a second face feature, a first depth map and a second depth map, calculating coding level dense consistency loss according to the first face feature and the second face feature, calculating prediction level consistency loss according to the first depth map and the second depth map, determining total loss of the twin neural network according to the coding level dense consistency loss and the prediction level consistency loss, and optimizing twin neural network parameters according to the total loss by adopting a random gradient descent method to generate a face living body detection model. The face living body detection model generated by the scheme provided by the invention has the advantage of high accuracy Gao Jupu adaptability of the face recognition result.

Inventors

LI HONG
XING JIANFEI
QU SIYU
YAN CAIPING

Assignees

杭州启源视觉科技有限公司

Dates

Publication Date: 20260508
Application Date: 20230320

Claims (9)

1. The method for generating the human face living body detection model is characterized by comprising the following steps of: Inputting a training sample into a pre-created twin neural network, wherein the twin neural network comprises a first branch and a second branch, the first branch comprises a first feature encoder, a feature converter and a first classifier, and the second branch comprises a second feature encoder and a second classifier; The training samples are amplified through the first branch and the second branch, and an amplified face image is generated; Inputting the amplified face image into the first branch and the second branch respectively to obtain a first face feature, a second face feature, a first depth map and a second depth map; calculating the dense consistency loss of the coding level of the twin neural network according to the first face characteristics and the second face characteristics; calculating the prediction level consistency loss of the twin neural network according to the first depth map and the second depth map; determining a total loss of the twin neural network from the coding level dense consistency loss and the prediction level consistency loss; optimizing the twin neural network parameters according to the total loss by adopting a random gradient descent method to generate a human face living body detection model; Wherein the coding level dense consistency penalty is: ; Wherein, the Representing the output of the 1 st view characteristic encoder, Representing the output of the feature encoder at view 2, Representing the output of the feature transformer at view 1, In order to stop the gradient update operation, wherein, ; Wherein, the Representative of Is the first of (2) The number of rows of the device is, Is at The center of all normalized row vectors in (1), wherein, ; The prediction level consistency loss calculation formula is as follows: ; Wherein, the And (3) with Respectively outputting depth maps of the first classifier and the second classifier; the first feature encoder and the second feature encoder have the same structure and comprise: The system comprises an input layer, a first convolution module, a second convolution module, a third convolution module, a first fraud attention module, a second fraud attention module, a third fraud attention module, a first maximum pooling layer, a second maximum pooling layer and a third maximum pooling layer; The output of the input layer is respectively input into the first convolution module and the first fraud attention module, the output of the first convolution module and the output of the first fraud attention module are added and then input into the first maximum pooling layer to generate low-level features, the low-level features are input into the second convolution module and the second fraud attention module, the output of the second convolution module and the output of the second fraud attention module are added and then input into the second maximum pooling layer to generate medium-level features, the medium-level features are input into the third convolution module and the third fraud attention module, and the output of the third convolution module and the output of the third fraud attention module are input into the third maximum pooling layer to obtain high-level features.
2. The method of claim 1, wherein the step of inputting the augmented face image into the first and second branches, respectively, to obtain a first face feature, a second face feature, a first depth map, and a second depth map, comprises: Inputting the amplified face image into the first feature encoder and the second feature encoder respectively to generate a first face feature and a second face feature; And inputting the first face features into the first classifier to obtain a first depth map, and inputting the second face features into the second classifier to obtain a second depth map.
3. The method of claim 2, wherein the step of generating an augmented face image by augmenting the training samples with the first and second branches comprises: Inputting a training sample into a first branch, wherein the first branch amplifies the training sample according to a first visual angle to obtain an amplified first face image; and inputting the training sample into a second branch, and amplifying the training sample by the second branch according to a second visual angle to obtain an amplified second face image.
4. A method according to claim 3, wherein the step of inputting the augmented face image into the first and second feature encoders, respectively, to generate first and second face features comprises: Inputting the amplified first face image into the first feature encoder to obtain face features of low, medium and high levels, splicing the face features of the low, medium and high levels, and inputting the face features of the low, medium and high levels into the feature converter to generate first face features; inputting the amplified second face image into the second feature encoder to obtain face features of low, medium and high levels, and splicing the face features of the low, medium and high levels to generate the second face features.
5. The method of claim 1, wherein after the step of generating a face living detection model, the method further comprises: Inputting the face image to be identified into a second feature encoder of the face living body detection model to obtain face features of low, medium and high levels; inputting the third face features into the second classifier to obtain a third depth map; And judging whether the face image to be recognized is a real face image or not according to the third depth image and a preset threshold value.
6. The method of claim 1, wherein after the step of generating a face living detection model, the method further comprises: respectively inputting a face image to be recognized into a first branch and a second branch of the face living body detection model; And judging whether the face image to be recognized is a real face image or not based on the prediction result obtained by the first branch and the prediction result obtained by the second branch.
7. A face living body detection model generating apparatus, characterized by comprising: The system comprises a first input module, a second input module, a first classification module, a second classification module and a first analysis module, wherein the first input module is used for inputting a training sample into a pre-created twin neural network, the twin neural network comprises a first branch and a second branch, and the first branch comprises a first feature encoder, a feature converter and a first classifier; the augmentation module is used for augmenting the training sample through the first branch and the second branch to generate an augmented face image; The second input module is used for inputting the amplified face image into the first branch and the second branch respectively to obtain a first face feature, a second face feature, a first depth map and a second depth map; The first calculation module is used for calculating the dense consistency loss of the coding level of the twin neural network according to the first face characteristics and the second face characteristics; the second calculation module is used for calculating the prediction level consistency loss of the twin neural network according to the first depth map and the second depth map; a total loss determination module configured to determine a total loss of the twin neural network according to the coding level dense consistency loss and the prediction level consistency loss; The parameter adjusting module is used for optimizing the twin neural network parameters according to the total loss by adopting a random gradient descent method to generate a human face living body detection model; Wherein the coding level dense consistency penalty is: ; Wherein, the Representing the output of the 1 st view characteristic encoder, Representing the output of the feature encoder at view 2, Representing the output of the feature converter at view 1; in order to stop the gradient update operation, wherein, ; Wherein, the Representative of Is the first of (2) The number of rows of the device is, Is at The center of all normalized row vectors in (1), wherein, ; The prediction level consistency loss is: ; Wherein, the And (3) with Respectively outputting depth maps of the first classifier and the second classifier; the first feature encoder and the second feature encoder have the same structure and comprise: The system comprises an input layer, a first convolution module, a second convolution module, a third convolution module, a first fraud attention module, a second fraud attention module, a third fraud attention module, a first maximum pooling layer, a second maximum pooling layer and a third maximum pooling layer; The output of the input layer is respectively input into the first convolution module and the first fraud attention module, the output of the first convolution module and the output of the first fraud attention module are added and then input into the first maximum pooling layer to generate low-level features, the low-level features are input into the second convolution module and the second fraud attention module, the output of the second convolution module and the output of the second fraud attention module are added and then input into the second maximum pooling layer to generate medium-level features, the medium-level features are input into the third convolution module and the third fraud attention module, and the output of the third convolution module and the output of the third fraud attention module are input into the third maximum pooling layer to obtain high-level features.
8. The apparatus of claim 7, wherein the second input module comprises: The first submodule is used for inputting the amplified face image into the first feature encoder and the second feature encoder respectively to generate a first face feature and a second face feature; and the second sub-module is used for inputting the first face features into the first classifier to obtain a first depth map, and inputting the second face features into the second classifier to obtain a second depth map.
9. An electronic device comprising a processor, a memory and a program or instruction stored on the memory and executable on the processor, which when executed by the processor, implements the steps of a method of generating a face living detection model according to any one of claims 1 to 6.

Description

Face living body detection model generation method and device and electronic equipment Technical Field The invention relates to the technical field of face recognition, in particular to a face living body detection model generation method and device and electronic equipment. Background With the progress of computer vision technology, face recognition has become a major method of biometric authentication, and face recognition systems have been successfully applied to various scenes, such as access control systems, electronic payment, etc., however, the security of the recognition systems is always a hotspot problem of concern in the industry, and attackers of the recognition systems can present deceptive faces, such as printing attacks, video replay attacks, 3D mask attacks, etc., to the systems to obtain legal rights by misleading the systems by impersonating users. Therefore, the human face living body detection system is an important component of the human face detection system, and the common human face living body detection method comprises the following steps: Conventionally, human face features such as local binary patterns, scale invariant feature transforms, gradient histograms, acceleration robust features, gaussian function differences are used to capture the rogue features in the face image, and dynamic cues such as dynamic features, eye blinking, and small movements are used to extract the rogue features in the video. And (3) classifying the features by using a support vector machine to detect whether the input face image is a real face. The face living body detection method based on deep learning is characterized in that the face living body detection is directly subjected to two-class classification by using cross entropy in the initial stage, but the two-class classification task is easy to lose some important information or capture wrong classification characteristics in the task, and then a depth map deep supervision network appears, so that more fine-granularity information can be captured in a full-pixel supervision mode. According to depth map supervision, technicians successively develop operators or auxiliary remote photoelectric volume pulse waves suitable for depth map supervision to improve the accuracy of human face living body detection. However, when the face living body detection model is applied to data of different domains or an unseen attack means, serious performance degradation occurs, and accuracy of a recognition result is poor. It can be seen that the existing human face living body detection model has poor universality. Disclosure of Invention The embodiment of the invention aims to provide a method and a device for generating a human face living body detection model and electronic equipment, which can solve the problem of poor universality of the existing human face living body detection model. In order to solve the technical problems, the invention provides the following technical scheme: The embodiment of the invention provides a human face living body detection model generation method, which comprises the steps of inputting a training sample into a pre-created twin neural network, wherein the twin neural network comprises a first branch and a second branch, the first branch comprises a first feature encoder, a feature converter and a first classifier, and the second branch comprises a second feature encoder and a second classifier; The training samples are amplified through the first branch and the second branch, and an amplified face image is generated; Inputting the amplified face image into the first branch and the second branch respectively to obtain a first face feature, a second face feature, a first depth map and a second depth map; calculating the dense consistency loss of the coding level of the twin neural network according to the first face characteristics and the second face characteristics; calculating the prediction level consistency loss of the twin neural network according to the first depth map and the second depth map; determining a total loss of the twin neural network from the coding level dense consistency loss and the prediction level consistency loss; and optimizing the twin neural network parameters according to the total loss by adopting a random gradient descent method, and generating a human face living body detection model. Optionally, the step of inputting the amplified face image into the first branch and the second branch to obtain a first face feature, a second face feature, a first depth map and a second depth map, includes: Inputting the amplified face image into the first feature encoder and the second feature encoder respectively to generate a first face feature and a second face feature; And inputting the first face features into the first classifier to obtain a first depth map, and inputting the second face features into the second classifier to obtain a second depth map. Optionally, the training samples are augmented by the fir