CN-115909460-B - Face recognition system countermeasure method based on face depth information

CN115909460BCN 115909460 BCN115909460 BCN 115909460BCN-115909460-B

Abstract

A face recognition system countermeasure method based on face depth information is characterized in that a depth estimation neural network based on a UNet codec structure is built and trained in an offline stage and used for generating a depth map of an image to be detected, a four-channel face recognition network is built and trained, and a recognition authentication result is obtained through the four-channel face recognition network after the depth map is generated on an input color image in an online stage. According to the method for introducing the face depth information into the face recognition system, the robustness of the face recognition system to the attack sample is improved to a great extent.

Inventors

GONG ZHIJUN
YI RAN
MA LIZHUANG

Assignees

上海交通大学

Dates

Publication Date: 20260512
Application Date: 20221130

Claims (7)

1. A face recognition system countermeasure method based on face depth information is characterized in that a depth estimation neural network based on a UNet codec structure is built and trained in an off-line stage and used for generating a depth map of an image to be detected; the off-line stage comprises the following steps: 1) Constructing a data set, and preparing human face real point cloud data Will be Projection onto X-Y plane to obtain color picture And depth map Preparing rgb face data set, and uniformly selecting face colour map according to face identity information Fitting a virtual face depth map by using a face 3D deformation statistical model (3 DMM) As a face color map A real tag in the depth estimation module; 2) Constructing and pre-training a depth estimation module of an UNet-based encoder-decoder structure, wherein an up-sampling layer of a decoder of the depth estimation module adopts a combination of nearest neighbor up-sampling and a convolution layer to replace a deconvolution layer, so that a checkerboard phenomenon which is very easy to occur in a face generation task is effectively avoided; 3) The four-channel face data set is constructed for training a four-channel face recognition network, namely corresponding face depth data is generated on a large-scale face rgb data set by utilizing the pre-trained depth estimation model in the step 2), then the rgb mode and the depth mode are spliced on a channel, after the four-channel face data set is obtained, the face recognition network consisting of an IR50 network serving as a main body part and a ARCFACE HEAD network serving as a head part is trained in an end-to-end training mode, and a loss function is trained by adopting a cross entropy loss function.
2. The face recognition system countermeasure method based on the face depth information according to claim 1, wherein the depth estimation neural network is based on a UNet structure and comprises a full convolution network, an encoder part and a decoder part, wherein the encoder part receives an input of a feature map with a conversion dimension of 32 dimensions through a receiving input module, and then carries out dimension ascending on a channel through four downsampling modules with downsampling coefficients of 2 to obtain a feature map with 512 dimensions; The up-sampling part is used for solving the checkerboard effect, adopts a mode of firstly carrying out bilinear interpolation and then carrying out convolution, converts the size of a feature map into twice of the original size through bilinear interpolation, and then converts the feature map by using convolution of 3x 3; After the encoding-decoding structure is adopted, a single-channel depth map with the same resolution as the input is obtained, the real depth map is used as a label to carry out constraint, and the depth estimation neural network is trained.
3. The face recognition system countermeasure method based on the face depth information of claim 1, wherein the four-channel face recognition network comprises a main body structure and a classification head structure, wherein the number of input channels of a convolution layer receiving input in the main body structure is adjusted to four channels, and the classification head structure adopts Arcface classification heads commonly used in face task to receive 512-dimensional vectors and output result vectors with the same dimension as the category.
4. The method for defending face recognition system according to claim 1, wherein the pre-training is to Inputting depth estimation network, outputting predicted depth map with same resolution ; The pre-trained loss function Wherein the first term loss term is And (3) with MSE loss of (2) The second term loss term is normal diagram And (3) with MSE loss between , And (3) with Is the weight of both losses.
5. The face depth information-based face recognition system countermeasure method according to any one of claims 1 to 3, wherein the online stage includes extracting features of a person to be recognized through a body part of a trained face recognition network, storing the extracted features in a feature library, and comparing the extracted features with features in a feature library to obtain a feature library with similarity greater than a threshold value And the face comparison is considered successful.
6. The method for countering and defending a face recognition system based on face depth information as recited in claim 5, wherein the extracting is to directly capture a color map of the face by using a system with a depth camera And depth map Or a color map of the face to be captured Inputting the trained depth estimation module to obtain a predicted depth map And then splicing the color map and the depth map to obtain four-way input.
7. A system for realizing the face depth information-based face recognition system countermeasure method of any one of claims 1 to 6 is characterized by comprising a 3DMM virtual depth generation unit, a face monocular depth estimation unit and a main face recognition unit, wherein the 3DMM unit performs unsupervised fitting according to face color data to obtain fitted face depth data for pre-training of a depth estimation module, the face depth estimation unit generates a face depth map for subsequent recognition through network prediction according to face color map information, and the face recognition unit performs feature extraction by taking a four-channel face map obtained by splicing the color map and the depth map as input to obtain final face features.

Description

Face recognition system countermeasure method based on face depth information Technical Field The invention relates to a technology in the application field of a neural network, in particular to a face recognition system countermeasure method based on face depth information. Background The attack resistance against the face means that after the face image is tampered by the pixel level, the recognition network can misrecognize the face as other people, and the tampering can be realized physically by means of making up, wearing special ornaments and the like, so that great potential safety hazards are generated. At present, the method for performing countermeasure defense on the face is less, most of the work for testing the defense effect on the face is a defense method for a general task, and the most common means is training in a countermeasure training mode. The challenge training utilizes the current model to generate challenge-resisting samples, and then the samples are added into the training, so that the robustness of the model to the challenge can be greatly improved. Essentially, the challenge training is to make the model "see" the attack sample, thereby improving the accuracy of the recognition of this portion of the sample by the model. However, the countermeasure training introduces a great training overhead, which is very unfriendly for large model iteration of the face recognition system. For attack resistance, there is a technical route that a resistance sample is detected before inputting into a network and screened out, or an attack resistance image is converted into a normal image through the network and then is input into the network, and the technical means is obviously unsuitable in a face recognition system. Regardless of whether the system determines the input image is a challenge sample, the face recognition system should output a final comparison result, which is consistent with the application scenario. Disclosure of Invention Aiming at the defects in the prior art, the invention provides a face recognition system countermeasure method based on face depth information for solving the problem of potential countermeasure attack of a network, and the robustness of the face recognition system to an attack sample is improved to a great extent by introducing the face depth information method into the face recognition system. The invention is realized by the following technical scheme: The invention relates to a face recognition system countermeasure method based on face depth information, which is used for generating a depth map of an image to be detected by constructing and training a depth estimation neural network of a UNet-based codec structure in an off-line stage, and obtaining a recognition authentication result through the four-channel face recognition network after generating the depth map of an input color image in an on-line stage by constructing and training the four-channel face recognition network. The depth estimation neural network is based on a UNet structure, and the specific network structure is that the network is based on a traditional UNet structure and is a classical full convolution network and is divided into an encoder part and a decoder part. The encoder part is the same as the classical structure, after the input of the feature map with the conversion dimension 32 is carried out by a receiving input module, the dimension is increased on a channel by four downsampling modules with the downsampling coefficient of 2 to obtain the feature map with 512 dimensions, and the decoder part is generally composed of an upsampling module with the upsampling coefficient of 2, which is different from the traditional upsampling module in the scheme, wherein the upsampling part solves the checkerboard effect, does not use the common deconvolution structure, adopts a mode of firstly carrying out bilinear interpolation and then carrying out convolution, converts the size of the feature map into twice of the original size by bilinear interpolation, and then utilizes the convolution conversion feature map with the size of 3x 3. After the encoding-decoding structure is adopted, a single-channel depth map with the same resolution as the input is obtained, the real depth map is used as a label to carry out constraint, and the depth estimation neural network is trained. The four-channel face recognition network comprises a main structure and a classification head structure, wherein the network structure of the main structure can adopt any common face recognition network, in the scheme, an IR50 network proposed in Arcface paper is adopted, and 512-dimensional feature vectors are extracted from an input face image. The network is modified by adjusting the number of input channels of the convolution layer receiving the input into four channels, and the classification head structure also adopts Arcface classification heads commonly used in face tasks to replace softmax, so that the charac