CN-115565217-B - Face camouflage method and system based on expression action migration

CN115565217BCN 115565217 BCN115565217 BCN 115565217BCN-115565217-B

Abstract

The invention provides a face camouflage method and a face camouflage system based on expression and action migration, which are characterized in that a driving video made by a third person according to a random instruction sent by a face recognition system is extracted, a difference value is obtained between a key point of each frame of image in the driving video and a key point of a target face, and an action video of the target face is generated, so that actions of the third person in the driving video are migrated to the target face, namely, the appearance of the target face is reserved, and the action video of the target face is obtained by adding the motion characteristics of the face of the third person in the driving video, so that face recognition is completed.

Inventors

OUYANG PAN
HUANG GUANGQI
QIN BINGQING

Assignees

航天科工深圳（集团）有限公司

Dates

Publication Date: 20260508
Application Date: 20220815

Claims (6)

1. The face camouflage method based on the expression action migration is characterized by comprising the following steps of: Step 1, acquiring a target face image, and displaying the target face image on a display end for detection by a face recognition system; step 2, a third party makes corresponding expression actions according to random instructions sent by a face recognition system, and driving videos of the expression actions made by the third party are collected; extracting key point information of a target face and key point information of a face of a third person in each frame of image in the driving video in sequence; the key point information extraction method is to extract the key point extraction network model; The training method of the key point extraction network model comprises the following steps: Step 3.1, extracting key point information of two different frames in the action video by using the same face action video, and setting the key point information extracted by the previous frame X as H and the key point information extracted by the next frame X ' as H ', wherein the sparse motion of the key points of the previous and the next frames is expressed as H-A, and H-A=H-H '; inputting a dense motion representation model according to a previous frame X and a dense motion representation H [ lambda ] of a key point between the previous frame X and the next frame, carrying out dense motion representation of the key point to obtain a feature vector of the dense motion representation of the key point, converting the sparse motion representation into the dense motion representation, and completing the dense motion representation by using a dense motion conversion network M, wherein the dense motion conversion network M converts the input X and H [ lambda ] into a R H×W×2 format through a Unet network, R represents a vector space, one vector space of H multiplied by W multiplied by 2, H and W are the width and height of the feature map, and 2 represents the movement value of X and Y directions; Training the key point extraction network model by using a condition-based GAN, inputting a previous frame X into an encoder in a condition-based GAN generator to obtain a characteristic vector of the previous frame X, splicing the characteristic vector of the previous frame X and the characteristic vector expressed by dense motion of the key points in the channel direction of each key point, and decoding the spliced characteristic vector by using a decoder to generate an image X-A; Inputting the (X-H ') and the true label (X ', H ') into a condition-based GAN discriminator for discrimination, wherein the discriminator is required to discriminate the (X-H ') as true as possible when training the generator, and the discriminator is required to discriminate the (X-H ') as false as possible when training the discriminator; Step 3.5, returning to the step 3.1 until the key point extraction network model is trained; Step 4, sequentially calculating the difference value between the key point information of the third person face and the key point information of the target person face in all frames of the driving video, and sequentially calculating one frame of frame image of the target person face according to the difference value to form an action video of the target person face according to the instruction; And 5, displaying the target action video on a display end for detection by a face recognition system.
2. The face camouflage method according to claim 1, wherein the method for obtaining a frame of image of the target face according to the difference value in the step 4 is that the key point information of the target face and the difference value are spliced on the key point channel to obtain spliced key point information, and decoding is performed according to the spliced key point information to obtain the image.
3. A face camouflage method according to claim 1 or 2, wherein the environment flash change before the face recognition system is collected, and the target face is subjected to flash processing in real time according to the environment flash change.
4. A face camouflage method according to claim 3, wherein the target face is image enhanced using a contrast enhancement method.
5. The face camouflage method of claim 4, wherein the contrast enhancement method is histogram equalization.
6. The face camouflage system based on the expression action migration is characterized by comprising the following modules: The target face acquisition module is used for acquiring a target face image and displaying the target face image on a display end for detection by a face recognition system; The driving video acquisition module is used for enabling a third party to make corresponding expression actions according to random instructions sent by the face recognition system and acquiring driving videos of the expression actions made by the third party; the key point information extraction module is used for extracting key point information of a target face and key point information of a third face in each frame of image in the driving video in sequence; the key point information extraction method is to extract the key point extraction network model; The training method of the key point extraction network model comprises the following steps: 1) Extracting key point information of two different frames in the action video by using the same face action video, and setting the key point information extracted by the previous frame X as H and the key point information extracted by the next frame X ' as H ', wherein the sparse motion of the key points of the previous and the next frames is expressed as H-A, H-A=H-H '; 2) Inputting dense motion representation H-pattern according to the previous frame X and the dense motion representation H-pattern between the previous and subsequent frames to obtain the feature vector of the dense motion representation of the key point, converting the sparse motion representation into the dense motion representation by using a dense motion conversion network M, converting the input X and H-pattern into R H×W×2 format by using the dense motion conversion network M through Unet network, wherein R represents the vector space, one vector space of H X W X2, H and W are the width and height of the feature map, and 2 represents the moving value of X and Y directions; 3) Training the key point extraction network model by using a condition-based GAN, inputting a previous frame X into an encoder in a condition-based GAN generator to obtain a characteristic vector of the previous frame X, splicing the characteristic vector of the previous frame X and the characteristic vector of the key point dense motion representation in the channel direction of each key point, and decoding the spliced characteristic vector by using a decoder to generate an image X-A; 4) Inputting the (X-H ') and the true label (X ', H ') into a condition-based GAN discriminator for discrimination, wherein the discriminator is required to discriminate the (X-H ') as true as possible when training the generator, and the discriminator is required to discriminate the (X-H ') as false as possible when training the discriminator; 5) Returning to 1) until the key point extraction network model is trained; The target action video making module is used for sequentially calculating the difference value between the key point information of the face of the third person in all frames of the driving video and the key point information of the target face, and sequentially calculating one frame of image of the target face according to the difference value to form an action video of the target face according to the instruction; and the identification module is used for displaying the target action video on a display end for detection by the face recognition system.

Description

Face camouflage method and system based on expression action migration Technical Field The invention belongs to the technical field of face recognition, and particularly relates to a face camouflage method and system based on expression and action migration. Background The face recognition technology is widely applied to daily life of people, such as mobile phone unlocking, account verification, access control system, financial payment, public security tracking escape and the like, and provides important support for construction of smart cities and peace cities. However, many potential safety hazards still exist in the existing face recognition system. The face camouflage technology seriously threatens the safety and the credibility of the face recognition technology, not only causes huge potential safety hazards to the property and the privacy of the user, but also brings great challenges to public safety management, and researches how to use a third-party face for camouflage so as to better stop the safety of face recognition. Disclosure of Invention The invention provides a face camouflage method and a face camouflage system based on expression movement migration, which aim to solve the technical problem of how to camouflage the face of a target by using a third-party face expression within the range allowed by laws and regulations. In order to solve the technical problems, the invention adopts the following technical scheme: a face camouflage method based on expression action migration comprises the following steps: Step 1, acquiring a target face image, and displaying the target face image on a display end for detection by a face recognition system; step 2, a third party makes corresponding expression actions according to random instructions sent by a face recognition system, and driving videos of the expression actions made by the third party are collected; extracting key point information of a target face and key point information of a face of a third person in each frame of image in the driving video in sequence; Step 4, sequentially calculating the difference value between the key point information of the third person face and the key point information of the target person face in all frames of the driving video, and sequentially calculating one frame of frame image of the target person face according to the difference value to form an action video of the target person face according to the instruction; And 5, displaying the target action video on a display end for detection by a face recognition system. Further, the method for extracting the key point information in the step 3 is to use a key point extraction network model for extraction. Further, the training method of the key point extraction network model comprises the following steps: Step 3.1, extracting key point information of two different frames in the action video by using the same face action video, and setting the key point information extracted by the previous frame X as H and the key point information extracted by the next frame X ' as H ', wherein the sparse motion of the key points of the previous and the next frames is expressed as H-A, and H-A=H-H '; Step 3.2, performing the dense motion representation of the key points according to the sparse motion representation H-input dense motion representation model of the key points between the previous frame X and the previous frame and the next frame to obtain the feature vectors of the dense motion representation of the key points; Training the key point extraction network model by using a condition-based GAN, inputting a previous frame X into an encoder in a condition-based GAN generator to obtain a characteristic vector of the previous frame X, splicing the characteristic vector of the previous frame X and the characteristic vector expressed by dense motion of the key points in the channel direction of each key point, and decoding the spliced characteristic vector by using a decoder to generate an image X-A; Inputting the (X-H ') and the true label (X ', H ') into a condition-based GAN discriminator for discrimination, wherein the discriminator is required to discriminate the (X-H ') as true as possible when training the generator, and the discriminator is required to discriminate the (X-H ') as false as possible when training the discriminator; And 3.5, returning to the step 3.1 until the key point extraction network model is trained. Further, in the step 4, a frame of image of the target face is obtained according to the difference value calculation, which is to splice the key point information of the target face and the difference value on the key point channel to obtain spliced key point information, and decode according to the spliced key point information to obtain the image. Further, the environment flashing change before the face recognition system is collected, and the target face is subjected to flashing processing in real time according to the environment flashing change.