CN-121982764-A - Construction method of face recognition system with enhanced data privacy
Abstract
The invention provides a method for constructing a face recognition system with enhanced data privacy, which comprises the steps of constructing a training set; and training the initial face recognition system to be converged by utilizing the training set to obtain the face recognition system. Wherein the training set comprises a plurality of virtual face images generated based on a sequence of text attributes sampled from a constructed text attribute library and a hybrid virtual identity. According to the invention, the human face recognition system is constructed by adopting the virtual human face image training set with changeable style attributes, attribute distribution close to the real world and abundant change in identity class, so that the system has excellent human face recognition capability while avoiding privacy safety problems.
Inventors
- YANG JUNZHE
- HE MINGJIE
- SHAN SHIGUANG
Assignees
- 中国科学院计算技术研究所
Dates
- Publication Date
- 20260505
- Application Date
- 20260228
Claims (10)
- 1. A method for constructing a face recognition system with enhanced data privacy, the method comprising: s1, constructing a training set: S11, acquiring a plurality of real face images, extracting all attributes irrelevant to the identity from each real face image, mapping each attribute irrelevant to the identity into a text attribute, and grouping all the text attributes to obtain a text attribute library; s12, acquiring a plurality of virtual identities, and sampling a plurality of times from a text attribute library according to a preset sampling rule to acquire a plurality of text attribute sequences, wherein each sampling samples a plurality of text attributes belonging to different groups from the text attribute library to form a text attribute sequence; combining the plurality of virtual identities with the plurality of text attribute sequences to construct a plurality of groups of virtual face image generation conditions, wherein each two arbitrary different virtual identities are respectively combined with each text attribute sequence, wherein two different virtual identities are used as a target identity and a reference identity, and the target identity and the reference identity are respectively combined with the same text attribute sequence into two different identity-text attribute pairs to form a group of virtual face image generation conditions; s13, acquiring a pre-trained double-condition diffusion model, generating a plurality of virtual face images based on the constructed multiple groups of virtual face image generation conditions by adopting the pre-trained double-condition diffusion model, and marking the identity of the face in each generated virtual face image as a target identity to acquire a training set; S2, acquiring an initial face recognition system; S3, training the initial face recognition system to be converged by using the virtual face image as input and the identity of the virtual face image as output by using a training set to obtain the face recognition system.
- 2. The method for constructing a face recognition system with enhanced data privacy according to claim 1, wherein the preset sampling rule is: Wherein, the Representing one of the properties of the sample to be taken, Representative pair attribute Is used for the sampling probability of (a), Representing attributes when extracting all attributes independent of identity The probability of the occurrence of the presence of a defect, Representing an index controlling the degree of long tail lift, Represents any one of all possible attributes in a set of attributes, Representing the extraction of all identity-independent attributes Probability of occurrence.
- 3. The method for constructing a face recognition system with enhanced data privacy of claim 1, wherein the pre-trained bi-conditional diffusion model is obtained by: s131, constructing an initial double-condition diffusion model according to a framework of the latent space diffusion model, and configuring the initial double-condition diffusion model into a receiving identity-text attribute pair; s132, extracting identities corresponding to each real face image from the plurality of real face images obtained in the step S11, constructing a text attribute sequence corresponding to each real face image by utilizing all text attributes corresponding to each real face image, and combining the identities corresponding to each real face image and the text attribute sequence into an identity-text attribute pair corresponding to each real face image; S133, training the initial double-condition diffusion model based on a training target to be converged by taking the real face image and the corresponding identity-text attribute pair as input and the predicted face image as output, and then obtaining the pre-trained double-condition diffusion model.
- 4. A method of constructing a data privacy enhanced face recognition system according to claim 3, wherein the text attribute sequence includes a plurality of text attributes arranged in sequence, the arrangement order of the text attributes in the text attribute sequence is randomly disturbed, and a part of the text attributes are randomly discarded according to a set proportion, before the identity corresponding to each real face image and the text attribute sequence are combined into the identity-text attribute pair corresponding to each real face image.
- 5. A method of constructing a face recognition system with enhanced data privacy according to claim 3, wherein the training target in step S133 is represented as: Wherein, the Parameters representing the noise predictor in the initial bi-conditional diffusion model, Representing an image of a real human face, The representative time of day is indicated by the time, Representing the actual addition of noise to the system, Representing the prediction noise and, Representing the noise predictor in the initial bi-conditional diffusion model, Representative of The corresponding identity is embedded in the device, Representative of A sequence of corresponding text attributes is provided, Representative of Time of day Representation in a latent space, wherein Obtained by the following steps: Wherein the method comprises the steps of Representing the time to the moment in the initial bi-conditional diffusion model The cumulative product of the signal retention coefficients of (c), Representing the initial time The representation in the potential space is such that, Representing a gaussian distribution with a mean of 0 and a variance of 1, wherein, Coding by an initial bi-conditional diffusion model Obtained.
- 6. The method for constructing a face recognition system with enhanced data privacy according to claim 1, wherein the pre-trained bi-conditional diffusion model is adopted in step S13 to generate the virtual face image based on the virtual face image generation condition by: Generating a random noise; performing multiple rounds of iterative denoising on the generated random noise to generate a virtual face image, wherein each round of iterative denoising comprises: Obtaining first prediction noise and second prediction noise of the present round according to the virtual face image generation condition, the time step of the present round and the latent variable of the present round; The first prediction noise and the second prediction noise of the current round are weighted and mixed to obtain mixed prediction noise of the current round; And performing the iterative denoising of the round by using the mixed predictive noise of the round to obtain a latent variable of the next round, wherein the latent variable obtained by the iterative denoising of the last round is decoded to generate a virtual face image.
- 7. The method for constructing a face recognition system with enhanced data privacy according to claim 6, wherein the first prediction noise and the second prediction noise of the present round are weighted and mixed by: Wherein, the Representing the mixed prediction noise of the present round, Representing the coefficient of mixing and, Representing the first prediction noise of the present round, Representing the second prediction noise of the present round.
- 8. The method for constructing a data privacy enhanced face recognition system of claim 7, =0.8。
- 9. A computer program product comprising a computer program, characterized in that the computer program, when being executed by a processor, realizes the steps of the method according to one of claims 1 to 8.
- 10. A computer readable storage medium, having stored thereon a computer program, the computer program being executed by a processor to implement the steps of the method of one of claims 1-8.
Description
Construction method of face recognition system with enhanced data privacy Technical Field The invention relates to the field of computer vision, in particular to the field of face recognition, and more particularly relates to a method for constructing a face recognition system with enhanced data privacy. Background Face Recognition (FR) is one of the important research directions in the field of computer vision. The training of the existing high-performance face recognition system generally depends on a large-scale face image training set, and the recognition performance is improved through the larger training set scale. However, the existing training sets based on real face images may have privacy risks on one hand, and on the other hand, sample distribution in the training sets usually presents significant long-tail features, that is, few identity samples possess a large number of samples, and many identity samples are few, and the changes of gestures, illumination, expression, background and other wind patterns inside the same identity are limited, so that the face recognition system trained based on the training sets has the problem of insufficient generalization capability in complex real scenes. In order to alleviate the problems of privacy and sample distribution, one approach is to construct face images based on virtual identities on a large scale through an image generation model, expand the changes of wind patterns such as gestures, illumination, expressions, accessories and the like on the premise of guaranteeing identity consistency, and replace the real face images with synthesized virtual face images for training a face recognition system. A typical virtual face image generation scheme is based on face recognition features, namely, firstly, an identity feature (such as 512-dimensional embedded vector) is extracted by a pre-trained face recognition system, then the identity feature is used as a conditional input, and a batch of virtual face images capable of maintaining the identity feature are generated by generating an antagonism network or a diffusion model. On the basis, three-dimensional face shape parameters (3D Morphable Model,3DMM), face key points, face style diagrams and a small amount of semantic tags (such as age, gesture and the like) are added in some schemes to serve as style control signals, so that the gestures, expressions or local appearances of the faces except the identities of the people are controlled to a certain extent. However, a number of experiments have shown that even with the latest face generation framework, the overall recognition capacity of a face recognition system trained with a virtual face training set is significantly lower than that of a system trained with a real face training set, because the existing solution mainly has the following problems: the style data in the adopted virtual face training set is mostly style graphs or 3D parameters, or the style data directly depend on inherent gestures and expression distribution of the training set, so that the training set has poor style control capability, and when one style is required to be changed, the other style is required to be changed. The style data in the adopted virtual face training set presents the characteristic of high long tail distribution, and the training set has deviation from the style attribute distribution in the real world. What is seen in training the model for generating virtual face images is the local distribution of each identity in FR identity neighborhood space, while reasoning is forced to perform conditional Classifier-Free guide (CFG) on a single prototype vector to generate a virtual face image that fits exactly to the prototype vector. This results in an excessive concentration of images generated based on the same virtual identity in the feature space, and insufficient variation in part under the control of the identity condition, so that the training set composed of such virtual face images does not have sufficient intra-class diversity. In summary, the recognition capability of the existing face recognition system trained based on the virtual face image training set is insufficient, and the reason is that the training set adopted has a series of problems that the style of the virtual face image sample is insufficient, deviation exists between the style sample and the attribute distribution of the style in the real world, and the change in the virtual identity class is insufficient. It should be noted that, the present background art is only for describing the relevant information of the present invention to facilitate understanding of the technical solution of the present invention, but does not mean that the relevant information is necessarily prior art. Related information is filed and published with the inventive arrangements, and should not be considered prior art, in the absence of evidence that related information was published prior to the filing date of the pres