CN-122024292-A - Image processing method, device, equipment, medium and program product

CN122024292ACN 122024292 ACN122024292 ACN 122024292ACN-122024292-A

Abstract

The embodiment of the application discloses an image processing method, an image processing device, image processing equipment, an image processing medium and a program product. The method comprises the steps of obtaining a face image, carrying out feature extraction processing on the face image to obtain a face feature vector, obtaining a noise image which is randomly generated, and carrying out multidirectional denoising processing on the noise image based on the face feature vector to generate an occlusion image. By adopting the embodiment of the application, the sensitive target face part in the face can be subjected to targeted shielding, so that the face privacy protection in the image is realized, and the face desensitization efficiency is improved.

Inventors

YANG YIFAN

Assignees

腾讯科技（深圳）有限公司

Dates

Publication Date: 20260512
Application Date: 20241112

Claims (17)

1. An image processing method, comprising: acquiring a face image, and carrying out feature extraction processing on the face image to obtain a face feature vector, wherein the face feature vector is used for indicating feature information of a face in the face image; Acquiring a noise image, the noise image being randomly generated; And carrying out multidirectional denoising processing on the noise image based on the face feature vector to generate an occlusion image, wherein the occlusion image comprises the face with the target face part occluded, and the occluded face retains the face appearance attribute of the original face.
2. The method of claim 1, wherein the noise image is a diffusion state at a time step t, t is an integer greater than 1, the diffusion state at the time step t is obtained by adding Gaussian noise to the diffusion state at the time step t-1, any time step is denoted as j, j is an integer belonging to [0, t ], and the performing multidirectional denoising processing on the noise image based on the face feature vector to generate an occlusion image comprises: Performing multidirectional noise prediction processing on the diffusion state under the time step t based on the feature information of the face indicated by the face feature vector to obtain target noise information corresponding to the time step t under each direction; according to the target noise information corresponding to the time step t in each direction, carrying out characteristic recovery processing on the diffusion state under the time step t to obtain the diffusion state under the time step t-1; And repeatedly executing the steps on the diffusion state under the time step t-1 until the time step j=0 to obtain the diffusion state under the time step 0, wherein the diffusion state under the time step 0 is used as the shielding image.
3. The method of claim 2, wherein the multiple directions include a conditional guidance direction, an unconditional guidance direction, and a classification guidance direction, wherein the performing the multiple direction noise prediction process on the diffusion state at the time step t based on the feature information of the face indicated by the face feature vector to obtain the target noise information corresponding to the time step t at each direction includes: based on the characteristic information of the face indicated by the face characteristic vector, carrying out condition guiding processing on the diffusion state under the time step t to obtain first noise information corresponding to the time step t under the condition guiding direction, wherein the first noise information indicates that the characteristic information of the face in the to-be-generated shielding image is matched with the characteristic information of the face indicated by the face characteristic vector, and a target face part in the face is shielded; performing unconditional guiding treatment on the diffusion state under the time step t to obtain second noise information corresponding to the time step t in the unconditional guiding direction, wherein the second noise information indicates that a target human face part of a human face in a to-be-generated shielding image is shielded; And carrying out classification guiding processing on the diffusion state under the time step t to obtain third noise information corresponding to the time step t in the classification guiding direction, wherein the third noise information indicates that the face part of the target person of the face in the to-be-generated shielding image is shielded.
4. A method according to claim 3, wherein the performing, based on the feature information of the face indicated by the face feature vector, a conditional guidance process on the diffusion state in the time step t to obtain first noise information corresponding to the time step t in the conditional guidance direction includes: carrying out feature fusion on the face feature vector and the diffusion state under the time step t to obtain a fusion feature vector; Based on the fusion feature vector and the face feature vector, carrying out conditional guidance noise prediction on the diffusion state under the time step t to obtain first prediction noise corresponding to the time step t under the conditional guidance direction; And calculating first noise information corresponding to the time step t in the conditional guiding direction according to the diffusion state under the time step t and the first prediction noise corresponding to the time step t.
5. The method of claim 3, wherein the unconditionally guiding the diffusion state in the time step t to obtain the second noise information corresponding to the time step t in the unconditionally guiding direction includes: Unconditional guiding noise prediction is carried out on the diffusion state under the time step t, and second prediction noise corresponding to the time step t under the unconditional guiding direction is obtained; and obtaining second noise information corresponding to the time step t in the unconditional guiding direction according to the diffusion state under the time step t and the second prediction noise corresponding to the time step t.
6. The method of claim 3, wherein said classifying and guiding the diffusion state at the time step t to obtain third noise information corresponding to the time step t in the classifying and guiding direction comprises: the diffusion state under the time step t is subjected to classification recognition processing to obtain a classification result, wherein the classification result is used for indicating the probability that a target human face part in a human face included in the diffusion state under the time step t is blocked; carrying out logarithmic operation on the classification result to obtain a logarithmic result; and carrying out gradient operation on the logarithmic result to obtain third noise information corresponding to the time step t in the classification guiding direction.
7. The method of claim 2, wherein the noise image conforms to a Gaussian distribution, the target noise information comprises first noise information, second noise information and third noise information, and the performing feature recovery processing on the diffusion state at the time step t according to the target noise information corresponding to the time step t at each direction to obtain the diffusion state at the time step t-1 comprises: Acquiring first weight information of the first noise information, second weight information of the second noise information and third weight information of the third noise information; Performing weight calculation on the first noise information, the second noise information and the third noise information by adopting the first weight information, the second weight information and the third weight information to obtain a Gaussian function mean value conforming to Gaussian distribution; and calculating the diffusion state at the time step t-1 based on the Gaussian function mean value.
8. The method according to any one of claims 1-7, wherein the acquiring a face image comprises: Acquiring an image to be processed, and performing face recognition processing on the image to be processed to obtain a face detection area containing the face in the image to be processed, wherein the image to be processed comprises at least one of a vehicle-mounted image and a training return image; Cutting the face detection area from the image to be processed to obtain a face image; the method further comprises the steps of: And carrying out fusion processing on the shielding image and the image to be processed to obtain a target image, wherein the face part of the target person in the face in the target image is shielded.
9. The method of claim 1, wherein the method is performed by a target diffusion model, the target diffusion model comprising a target classification sub-model and a target denoising sub-model; The target classification sub-model is used for carrying out classification recognition processing on the diffusion state in the time step j to obtain a classification result; The target denoising sub-model is used for predicting unconditional guiding noise of the diffusion state in the time step j to obtain second prediction noise corresponding to the time step j in the unconditional guiding direction; The target denoising sub-model is further used for carrying out conditional guidance noise prediction on the diffusion state under the time step j based on the fusion feature vector and the face feature vector to obtain first prediction noise corresponding to the time step j under the conditional guidance direction, wherein the fusion feature vector is obtained by carrying out feature fusion on the face feature vector and the noise image.
10. The method of claim 9, wherein the training process of the target diffusion model comprises: The method comprises the steps of obtaining a sample data set, wherein the sample data set comprises a first data subset and a second data subset, the first data subset comprises sample shielding images corresponding to each sample face in a plurality of sample faces, target face positions of the corresponding sample faces in the sample shielding images are shielded, the second data subset comprises sample non-shielding images corresponding to each sample face in the sample faces, the target face positions of the corresponding sample faces in the sample non-shielding images are not shielded, and other image areas except the target face positions in the sample shielding images and the sample non-shielding images corresponding to the same sample face are matched; Model optimization is carried out on the initial classification sub-model by adopting the first data subset and the second data subset, and the target classification sub-model is obtained; and carrying out model optimization on the initial denoising sub-model by adopting the first data subset and the second data subset to obtain the target denoising sub-model.
11. The method of claim 10, wherein the model optimizing the initial classification sub-model using the first subset of data and the second subset of data to obtain the target classification sub-model comprises: obtaining a target sample image from the sample dataset, the target sample image belonging to the first subset of data or the second subset of data; Performing diffusion treatment on the target sample image to obtain a diffusion state of the target sample image in a time step t; Performing face recognition processing on the diffusion state of the target sample image under the time step t by using the initial classification sub-model to obtain a prediction recognition result, wherein the prediction recognition result is used for indicating the probability that the face part of the target person in the target sample face in the target sample image is blocked; Acquiring tag information of the target sample image, wherein the tag information indicates that the target sample image belongs to the first data subset or the tag information indicates that the target sample image belongs to the second data subset; And optimizing the initial classification sub-model according to the direction of reducing the gap between the prediction recognition result and the label information of the target sample image to obtain the target classification sub-model.
12. The method of claim 10, wherein the model optimization for the initial denoising sub-model has a number of training iterations G, G being an integer greater than 1, wherein H training iteration processes in G training iterations use sample non-occlusion images in the second data subset, G-H training iteration processes in G training iterations do not use sample non-occlusion images in the second data subset, H being an integer greater than zero and less than G; Any one training iteration process of the H training iteration processes comprises the following steps: Obtaining a target sample occlusion image corresponding to a target sample face from the first data subset, and obtaining a target sample non-occlusion image corresponding to the target sample face from the second data subset; performing diffusion treatment on the target sample shielding image to obtain Gaussian noise corresponding to the target sample shielding image at a time step t and a diffusion state of the target sample shielding image at the time step t; Carrying out feature fusion on a diffusion state of the target sample shielding image under a time step t and a sample feature vector corresponding to the target sample non-shielding image, and guiding the sample fusion feature vector to carry out feature recovery processing by using the sample feature vector corresponding to the target sample non-shielding image in the process of carrying out noise prediction on the sample fusion feature vector after the feature fusion by using the initial denoising sub-model so as to obtain sample prediction noise corresponding to the target sample shielding image under the time step t; And performing model optimization on the initial denoising sub-model according to the direction of reducing the difference between Gaussian noise corresponding to the target sample shielding image at the time step t and sample prediction noise corresponding to the target sample shielding image at the time step t.
13. The method of claim 12, wherein any one of the G-H training iteration processes comprises: Obtaining a target sample shielding image corresponding to a target sample face from the first data subset; performing diffusion treatment on the target sample shielding image to obtain Gaussian noise corresponding to the target sample shielding image at a time step t and a diffusion state of the target sample shielding image at the time step t; Carrying out noise prediction on the diffusion state of the target sample shielding image under the time step t by using the initial denoising sub-model to obtain sample prediction noise corresponding to the target sample shielding image under the time step t; And performing model optimization on the initial denoising sub-model according to the direction of reducing the difference between Gaussian noise corresponding to the target sample shielding image at the time step t and sample prediction noise corresponding to the target sample shielding image at the time step t.
14. An image processing apparatus, comprising: The device comprises an acquisition unit, a characteristic extraction unit and a characteristic extraction unit, wherein the acquisition unit is used for acquiring a face image and carrying out characteristic extraction processing on the face image to obtain a face characteristic vector, and the face characteristic vector is used for indicating characteristic information of a face in the face image; The acquisition unit is further used for acquiring a noise image, and the noise image is randomly generated; the processing unit is used for carrying out multidirectional denoising processing on the noise image based on the face feature vector to generate an occlusion image, wherein the occlusion image comprises the face with the target face part occluded, and the occluded face retains the face appearance attribute of the original face.
15. A computer device, comprising: a processor adapted to execute a computer program; A computer readable storage medium having stored therein a computer program which, when executed by the processor, implements the image processing method according to any one of claims 1-13.
16. A computer-readable storage medium, characterized in that the computer-readable storage medium stores a computer program adapted to be loaded by a processor and to perform the image processing method according to any of claims 1-13.
17. A computer program product comprising computer instructions which, when executed by a processor, implement the image processing method of any of claims 1-13.

Description

Image processing method, device, equipment, medium and program product Technical Field The present application relates to the field of computer technology, and in particular to the field of artificial intelligence, and more particularly to an image processing method, an image processing apparatus, a computer device, a computer readable storage medium and a computer program product. Background Image desensitization refers to the process of removing sensitive information (such as face, identification number or license plate information) in an image. Taking sensitive information in an image as a human face as an example, the universal target detection technology realizes the desensitization of the human face in the image by erasing or smearing the human face area in the image. However, this apparent manner of facial erasure or smearing not only disrupts the overall aesthetic appearance of the desensitized image and the consistency of the image content, but also negatively impacts subsequent image reuse. Therefore, how to desensitize the face of the image becomes a research hot spot in the field of sensitive information processing. Disclosure of Invention The embodiment of the application provides an image processing method, an image processing device, image processing equipment, an image processing medium and an image processing program product, which can conduct targeted shielding on sensitive target face parts in a face, and improve the face desensitization efficiency while realizing face privacy protection in an image. In one aspect, an embodiment of the present application provides an image processing method, including: acquiring a face image, and carrying out feature extraction processing on the face image to obtain a face feature vector, wherein the face feature vector is used for indicating feature information of a face in the face image; acquiring a noise image, wherein the noise image is randomly generated; and carrying out multidirectional denoising processing on the noise image based on the face feature vector to generate an occlusion image, wherein the occlusion image comprises a face with an occluded target face part, and the occluded face retains the face appearance attribute of the original face. In another aspect, an embodiment of the present application provides an image processing apparatus, including: the device comprises an acquisition unit, a characteristic extraction unit and a characteristic extraction unit, wherein the acquisition unit is used for acquiring a face image and carrying out characteristic extraction processing on the face image to obtain a face characteristic vector, and the face characteristic vector is used for indicating characteristic information of a face in the face image; The acquisition unit is also used for acquiring a noise image, and the noise image is randomly generated; The processing unit is used for carrying out multidirectional denoising processing on the noise image based on the face feature vector to generate an occlusion image, wherein the occlusion image comprises a face with an occluded target face part, and the occluded face retains the face appearance attribute of the original face. In one implementation, the noise image is a diffusion state under a time step t, t is an integer greater than 1, the diffusion state under the time step t is obtained by adding Gaussian noise to the diffusion state under the time step t-1, any time step is expressed as j, j is an integer belonging to [0, t ], and the processing unit is used for performing multidirectional denoising processing on the noise image based on the face feature vector, and is specifically used for generating an occlusion image: Based on the characteristic information of the face indicated by the face characteristic vector, performing multidirectional noise prediction processing on the diffusion state under the time step t to obtain target noise information corresponding to the time step t under each direction; According to the target noise information corresponding to the time step t in each direction, carrying out characteristic recovery processing on the diffusion state in the time step t to obtain the diffusion state in the time step t-1; repeating the steps on the diffusion state under the time step t-1 until the time step j=0 to obtain the diffusion state under the time step 0, wherein the diffusion state under the time step 0 is used as a shielding image. In one implementation, the multidirectional comprises a conditional guiding direction, an unconditional guiding direction and a classified guiding direction, and a processing unit, wherein the processing unit is used for carrying out multidirectional noise prediction processing on the diffusion state under the time step t based on the characteristic information of the face indicated by the face characteristic vector, and is specifically used for when target noise information corresponding to the time step t under each direction is obtained: