CN-121999088-A - Portrait hairstyle facing identity information retention based on diffusion model and shelter removing method and system

CN121999088ACN 121999088 ACN121999088 ACN 121999088ACN-121999088-A

Abstract

The invention relates to a portrait hairstyle and shelter removing method and a system for identity information retention based on a diffusion model, which belong to the technical field of computer vision and image processing and comprise the steps of obtaining a portrait image to be processed, carrying out semantic segmentation on the portrait image to be processed to obtain a hair mask and a shelter moving mask of a hat, and constructing a combined shelter mask of the hair and the shelter; the method comprises the steps of carrying out facial key point detection on a portrait image to be processed to generate a light-emitting mask, obtaining a combined mask of a mask area, taking a face and neck identity area as an identity holding area, inputting the portrait image to be processed and the combined mask of the area into a light head generating network based on a potential diffusion model, and generating a light head portrait image with a hairstyle and a mask removed and good identity information. The invention realizes that the consistency of the identity information of the person is maintained as much as possible while removing the complex shielding object.

Inventors

YAO XUNXIANG
XU HUA
XU YINGCHENG
ZHANG QIUYUE
ZHANG PENG
XU MINFENG
LIU PEIDE

Assignees

山东财经大学

Dates

Publication Date: 20260508
Application Date: 20260130

Claims (10)

1. The portrait hairstyle facing identity information holding and shelter removing method based on the diffusion model is characterized by comprising the following steps: acquiring a portrait image to be processed, performing facial semantic segmentation on the portrait image to be processed to obtain a hair mask and a cap shielding object moving mask, and constructing a hair and cap shielding object combined shielding mask; Performing face key point detection on the portrait image to be processed, fitting a Bezier curve based on the arch key points, dividing a face area into a face-neck identity area and a hair accessory shielding area, and generating a hair accessory shielding mask; combining the hair and cap shielding object with the shielding mask to obtain a combined shielding area mask, and taking the face and neck identity area as an identity holding area; Inputting the portrait image to be processed and the mask of the combined shielding area into an optical head generation network based on a potential diffusion model, generating conditions only in the combined shielding area, and maintaining or limiting and adjusting the identity maintaining area to generate an optical head portrait image with a hairstyle and a shielding object removed and good identity information maintaining.
2. The diffusion model-based identity information retention-oriented portrait style and shade removal method according to claim 1, wherein the portrait image is a portrait image including a head region including at least a face, a neck, and a shade present; preprocessing the acquired portrait image, including size normalization, color space conversion, histogram equalization, interpolation scaling and normalization to preset resolution and dynamic range; performing semantic segmentation on the portrait image to be processed to obtain a hair mask and a cap shielding object moving mask, and constructing a hair and cap shielding object moving combined shielding mask, wherein the method comprises the following steps: inputting the preprocessed portrait image into a face semantic segmentation model based on deep learning to obtain a segmentation result containing a plurality of semantic categories, wherein the segmentation result at least comprises a hair category and a cap category, and the segmentation result is correspondingly obtained: A hair mask for representing a set of pixels belonging to a hair region in the image; the cap shielding object moves the mask and is used for representing a pixel set belonging to a cap area in the image; And performing logic or operation on the hair mask and the cap shield moving mask to obtain a hair and cap shield moving combined shielding mask, wherein the hair and cap shield moving combined shielding mask is used for integrally representing the hair and cap area to be removed.
3. The identity information holding-oriented portrait hairstyle and shelter removal method based on diffusion model as claimed in claim 1, wherein the facial key point detection is carried out on the portrait image to be processed, the Bezier curve is fitted based on the arch key point, the facial area is divided into a facial neck identity area and a hair shelter area, and a hair shelter mask is generated, the method comprises the following steps: firstly, facial feature point detection is carried out on a portrait image, and a plurality of face key point coordinates are extracted; then, selecting a group of key points near the left and right eyebrow bows, and fitting through Bezier curves to obtain curves approximately describing the geometry of the lower edge of the eyebrows; Dividing the facial area into a lower area of the curve, an upper area of the curve and an area overlapping with the position of the hair accessory by taking the curve which approximately describes the geometry of the lower edge of the eyebrow as a boundary: Constructing a hair ornament shielding mask for representing a hair ornament area to be removed according to the dividing result; and meanwhile, based on a face segmentation result, decoupling and reorganizing the face-neck identity area, the clothing area and the background area.
4. The identity information retention-oriented portrait hairstyle and obstruction removal method based on diffusion model according to claim 1, wherein a group of key points near the left and right eyebrow arches are selected, and fitting is performed by using Bezier curves to obtain curves approximately describing the geometry of the lower edge of the eyebrow, and the method comprises: acquiring coordinate set of key points of double-sided eyebrow arch ; The two end points of the left or right eyebrow arch are taken as curve end points Three Bezier curves were constructed as follows: ; Wherein, the To be estimated for a control point, t is a parameter on the bezier curve, Is an expression of a bezier curve; assigning parameters to each keypoint By least squares optimization, the following is shown: ; Wherein, the Representing the position of the bezier curve at a given parameter t, Is a control parameter in the optimization process, and represents a parameter related to each key point; a curve is obtained that approximately describes the geometry of the lower edge of the eyebrow.
5. The portrait hairstyle and shade removing method facing identity information holding based on diffusion model as claimed in claim 1, wherein the method for combining hair and hat shades, combining shade mask and hair ornament shade mask to obtain combined shade area mask and using face and neck identity area as identity holding area comprises: Logically fusing the hair and hat shielding object combined shielding mask and the hair ornament shielding mask to obtain a combined shielding area mask; Meanwhile, the face and neck identity area which does not belong to the combined shielding area mask is marked as an identity holding area.
6. The method for preserving identity information-oriented portrait hairstyle and removing a mask according to claim 1, wherein inputting a portrait image to be processed and a combined mask area into a potential diffusion model-based photo head generation network, performing conditional generation only in the combined mask area, and preserving or limiting adjustment of an identity preservation area, generating a photo head portrait image with a hairstyle and a mask removed and good identity information preservation, comprises: The optical head generation network based on the potential diffusion model comprises a variable self-encoder (VAE), a denoising network (U-Net) and a control branch parallel to the denoising network (U-Net), wherein the VAE is used for realizing bidirectional mapping between a pixel space and a potential space, the U-Net is used for predicting noise and completing gradual denoising in a diffusion reverse process, the control branch is used for encoding condition information and injecting control features into each layer of the U-Net in a multi-scale mode, so that the controlled generation of a joint occlusion region and the structural texture maintenance of an identity maintenance region are realized; Potential encoding stage to input portrait image to be processed Input variations are mapped from encoder VAEs to potential space to yield potential representations ; Representing the variance from the encoder VAE; diffusion and noise adding stage for potential representation Executing forward diffusion process, gradually adding Gaussian noise according to preset noise scheduling to obtain noise adding potential representation under different time steps Denoising network U-Net Learning, for input, noise added in predicting current time step And optimizing noise prediction error by adopting a mean square error loss function to enable the denoising network U-Net to obtain the secondary power Gradually return to Is a reverse denoising capability; Condition denoising and control stage, namely inputting condition Sending into control branch parallel to the denoising network U-Net for coding, and injecting the obtained multi-scale control features into each layer in the denoising network U-Net, wherein the condition input At least comprises a joint shielding area mask Identity preserving region mask Clothing and background structure information; the control branch comprises a condition encoder, a trainable U-Net copy of the same scale topology as the denoising network U-Net and a zero convolution injection layer; The coding and injection modes of the control branches are as follows, a condition coder splices the condition graphs according to channels, and then convolves and downsamples the condition graphs to obtain multi-scale condition features, and the trainable U-Net copy further outputs control features corresponding to each scale of the denoising network U-Net Zero convolution implant layer will Mapping to residual errors Injecting the characteristic of the corresponding scale layer of the U-Net of the denoising network to realize smooth and controlled guidance; Each layer in the denoising network U-Net comprises a multi-scale downsampling layer Down Blocks at an encoding end, a middle bottleneck layer Mid Block and a multi-scale upsampling layer Up Blocks at a decoding end, and multi-scale features are fused through jump connection; Mask constraint local generation mechanism, in the process of inverse denoising sampling, a denoising network U-Net generates potential representation of candidate update under control branch guidance ; Then mask according to the combined shielding area Performing mask fusion only at The generated potential results are adopted, and the input potential characteristics or limited updated versions thereof are kept in a non-shielding area, so that local editing is realized; latent decoding phase in the inference phase, first, noise latent variables are initialized in the latent space Under the guidance of conditional control branch, the U-Net of denoising network is used to execute inverse denoising sampling according to the preset noise scheduler, from time step Stepwise iterate to Obtaining a denoising potential representation of the reverse sampling termination time, and marking the denoising potential representation as a final potential representation ; Variable self-encoder VAE including encoder AND decoder Encoder(s) Decoder for mapping a pixel space image to a potential space For restoring the potential representation to pixel space, and thus, the final potential representation Input VAE decoder Reconstructing to obtain an output image: 。
7. The identity information preserving portrait style and obstruction removing method based on diffusion model as claimed in claim 6, wherein condition input Also included is a structure enhancement prior; the solving formula of the condition diagram is as follows: ; Wherein, the Refers to a mask of a combined occlusion area, Refers to an identity preserving region mask, The clothing and background structure information is indicated; Representing a per-lane splice.
8. A portrait hairstyle and shade removal system for identity information retention based on a diffusion model, comprising: the portrait acquisition module is configured to receive or acquire portrait images to be processed and complete preprocessing; the semantic segmentation and mask generation module is configured to perform facial semantic segmentation on the portrait image to be processed to obtain a hair mask and a cap shield mask, and construct a hair and cap shield combined shield mask; The occlusion region analysis module is configured to fuse the hair and cap occlusion object combined occlusion mask and the hair ornament occlusion mask to obtain a combined occlusion region mask, and take the face and neck identity region as an identity holding region; And the optical head generating and image reconstructing module is configured to input the portrait image to be processed and the combined shielding area mask into an optical head generating network based on a potential diffusion model, perform conditional generation only in the combined shielding area, and perform holding or limited adjustment on the identity holding area to generate an optical head portrait image with the hairstyle and shielding removed and good identity information holding.
9. A computer readable storage medium having stored thereon a computer program, wherein the computer program, when invoked and executed by a processor, performs the operations of the identity information preserving portrait hairstyle and obstruction removal method based on diffusion model of any one of claims 1 to 7.
10. An electronic device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, which when executed by the processor, performs the operations of the identity information preserving portrait style and occlusion removal method based on a diffusion model of any of claims 1 to 7.

Description

Portrait hairstyle facing identity information retention based on diffusion model and shelter removing method and system Technical Field The invention belongs to the technical field of computer vision and image processing, and particularly relates to a portrait hairstyle and shelter removal method and system for identity information retention based on a diffusion model. Background With the rapid development of computer technology and artificial intelligence technology, computer vision and image generation technology based on deep learning is widely applied to social media, virtual makeup, virtual image customization, security monitoring and other scenes. The portrait image editing technology can flexibly adjust various attributes (such as hairstyles, makeup, expressions and the like) of the appearance of the person, wherein the portrait hairstyle removing technology can remove an original hairstyle from an input hairstyle portrait image and generate an optical head image, so that a user can conveniently replace a virtual hairstyle, and non-shielding facial texture data can be provided for tasks such as three-dimensional face reconstruction and face recognition, and the accuracy of reality sense, detail expressive force and identity feature modeling of a three-dimensional face model is improved. The existing portrait hairstyle removing method mainly comprises two major categories, namely, one category is based on a generation countermeasure network and potential space editing thereof, the portrait image is mapped to a potential space of a StyleGAN generation model and the like, the potential direction related to 'hair' is edited to achieve hairstyle removal, and the other category is based on strong generation capacity of a diffusion model, and the blocked area is subjected to conditional generation in a pixel space or the potential space to obtain an optical head image or an intermediate proxy image to serve the tasks of subsequent hairstyle migration and the like. The methods provide effective technical means for portrait hairstyle editing, but have obvious defects in a real complex scene. Specifically, on one hand, due to limited representation capability of a potential space encoder on a real image, the existing potential space editing-based method is difficult to obtain a potential representation for accurately describing input identity features, so that a person face with a hairstyle removed has deviation from an original person in geometric structure and texture details, identity information is not maintained sufficiently, on the other hand, the existing semantic segmentation model is difficult to accurately distinguish a hairpiece from a person face region when processing a forehead region which contains hairpiece shielding such as a hairband and a hairpin, and the like, and the hairpiece and the face are easily mixed in a segmentation result, so that the subsequent reconstruction quality is influenced. In addition, hair forms in a real scene are complex and changeable, head ornaments such as caps, hair hoops and hair clips are widely arranged, and various types of shielding can be formed on forehead, hairline and partial facial areas. The existing research relies on the paired data set of 'hair-light head' on the data level, but the paired samples of 'wearing cap/light-light head' are generally lacking, so that the model has insufficient capability of removing the shielding objects such as caps, light-emitting decorations and the like in the training stage. Even if a diffusion model or a control network is introduced to perform condition generation, due to complex semantic and geometric coupling relation between the shielding area and the non-shielding area, the problems that the hairstyle and the hat decoration are not completely removed, the skin color of the generated area is discontinuous with the skin color of the original face, illumination and shadow are inconsistent and the like often occur in the generated result, and identity information maintenance and complete removal of complex shielding objects are difficult to be simultaneously considered. Disclosure of Invention In order to overcome the defects of the existing portrait hairstyle removing technology in the aspects of identity information maintenance, shelter elimination and background semantic consistency, the invention provides a portrait hairstyle and shelter removing method and system facing the identity information maintenance based on a diffusion model. The invention is based on a pre-training potential diffusion model, and introduces mechanisms such as facial semantic segmentation, joint mask generation, occlusion region fine modeling, condition control generation and the like, so that the model can remove various types of complex occlusion objects such as hairstyles, hats, hair accessories and the like, simultaneously keep the identity characteristics, facial structures and overall style consistency of original characters as fa