CN-117152035-B - Fusion method for realizing different human body reconstruction models based on single RGB image

CN117152035BCN 117152035 BCN117152035 BCN 117152035BCN-117152035-B

Abstract

The invention discloses a fusion method for realizing different human body reconstruction models based on a single RGB image, which comprises the steps of firstly meshing an implicit expression network obtained by two different reconstruction methods through a Maring Cubs algorithm to obtain three-dimensional human body reconstruction models with aligned space and space resolutions, obtaining pixel aligned depth maps of each reconstruction model through a rendering mode, calculating the thickness of a space z, comparing the thickness maps to obtain the same area and different areas of the two models, training the RGB image and the depth image rendered by a human body model dataset to obtain a depth map network of a visible face and an invisible face of a person, inputting the predicted visible face and the invisible face depth map by using the RGB image input network, determining the z space position, aligning the z space thickness of the different models by using a thickness scaling algorithm, and finally carrying out interpolation fusion on the different areas of the two models and the same area at a boundary by using an edge distance marking algorithm. The invention improves the reconstruction accuracy.

Inventors

LIN ZHAOJI
ZHAO JIEYU
YAO LI

Assignees

三江学院

Dates

Publication Date: 20260508
Application Date: 20230830

Claims (9)

1. A fusion method for realizing different human body reconstruction models based on a single RGB image is characterized by comprising the steps of S100, calibrating a model migration and fusion area, gridding an implicit expression network obtained by two different reconstruction methods through a Marching Cubes algorithm to obtain a three-dimensional human body reconstruction model with aligned space and space resolution, inputting two three-dimensional human body implicit reconstruction models based on a single RGB image, obtaining a space depth map aligned with each model pixel through a rendering mode, calculating a corresponding space thickness map, and comparing the thickness maps to obtain the same area and different areas of the two models, S200, obtaining a depth map network of a visible face and an invisible face of a person based on the single RGB image and the corresponding space depth map rendered by a human body model dataset through training, obtaining a predicted depth map of the visible face and the invisible face by using the RGB image input network, determining a z space position and aligning the z space thickness of different reconstruction models by using a thickness scaling algorithm; step S300, model fusion, namely interpolation fusion is carried out on the same area and different areas of two reconstructed models at the boundary, and step S400, post-processing is carried out on the models by discrete triangle clipping and normal optimization; step S200 includes: Step S210, using a single RGB picture rendered by the human body model dataset and a corresponding depth map as training pairs, respectively training a visible surface depth map network and an invisible surface depth map network corresponding to the RGB image through an image generation network formed by a neural network ResNet to obtain a depth map network with an RGB map and an image angle increased by 180 degrees Wherein G D represents a training-completed depth map network, Representing a network that generates a visual face depth map, Obtaining a predicted visible face depth map and an invisible face depth map based on the trained visible face depth map network and the invisible face depth map network; Step S220, calculating pixel-by-pixel difference values based on the predicted visible surface depth map and the invisible surface depth map to obtain a predicted space thickness map, comparing the predicted space thickness map with corresponding pixel thickness values in the first model and the second model space thickness map, and determining z space thickness and z space position; Step S230, scaling the thickness of the migration part by using a thickness scaling algorithm, so that the first model and the second model are matched in z space.
2. The method for realizing fusion of different human body reconstruction models based on a single RGB image of claim 1, wherein the step S100 comprises the following steps: step S110, after cutting and removing the background by using the same figure RGB picture, inputting two three-dimensional human implicit reconstruction models to obtain a Marching Cubes cube space with aligned space reconstructed by the two models; Step S120, traversing the obtained two Marching Cubes cube spaces, obtaining the numerical value of each point, and counting the distance from the point of each cube [ z ] [ x ] [ y ] > v to the view angle of the camera on the x-y two-dimensional space to obtain a first space depth map and a second space depth map, wherein the v value represents the probability value of whether the sampling point is on the surface of the reconstructed model after calculation; Step S130, calculating a corresponding first space thickness map and a corresponding second space thickness map based on the first space depth map and the second space depth map; step S140, comparing pixel by pixel based on the first space thickness map and the second space thickness map to obtain a coincident point and a non-coincident point of two reconstruction models on an x-y two-dimensional space of Marching Cubes cubes, obtaining difference and coincident information in the two cube spaces, and judging a reserved area of the first model and a migration area of the second model; and step S150, marking edge points of the reserved area of the first model and the migration area of the second model based on an edge distance marking algorithm.
3. The fusion method for realizing different human body reconstruction models based on a single RGB image according to claim 2, wherein in step S110, the first model is a three-dimensional human body implicit reconstruction model obtained by an ICON reconstruction method, and the second model is a three-dimensional human body implicit reconstruction model obtained based on PIFU reconstruction method.
4. The method of claim 2, wherein in step S120, v is 0.5 and 0.5 as reconstruction surfaces, and more than 0.5 is considered to be outside the reconstruction surfaces and less than 0.5 is considered to be inside the reconstruction human body surfaces.
5. The method for realizing fusion of different human body reconstruction models based on a single RGB image of claim 2, wherein step S150 comprises the following steps: Step S151, traversing all common points, marking different points adjacent to the common points as distance 1, traversing the distance 1 points, marking all the different points which are not marked and adjacent to the common points as distance 2, and the like, and marking all the reachable different points as the distance from the nearest identical point through repeated marking; Step S152, setting a threshold value x, marking points with the distance larger than the threshold value x as point sets of migration parts, repeatedly marking adjacent points of migration point sets of different parts until the distance 1 is added into the migration point sets, wherein all marked points are point sets needing migration splicing; And step 153, repeatedly recursing through the step 151 and the step 152 to obtain interpolation areas of n pixels from the edges of the first model retaining area and the second model migration area respectively.
6. The method for realizing fusion of different human body reconstruction models based on a single RGB image of claim 1, wherein in step S210, when training the visible face depth map network and the invisible face depth map network, the feature matching loss is: Wherein G represents a generator in the depth map network, D k represents a kth arbiter in the depth map network, RGB represents an input picture, deep represents a corresponding depth map, E (RGB,Deep) represents a calculation function related to RGB and Deep, T represents the total number of layers of the neural network, N i represents the number of elements per layer of the neural network, Represented is the L1 loss between the input of the discriminator parameter of the true depth map and the input of the discriminator parameter of the generated depth map.
7. The method for realizing fusion of different human body reconstruction models based on a single RGB image of claim 1, wherein in step S230, the spatial average thickness T A of the first model is calculated, the average thickness T AD of the region corresponding to the depth prediction image and the first model is calculated, the predicted thickness T AL of the missing part of the first model is calculated by proportion to obtain the thickness T B2A of the second model transferred to the first model part, and the thickness of the corresponding part of the second model is converted into T B2A .
8. The method for realizing fusion of different human body reconstruction models based on a single RGB image of claim 1, wherein the step S300 comprises the steps of carrying out interpolation calculation on the second model retaining part and the second model migration part, wherein an interpolation formula is as follows: R=X 0 *(1-p)+X 1 *p Wherein R is an interpolation result, X 0 、X 1 represents a spatial interpolation point, 0< p <1 is an offset of interpolation pixels, the offset of i pixels away from the fusion area is i/n, and n represents a threshold value of the fusion range.
9. The method for realizing fusion of different human body reconstruction models based on a single RGB image of claim 1, wherein the step S400 comprises the following steps: Step S410, removing discrete triangles, and deleting triangle fragments formed by vertexes and edges which are not connected with a main body grid through marking the main body model to obtain a continuous reconstruction grid; step S420, re-gridding the model, adjusting the positions of grid vertexes, and changing the situation that the areas of the grid triangles in the space have large differences so that the areas of the grid triangles in the space are equal; step S430, using a single RGB picture rendered by a three-dimensional model in a training set and a corresponding normal map as a training pair, respectively training a normal map network of a visible surface and a normal map network of an invisible surface corresponding to RGB images through an image generation network constituted by ResNet to obtain a normal map network with the RGB map and the image angle increased by 180 degrees G N denotes the trained normal map network; Step S440, inputting the vertex after re-meshing into a vertex fine-tuning neural training network, outputting the fine-tuning vertex, and micro-rendering the mesh without changing the topological structure of the mesh to obtain a dressing human normal map of a visible surface and an invisible surface, and calculating L1 loss by the characteristic normal predicted in the step S430 and the normal obtained by rendering as a loss function of iterative optimization.

Description

Fusion method for realizing different human body reconstruction models based on single RGB image Technical Field The invention relates to the field of computer vision processing, in particular to a fusion method for realizing different human body reconstruction models based on a single RGB image. Background Three-dimensional reconstruction of the human body is an important subdivision field in the field of three-dimensional reconstruction. The method has the advantages of low cost, convenient acquisition of the three-dimensional model of the human body and wide application space and application value. The traditional method has the disadvantages of long time consumption, expensive equipment, high acquisition difficulty and difficult application to daily scenes. At present, the deep learning can learn reasonable mapping from images by utilizing strong fitting capacity of a neural network, but has obvious defects, different single-image reconstruction methods have difficulty in considering the reconstruction advantages in multiple aspects due to the limitation of learning information, on one hand, the skin multi-person linear model (SMPL: skinned Multi-person Linear Model) series method can exert the reconstructed gesture and the human body structure advantages, but has difficulty in considering good and complete clothing reconstruction, and on the other hand, the implicit expression series method has the advantages of high reconstruction precision and strong capability of expressing complex high-dimensional three-dimensional structures, but has difficulty in considering reasonable human body structures. In the three-dimensional reconstruction of a single figure human body, the two ends of a balance are used for exerting the free expression capability of a model and making gesture specifications by using parameterization prior, and when the reconstruction method is focused on one end, the reconstruction advantages of the other end are limited. Disclosure of Invention The invention aims to provide a fusion method for realizing different human body reconstruction models based on a single RGB image, and improve reconstruction accuracy. In order to solve the technical problems, the technical scheme of the invention is that the fusion method for realizing different human body reconstruction models based on a single RGB image comprises the following steps: Step 100, calibrating a model migration and fusion area, namely gridding an implicit expression network obtained by two different reconstruction methods through a Marching Cubs algorithm to obtain a three-dimensional human body reconstruction model with aligned space and space resolution; obtaining a spatial depth map aligned with each model pixel by a rendering mode, calculating a corresponding spatial thickness map, and obtaining the same region and different regions of the two models by comparing the thickness maps; step 200, model thickness scaling and space alignment based on a predicted depth map, namely training to obtain a depth map network of a visible face and an invisible face of a person based on a single RGB picture and a corresponding space depth map which are rendered by a human body model dataset, obtaining the predicted visible face depth map and the invisible face depth map by using an RGB image input network; Step 300, model fusion, in which interpolation fusion is carried out on the same area and different areas of two reconstructed models at the boundary; and step 400, post-processing, namely performing discrete triangle clipping and normal optimization on the model. Further, step S100 includes: step S110, after cutting and removing the background by using the same figure RGB picture, inputting two three-dimensional human implicit reconstruction models to obtain a Marching Cubes cube space with aligned space reconstructed by the two models; Step S120, traversing the obtained two Marching Cubes cube spaces, obtaining the numerical value of each point, and counting the distance from each point of each cube [ z ] [ x ] [ y ] > V to the view angle of the camera on the x-y two-dimensional space to obtain a first space depth map and a second space depth map, wherein the V value represents the probability value of whether the sampling point is on the surface of the reconstructed model after calculation; Step S130, calculating a corresponding first space thickness map and a corresponding second space thickness map based on the first space depth map and the second space depth map; step S140, comparing pixel by pixel based on the first space thickness map and the second space thickness map to obtain a coincident point and a non-coincident point of two reconstruction models on an x-y two-dimensional space of Marching Cubes cubes, obtaining difference and coincident information in the two cube spaces, and judging a reserved area of the first model and a migration area of the second model; and step S150, marking edge points of the reserved area of the fir