Search

US-12620174-B2 - Reconstruction of a 3D mesh of a person

US12620174B2US 12620174 B2US12620174 B2US 12620174B2US-12620174-B2

Abstract

A 3D mesh of a person shall be reconstructed based on one single 2D image. For this purpose, 2D vertex projections and 3D vertex projections of the person are predicted based on the single 2D image. An approximated pose is estimated from the 3D vertex projections. The shape and/or pose, i.e. the 3D mesh of the person, is computed from the predicted 2D vertex projections and from the approximated pose by using a pregiven camera model and an articulated 3D mesh model of a human body.

Inventors

  • Karthik Shetty
  • Annette Birkhold

Assignees

  • SIEMENS HEALTHCARE GMBH

Dates

Publication Date
20260505
Application Date
20230908
Priority Date
20220912

Claims (20)

  1. 1 . A method of reconstructing a 3D mesh of a person, the method comprising: providing a single 2D image of the person; predicting 2D vertex projections and 3D vertex projections of the person based on the single 2D image; estimating an approximated pose from the 3D vertex projections, and computing a shape, a pose, or the shape and the pose of the person from the predicted 2D vertex projections and from the approximated pose by using a pregiven camera model and an articulated 3D mesh model of a human body.
  2. 2 . The method of claim 1 , wherein computing includes computing rotations for individual segments of the human body.
  3. 3 . The method of claim 2 , wherein pose parameters for the articulated 3D mesh model are obtained by inverse geometric transformation of the computed rotations.
  4. 4 . The method of claim 1 , wherein computing includes a translation of coordinates of the predicted 2D vertex projections into a pregiven coordinate system.
  5. 5 . The method of claim 1 , wherein the articulated 3D mesh model bases on a system of linear equations.
  6. 6 . The method of claim 5 , wherein a depth constraint is added to the system of linear equations.
  7. 7 . The method of claim 6 , wherein at least one further articulated 3D mesh model based on linear equations is added to the system of linear equations.
  8. 8 . A device for reconstructing a 3D mesh of a person, the device comprising: an image delivery component configured to provide one single 2D image from the person; a prediction component configured to predict 2D vertex projections and a 3D vertex projections of the person based on the single 2D image; an estimation component configured to estimate an approximated pose from the 3D vertex projections; and a calculation component configured to compute a shape, a pose, or the shape and the pose of the person from the predicted 2D vertex projections and from the approximated pose by using a pregiven camera model and an articulated 3D mesh model of a human body.
  9. 9 . The device of claim 8 , wherein computing includes computing rotations for individual segments of the human body.
  10. 10 . The device of claim 9 , wherein pose parameters for the articulated 3D mesh model are obtained by inverse geometric transformation of the computed rotations.
  11. 11 . The device of claim 8 , wherein computing includes a translation of coordinates of the predicted 2D vertex projections into a pregiven coordinate system.
  12. 12 . The device of claim 8 , wherein the articulated 3D mesh model bases on a system of linear equations.
  13. 13 . The device of claim 12 , wherein a depth constraint is added to the system of linear equations.
  14. 14 . The device of claim 13 , wherein at least one further articulated 3D mesh model based on linear equations is added to the system of linear equations.
  15. 15 . A non-transitory computer readable storage medium comprising a set of computer-readable instructions stored thereon, the instructions which, when executed by at least one processor cause the processor to: provide a single 2D image of the person; predict 2D vertex projections and 3D vertex projections of the person based on the single 2D image; estimate an approximated pose from the 3D vertex projections, and compute a shape, a pose, or the shape and the pose of the person from the predicted 2D vertex projections and from the approximated pose by using a pregiven camera model and an articulated 3D mesh model of a human body.
  16. 16 . The non-transitory computer readable storage medium of claim 15 , wherein the instructions for the at least one processor to compute include computing rotations for individual segments of the human body.
  17. 17 . The non-transitory computer readable storage medium of claim 16 , wherein pose parameters for the articulated 3D mesh model are obtained by inverse geometric transformation of the computed rotations.
  18. 18 . The non-transitory computer readable storage medium of claim 15 , wherein the instructions for the at least one processor to compute include a translation of coordinates of the predicted 2D vertex projections into a pregiven coordinate system.
  19. 19 . The non-transitory computer readable storage medium of claim 15 , wherein the articulated 3D mesh model bases on a system of linear equations.
  20. 20 . The non-transitory computer readable storage medium of claim 15 , wherein a depth constraint is added to the system of linear equations.

Description

CROSS REFERENCE TO RELATED APPLICATIONS This application claims the benefit of EP 22195196.5 filed on Sep. 12, 2022, which is hereby incorporated by reference in its entirety. FIELD Embodiments relate to reconstruction of a 3D mesh of a person. BACKGROUND Estimating human surface meshes and poses from single images is one of the core research directions in computer vision, allowing for multiple applications in computer graphics, robotics and augmented reality. However, since this is essentially an ill-posed problem, since humans have complex body articulations and unknown scene parameters may be represented, this is a challenging task. This problem has become somewhat more tractable thanks to parametric models such as SMPL and SMPL-X, that represent various human poses and identities using only a few parameters (compare Matthew Loper, Naureen Mahmood, Javier Romero, Gerard Pons-Moll, and Michael J. Black; SMPL: A skinned multi-person linear model; ACM Trans; Graphics (Proc. SIGGRAPH Asia), 34(6):248:1-248:16, October 2015). Most state-of-the-art methods like that described in Nikos Kolotouros, Georgios Pavlakos, Michael J Black, and Kostas Daniilidis; Learning to reconstruct 3d human pose and shape via model-fitting in the loop; in ICCV, 2019 directly regress the shape and pose parameters from a given input image. These approaches rely completely on neural networks, even though the mapping between the parameters and the mesh is non-linear while also making several assumptions on the image generation process. One of the factors lies in the use of a simplified camera model i.e. the weak perspective camera. In this scenario the camera is assumed to be far from the subject, that is generally realized by setting a large focal length constant for all images. This weak camera may be modeled by three parameters, two with respect to translation in the horizontal and vertical directions, and the third being scale. While these methods may estimate plausible shape and pose parameters, it becomes a struggle for the networks to optimize between the 2D re-projection loss and the 3D loss, resulting in meshes often being misaligned either in the 3D space or the image space. On the other hand, recent non-parametric or model-free approaches (compare Kevin Lin, Lijuan Wang, and Zicheng Liu; Mesh Graphormer; 2021 IEEE/CVF International Conference on Computer Vision (ICCV), pages 12919-12928, 2021) directly regress the mesh vertex coordinates with their 2D projections, aligning well to the input image. However, even these methods suffer from the same obstacles as the parametric models by ignoring the effects of a perspective camera. The paper Ke Sun, Bin Xiao, Dong Liu, and Jingdong Wang; Deep high-resolution representation learning for human pose estimation; in CVPR, 2019 discloses a human pose estimator HRNet-W32. A mesh regressor is shown in Gyeongsik Moon and Kyoung Mu Lee; I21-meshnet: Image-to-pixel prediction network for accurate 3d human pose and mesh estimation from a single rgb image; ArXiv, abs/2008.03713, 2020. A graph convolution network (GCN) is described in Thomas N. Kipf and Max Welling; Semi-Supervised Classification with Graph Convolutional Networks; in Proceedings of the 5th International Conference on Learning Representations, ICLR '17, 2017. W. Kabsch; A solution for the best rotation to relate two sets of vectors; Acta Crystallographica Section A, 32(5):922-923, 1976 presents an Approximate Rotation Estimator (ARE). In interventional settings, a patient model may streamline clinical workflow, help optimize radiation dose, and optimize C-arm trajectories and patient positioning. For all of these tasks a patient model that fits the actual patient as precise as possible is required. Especially poses of patients may vary between procedures or may change during a procedure. Most current camera-based methods, estimate 2D/3D key points on the patient body or regress the shape/pose parameters while assuming in both cases a fixed camera with a very large focal length (i.e., weak perspective camera). This results in unrealistic depth estimation and pose estimation. Further, these methods provide no options to add any known constraints of the environment. BRIEF SUMMARY AND DESCRIPTION The scope of the present disclosure is defined solely by the claims and is not affected to any degree by the statements within this summary. The present embodiments may obviate one or more of the drawbacks or limitations in the related art. Embodiments provide a method and a device for improved reconstruction of a 3D mesh of a person based on one single 2D image of the person. Embodiments provide a method of reconstructing a 3D mesh of a person including the step of providing one single 2D image of the person. A human being is represented as a 3D mesh. Such a 3D mesh simplifies further data processing, where the pose or shape of the person is essential. The first step of the method is to provide one single 2D image. The single 2D image m