EP-4573522-B1 - 3D RECONSTRUCTION OF A TARGET

EP4573522B1EP 4573522 B1EP4573522 B1EP 4573522B1EP-4573522-B1

Inventors

ZHAO, Zi-chuan
PENA-RIOS, Anasol
CLARK, ADRIAN
CONWAY, ANTHONY

Dates

Publication Date: 20260513
Application Date: 20230720

Claims (10)

A computer-implemented method for 3D reconstruction of a physical object, the method comprising: obtaining an initial global reconstruction of the physical object in a 3D space, inferred by a global machine learning model trained to reconstruct a 3D representation of an entirety of the physical object in 3D space from data that has been sampled from the physical object; providing, to a user, an initial visualisation of the physical object based on the reconstruction; receiving, from the user, at least one indication of at least one point of interest in the visualisation; resampling at least one first subsection of the physical object based on the at least one point of interest to obtain local data, wherein the local data is associated with the subsection based on spatial information that associates the local data with a point in 3D space; inputting the resampled local data and spatial information into a local feature machine learning model to obtain at least one local 3D reconstruction of the physical object, wherein the local feature machine learning model has been trained to output a physical object reconstruction from local data of resampled subsections, and wherein the 3D coordinate system of the local 3D reconstruction aligns with the global 3D reconstruction; and merging the global 3D reconstruction with the local 3D reconstruction.
The method of claim 1, wherein the steps of the method are performed iteratively using the merged reconstruction as the initial global reconstruction in the next iteration until receiving, from the user, an indication to stop.
The method claims of 1 or 2, wherein the global model is an encoder-decoder network comprising a global encoder trained to infer a global latent code from the physical object data, and wherein the local feature machine learning model comprises a local feature encoder-decoder network, the local feature encoder-decoder network comprising: a local feature encoder trained to infer a local feature latent code from the resampled local data and spatial information; and a local feature decoder trained to infer a representation of the physical object in the 3D space from a combination of the local feature latent code and the global latent code.
The method of claim 3 wherein the local 3D reconstruction comprises at least one reconstruction of at least one additional subsection of the physical object that corresponds to the at least one first subsection under symmetry, wherein the reconstruction of the at least one additional subsection is inferred by the local feature decoder based on the local data of the at least one first subsection and said symmetry, learned by the global encoder.
The method of claim 3 or 4 wherein obtaining a global reconstruction of the physical object in a 3D space comprises: inputting the global latent code to the local feature decoder wherein the local feature decoder is trained to infer, from the global latent code, a global representation of the physical object in the 3D space.
The method of claims 3, 4, or 5 wherein merging the global 3D reconstruction and the local 3D reconstruction comprises: receiving, from the local feature decoder, a local reconstruction corresponding to each point of interest; receiving, from the local feature decoder, weight information comprising a weight value for each point in space for each local reconstruction; and merging the global reconstruction and the at least one local reconstruction based on the weight information; wherein the local feature decoder is trained to infer the weight information based on the combined local feature latent code and the global latent code by using a loss function based on the merged reconstruction, wherein the parts of the local 3D reconstruction that have been reconstructed based on the resampled at least one first subsection of the physical object are weighted higher.
The method of any preceding claim wherein resampling the physical object based on the at least one point of interest comprises resampling a subspace in the global 3D space centred on the at least one point of interest.
The method of any preceding claim, wherein the local and global 3D reconstructions are each a scalar field representing an occupation probability of a point in space and wherein merging the global 3D reconstruction with the local 3D reconstruction comprises: combining the scalar field values of the global 3D reconstruction with the scalar field values of the local 3D reconstruction; and, optionally, extracting a probability iso-surface from the combined scalar field to represent the shape of the physical object for visualisation.
A computer system comprising a processor and a memory storing instructions executable by the processor to cause the processor to perform the method of any preceding claims.
A computer readable medium comprising computer program code to, when loaded on and executed by a computer, causes the computer to carry out the method of any of claims 1 to 8.

Description

The present invention relates to a computer-implemented method and system for interactive 3D reconstruction of a target. Background A digital twin system may be used for the creation of 3D model of a target such as an object. Augmented Reality (AR) and Virtual Reality (VR) technologies and their applications with digital twin systems often rely upon 3D virtual representations of complex physical objects. The improvements in the accuracy of the virtual representation and level of detail that can be reproduced can assist in the optimization of AR and VR applications and provide an improved user experience. Fast, automatic creation of virtual representations from physical objects provides a challenge to the AR and VR industry. 3D reconstruction is a technique which attempts to recover the original 3D shape of an object or scene from input data such as, for example, one or more images, or from a point cloud acquired from a scanning device. One technique for 3D reconstruction is implicit field reconstruction in which the output target is represented as a scalar field in the 3D space. Deep learning based implicit reconstruction systems can be classified into two categories, the forward class, and the converging class. In forward-class algorithms, the input data is first encoded to a latent code by an encoder neural network, and then decoded into the implicit field by a decoder neural network by the learned parameters. This category of architecture is capable of reconstructing 3D shapes from learned priors which reduces noise and prevents missing parts in the reconstruction. However, it performs poorly when reproducing targets not encountered in the training set and tends to over-smooth the output. The converging-class tries to learn a neural network that represents the entire implicit field for each individual object. This class performs better at reproducing details but is less reliable at reproducing the shape of an object and takes longer for individual objects. It is desirable therefore to overcome the shortcomings of the two classes to be able to produce a 3D reconstruction in a manner which is able to accurately reproduce the shape of the target while also being able to efficiently recreate complex details. WO2021/184933 A1 discloses a method of 3D human body model reconstruction based on artificial intelligence. Based on a target image, a 3D model of the entire human body, and a 3D sub-model of a part of the human body can be obtained by using a neural network model and/or a parameterization model, which are fused to obtain a fused 3D human body model. Summary of the invention According to a first aspect of the present invention, there is provided a method for 3D reconstruction of a physical object, the method comprising: obtaining an initial global reconstruction of the physical object in a 3D space, inferred by a global machine learning model trained to reconstruct a 3D model representation of an entirety of the physical object in 3D space from data that has been sampled from the physical object; providing, to a user, an initial visualisation of the physical object based on the reconstruction; receiving, from the user, at least one indication of at least one point of interest in the visualisation; resampling at least one first subsection of the physical object based on the at least one point of interest to obtain local data, wherein the local data is associated with the subsection based on spatial information that associates the local data with a point in 3D space; inputting the resampled local data and spatial information into a local feature machine learning model to obtain at least one local 3D reconstruction of the physical object, wherein the local feature machine learning model has been trained to output a target reconstruction from local data of resampled subsections, and wherein the 3D coordinate system of the local 3D reconstruction aligns with the global 3D reconstruction; and merging the global 3D reconstruction with the local 3D reconstruction. Preferably, the steps of the method are performed iteratively using the merged reconstruction and the further visualisation based on the merged reconstruction as the initial global reconstruction and initial visualisation in the next iteration until receiving, from the user, an indication to step. Preferably, the global model is an encoder-decoder network comprising a global encoder trained to infer a global latent code from the physical object target data, and the local feature machine learning model comprises an encoder-decoder network, the local feature encoder-decoder network comprising: a local feature encoder trained to infer a local feature latent code form the resampled local data and spatial information; and a local feature decoder trained to infer a representation of the physical object target in the 3D space from a combination of the local feature latent code and the global latent code. Preferably, the local 3D reconstruction comprises at le