CN-116324895-B - Deformable nerve radiation field

CN116324895BCN 116324895 BCN116324895 BCN 116324895BCN-116324895-B

Abstract

An image synthesis technique using a neuro-radiation field (NeRF) includes generating a deformation model of movement experienced by a subject in a non-rigid deformation scene. For example, when the image composition system uses NeRF, the system takes as input a plurality of poses of the subject for training data. In contrast to conventional NeRF, this solution first expresses the position of the subject from various perspectives in the viewing frame. This solution then involves deriving a deformation model, i.e. a mapping between the observation framework and the specification framework in which the movement of the subject is taken into account. This mapping is done using potential deformation codes for each pose determined using a multi-layer perceptron (MLP). Another MLP is then used to derive NeRF from the position in the canonical frame and the projected ray direction. Then NeRF can be used to derive a new pose for the subject.

Inventors

Ricardo Martin bruvalia
Genhong Parker
Utkash Sinha
Sofin buaziz
DANIEL GOLDMAN
Jonathan Tilton Baron
Steven Maxwell Sets

Assignees

谷歌有限责任公司

Dates

Publication Date: 20260512
Application Date: 20210114
Priority Date: 20201116

Claims (20)

1.A method, comprising: acquiring image data representing a plurality of images, each image of the plurality of images comprising an image of a scene within a viewing frame, the scene comprising a non-rigidly deformed object viewed from a respective perspective; Generating a deformation model based on the image data, the deformation model describing movement by the non-rigid deformation object while generating the image data, the deformation model being represented by a differentiable nonlinear mapping between positions in the observation frame and positions in a canonical frame, and A neuro-radiation field is generated based on a position and a viewing direction of projected rays through the positions in the canonical frame, the neuro-radiation field providing a mapping between the position and viewing direction to a color and optical density at each of the viewing frames, the color and optical density at each of the viewing frames enabling viewing of the non-rigidly deformed object from a new perspective.
2. The method of claim 1, wherein the deformation model is adjusted based on potential code that encodes a state of the scene in a frame.
3. The method of claim 1, wherein the deformation model comprises a rotation, a pivot point corresponding to the rotation, and a translation.
4. A method according to claim 3, wherein the rotation is encoded as a pure logarithmic quaternion.
5. A method according to claim 3, wherein the deformation model comprises the sum of (i) a similarity transformation for the difference between position and the pivot point, (ii) the pivot point, and (iii) the translation.
6. The method of claim 1, wherein the deformation model comprises a multi-layer perceptron MLP within a neural network.
7. The method of claim 6, wherein the elastic loss function component of the MLP is based on a norm of a matrix representing the deformation model.
8. The method of claim 7, wherein the matrix is a jacobian of the deformation model relative to the locations in the observation frame.
9. The method of claim 7, wherein the elastic loss function component is based on a singular value decomposition of the matrix representing the deformation model.
10. The method of claim 9, wherein the elastic loss function component is based on a logarithm of a matrix of singular values generated by the singular value decomposition.
11. The method of claim 7, wherein the elastic loss function component is comprised of a rational function to produce a robust elastic loss function.
12. The method of claim 6, wherein background loss function components involve designating points in the scene as static points with a penalty for movement.
13. The method of claim 12, wherein the background loss function component is based on a difference between a static point and a mapping of the static point in the observation framework to the canonical framework according to the deformation model.
14. The method of claim 6, wherein generating the deformation model comprises: position encoding is applied to position coordinates within the scene to produce a periodic function of position having a frequency that increases with training iterations of the MLP.
15. The method of claim 14, wherein the periodic function of the position code is multiplied by a weight indicating whether a training iteration includes a particular frequency.
16. A computer program product comprising a non-transitory storage medium, the computer program product comprising code that, when executed by processing circuitry of a computing device, causes the processing circuitry to perform a method comprising: acquiring image data representing a plurality of images, each image of the plurality of images comprising an image of a scene within a viewing frame, the scene comprising a non-rigidly deformed object viewed from a respective perspective; Generating a deformation model based on the image data, the deformation model describing movement by the non-rigid deformation object while generating the image data, the deformation model being represented by a differentiable nonlinear mapping between positions in the observation frame and positions in a canonical frame, and A neuro-radiation field is generated based on a position and a viewing direction of projected rays through the positions in the canonical frame, the neuro-radiation field providing a mapping between the position and viewing direction to a color and optical density at each of the viewing frames, the color and optical density at each of the viewing frames enabling viewing of the non-rigidly deformed object from a new perspective.
17. The computer program product of claim 16, wherein the deformation model comprises a multi-layer perceptron MLP within a neural network.
18. The computer program product of claim 17, wherein the elastic loss function component of the MLP is based on a norm of a matrix representing the deformation model.
19. The computer program product of claim 18, wherein the matrix is a jacobian of the deformation model relative to the locations in the observation frame.
20. An electronic device, the electronic device comprising: Memory, and A control circuit coupled to the memory, the control circuit configured to: acquiring image data representing a plurality of images, each image of the plurality of images comprising an image of a scene within a viewing frame, the scene comprising a non-rigidly deformed object viewed from a respective perspective; Generating a deformation model based on the image data, the deformation model describing movement by the non-rigid deformation object while generating the image data, the deformation model being represented by a differentiable nonlinear mapping between positions in the observation frame and positions in a canonical frame, and A neuro-radiation field is generated based on a position and a viewing direction of projected rays through the positions in the canonical frame, the neuro-radiation field providing a mapping between the position and viewing direction to a color and optical density at each of the viewing frames, the color and optical density at each of the viewing frames enabling viewing of the non-rigidly deformed object from a new perspective.

Description

Deformable nerve radiation field Cross Reference to Related Applications The present application is a non-provisional application, filed on 11/16 2020, entitled "DEFORMABLE NEURAL RADIANCE FIELDS (deformable nerve radiation field)", U.S. provisional patent application No.63/198,841, the disclosure of which is incorporated herein by reference in its entirety and claims priority thereto. Technical Field The present description relates to image synthesis using a neuro-radiation field (NeRF). Background Some computers configured to render computer graphics objects may render the objects at a specified view given a plurality of existing views. For example, given several depth images and color images captured from a camera with respect to a scene comprising such a computer graphics object, the object may be a new view of the composite scene from different viewpoints. The scene may be real, in which case the view is captured using physical color and depth sensors, or synthetic, in which case the view is captured using a rendering algorithm such as rasterization. For real scenes, there are many depth sensing technologies such as time-of-flight sensors, structured light based sensors, and stereo or multiview stereo algorithms. These techniques may involve visible or infrared sensors having passive or active illumination patterns, where the patterns may vary in time. Disclosure of Invention In one general aspect, a method may include obtaining image data representing a plurality of images, each of the plurality of images including an image of a scene within a viewing frame, the scene including a non-rigidly deformed object viewed from a respective perspective. The method may further include generating a deformation model based on the image data, the deformation model describing movement by the non-rigid deformation object when generating the image data, the deformation model being represented by a mapping between positions in the observation box and positions in the canonical frame. The method may further include generating a deformable neural radiation field (D-NeRF) based on the position and viewing direction of the projected rays through the positions in the canonical frame, the D-NeRF providing a mapping between the position and viewing direction to a color and optical density at each position in the viewing frame, the color and optical density at each position in the viewing frame enabling viewing of the non-rigidly deformed object from a new perspective. In another general aspect, a computer program product includes a non-transitory storage medium including code that, when executed by processing circuitry of a computing device, causes the processing circuitry to perform a method. The method may include obtaining image data representing a plurality of images, each of the plurality of images including an image of a scene within a viewing frame, the scene including a non-rigidly deformed object viewed from a respective perspective. The method may further include generating a deformation model based on the image data, the deformation model describing movement by the non-rigid deformation object when generating the image data, the deformation model being represented by a mapping between locations in the observation frame and locations in the specification frame. The method may further include generating a deformable neural radiation field (D-NeRF) based on the position and viewing direction of the projected rays through the positions in the canonical frame, the D-NeRF providing a mapping between the position and viewing direction to a color and optical density at each position in the viewing frame, the color and optical density at each position in the viewing frame enabling viewing of the non-rigidly deformed object from the new perspective. In another general aspect, an electronic device includes a memory and a control circuit coupled to the memory. The control circuitry may be configured to obtain image data representing a plurality of images, each image of the plurality of images comprising an image of a scene within the viewing frame, the scene comprising a non-rigidly deformed object viewed from a respective viewing angle. The control circuit may be further configured to generate a deformation model based on the image data, the deformation model describing movements made by the non-rigid deformation object when generating the image data, the deformation model being represented by a mapping between positions in the observation frame and positions in the specification frame. The control circuitry may be further configured to generate a deformable neural radiation field (D-NeRF) based on the position and viewing direction of the projected rays through the positions in the canonical frame, the D-NeRF providing a mapping between the position and viewing direction to a color and optical density at each position in the viewing frame, the color and optical density at each position in the viewing frame enabling viewing