US-12620167-B2 - Using vector graphics to create 3D content

US12620167B2US 12620167 B2US12620167 B2US 12620167B2US-12620167-B2

Abstract

Deep learning techniques such as vector graphics are used to create 3D content and assets for metaverse applications. Vector graphics is a scalable format that provides rich 3D content. A vector graphics encoder such as a deep neural network such as a recurrent neural network (RNN) or transformer receives vector graphics and generates an encoded output. The encoded output is decoded by a 3D decoder such as another deep neural network that outputs 2D graphics for comparison with the original image. Loss is computed between the original and the output of the 3D decoder. The loss is back propagated to train the vector graphics encoder to generate 3D content.

Inventors

Sudha Krishnamurthy

Assignees

SONY INTERACTIVE ENTERTAINMENT INC.

Dates

Publication Date: 20260505
Application Date: 20220926

Claims (18)

1 . A method for training a machine learning (ML) model using vector graphics and raster graphics in parallel, the method comprising: inputting vector graphics representing a first two dimensional (2D) image to the ML model; receiving from the ML model a first 3D asset; inputting raster graphics representing a second 2D image to the ML model; receiving from the ML model a second 3D asset; and using the first 3D asset and the second 3D asset to train the ML model, including: converting the first 3D asset to a first converted 2D image; converting the second 3D asset to a second converted 2D image; comparing the first converted 2D image to the first 2D image to generate a first loss indication; comparing the second converted 2D image to the second 2D image to generate a second loss indication; and providing the first loss indication and the second loss indication back to the ML model.
2 . The method of claim 1 , wherein the first 2D image and the second 2D image are the same image.
3 . The method of claim 1 , wherein the first 2D image and the second 2D image are different images.
4 . The method of claim 1 , wherein the first loss indication is generated by a first loss function and the second loss indication is generated by a second loss function, the first loss function and the second loss function being the same loss function.
5 . The method of claim 1 , wherein the first loss indication is generated by a first loss function and the second loss indication is generated by a second loss function, the first loss function and the second loss function being different loss functions.
6 . The method of claim 1 , wherein the ML model comprises at least one recurrent neural network (RNN).
7 . One or more non-transitory computer storage media storing instructions that when executed by one or more computers cause the one or more computers to perform operations for training a machine learning (ML) model using vector graphics and raster graphics in parallel, the operations comprising: inputting vector graphics representing a first two dimensional (2D) image to the ML model; receiving from the ML model a first 3D asset; inputting raster graphics representing a second 2D image to the ML model; receiving from the ML model a second 3D asset; and using the first 3D asset and the second 3D asset to train the ML model, including: converting the first 3D asset to a first converted 2D image; converting the second 3D asset to a second converted 2D image; comparing the first converted 2D image to the first 2D image to generate a first loss indication; comparing the second converted 2D image to the second 2D image to generate a second loss indication; and providing the first loss indication and the second loss indication back to the ML model.
8 . The one or more non-transitory computer storage media of claim 7 , wherein the first 2D image and the second 2D image are the same image.
9 . The one or more non-transitory computer storage media of claim 7 , wherein the first 2D image and the second 2D image are different images.
10 . The one or more non-transitory computer storage media of claim 7 , wherein the first loss indication is generated by a first loss function and the second loss indication is generated by a second loss function, the first loss function and the second loss function being the same loss function.
11 . The one or more non-transitory computer storage media of claim 7 , wherein the first loss indication is generated by a first loss function and the second loss indication is generated by a second loss function, the first loss function and the second loss function being different loss functions.
12 . The one or more non-transitory computer storage media of claim 7 , wherein the ML model comprises at least one recurrent neural network (RNN).
13 . A system comprising: a processor; and memory coupled to the processor and having stored therein instructions that, when executed by the processor, cause the processor to perform operations for training a machine learning (ML) model using vector graphics and raster graphics in parallel, the operations comprising: inputting vector graphics representing a first two dimensional (2D) image to the ML model; receiving from the ML model a first 3D asset; inputting raster graphics representing a second 2D image to the ML model; receiving from the ML model a second 3D asset; and using the first 3D asset and the second 3D asset to train the ML model, including: converting the first 3D asset to a first converted 2D image; converting the second 3D asset to a second converted 2D image; comparing the first converted 2D image to the first 2D image to generate a first loss indication; comparing the second converted 2D image to the second 2D image to generate a second loss indication; and providing the first loss indication and the second loss indication back to the ML model.
14 . The system of claim 13 , wherein the first 2D image and the second 2D image are the same image.
15 . The system of claim 13 , wherein the first 2D image and the second 2D image are different images.
16 . The system of claim 13 , wherein the first loss indication is generated by a first loss function and the second loss indication is generated by a second loss function, the first loss function and the second loss function being the same loss function.
17 . The system of claim 13 , wherein the first loss indication is generated by a first loss function and the second loss indication is generated by a second loss function, the first loss function and the second loss function being different loss functions.
18 . The system of claim 13 , wherein the ML model comprises at least one recurrent neural network (RNN).

Description

FIELD The present application relates generally to using vector graphics to create 3D content. BACKGROUND Graphics for computer simulations such as computer games may include three dimensional (3D) objects mixed with 2D objects, and in particular foreground objects may be rendered in 3D and background objects may be rendered in 2D. 2D objects from, e.g., artists or game developers that are desired to be rendered in 3D in the final product can be created using multiple 2D raster graphics images. SUMMARY As understood herein, in rendering 3D graphics, instead of using multiple 2D raster graphics images, vector graphics may be used by a trained machine learning (ML) model to generate 3D assets for computer simulations in a more scalable manner using relatively richer metadata. In greater detail, conventional 2D to 3D reconstruction methods typically reconstruct 3D from 2D rasterized images. However, as understood herein vector graphics representation of a 2D object provides rich metadata that contains geometric information about the vertices and edges of the object. This can be leveraged along with the pixel information from rasterized images of the object to train machine learning models to volumetrically render a 2D representation into a 3D object. The 3D reconstructed objects can then be embedded into gaming and metaverse environments. Accordingly, a device includes at least one computer storage that is not a transitory signal and that in turn includes instructions executable by at least one processor to input to at least one machine learning model (ML) model vector graphics representing at least one two dimensional (2D) image. The instructions are executable to receive from the ML model at least one three dimensional (3D) asset responsive to the input, and present, in at least one computer simulation, the 3D asset. In some embodiments the instructions can be executable to input to the ML model raster graphics representing at least one 2D image and receive from the ML model at least one 3D asset responsive to the input of the raster graphics. The instructions can be executed to present, in the at least one computer simulation, the 3D asset. The 2D image represented by the raster graphics can be the same 2D image represented by the vector graphics, or it can be a different image. In another aspect, an apparatus includes at least one computer storage that is not a transitory signal and that in turn includes instructions executable by at least one processor to receive vector graphics representing original two dimensional (2D) images. The instructions are executable to input the vector graphics to at least one machine learning (ML) model, and to receive from the ML model three dimensional (3D) representations of each of the 2D images. The instructions are executable to convert at least some of the 3D representations to converted 2D images, and based at least in part on plural of the converted 2D images and respective original 2D images, generate a loss indication that is provided to the ML model to train the model. The original 2D images can be considered original first 2D images, the 3D representations can be considered first 3D representations, the converted 2D images can be considered first converted 2D images, the loss indication can be considered a first loss indication, and the instructions may be executable to receive raster graphics representing original second 2D images. In these examples, the instructions may be executable to input the raster graphics to the at least one ML model, receive from the ML model second 3D representations of each of the original second 2D images, and convert at least some of the second 3D representations to second converted 2D images. Based at least in part on plural of the second converted 2D images and respective original second 2D images, a second loss indication can be generated and provided to the ML model to train the model. In some implementations, the original second 2D images are the same as the original first 2D images. In other implementations, the original second 2D images are not the same as the original first 2D images. In some implementations, the first loss indication is generated by a first loss function and the second loss indication is generated by a second loss function that is the same as the first loss function. In other implementations, the first and second loss functions are not the same. In another aspect, a method includes inputting vector graphics representing at least one original two dimensional (2D) image to at least one machine learning (ML) model. The method includes receiving from the ML model at least one 3D asset, using the 3D asset to train the ML model, and/or presenting the 3D asset on at least one display. The details of the present application, both as to its structure and operation, can be best understood in reference to the accompanying drawings, in which like reference numerals refer to like parts, and in which: BRIEF DESCRIPTION OF