US-12626450-B1 - Point cloud enhancement using an infill mask and synthesized representation

US12626450B1US 12626450 B1US12626450 B1US 12626450B1US-12626450-B1

Abstract

A point cloud having occluded regions may be infilled with additional points by creating an infill mask and a synthesized representation, wherein the synthesized representation comprises generated information for points of the occluded regions. The infill mask and the synthesized representation may both be generated using a 2D version of the point cloud generated by rasterizing the 3D point cloud and respectively using a first and second machine learning techniques to generate the infill mask and the synthesized representation. Points identified in the occluded regions may be selected, matched with the information generated in the synthesized representation, and infilled into the point cloud.

Inventors

Magnus H Johnson
Eric Geusz
Jeremy R Bernstein
Novaira Masood
Pravalika Avvaru
Randal W Lamore

Assignees

APPLE INC.

Dates

Publication Date: 20260512
Application Date: 20220916

Claims (19)

1 . A non-transitory computer-readable medium storing program instructions that, when executed using one or more processors, cause the one or more processors to: generate, using a first machine learning algorithm, an infill mask for a point cloud, wherein the infill mask indicates occluded regions of the point cloud; generate, using a second machine learning algorithm, a synthesized representation of the point cloud comprising attribute values and depth values for points of the point cloud including occluded points; and at least partially infill occluded regions of the point cloud, wherein to infill the occluded regions, the program instructions cause the one or more processors to: select points to be added to the point cloud amongst the occluded regions indicated in the infill mask; determine, based on the synthesized representation, attribute values and depth values for the points selected to be added to the point cloud; at least partially infill the occluded regions of the point cloud using the determined attribute values and depth values for the points to be added to the point cloud; and cause the point cloud comprising infilled points to be rendered on a display of a device.
2 . The non-transitory computer-readable medium of claim 1 , wherein the point cloud comprises points located in three-dimensional (3D) space, and wherein: the infill mask comprises a two-dimensional (2D) image comprising pixels located at width and height locations corresponding to width and height dimensions of the point cloud, wherein the pixels of the infill mask further comprise an infill value indicating a probability of whether a corresponding point in the point cloud at the width and height dimensions corresponding to the width and height location of the pixel is an occluded point; the synthesized representation comprises a 2D image comprising pixels located at width and height locations corresponding to the width and the height dimensions of the point cloud, wherein the pixels of the synthesized representation further comprise pixel values indicating one or more attribute values and a depth value for a corresponding point in the point cloud located at a width and a height dimension corresponding to the width and height location of the pixel in the synthesized representation.
3 . The non-transitory computer-readable medium of claim 2 , wherein the program instructions, when executed using the one or more processors, further cause the one or more processors to: receive attribute values and spatial information for points of the point cloud, wherein the spatial information comprises information for determining locations of the points of the point cloud in three-dimensional (3D) space, generate, based on the received attribute values and spatial information, a two-dimensional (2D) representation of the point cloud, wherein depth values of the points of the point cloud in 3D space are represented as an additional attribute value of pixels in the 2D representation that correspond to the points of the point cloud in 3D space; wherein the 2D image of the infill mask and the 2D image of the synthesized representation are generated by the first machine learning algorithm and the second machine learning algorithm using the 2D representation of the point cloud as an input to the respective machine learning algorithms.
4 . The non-transitory computer-readable medium of claim 3 , wherein the program instructions, when executed using the one or more processors, further cause the one or more processors to: receive attribute values and spatial information for a plurality of frames of the point cloud corresponding to versions of the point cloud at plurality of moments in time; and generate 2D representations for the point cloud for respective ones of the frames, wherein the first and second machine learning algorithms further use temporal correlations between the plurality of frames of the point cloud to generate the infill mask and the synthesized representation.
5 . The non-transitory computer readable medium of claim 4 , wherein the second machine learning algorithm comprises: recurrent convolutional long short-term memory (LSTM) layers that utilize the plurality of frames to generate the synthesized representation of the point cloud comprising the attribute values and the depth values for the points of the point cloud including the occluded points.
6 . The non-transitory computer readable medium of claim 4 , wherein the second machine learning algorithm comprises: a recurrent generative adversarial network (GAN) that utilize the plurality of frames to generate the synthesized representation of the point cloud comprising the attribute values and the depth values for the points of the point cloud including the occluded points.
7 . The non-transitory computer-readable medium of claim 2 , wherein the program instructions, when executed using the one or more processors, further cause the one or more processors to: up-scale the point cloud in the height, width, or depth direction, wherein the infill mask and the synthesized representation are generated for the up-scaled version of the point cloud.
8 . The non-transitory computer-readable medium of claim 1 , wherein the machine learning algorithm that generates the infill mask is trained to differentiate between occluded regions and naturally sparse regions of the point cloud.
9 . The non-transitory computer-readable medium of claim 1 , wherein to generate the infill mask, the program instructions, when executed using the one or more processors, further cause the one or more processors to: determine depth gradients between sets of points of the point cloud; and for points in one or more regions of the point cloud with a depth gradient greater than a threshold value exempt the points in the one or more regions with high depth gradients from being candidates for sampling for points to be added to the point cloud.
10 . The non-transitory computer-readable medium of claim 1 , wherein the second machine learning algorithm comprises: a generative adversarial (GAN) network.
11 . The non-transitory computer-readable medium of claim 1 , wherein the second machine learning algorithm comprises: a sinusoidal representation network.
12 . The non-transitory computer-readable medium of claim 1 , wherein to generate the infill mask, the program instructions, when executed using the one or more processors, further cause the one or more processors to: apply object heuristics to identify objects in the point cloud; and use the identified objects to determine occluded regions of the point cloud.
13 . A device comprising: a display; a memory storing program instructions; and one or more processors, wherein the program instructions, when executed using the one or more processors, cause the one or more processors to: generate, via a first machine learning algorithm, an infill mask for a point cloud, wherein the infill mask indicates occluded regions of the point cloud; generate, via a second machine learning algorithm, a synthesized representation of the point cloud comprising attribute values and depth values for points of the point cloud including occluded point; and at least partially infill the occluded regions of the point cloud, wherein to infill the occluded regions, the program instructions cause the one or more processors to: sample the occluded-regions of the infill mask to determine points to be added to the point cloud; determine, based on the synthesized representation, attribute values and depth values for the points to be added to the point cloud; project the points sampled from the occluded regions of the infill mask into the point cloud, wherein the projected points have the attribute values and depth values determined using the synthesized representation; and cause the point cloud comprising infilled points to be rendered on the display of the device.
14 . The device of claim 13 , further comprising: a LiDAR sensor, wherein the program instructions, when executed using the one or more processors, further cause the one or more processors to: cause the point cloud to be captured using the LiDAR sensor of the device.
15 . The device of claim 14 , wherein the program instructions, when executed using the one or more processors, further cause the one or more processors to: encode spatial information and attribute information for the point cloud comprising infilled points.
16 . The device of claim 13 , wherein the program instructions, when executed using the one or more processors, further cause the one or more processors to: receive an encoded bit stream comprising attribute values and spatial information for points of the point cloud; and decode the encode bit stream to determine the attribute values and spatial information for the points of the point cloud.
17 . The device of claim 16 , wherein the program instructions, when executed using the one or more processors, further cause the one or more processors to: generate, based on the decoded attribute values and spatial information, a two-dimensional (2D) representation of the point cloud, wherein depth values of the points of the point cloud in 3D space are represented as an additional attribute value of pixels in the 2D representation that correspond to the points of the point cloud in 3D space; wherein the 2D image of the infill mask and the 2D image of the synthesized representation are generated by the first machine learning algorithm and the second machine learning algorithm using the 2D representation of the point cloud as an input to the respective machine learning algorithms.
18 . A method, comprising: generating, via a first machine learning algorithm, an infill mask for a point cloud, wherein the infill mask indicates occluded regions of the point cloud; generating, via a second machine learning algorithm, a synthesized representation of the point cloud comprising attribute values and depth values for points of the point cloud including occluded points; at least partially filling the occluded regions of the point cloud, wherein performing said filling the occluded regions comprises: sampling the occluded-regions of the infill mask to determine points to be added to the point cloud; determining, based on the synthesized representation, attribute values and depth values for the points to be added to the point cloud; and projecting the points sampled from the occluded regions of the infill mask into the point cloud, wherein the projected points have the attribute values and depth values determined using the synthesized representation; and causing the point cloud comprising infilled points to be rendered on a display of a device.
19 . The method of claim 18 , wherein the second machine learning algorithm comprises one or more of: a generative adversarial (GAN) network; or a sinusoidal representation network.

Description

This application claims benefit of priority to U.S. Provisional Application Ser. No. 63/247,768, entitled “Point Cloud Enhancement Using an Infill Mask and Synthesized Representation,” filed Sep. 23, 2021, and which is hereby incorporated herein by reference in its entirety. BACKGROUND Technical Field This disclosure relates generally to techniques for rendering a scene from a point cloud using an infill mask. Background Various types of sensors, such as light detection and ranging (LiDAR) systems, 3D-cameras, 3D scanners, etc. may capture data indicating positions of points in three-dimensional (3D) space, for example positions in the X, Y, and Z planes. Also, such systems may further capture attribute information in addition to spatial information for the respective points, such as color information (e.g., RGB values), intensity attributes, reflectivity attributes, motion related attributes, modality attributes, or various other attributes. In some circumstances, additional attributes may be assigned to the respective points, such as a time-stamp when the point was captured. Points captured by such sensors may make up a “point cloud” comprising a set of points each having associated spatial information and one or more associated attributes. In some circumstances, a point cloud may include thousands, hundreds of thousands, millions, or a greater number of points. Also, in some circumstances, point clouds may be generated, for example in software, as opposed to being captured by one or more sensors. SUMMARY In some aspects, a point cloud infill module is configured to generate an infill mask for a point cloud using a first machine learning algorithm, wherein the infill mask indicates occluded regions of the point cloud. For example, for various regions some points of a point cloud in an occluded region may be omitted from a captured point cloud. As an example, sensors that capture the point cloud may be obstructed such that information for the points in the occluded region is not captured. The point cloud infill module is further configured to generate, using a second machine learning algorithm, a synthesized representation of the point cloud. In some aspects, to generate the infill mask and the synthesized representation, a version of the point cloud in three-dimensional space (e.g., 3D) may be converted to a two-dimensional representation (e.g., 2D), wherein the 2D representation is used by the first and second machine learning algorithms to generate the infill mask and the synthesized representation. In some aspects, the second machine learning algorithm may determine values for pixels of a 2D representation corresponding to occluded points based on values for other non-occluded points that are included in the 2D representation provided to the second machine learning algorithm. Various machine learning techniques as further described herein may be used to implement the first and second machine learning algorithms that generates the infill mask and the synthesized representation. The point cloud infill module may use the infill mask and the synthesized representation to at least partially infill the occluded regions of the point cloud. For example, instead of randomly adding points to the point cloud, or adding points without regard to which regions are occluded or not, the point cloud infill module may use the infill mask to determine regions of the point cloud that are occluded and therefore need more points. The point cloud infill module may further use the synthesized representation to determine values to assign to points to be added at locations determined using the infill mask. For example, pixels in occluded regions of the infill mask may be sampled to determine points to be added to the point cloud, and corresponding pixels in the synthesized representation may be used to determine depth and/or other attribute values, such as color values to be assigned to the points to be added to the point cloud in the occluded regions (as determined from the infill mask). The point cloud infill module may further include the added points in the occluded regions in an augmented (e.g., infilled) version of the point cloud. In some aspects, the point cloud infill module may be implemented on an encoder side, wherein the point cloud is augmented with additional points added to occluded regions prior to being encoded. In some aspects, the point cloud infill module may be implemented on a decoder side, wherein a received point cloud is infilled as part of reconstructing a reconstructed version of the point cloud from a received encoded version of the point cloud. In some aspects, a point cloud infill module may be implemented at both an encoder side and a decoder side. Also, in some aspects, a point cloud infill module may be implemented in other locations, such as in a network between an encoder and decoder. BRIEF DESCRIPTION OF THE DRAWINGS FIG. 1A illustrates a system comprising a sensor that captures information