DE-102025141256-A1 - Iterative determination of the attention area on spatially related point cloud buckets

DE102025141256A1DE 102025141256 A1DE102025141256 A1DE 102025141256A1DE-102025141256-A1

Abstract

A method comprises receiving a three-dimensional (3D) point cloud and generating a plurality of buckets by applying a spatial hash function to the coordinates of points in the 3D point cloud. The method includes arranging the points into a contiguous block of memory based on the spatial hash function. The method includes performing a plurality of attention iterations. Each attention iteration includes selecting a subset of the plurality of buckets to define an attentional area for the attention iteration, loading point features corresponding to the points in the attentional area from a contiguous block of memory into a cache, and generating updated point features by performing a multi-head self-attention operation on the loaded point features. The method includes generating a feature representation of the 3D point cloud based on the updated point features from the plurality of attention iterations.

Inventors

Liyan Chen

Assignees

GM CRUISE HOLDINGS LLC

Dates

Publication Date: 20260513
Application Date: 20251009
Priority Date: 20250914

Claims (10)

A computer-implemented procedure, executed on data processing hardware, that causes the data processing hardware to perform operations, comprising: Receiving a three-dimensional (3D) point cloud; Generating a plurality of buckets by applying a spatial hash function to the coordinates of points in the 3D point cloud, each bucket containing a corresponding group of spatially close points; Arranging, based on the spatial hash function, the points into a contiguous block of memory, with points assigned to the same bucket being located at adjacent memory locations within the contiguous block of memory; Performing a plurality of attention iterations, each attention iteration comprising: Selecting a subset of the plurality of buckets to define an attention area for the attention iteration, the selection being made without physically reordering the points within the contiguous block of memory; Loading point features corresponding to the points in the attention area from the contiguous block of memory into a cache; and Generating updated point features by performing a multi-head self-attention operation on the loaded point features; and Generating, based on the updated point features from the multitude of attention iterations, a feature representation of the 3D point cloud.
Procedure according to Claim 1 , where the spatial hash function includes at least one of: an XOR mod hash; an XOR div hash; a Z-order mod hash; or a Z-order div hash.
Procedure according to Claim 2 , where performing the multitude of attention iterations comprises: performing a first set of iterations using a first spatial hash function; and performing a second set of iterations using a second spatial hash function that differs from the first spatial hash function.
Procedure according to Claim 1 , where selecting the subset of the multitude of buckets in a subsequent attention iteration involves selecting a shifted subset of buckets relative to a previous attention iteration.
Procedure according to Claim 1 , where: the data processing hardware includes a graphics processing unit (GPU); and the cache includes a local high-speed cache of the GPU.
Procedure according to Claim 5 , where the size of each bucket in the multitude of buckets is aligned with a memory tile size of the GPU to optimize loading operations.
Procedure according to Claim 1 , wherein the operations further include, after at least one attention iteration, performing a pooling operation on at least one bucket of the plurality of buckets in order to subject the points within the at least one bucket to downsampling processing.
Procedure according to Claim 7 , where performing the operation to pool into buckets includes: partitioning the points within the at least one bucket into a plurality of sub-buckets; and applying a reduction function to the point features within each sub-bucket to generate a reduced feature vector representing each respective sub-bucket.
Procedure according to Claim 8 , whereby the partitioning of the points into the multitude of sub-buckets and the application of the reduction function are performed entirely within the cache.
Procedure according to Claim 1 , the operations further include controlling a vehicle based on the feature representation of the 3D point cloud.

Description

Cross-reference to related registration This application claims priority under 35 USC §119(e) over the preliminary US application with serial number filed on November 13, 2024. 63/720,108 The disclosure of this earlier application is considered to be part of the disclosure of this application and is hereby incorporated by reference in its entirety. Introduction The information provided in this section serves to give a general overview of the context of the disclosure. The work of the inventors mentioned herein, insofar as it is described in this section, as well as aspects of the description that cannot be classified as prior art at the time of filing, are neither expressly nor implicitly recognized as prior art with respect to the present disclosure. This disclosure relates generally to the processing of three-dimensional (3D) point cloud data using deep learning models and, more specifically, to feature extraction from such data. Systems for analyzing 3D point clouds often utilize specialized neural network architectures, such as point transformers, to learn meaningful representations from raw or unprocessed spatial data for applications including autonomous driving and robotics. These transformer-based models typically employ multi-head self-attention mechanisms to capture complex geometric structures and relationships between points. To manage the computational demands of processing point clouds at scale, attention operations can be applied to localized groups of points. Information across different groups can be aggregated through techniques involving the processing of various groupings of points in subsequent stages of a network, with computations often performed on parallel processing hardware such as graphics processing units (GPUs). Summary One aspect of the disclosure provides a procedure implemented in a computer and executed on data processing hardware that causes the data processing hardware to perform operations. These operations include receiving a three-dimensional (3D) point cloud and generating a plurality of buckets by applying a spatial hash function to the coordinates of points in the 3D point cloud. Each bucket contains a corresponding group of spatially close points. The operations also include arranging the points, based on the spatial hash function, into a contiguous block of memory. Points assigned to the same bucket are located at adjacent memory locations within the contiguous block. Finally, the operations include performing a plurality of attention iterations. Each attention iteration involves selecting a subset of the multitude of buckets to define an attentional area for the iteration, loading point features corresponding to the points in the attentional area from the contiguous memory block into a cache, and generating updated point features by performing a multi-head self-attention operation on the loaded point features. The selection is performed without physically reordering the points within the contiguous memory block. The operations include generating a feature representation of the 3D point cloud based on the updated point features from the multitude of attention iterations. Implementations of the revelation may include one or more of the following optional features. In some implementations, the spatial hash function includes at least one XOR-mod hash, one XOR-div hash, one Z-order mod hash, or one Z-order div hash. In these implementations, performing the multitude of attention iterations involves performing a first set of iterations using a first spatial hash function and performing a second set of iterations using a second spatial hash function that is different from the first. Selecting the subset of the multitude of buckets in a subsequent attention iteration may involve selecting a shifted subset of buckets relative to a previous attention iteration. In some examples, the data processing hardware includes a graphics processing unit (GPU), and the cache includes a local, high-speed GPU cache. In these examples, the size of each bucket within the multitude of buckets can be aligned with a memory tile size of the GPU to optimize loading operations. In some implementations, this includes... The operations further include, after at least one attention iteration, performing an in-bucket pooling operation on at least one bucket of the plurality of buckets to subject the points within that bucket to downsampling. In these implementations, performing the in-bucket pooling operation can include partitioning the points within the at least one bucket into a plurality of sub-buckets and applying a reduction function to the point features within each sub-bucket to generate a reduced feature vector representing each respective sub-bucket. Here, the partitioning of the points into the plurality of sub-buckets and the application of the reduction function can be performed entirely within the cache. The operations can further include controlling a vehicle based on the feature rep