US-20260127748-A1 - HIGH-ACCURACY NON-CAUSAL TRACKING THROUGH ITERATIVE FORWARD-BACKWARD POINT-CLOUD AGGREGATION

US20260127748A1US 20260127748 A1US20260127748 A1US 20260127748A1US-20260127748-A1

Abstract

A method for tracking objects of interest includes obtaining input data generated by sensors of a vehicle; generating, based on the input data, a point cloud sequence comprising a plurality of point clouds, wherein each point cloud in the point cloud sequence represents a picture of an environment surrounding the vehicle at a specific moment in time; processing the point cloud sequence in a forward direction to generate a first set of tracking identifiers (IDs) for objects detected in the point cloud sequence; processing the point cloud sequence in a backward direction to generate a second set of tracking IDs for objects detected in the point cloud sequence; combining the first set and the second set of tracking IDs to generate a combined set of tracking IDs for the objects; and tracking the objects using the point cloud sequence and the combined set of tracking IDs to generate tracking output.

Inventors

Madhumitha Sakthi
Amin Ansari
Sai Madhuraj JADHAV

Assignees

QUALCOMM INCORPORATED

Dates

Publication Date: 20260507
Application Date: 20241101

Claims (20)

1 . A method for tracking objects of interest comprising: obtaining input data generated by one or more sensors of a vehicle; generating, based on the input data, a point cloud sequence comprising a plurality of point clouds, wherein each point cloud in the point cloud sequence represents a picture of an environment surrounding the vehicle at a specific moment in time; processing the point cloud sequence in a forward direction to generate a first set of tracking identifiers (IDs) for one or more objects detected in the point cloud sequence; processing the point cloud sequence in a backward direction to generate a second set of tracking IDs for one or more objects detected in the point cloud sequence; combining the first set of tracking IDs and the second set of tracking IDs to generate a combined set of tracking IDs for the one or more objects; and tracking the one or more objects using the point cloud sequence and the combined set of tracking IDs to generate tracking output.
2 . The method of claim 1 , further comprising: iteratively aggregating a plurality of LiDAR points to the point cloud sequence until a termination condition is met.
3 . The method of claim 2 , wherein the termination condition comprises no changes in the combined set of tracking IDs between two consecutive iterations.
4 . The method of claim 2 , wherein iteratively aggregating the plurality of LiDAR points further comprises: generating pseudo-LiDAR data for one or more tracking IDs in the combined set based on the input data generated by one or more cameras of the vehicle.
5 . The method of claim 4 , further comprising: incorporating the pseudo-LiDAR data into the aggregated plurality of LiDAR data points for an object associated with a corresponding tracking ID.
6 . The method of claim 5 , wherein the termination condition comprises no changes in the aggregated plurality of LiDAR data points between two consecutive iterations.
7 . The method of claim 1 , wherein combining the first set of tracking IDs and the second set of tracking IDs comprises: assigning consistent track IDs to objects detected in both the processing the point cloud sequence in the forward direction and the processing the point cloud sequence in the backward direction.
8 . The method of claim 4 , further comprising: refining the tracking output based on the aggregated plurality of LiDAR points.
9 . The method of claim 1 , further comprising operating an Advanced Driver Assistance System (ADAS) based on the tracking output.
10 . A system for tracking objects of interest, the system comprising: a memory for storing input data; and processing circuitry in communication with the memory, wherein the processing circuitry is configured to: obtain the input data generated by one or more sensors of a vehicle; generate, based on the input data, a point cloud sequence comprising a plurality of point clouds, wherein each point cloud in the point cloud sequence represents a picture of an environment surrounding the vehicle at a specific moment in time; process the point cloud sequence in a forward direction to generate a first set of tracking identifiers (IDs) for one or more objects detected in the point cloud sequence; process the point cloud sequence in a backward direction to generate a second set of tracking IDs for one or more objects detected in the point cloud sequence; combine the first set of tracking IDs and the second set of tracking IDs to generate a combined set of tracking IDs for the one or more objects; and track the one or more objects using the point cloud sequence and the combined set of tracking IDs to generate tracking output.
11 . The system of claim 10 , wherein the processing circuitry is further configured to: iteratively aggregate a plurality of LiDAR points to the point cloud sequence until a termination condition is met.
12 . The system of claim 11 , wherein the termination condition comprises no changes in the combined set of tracking IDs between two consecutive iterations.
13 . The system of claim 11 , wherein the processing circuitry configured to iteratively aggregate the plurality of LiDAR points is further configured to: generate pseudo-LiDAR data for one or more tracking IDs in the combined set based on the input data generated by one or more cameras of the vehicle.
14 . The system of claim 13 , wherein the processing circuitry is further configured to: incorporate the pseudo-LiDAR data into the aggregated plurality of LiDAR data points for an object associated with a corresponding tracking ID.
15 . The system of claim 14 , wherein the termination condition comprises no changes in the aggregated plurality of LiDAR data points between two consecutive iterations.
16 . The system of claim 10 , wherein the processing circuitry configured to combine the first set of tracking IDs and the second set of tracking IDs is further configured to: assign consistent track IDs to objects detected in both the processing the point cloud sequence in the forward direction and the processing the point cloud sequence in the backward direction.
17 . The system of claim 13 , wherein the processing circuitry is further configured to: refine the tracking output based on the aggregated plurality of LiDAR points.
18 . The system of claim 10 , wherein the processing circuitry is further configured to: operate an Advanced Driver Assistance System (ADAS) based on the generated tracking output.
19 . Non-transitory computer-readable storage media having instructions encoded thereon, the instructions configured to cause processing circuitry to: obtain input data generated by one or more sensors of a vehicle; generate, based on the input data, a point cloud sequence comprising a plurality of point clouds, wherein each point cloud in the point cloud sequence represents a picture of an environment surrounding the vehicle at a specific moment in time; process the point cloud sequence in a forward direction to generate a first set of tracking identifiers (IDs) for one or more objects detected in the point cloud sequence; process the point cloud sequence in a backward direction to generate a second set of tracking IDs for one or more objects detected in the point cloud sequence; combine the first set of tracking IDs and the second set of tracking IDs to generate a combined set of tracking IDs for the one or more objects; and track the one or more objects using the point cloud sequence and the combined set of tracking IDs to generate tracking output.
20 . The non-transitory computer-readable storage media of claim 19 , wherein the instructions are further configured to cause the processing circuitry to: iteratively aggregate a plurality of LiDAR points to the point cloud sequence until a termination condition is met.

Description

TECHNICAL FIELD This disclosure relates to image processing. BACKGROUND Among other challenges, autonomous driving systems need to accurately detect and track moving objects such as vehicles, pedestrians, and cyclists in real time. In autonomous driving, tracking may involve annotations for every frame (picture) of a sensor output, while detection can often get by with sparse annotations (e.g., once every 10 pictures). This is because tracking involves continuously updating the location of an object over time, whereas detection may only involve identifying the presence or absence of the object in a given picture. Tracking annotations may also specify the identity of each object, which may add another layer of complexity. Object identity may be used because tracking may involve following the same object across multiple pictures and the object being tracked may be distinguished from other objects. In many contemporary autonomous driving systems, the annotations may be more complex for tracking. For example, tracking annotations may specify the bounding box, orientation, and potentially other attributes of the object, while detection annotations may only include a bounding box. In some examples, annotating a medium-sized dataset, even with experienced annotators, may take several months. SUMMARY This disclosure describes techniques for object tracking. These techniques may involve tracking objects in a video sequence from the first picture in the video sequence to the last picture in the video sequence using a forward pass. During the forward pass, the disclosed techniques may assign a unique ID to each tracked object. The forward pass may provide an initial estimate of the trajectory of the object. The disclosed techniques may also track objects in the video sequence from the last picture in the video sequence to the first picture in the video sequence using a backward pass. During the backward pass, the disclosed techniques may assign unique IDs to tracked objects in this reverse direction. The backward pass may provide a complementary perspective on the trajectory of the object. In one example, a method for tracking objects of interest includes obtaining input data generated by one or more sensors of a vehicle; generating, based on the input data, a point cloud sequence comprising a plurality of point clouds, wherein each point cloud in the point cloud sequence represents a picture of an environment surrounding the vehicle at a specific moment in time; processing the point cloud sequence in a forward direction to generate a first set of tracking identifiers (IDs) for one or more objects detected in the point cloud sequence; processing the point cloud sequence in a backward direction to generate a second set of tracking IDs for one or more objects detected in the point cloud sequence; combining the first set of tracking IDs and the second set of tracking IDs to generate a combined set of tracking IDs for the one or more objects; and tracking the one or more objects using the point cloud sequence and the combined set of tracking IDs to generate tracking output. In another example, a system for tracking objects of interest includes a memory for storing input data; and processing circuitry in communication with the memory. The processing circuitry is configured to obtain the input data generated by one or more sensors of a vehicle and generate, based on the input data, a point cloud sequence comprising a plurality of point clouds. Each point cloud in the point cloud sequence represents a picture of an environment surrounding the vehicle at a specific moment in time. The processing circuitry is also configured to process the point cloud sequence in a forward direction to generate a first set of tracking identifiers (IDs) for one or more objects detected in the point cloud sequence and process the point cloud sequence in a backward direction to generate a second set of tracking IDs for one or more objects detected in the point cloud sequence. The processing circuitry is further configured to combine the first set of tracking IDs and the second set of tracking IDs to generate a combined set of tracking IDs for the one or more objects and to track the one or more objects using the point cloud sequence and the combined set of tracking IDs to generate tracking output. In yet another example, non-transitory computer-readable storage media having instructions encoded thereon, the instructions configured to cause processing circuitry to: obtain input data generated by one or more sensors of a vehicle and generate, based on the input data, a point cloud sequence comprising a plurality of point clouds. Each point cloud in the point cloud sequence represents a picture of an environment surrounding the vehicle at a specific moment in time. Additionally, the instructions are configured to cause processing circuitry to: process the point cloud sequence in a forward direction to generate a first set of tracking identifiers (ID