US-12626515-B2 - Method and system for annotating sensor data

US12626515B2US 12626515 B2US12626515 B2US 12626515B2US-12626515-B2

Abstract

A computer-implemented method for annotating driving scenario sensor data, including the steps of receiving raw sensor data, the raw sensor data comprising a plurality of successive LIDAR point clouds and/or a plurality of successive camera images, recognizing objects in each image of the camera data and/or each point cloud using one or more neural networks, correlating objects within successive images and/or point clouds, removing false positive results on the basis of plausibility criteria, and exporting the annotated sensor data of the driving scenario.

Inventors

Daniel Roedler
Phillip Thomas
Simon Romanski
Georgi Urumov
Tobias Biester
Ruben Jacob
Boris Neubert

Assignees

DSPACE GMBH

Dates

Publication Date: 20260512
Application Date: 20230331
Priority Date: 20201117

Claims (13)

1 . A computer-implemented method for annotating driving scenario sensor data, wherein the method comprises: receiving raw sensor data, wherein the raw sensor data comprises a plurality of successive point clouds from a LIDAR sensor and/or a plurality of successive images from one or more cameras; recognizing objects in each image and/or each point cloud using one or more neural networks; correlating objects within successive images and/or point clouds; removing false positive results on a basis of plausibility criteria; and exporting annotated sensor data of a driving scenario, wherein the plausibility criteria for removing false positive results are based on a height above the ground, a duration of existence, and/or a confidence level of a neural network, wherein the raw sensor data comprises LIDAR point clouds and simultaneously acquired camera data, wherein said correlating objects is performed taking into account a relative spatial orientation of a LIDAR sensor and a camera, and wherein at least one neural network for attribute recognition is applied to a detected object and at least one attribute of the object is determined using a camera image and assigned to the object in the LIDAR point cloud.
2 . The method according to claim 1 , wherein the raw sensor data comprise point clouds from the LIDAR sensor, wherein a point cloud is divided into at least two regions, wherein a neural network of a first architecture is used to recognize objects in the first region and a neural network of a second architecture is used in the second region, and wherein the first architecture is different from the second architecture.
3 . The method according to claim 2 , wherein the first region comprises the immediate vicinity of a measuring vehicle, whereas the second region has a minimum distance from the measuring vehicle, wherein a center-point-based architecture is used in the first region and a point RCNN-based architecture is used in the second region for the neural network for object recognition.
4 . The method according to claim 1 , further comprising: removing duplicates before objects are correlated, wherein said removing the duplicates is based on an overlap criterion and/or a confidence level of a neural network.
5 . The method according to claim 4 , wherein said removing the duplicates of recognized objects is performed within an image and/or point cloud, wherein the objects are checked for two overlapping objects or a first object recognized with a first confidence level and a second object recognized with a second confidence level, wherein the first confidence level is higher than the second confidence level, wherein it is determined whether the overlap or an intersection over union exceeds a predefined threshold value, and wherein in case of the overlap, the second object is discarded as a duplicate.
6 . The method according to claim 1 , wherein said correlating objects comprises linking objects in successive frames, therefore, images and/or point clouds, and wherein an object in a first frame is correlated with an object in a second frame if the objects belong to the same object class and the overlap or an intersection over union exceeds a predefined threshold value.
7 . The method according to claim 6 , further comprising correcting missed objects, wherein more than two consecutive frames are analyzed, wherein if an object in a first frame was correlated with an object in a third frame but no object was detected in an intervening frame, the object is then inserted in the second frame.
8 . The method according to claim 1 , wherein correlating objects comprises predicting a position of an object on a subsequent image and/or a subsequent point cloud by means of a Gaussian process reduction or a Kalman filter and/or tracking of objects in successive images occurs by means of a factor graph, therefore, a bipartite graph for factoring the probability distribution.
9 . The method according to claim 1 , further comprising optimizing an object size and/or an object position in each image of the raw sensor data and/or each point cloud by regression.
10 . A non-transitory, computer-readable data storage medium containing instructions which, when executed by a processor of a computer system, cause the computer system to execute the method according to claim 1 .
11 . A computer system comprising a processor, a human-machine interface, and a nonvolatile memory, wherein the nonvolatile memory contains instructions which, when executed by the processor, cause the computer system to execute the method according to claim 1 .
12 . A computer-implemented method for annotating driving scenario sensor data, wherein the method comprises: receiving raw sensor data, wherein the raw sensor data comprises a plurality of successive point clouds from a LIDAR sensor and/or a plurality of successive images from one or more cameras; recognizing objects in each image and/or each point cloud using one or more neural networks; correlating objects within successive images and/or point clouds; removing false positive results on a basis of plausibility criteria; and exporting annotated sensor data of a driving scenario, wherein the raw sensor data comprises point clouds from a LIDAR sensor, wherein a point cloud is divided into at least two regions, wherein a neural network of a first architecture is used to recognize objects in a first region and a neural network of a second architecture is used in a second region, wherein the first architecture is different from the second architecture, and wherein the first region comprises an immediate vicinity of a measuring vehicle, whereas the second region has a minimum distance from the measuring vehicle, and wherein a center-point-based architecture is used in the first region and a point RCNN-based architecture is used in the second region for the neural network for object recognition.
13 . A computer-implemented method for annotating driving scenario sensor data, wherein the method comprises: receiving raw sensor data, wherein the raw sensor data comprises a plurality of successive point clouds from a LIDAR sensor and/or a plurality of successive images from one or more cameras; recognizing objects in each image and/or each point cloud using one or more neural networks; correlating objects within successive images and/or point clouds; removing false positive results on a basis of plausibility criteria; and exporting annotated sensor data of a driving scenario, wherein the raw sensor data comprise point clouds from a LIDAR sensor, wherein a point cloud is divided into at least two regions, wherein a neural network of a first architecture is used to recognize objects in a first region and a neural network of a second architecture is used in a second region, wherein the first architecture is different from the second architecture, and wherein said correlating objects comprises predicting a position of an object on a subsequent image and/or a subsequent point cloud by means of a Gaussian process reduction or a Kalman filter and/or tracking of objects in successive images occurs by means of a factor graph, therefore, a bipartite graph for factoring a probability distribution.

Description

This nonprovisional application is a continuation of International Application No PCT/EP2021/081845, which was filed on Nov. 16, 2021, and which claims priority to German Patent Application No 10 2020 130 335.1, which was filed in Germany on Nov. 17, 2020, and which are both herein incorporated by reference. BACKGROUND OF THE INVENTION Field of the Invention The invention relates to a computer-implemented method for automatically annotating driving scenario sensor data, a computer-readable data carrier, and a computer system. Description of the Background Art Autonomous driving promises an unprecedented level of comfort and safety in everyday traffic. Despite enormous investments by various companies, however, existing approaches can only be used under limited conditions or provide only a partial amount of truly autonomous behavior. One reason for this is the lack of a sufficient number and variety of driving scenarios. Annotated sensor data from driving scenarios are required for the training of autonomous driving functions. A common approach to data annotation, also called data enrichment, uses a large number of workers to manually annotate each image. As a result, conventional data enrichment methods are extremely time-consuming, error-prone, and therefore extremely expensive. A semi-automatic approach comprising keyframe annotation and interpolation/extrapolation provides some—limited—improvement. This is shown schematically in FIG. 2 and comprises the selection of a number of images as keyframes, which are manually annotated using propagation/extrapolation. After the manual editing of the keyframes, the annotations to the intervening frames are generated by interpolation. Thus, the recognition of objects on keyframes is performed by humans who continue to link related objects using extrapolation. The interpolation process then uses this information—object recognition and linking—to generate annotations for the same objects on all frames between keyframes. Theoretically, the efficiency of this mechanism can be increased by increasing the distance between keyframes, because more annotations will then be created automatically. However, greater spacing between keyframes is associated with a dramatic increase in the manual corrections required. Objects that can only be seen briefly on non-keyframes, for example, must be covered by manual intervention. This is where this automation approach reaches its limits rather quickly. Only small- to medium-sized data enrichment projects can therefore be tackled by conventional annotation strategies, whereas other higher-level functions such as validation of autonomous driving functions, selection of data, or the creation of scenario libraries are unattainable due to the enormous manual effort and associated costs. SUMMARY OF THE INVENTION It is therefore an object of the present invention to provide improved methods for annotating driving scenario sensor data; an automated annotation method with minimal need for human intervention would be especially desirable. The object is achieved by a method for annotating driving scenario sensor data, a computer-readable data carrier, and a computer system. In an exemplary embodiment, a computer-implemented method for annotating driving scenario sensor data is provided, comprising: receiving raw sensor data, wherein the raw sensor data comprise a plurality of successive LIDAR point clouds and/or a plurality of successive camera images, recognizing objects in each image of the camera data and/or each point cloud using one or more neural networks, wherein preferably an object class, an object position, an object size, and/or object extents are assigned to a recognized object, in particular the coordinates of a bounding box surrounding the object, correlating objects within successive images and/or point clouds, removing false positive results on the basis of plausibility criteria, and exporting the annotated sensor data of the driving scenario. Advantageously, the neural networks for object recognition can be optimized for high recall, therefore, for recognizing as high a percentage as possible of objects that are actually present, because the later removal of false positive results on the basis of plausibility criteria effectively minimizes false recognitions without manual intervention. The invention is based on the idea that semantic information derived from the temporal correlation of objects in the individual frames of an image can be effectively utilized using a series of steps/techniques. In this case, objects are recognized first that are expediently tracked or correlated with one another over a series of images as well. The employed techniques further comprise removing false positive results in which an object was erroneously recognized, filling of gaps, optimizing object size and position by regression, and smoothing trajectories. Some steps/techniques like a regression of object size and/or position can be optional. The