US-12623666-B2 - System for localizing three-dimensional objects

US12623666B2US 12623666 B2US12623666 B2US 12623666B2US-12623666-B2

Abstract

Disclosed herein are system, method, and computer program product embodiments for localizing three-dimensional objects relative to a vehicle. The system includes: at least one sensor for generating two-dimensional (2D) data and a three-dimensional (3D) point cloud of an environment external to a vehicle. The 3D point cloud includes object points associated with a stationary traffic control object. The localization system also includes a memory and at least one processor coupled to the memory. The processor is programmed to: select a bounding box associated with the object from the memory based on the 2D data; arrange the bounding box proximate to the object points in the 3D point cloud; assign a weight to each point of the 3D point cloud based on a position of the point relative to the bounding box; filter the weighted points; and generate a 3D location of the object based on the filtered points.

Inventors

Nikolaus Jonathan MITCHELL
Yong-Dian Jian

Assignees

FORD GLOBAL TECHNOLOGIES, LLC

Dates

Publication Date: 20260512
Application Date: 20211110

Claims (20)

1 . A localization system comprising: at least one sensor for generating two-dimensional (2D) data and a three-dimensional (3D) point cloud of an environment external to a vehicle, wherein the 3D point cloud includes object points associated with an object; a memory; and at least one processor coupled to the memory and programmed to: select a bounding box associated with the object from the memory based on the 2D data; arrange the bounding box about the object points in the 3D point cloud; assign a weight to each point of the 3D point cloud based on a position of the point relative to the bounding box and based on a result of an occlusion test, wherein the result is determined by: imposing a first representation to a first collection of points, the first representation being in a first predefined 2D shape that is irrelevant to a shape of the first collection of points, imposing a second representation to a second collection of points, the second representation being in a second predefined 2D shape that is irrelevant to a shape of the second collection of points, the second predefined 2D shape being different from the first predefined 2D shape, and responsive to determining the first predefined 2D shape and the second predefined 2D shape overlap within the bounding box, assigning higher weight for the first collection of points within the bounding box than the second collection of points within the bounding box, the first collection of points being closer to the one or more sensors than the second collection of points to the one or more sensors; filter the weighted points; and generate a 3D location of the object based on the filtered points.
2 . The localization system of claim 1 , wherein the at least one processor is further programmed to: generate a polyhedron extending between a position of the at least one sensor and the bounding box in the 3D point cloud; and increment the weight of each point of the 3D point cloud that is located within the polyhedron.
3 . The localization system of claim 1 , wherein the at least one processor is further programmed to increment the weight of each point of the 3D point cloud that is located distal to the bounding box along a longitudinal axis extending between a position of the at least one sensor and the bounding box.
4 . The localization system of claim 1 , wherein the at least one processor is further programmed to increment the weight of each point of the 3D point cloud that is located radially adjacent to a longitudinal axis extending between a position of the at least one sensor and the bounding box.
5 . The localization system of claim 1 , wherein the at least one processor is further programmed to filter the weighted points by removing weighted points that are less than a threshold value.
6 . The localization system of claim 1 , wherein the at least one processor is further programmed to: cluster the filtered points; and generate the 3D location of the object based on at least one of a comparison of the clustered points to predetermined data associated with a size and shape of the object, and a centroid of the clustered points.
7 . The localization system of claim 1 , wherein the at least one processor is further programmed to: cluster the filtered points to form potential clustered points; compare the potential clustered points to predetermined data associated with a size and shape of the object to determine final clustered points; and generate the 3D location of the object based on a centroid of the final clustered points.
8 . The localization system of claim 1 , wherein the object comprises a stationary traffic control object.
9 . The localization system of claim 1 , wherein the at least one sensor comprises a lidar system, the lidar system comprising: at least one emitter for projecting light pulses away from the vehicle; at least one detector for receiving at least a portion of the light pulses that reflect off of one or more objects in the environment as reflected light pulses; and wherein the lidar system provides the 3D point cloud based on the reflected light pulses.
10 . The localization system of claim 1 , wherein the at least one sensor comprises a camera for providing the 2D data, the 2D data comprising an image of the object.
11 . A method for localizing an object relative to a vehicle, comprising: receiving two-dimensional (2D) data and a three-dimensional (3D) point cloud of an environment external to the vehicle from at least one sensor, wherein the 3D point cloud includes object points associated with a stationary traffic control object; selecting a bounding box associated with the stationary traffic control object based on the 2D data; arranging the bounding box about the object points in the 3D point cloud; assigning a weight to each point of the 3D point cloud based on a position of the point relative to the bounding box and based on a result of an occlusion test, wherein the result is determined by: imposing a first representation to a first collection of points, the first representation being in a first custom shape that is irrelevant to a shape of the first collection of points, imposing a second representation to a second collection of points, the second representation being in a second custom shape that is irrelevant to a shape of the second collection of points, the second custom shape being different from the first custom shape, and assigning higher weight for the first collection of points than the second collection of points, the first collection of points being closer to the one or more sensors than the second collection of points to the at least one sensor; filtering the weighted points; and generating a 3D location of the stationary traffic control object based on the filtered points.
12 . The method of claim 11 , wherein assigning a weight to each point of the 3D point cloud based on a position of the point relative to the bounding box comprises: generating a polyhedron extending between a position of the at least one sensor and the bounding box in the 3D point cloud; and incrementing the weight of each point of the 3D point cloud that is located: within the polyhedron, distal to the bounding box along a longitudinal axis extending between the position of the at least one sensor and the bounding box, or radially adjacent to the longitudinal axis.
13 . The method of claim 11 , wherein filtering the weighted points comprises removing weighted points that are less than a threshold value.
14 . The method of claim 11 further comprising: clustering the filtered points; and generating the 3D location of the stationary traffic control object based on at least one of a comparison of the clustered points to predetermined data associated with a size and shape of the stationary traffic control object, and a centroid of the clustered points.
15 . The method of claim 11 further comprising: clustering the filtered points to form potential clustered points; comparing the potential clustered points to predetermined data associated with a size and shape of the stationary traffic control object to determine final clustered points; and generating the 3D location of the stationary traffic control object based on a centroid of the final clustered points.
16 . A non-transitory computer-readable medium having instructions stored thereon that, when executed by at least one computing device, cause the at least one computing device to perform operations comprising: selecting a bounding box associated with a stationary traffic control object based on two-dimensional (2D) data; arranging the bounding box about object points in a three-dimensional (3D) point cloud of an environment external to a vehicle generated via one or more sensors, wherein the object points are associated with the stationary traffic control object; assigning a weight to each point of the 3D point cloud based on a position of the point relative to the bounding box and based on a result of an occlusion test, wherein the result is determined by: imposing a first representation to a first collection of points, the first representation being in a first predefined shape, imposing a second representation to a second collection of points, the second representation being in a second predefined shape different from the first predefined shape, and assigning higher weight for the first collection of points than the second collection of points, the first collection of points being closer to the one or more sensors than the second collection of points to the one or more sensors; filtering the weighted points; and generating a 3D location of the stationary traffic control object based on the filtered points.
17 . The non-transitory computer-readable medium of claim 16 , wherein the operations further comprise: generating a square frustum extending between the position of the vehicle and the bounding box in the 3D point cloud; and incrementing the weight of each point of the 3D point cloud that is located within the square frustum.
18 . The non-transitory computer-readable medium of claim 16 , wherein the operations further comprise incrementing the weight of each point of the 3D point cloud that is located distal to the bounding box along a longitudinal axis extending between the position of the vehicle and the bounding box, or located radially adjacent to the longitudinal axis.
19 . The non-transitory computer-readable medium of claim 16 , wherein the operations further comprise: clustering the filtered points; and generating the 3D location of the stationary traffic control object based a comparison of the clustered points to predetermined data associated with a size and shape of the stationary traffic control object.
20 . The non-transitory computer-readable medium of claim 16 , wherein the operations further comprise: clustering the filtered points to form potential clustered points; comparing the potential clustered points to predetermined data associated with a size and shape of the stationary traffic control object to determine final clustered points; and generating the 3D location of the stationary traffic control object based on a centroid of the final clustered points.

Description

TECHNICAL FIELD One or more embodiments relate to a system and method for localizing a three-dimensional object relative to a vehicle. BACKGROUND A vehicle may include a system to monitor its external environment to detect the presence of specific objects, e.g., traffic lights, street signs, and other vehicles. The system may also determine the three-dimensional (3D) location of the specific objects relative to the vehicle. The vehicle may control one or more other vehicle systems based on these 3D locations. For example, the vehicle may control a brake system to stop the vehicle based on the location and/or status of a traffic light or remote vehicle. The system may include sensors or cameras for detecting the objects. The system may also use one or more strategies to determine the location of the objects based on data from the sensors or cameras. There are a number of different existing locating methods. One method uses generic multi-view geometry algorithms that are based on concepts of triangulation and two-dimensional images from the cameras. However, such methods are typically inaccurate with errors over one meter. Another method uses a deep learning network to directly regress the object location from the sensor data. However, such deep learned versions typically require extensive manual labeling and storage of predetermined data based on this labeling. Other methods utilize certain characteristics of an object (e.g., a sign is flat), and create a custom algorithm for that specific object. However, such methods often can't distinguish between similar shaped objects. For example, if there are multiple instances of the same or similar object within a scene, e.g., multiple traffic lights, these strategies may not be able to distinguish the traffic lights without an additional complicated tracking algorithm. SUMMARY In one embodiment, a localization system includes at least one sensor for generating two-dimensional (2D) data and a three-dimensional (3D) point cloud of an environment external to a vehicle. The 3D point cloud includes object points associated with an object. The localization system also includes a memory and at least one processor coupled to the memory. The processor is programmed to: select a bounding box associated with the object from the memory based on the 2D data; arrange the bounding box proximate to the object points in the 3D point cloud; assign a weight to each point of the 3D point cloud based on a position of the point relative to the bounding box; filter the weighted points; and generate a 3D location of the object based on the filtered points. In another embodiment, a method is provided for localizing an object relative to a vehicle. Two-dimensional (2D) data and a three-dimensional (3D) point cloud of an environment external to the vehicle are received from at least one sensor. The 3D point cloud includes object points associated with a stationary traffic control object. A bounding box associated with the stationary traffic control object is selected based on the 2D data. The bounding box is arranged proximate to the object points in the 3D point cloud. A weight is assigned to each point of the 3D point cloud based on a position of the point relative to the bounding box. The weighted points are filtered. A 3D location of the stationary traffic control object is generated based on the filtered points. In yet another embodiment, a non-transitory computer-readable medium having instructions stored thereon is provided. The instructions, when executed by at least one computing device, cause the at least one computing device to perform operations comprising: selecting a bounding box associated with a stationary traffic control object based on two-dimensional (2D) data; arranging the bounding box proximate to object points in a three-dimensional (3D) point cloud of an environment external to a vehicle, wherein the object points are associated with the stationary traffic control object; assigning a weight to each point of the 3D point cloud based on a position of the point relative to the bounding box; filtering the weighted points; and generating a 3D location of the stationary traffic control object based on the filtered points. BRIEF DESCRIPTION OF THE DRAWINGS FIG. 1 is a schematic view of an autonomous vehicle with a system for localizing a 3D object, in accordance with one or more embodiments. FIG. 2 is a schematic diagram illustrating communication between the system and other systems. FIG. 3 is a flow chart illustrating a method for localizing the 3D object, in accordance with one or more embodiments. FIG. 4 is a schematic diagram illustrating an occlusion test to weight points according to the method of FIG. 3. FIG. 5 is a diagram illustrating a projection test to weight points according to the method of FIG. 3. FIG. 6 illustrates a 3D point cloud generated by the system according to the method of FIG. 3. FIG. 7 illustrates a filtered 3D point cloud generated by the system,