CN-121999451-A - Feature detection model for autonomous and semi-autonomous systems and applications
Abstract
The present disclosure relates to feature detection models for autonomous and semi-autonomous systems and applications. In various examples, feature detection models for autonomous and/or semi-autonomous systems and applications are described herein. The systems and methods described herein may use one or more trained machine learning models to automatically generate representations of traffic features, such as road identifications and/or road edges, corresponding to maps. For example, the model may take as input an image representing at least a portion of a map that includes one or more traffic characteristics and one or more indications of one or more points associated with the traffic characteristics represented by the image. Based at least on processing these inputs, the model may generate and/or output data representing additional points and/or heat maps (which represent one or more lines representing traffic characteristics) associated with the traffic characteristics. The output data may then be used to determine a representation of the traffic characteristics used to annotate the map.
Inventors
- CHEN KEZHAO
- ZHAO RUIQI
- LI YUJIAN
Assignees
- 辉达公司
Dates
- Publication Date
- 20260508
- Application Date
- 20241101
Claims (20)
- 1. A method, comprising: generating one or more input markers representing one or more first points associated with a road identification depicted by an image associated with a map; Generating one or more embeddings associated with the image; Generating, using one or more machine learning models and based at least on the one or more input markers and the one or more embeddings, one or more output markers representing one or more second points associated with the road identification; generating a line representation of the road marking based at least on the one or more second points, and The map is updated to include a tag associated with the pavement marker based at least on the line representation.
- 2. The method of claim 1, further comprising at least one of: Receiving input data representing the one or more first points associated with the pavement marker, or The one or more first points associated with the road map are determined based at least on an analysis of at least one of the map or the image.
- 3. The method of claim 1, wherein generating the one or more output markers comprises: Generating one or more first output labels representing a first portion of the one or more second points using the one or more machine learning models and based at least on the one or more input labels and the one or more embeddings, and One or more second output markers representing a second portion of the one or more second points are generated using the one or more machine learning models and based at least on the one or more first output markers.
- 4. The method of claim 1, further comprising: Generating one or more image markers associated with the image using the one or more machine learning models and based at least on the one or more input markers and the one or more embeddings, Wherein generating the line representation is further based at least on the one or more image markers.
- 5. The method of claim 1, further comprising: the one or more input marks are appended to one or more learnable marks to generate one or more appended input marks, Wherein generating the one or more output markers is based at least on the one or more additional input markers and the one or more embeddings.
- 6. The method of claim 1, further comprising: Determining one or more classifications associated with the one or more second points based at least on the one or more output markers, Wherein generating the line representation is further based at least on the one or more classifications.
- 7. The method of claim 1, further comprising: generating, using one or more decoders and based at least on the one or more output markers, one or more coordinates associated with the one or more second points in the image, Wherein generating the line representation is based at least on the one or more coordinates.
- 8. The method of claim 1, further comprising: Generating a heat map associated with the road map information based at least on at least one of the one or more output markers or one or more image markers associated with the image, Wherein generating the line representation is further based at least on the heat map.
- 9. A data center, comprising: One or more central processing units, CPUs; one or more graphics processing units, GPUs; one or more isolated trusted execution environments, TEEs; One or more interconnects for multiple GPU communications; one or more data processing units DPU; One or more network interface chips, NICs; wherein one or more components of the data center are configured to: Determining one or more first points associated with traffic characteristics from a sensor data representation corresponding to a map; determining, using one or more machine learning models and based at least on input data associated with the one or more first points and the sensor data representation, one or more second points associated with the traffic feature; Generating a representation of the traffic feature based at least on the one or more second points, and The map is updated to include information associated with the traffic feature based at least on the representation.
- 10. The data center of claim 9, wherein the one or more components are further to: Generating one or more input markers based at least on the one or more first points, and generating one or more embeddings based at least on the sensor data representation, Wherein the input data is associated with the one or more input markers and the one or more embeddings.
- 11. The data center of claim 10, wherein the one or more components are further to: appending the one or more input marks to one or more learnable marks to generate one or more appended input marks, Wherein the input data is associated with the one or more additional input markers and the one or more embeddings.
- 12. The data center of claim 9, wherein determining the one or more second points associated with the traffic feature comprises: Generating one or more output labels using the one or more machine learning models and based at least on the input data, and The one or more second points associated with the traffic feature are determined based at least on the one or more output indicia.
- 13. The data center of claim 9, wherein the one or more components are further to perform at least one of: receiving one or more inputs representing the one or more first points associated with the traffic feature, or The one or more first points associated with the traffic feature are determined based at least on an analysis of at least one of the map or the sensor data representation.
- 14. The data center of claim 9, wherein determining the one or more second points associated with the traffic feature comprises: Determining at least a first portion of the one or more second points using the one or more machine learning models and based at least on the input data, and At least a second portion of the one or more second points is determined using the one or more machine learning models and based at least on second input data associated with the at least first portion of the one or more second points.
- 15. The data center of claim 9, wherein the one or more components are further to: Determining, using the one or more machine learning models and based at least on the input data, one or more classifications associated with the one or more second points, Wherein the representation is further generated based at least on the one or more classifications.
- 16. The data center of claim 9, wherein the one or more components are further to: Determining a heat map associated with the traffic feature using the one or more machine learning models and based at least on the input data, Wherein the representation is further generated based at least on the heat map.
- 17. The data center of claim 9, wherein: the traffic characteristics include a road identification represented by the sensor data representation corresponding to the map; The one or more processors are further configured to determine an identification type associated with the pavement marker based at least on the sensor data representation, and The map is further updated to indicate the identification type.
- 18. The data center of claim 9, wherein the data center is included in or used in combination with at least one of: a control system for an autonomous or semi-autonomous machine; A perception system for an autonomous or semi-autonomous machine; A system for performing one or more analog operations; A system for performing one or more digital twinning operations; a system for performing optical transmission simulation; A system for performing collaborative content creation of a 3D asset; a system providing one or more cloud gaming applications; A system for performing one or more deep learning operations; a system implemented using edge devices; A system implemented using a robot; a system for performing one or more generated artificial intelligence AI operations; a system for performing operations using one or more large language model LLMs; A system for performing operations using one or more visual language models VLM; A system for performing operations using one or more multimodal language models; a system for performing one or more conversational AI operations; a system for generating synthetic data; a system for presenting at least one of virtual reality content, augmented reality content, or mixed reality content; a system implementing one or more multimodal language models; a system that uses or deploys one or more inference micro-services; A system comprising deploying one or more machine learning models in a service or microservice, and operating system OS-level virtualization packages (e.g., containers); A system comprising one or more virtual machine VMs; A system implemented at least in part in a data center, or A system implemented at least in part using cloud computing resources.
- 19. One or more processors comprising: processing circuitry to generate a line representation associated with a traffic feature represented by a map, wherein the line representation is generated based at least on: one or more encoders of one or more machine learning models generate one or more input markers associated with one or more first points of the traffic feature and one or more image embeddings associated with images of the traffic feature, and One or more decoders of the one or more machine learning models process the one or more input markers and the one or more embeddings to determine one or more second points associated with the line representation.
- 20. The one or more processors of claim 19, wherein the one or more processors are included in at least one of: a control system for an autonomous or semi-autonomous machine; A perception system for an autonomous or semi-autonomous machine; A system for performing one or more analog operations; A system for performing one or more digital twinning operations; a system for performing optical transmission simulation; A system for performing collaborative content creation of a 3D asset; a system providing one or more cloud gaming applications; A system for performing one or more deep learning operations; a system implemented using edge devices; A system implemented using a robot; A system for performing one or more generative AI operations; a system for performing operations using one or more large language model LLMs; A system for performing operations using one or more visual language models VLM; A system for performing operations using one or more multimodal language models; a system for performing one or more conversational AI operations; a system for generating synthetic data; a system for presenting at least one of virtual reality content, augmented reality content, or mixed reality content; a system implementing one or more multimodal language models; a system that uses or deploys one or more inference micro-services; A system comprising deploying one or more machine learning models and an OS-level virtualization package (e.g., container) in a service or microservice; A system comprising one or more virtual machine VMs; A system implemented at least in part in a data center, or A system implemented at least in part using cloud computing resources.
Description
Feature detection model for autonomous and semi-autonomous systems and applications Background For a vehicle (e.g., autonomous vehicle, semi-autonomous vehicle, robot, etc.) to safely operate in an environment, the vehicle must be able to efficiently perform vehicle maneuvers-e.g., lane keeping, lane changing, lane splitting, turning, stopping and starting at intersections, crosswalks, etc., and/or other vehicle or machine maneuvers. For example, in order for a vehicle to travel on road streets (e.g., city streets, small streets, community streets, etc.) and highways (e.g., multi-lane roads) (navigate), it is desirable for the vehicle to travel in one or more partitions (partitions) or boundaries (e.g., lanes, intersections, crosswalks, boundaries, etc.) of the road, which are typically marked using traffic characteristics-e.g., road identifications (marking) including arrows, text, graphics, and/or other types of content. It is therefore important that the vehicle be able to detect traffic characteristics in the environment so that the vehicle can determine how to drive based on rules associated with those traffic characteristics. To detect traffic characteristics, a vehicle may use, at least in part, a map corresponding to an environment in which the vehicle is traveling. For example, the map may be annotated to indicate the locations of important traffic features that the vehicle needs to identify while traveling, such as road edges, road identifications, traffic signs, and the like. Some conventional methods for annotating such maps include a user viewing portions of the map to manually enter labels for traffic features. For example, for a given length of road marking, a user may manually indicate the location of the road marking by selecting a plurality of points (e.g., hundreds and/or thousands of points) located along the road marking. Manually indicating the location of the traffic feature represented by the map by the user may be time consuming, prone to user error, and/or require significant computing resources (different user devices). Thus, more specifically, for road identifications, other conventional methods may use curve fitting functions to connect existing road identifications that have been annotated on a map. For example, if the user has annotated a first portion of the road identification and a second portion of the road identification separately, these conventional methods would use only a curve fitting function to attach the two portions of the road identification together. But these conventional methods may be accurate for straight line road identifications but inaccurate for road identifications comprising one or more curves, simply by connecting existing road identifications using a curve fitting function. Further, because these conventional methods operate on the entire map, annotations of the generated road identifications may not be aligned when the map is divided into sub-portions (e.g., images), such as for providing the map to a vehicle for navigation. Disclosure of Invention Embodiments of the present disclosure relate to feature detection models for autonomous and/or semi-autonomous systems and applications. The systems and methods described herein may use one or more trained machine learning models (one or more models) to automatically generate representations of traffic features corresponding to a map, such as road identifications and/or road edges. For example, the one or more models may take as input one or more indications of an image representing at least a portion of a map including one or more traffic features and one or more points (e.g., one or more cues) associated with the one or more traffic features represented by the image. Based at least on processing these inputs, one or more models may generate and/or output data representing appendages and/or heatmaps (which represent one or more lines corresponding to one or more traffic characteristics) associated with the one or more traffic characteristics. In some examples, the one or more models and/or another post-processing component may then use the output (e.g., line representation of the road sign and/or road edge) to determine one or more final representations of the one or more traffic characteristics, which may then be used to annotate the map. In contrast to conventional systems, in some embodiments, the system of the present disclosure is able to automatically determine the location of traffic features represented by a map using hints and/or input images. As such, the system of the present disclosure does not require the user to manually enter all points (e.g., hundreds and/or thousands of points) of the traffic feature when annotating the map. Further, as described in more detail herein, one or more models may be trained to determine a plurality of points (e.g., up to one hundred or more points) associated with traffic features, which are then used to determine a final representation (e.g., a line represe