US-20260127963-A1 - TRAFFIC LIGHT CLASSIFICATION FOR AUTONOMOUS OR SEMI-AUTONOMOUS SYSTEMS AND APPLICATIONS

US20260127963A1US 20260127963 A1US20260127963 A1US 20260127963A1US-20260127963-A1

Abstract

In various examples, the systems and methods of the present disclosure may train and use machine learning models to determine attributes and, in some instances, classifications associated with traffic lights to determine traffic rules for operating a machine (e.g., an autonomous or semi-autonomous machine or vehicle) in an environment. For instance, an image depicting a traffic light device may be applied to a machine learning model that includes a plurality of component heads. Each one of component heads may be trained to detect different attributes and/or combinations of attributes associated with the traffic light device. Additionally, in some examples, the machine learning model may include a fusion head that is trained to classify the traffic light device. For instance, the fusion head may classify the traffic light device using the detected attributes and/or using a combined feature vector of multiple feature vectors applied to the plurality of component heads.

Inventors

Rui Shen
Dong Zhang

Assignees

NVIDIA CORPORATION

Dates

Publication Date: 20260507
Application Date: 20241105

Claims (20)

1 . A method comprising: applying, as input to one or more deep neural networks (DNNs), image data representing an image depicting one or more traffic light devices; determining, using one or more component heads of the one or more DNNs, one or more attributes associated with the one or more traffic light devices; determining, using a fusion head of the one or more DNNs and based at least on the one or more attributes, one or more classifications associated with the one or more traffic light devices; and causing a machine to perform one or more control operations based on at least one of the one or more attributes or the one or more classifications associated with the one or more traffic light devices.
2 . The method of claim 1 , wherein the one or more attributes associated with the one or more traffic light devices include at least one of: one or more orientations of the one or more traffic light devices; one or more housing shapes of the one or more traffic light devices; one or more active bulb colors of the one or more traffic light devices; or one or more active bulb shapes of the one or more traffic light devices.
3 . The method of claim 1 , wherein the one or more component heads include at least a component head that is to output a combination of detected attributes associated with the one or more traffic light devices.
4 . The method of claim 3 , wherein the detected attributes of the combination include at least one or more color and shape combinations of one or more active bulbs of the one or more traffic light devices.
5 . The method of claim 1 , wherein the one or more classifications associated with the one or more traffic light devices include at least a subset of the one or more attributes, the one or more classifications determined using the fusion head based at least on a combination of the one or more attributes.
6 . The method of claim 1 , further comprising: generating, based at least on the one or more DNNs processing the image data, one or more component feature vectors corresponding to the one or more traffic light devices depicted in the image, wherein: the one or more attributes are determined using the one or more component heads based at least on applying the one or more component feature vectors to the one or more component heads, and the one or more attributes are determined using the fusion head based at least on applying, to the fusion head, a combined feature vector including a combination of the one or more component feature vectors.
7 . The method of claim 1 , wherein the one or more component heads include at least a first component head and a second component head, the first component head to classify one or more first attributes of the one or more traffic light devices and the second component head to classify one or more second attributes of the one or more traffic light devices.
8 . A system comprising: one or more processors to: determine, based at least on one or more first layers of a machine learning model processing sensor data obtained using one or more sensors having fields of view or sensory fields including a traffic light device, first data corresponding to one or more first attributes associated with the traffic light device; determine, based at least on one or more second layers of the machine learning model processing the first data, second data corresponding to one or more second attributes associated with the traffic light device; and perform one or more operations associated with a machine based on at least one of the one or more first attributes or the one or more second attributes.
9 . The system of claim 8 , wherein at least one of the one or more first attributes or the one or more second attributes include at least one of: an orientation of the traffic light device; a housing shape of the traffic light device; active bulb colors of the traffic light device; active bulb shapes of the traffic light device; a bulb count of the traffic light device; a road user of the traffic light device; or a blinking state of the traffic light device.
10 . The system of claim 8 , the one or more processors further to: determine, using one or more fusion layers of the machine learning model, a classification associated with the traffic light device, wherein the performance of the one or more operations associated with the machine is further based at least on the classification.
11 . The system of claim 10 , wherein the determination of the classification associated with the traffic light device is based on a combination of at least one of: the one or more first attributes and the one or more second attributes; or a first feature vector and one or more second feature vectors, the first feature vector applied as input to the one or more first layers and the one or more second features vectors applied as input to the one or more second layers.
12 . The system of claim 8 , wherein the one or more first attributes determined using the one or more first layers include at least one or more color and shape combinations of one or more active bulbs of the traffic light device.
13 . The system of claim 8 , wherein at least one of the one or more first attributes or the one or more second attributes include one or more housing shapes associated with the traffic light device, the one or more housing shapes corresponding to at least one of: a vertical housing shape; a horizontal housing shape; a doghouse housing shape; or a pedestrian hybrid beacon housing shape.
14 . The system of claim 8 , wherein the machine learning model is trained, at least, by: obtaining an image depicting a second traffic light device; updating one or more portions of the image to generate an updated image depicting the second traffic light device having one or more updated attributes; and updating one or more parameters associated with the one or more first layers or the one or more second layers of the machine learning model based at least on applying the updated image as a training input to the machine learning model.
15 . The system of claim 8 , wherein the system is comprised in at least one of: a control system for an autonomous or semi-autonomous machine; a perception system for an autonomous or semi-autonomous machine; a system for performing one or more simulation operations; a system for performing one or more digital twin operations; a system for performing light transport simulation; a system for performing collaborative content creation for 3D assets; a system for performing one or more deep learning operations; a system implemented using an edge device; a system implemented using a robot; a system for performing one or more generative AI operations; a system for performing operations using a large language model; a system for performing operations using one or more vision language models (VLMs); a system for performing operations using one or more multi-modal language models; a system for using or deploying one or more inference microservices; a system for performing one or more conversational AI operations; a system for generating synthetic data; a system for presenting at least one of virtual reality content, augmented reality content, or mixed reality content; a system incorporating one or more virtual machines (VMs); a system implemented at least partially in a data center; or a system implemented at least partially using cloud computing resources.
16 . One or more processors comprising: processing circuitry to perform one or more operations corresponding to a machine based at least on one or more attributes associated with a first traffic light device, the one or more attributes determined using one or more machine learning models, the one or more machine learning models trained, at least, by: obtaining an image depicting a second traffic light device having one or more first attributes; updating one or more portions of the image to generate an updated image, the updated image depicting the second traffic light device having one or more second attributes; and updating one or more parameters associated with one or more component heads of the one or more machine learning models based at least on applying the updated image as a training input to the one or more machine learning models.
17 . The one or more processors of claim 16 , wherein the updating of the one or more portions of the image to generate the updated image comprises modifying one or more values of one or more pixels of the image corresponding to one or more active bulbs of the second traffic light device, wherein the one or more values are modified such that the one or more active bulbs are depicted as non-active bulbs in the updated image.
18 . The one or more processors of claim 16 , wherein the image depicts the second traffic light device in a first state and the updating of the one or more portions of the image to generate the updated image comprises updating the image such that the updated image depicts the second traffic light device in a second state that is different from the first state.
19 . The one or more processors of claim 16 , wherein the updating of the one or more portions of the image to generate the updated image comprises at least one of: updating one or more shapes of one or more bulbs of the second traffic light device; updating an orientation of the second traffic light device; updating a housing shape of the second traffic light device; or updating a number of bulbs associated with the second traffic light device.
20 . The one or more processors of claim 16 , wherein the one or more processors are comprised in at least one of: a control system for an autonomous or semi-autonomous machine; a perception system for an autonomous or semi-autonomous machine; a system for performing one or more simulation operations; a system for performing one or more digital twin operations; a system for performing light transport simulation; a system for performing collaborative content creation for 3D assets; a system for performing one or more deep learning operations; a system implemented using an edge device; a system implemented using a robot; a system for performing one or more generative AI operations; a system for performing operations using a large language model; a system for performing operations using one or more vision language models (VLMs); a system for performing operations using one or more multi-modal language models; a system for using or deploying one or more inference microservices; a system for performing one or more conversational AI operations; a system for generating synthetic data; a system for presenting at least one of virtual reality content, augmented reality content, or mixed reality content; a system incorporating one or more virtual machines (VMs); a system implemented at least partially in a data center; or a system implemented at least partially using cloud computing resources.

Description

BACKGROUND For an autonomous or semi-autonomous vehicle to navigate safely through an environment, the vehicle may, at times, need to correctly determine states of traffic lights. This capability may help ensure that the vehicle understands the traffic rules currently in place at a certain location or junction. However, the physical characteristics (e.g., appearance) of traffic lights generally vary from one geographic region to another. For instance, across different geographic regions, traffic lights may include a variety of different shapes, orientations, numbers of light bulbs, colors of light bulbs, shapes of light bulbs, and/or other physical features. As such, correctly identifying the traffic rules conveyed by the state of a specific traffic light may be difficult across a wide variety of scenarios. SUMMARY Embodiments of the present disclosure relate to traffic light classification for autonomous or semi-autonomous systems and applications. Systems and methods are disclosed that may train and use machine learning models to determine attributes and, in some instances, classifications associated with traffic lights to determine traffic rules for operating a machine (e.g., an autonomous or semi-autonomous machine or vehicle) in an environment. For instance, an image depicting a traffic light device may be applied to a machine learning model that includes a plurality of component heads. Each component head of the plurality of component heads may be trained to detect different attributes and/or combinations of attributes associated with the traffic light device, such as active bulb colors and/or shapes, number of bulbs, housing orientation, and/or any other attributes. In some examples, the machine learning model may include a fusion head that is trained to classify the traffic light device. For instance, the fusion head may classify the traffic light device using the detected attributes or embeddings from the plurality of component heads, and/or using a combined feature vector of multiple feature vectors applied to the plurality of component heads. Using the detected attributes and/or the classification of the traffic light device, the systems of the present disclosure may cause the machine to perform one or more control operations. In contrast to conventional systems, the systems of the present disclosure, in some embodiments, are able to use a multi-component machine learning architecture to classify each component (e.g., where each component may represent one or more attributes of a traffic light) and, in some instances, use a fusion classifier to fuse the features from each component head to predict a final class of the traffic light. For instance, the systems of the present disclosure may use a multi-component machine learning model to decompose a traffic light into multiple components, where active bulb state may be one of the components, and then a fusion head within the model may be used to predict the final traffic light class by combining all these components, allowing cross-checks between components and removing or reducing post-processing, in some instances. Additionally, in contrast to conventional systems, the systems of the present disclosure may apply implicit negative training targets to the machine learning models during training for each negative sample, and map the negative samples to a uniform distribution, which allows the models to better distinguish valid samples from unknown or invalid samples, thereby reducing false-positive activations in each component and in fusion. BRIEF DESCRIPTION OF THE DRAWINGS The present systems and methods for traffic light classification for autonomous or semi-autonomous systems and applications are described in detail below with reference to the attached drawing figures, wherein: FIG. 1 is a data flow diagram illustrating an example of a process for traffic light classification using a multi-component machine learning model, in accordance with some embodiments of the present disclosure; FIG. 2 is an illustration of an image crop depicting a traffic light, the image crop obtained from image data representing an image of an environment, in accordance with some embodiments of the present disclosure; FIG. 3 illustrates various examples of traffic light configurations and states that may be classified using a multi-component machine learning model, in accordance with some embodiments of the present disclosure; FIG. 4 illustrates an example of using a multi-component machine learning model architecture to predict attributes and/or classifications of traffic lights from image data, in accordance with some embodiments of the present disclosure; FIG. 5 is a data flow diagram illustrating an example process for training one or more machine learning models to predict traffic light classes and/or attributes, in accordance with some embodiments of the present disclosure; FIG. 6 illustrates an example of a system that may perform one or more of the processes de