CN-121999265-A - Traffic light classification for autonomous or semi-autonomous systems and applications

CN121999265ACN 121999265 ACN121999265 ACN 121999265ACN-121999265-A

Abstract

The present disclosure relates to traffic light classification for autonomous or semi-autonomous systems and applications. In various examples, the systems and methods of the present disclosure may train and use machine learning models to determine attributes associated with traffic lights and, in some cases, classifications to determine traffic rules for operating machines (e.g., autonomous or semi-autonomous machines or vehicles) in an environment. For example, an image depicting a traffic light apparatus may be applied to a machine learning model that includes a plurality of component heads. Each of the component heads may be trained to detect different attributes and/or combinations of attributes associated with the traffic light apparatus. Further, in some examples, the machine learning model may include a fusion head trained to classify traffic light devices. For example, the fusion head may classify traffic light devices using detected attributes and/or using a combined feature vector of a plurality of feature vectors applied to a plurality of component heads.

Inventors

SHEN RUI
ZHANG DONG

Assignees

辉达公司

Dates

Publication Date: 20260508
Application Date: 20251104
Priority Date: 20241105

Claims (20)

1. A method, comprising: Applying image data representing an image depicting one or more traffic light apparatuses as input to one or more deep neural networks DNNs; determining one or more attributes associated with the one or more traffic light apparatuses using one or more component heads of the one or more DNNs; Determining one or more classifications associated with the one or more traffic light devices using the fusion header of the one or more DNNs and based at least on the one or more attributes, and Causing a machine to perform one or more control operations based on at least one of the one or more attributes or the one or more classifications associated with the one or more traffic light apparatuses.
2. The method of claim 1, wherein the one or more attributes associated with the one or more traffic light devices comprise at least one of: One or more orientations of the one or more traffic light apparatuses; One or more housing shapes of the one or more traffic light apparatuses; One or more active light bulb colors of the one or more traffic light apparatuses, or One or more active bulb shapes of the one or more traffic light apparatuses.
3. The method of claim 1, wherein the one or more component heads comprise at least a component head for outputting a combination of detected attributes associated with the one or more traffic light devices.
4. The method of claim 3, wherein the detected attributes in the combination include at least one or more color and shape combinations of one or more active light bulbs of the one or more traffic light apparatuses.
5. The method of claim 1, wherein the one or more classifications associated with the one or more traffic light devices include at least a subset of the one or more attributes, the one or more classifications being determined based at least on a combination of the one or more attributes using the fusion head.
6. The method of claim 1, further comprising: processing the image data based at least on the one or more DNNs, generating one or more component feature vectors corresponding to the one or more traffic light devices depicted in the image; Wherein: the one or more attributes are determined based at least on applying the one or more component feature vectors to the one or more component heads using the one or more component heads, and The one or more attributes are determined based at least on applying a combined feature vector comprising a combination of the one or more component feature vectors to the fusion head using the fusion head.
7. The method of claim 1, wherein the one or more component heads include at least a first component head for classifying one or more first attributes of the one or more traffic light apparatuses and a second component head for classifying one or more second attributes of the one or more traffic light apparatuses.
8. A system, comprising: one or more of the processors of the present invention, the one or more processors are configured to: Determining first data corresponding to one or more first attributes associated with a traffic light device based at least on one or more first layer processes of a machine learning model using sensor data obtained using one or more sensors having a field of view or a sensor field comprising the traffic light device; Processing the first data based at least on one or more second layers of the machine learning model, determining second data corresponding to one or more second attributes associated with the traffic light apparatus, and One or more operations associated with the machine are performed based on at least one of the one or more first attributes or the one or more second attributes.
9. The system of claim 8, wherein at least one of the one or more first attributes or the one or more second attributes comprises at least one of: The orientation of the traffic light apparatus; The shape of the housing of the traffic light apparatus; the color of a movable bulb of the traffic light apparatus; the movable bulb shape of the traffic light apparatus; The number of bulbs of the traffic light apparatus; road users of the traffic light apparatus, or And the flickering state of the traffic signal lamp equipment.
10. The system of claim 8, the one or more processors further to: Using one or more fusion layers of the machine learning model, determining a classification associated with the traffic light device, Wherein the performing of the one or more operations associated with the machine is further based at least on the classifying.
11. The system of claim 10, wherein determining the classification associated with the traffic light apparatus is based on a combination of at least one of: The one or more first attributes and the one or more second attributes, or A first feature vector and one or more second feature vectors, the first feature vector being applied as input to the one or more first layers and the one or more second feature vectors being applied as input to the one or more second layers.
12. The system of claim 8, wherein the one or more first attributes determined using the one or more first layers comprise at least one or more color and shape combinations of one or more active light bulbs of the traffic light apparatus.
13. The system of claim 8, wherein at least one of the one or more first attributes or the one or more second attributes comprises one or more housing shapes associated with the traffic light apparatus, the one or more housing shapes corresponding to at least one of: A vertical housing shape; a horizontal housing shape; doghouse type shape of casing, or Pedestrians blend beacon housing shapes.
14. The system of claim 8, wherein the machine learning model is trained by at least: Obtaining an image depicting a second traffic light apparatus; Updating one or more portions of the image to generate an updated image depicting the second traffic light device having one or more updated attributes, and One or more parameters associated with the one or more first layers or the one or more second layers of the machine learning model are updated based at least on applying the updated image to the machine learning model as a training input.
15. The system of claim 8, wherein the system is included in at least one of: a control system for an autonomous or semi-autonomous machine; A perception system for an autonomous or semi-autonomous machine; A system for performing one or more analog operations; A system for performing one or more digital twinning operations; a system for performing optical transmission simulation; A system for performing collaborative content creation of a 3D asset; A system for performing one or more deep learning operations; a system implemented using edge devices; A system implemented using a robot; A system for performing one or more generative AI operations; a system for performing operations using a large language model; A system for performing operations using one or more visual language models VLM; A system for performing operations using one or more multimodal language models; A system for using or deploying one or more inference micro-services; a system for performing one or more conversational AI operations; a system for generating synthetic data; a system for presenting at least one of virtual reality content, augmented reality content, or mixed reality content; A system comprising one or more virtual machine VMs; A system implemented at least in part in a data center, or A system implemented at least in part using cloud computing resources.
16. One or more processors comprising: Processing circuitry to perform one or more operations corresponding to a machine based at least on one or more attributes associated with a first traffic light apparatus, the one or more attributes determined using one or more machine learning models trained at least by: Obtaining an image depicting a second traffic light apparatus having one or more first attributes; updating one or more portions of the image to generate an updated image depicting the second traffic light device having one or more second attributes, and One or more parameters associated with one or more component heads of the one or more machine learning models are updated based at least on applying the updated image as training input to the one or more machine learning models.
17. The one or more processors of claim 16, wherein updating the one or more portions of the image to generate the updated image comprises modifying one or more values of one or more pixels of the image corresponding to one or more active light bulbs of the second traffic light apparatus, wherein the one or more values are modified such that the one or more active light bulbs are depicted as inactive light bulbs in the updated image.
18. The one or more processors of claim 16, wherein the image depicts the second traffic light device in a first state and updating the one or more portions of the image to generate the updated image comprises updating the image such that the updated image depicts the second traffic light device in a second state different from the first state.
19. The one or more processors of claim 16, wherein updating the one or more portions of the image to generate the updated image comprises at least one of: Updating one or more shapes of one or more bulbs of the second traffic signal device; Updating the orientation of the second traffic light apparatus; updating the shape of the housing of the second traffic light apparatus, or Updating the number of light bulbs associated with the second traffic light apparatus.
20. The one or more processors of claim 16, wherein the one or more processors are included in at least one of: a control system for an autonomous or semi-autonomous machine; A perception system for an autonomous or semi-autonomous machine; A system for performing one or more analog operations; A system for performing one or more digital twinning operations; a system for performing optical transmission simulation; A system for performing collaborative content creation of a 3D asset; A system for performing one or more deep learning operations; a system implemented using edge devices; A system implemented using a robot; A system for performing one or more generative AI operations; a system for performing operations using a large language model; A system for performing operations using one or more visual language models VLM; A system for performing operations using one or more multimodal language models; A system for using or deploying one or more inference micro-services; a system for performing one or more conversational AI operations; a system for generating synthetic data; a system for presenting at least one of virtual reality content, augmented reality content, or mixed reality content; A system comprising one or more virtual machine VMs; A system implemented at least in part in a data center, or A system implemented at least in part using cloud computing resources.

Description

Traffic light classification for autonomous or semi-autonomous systems and applications Background In order for an autonomous or semi-autonomous vehicle to safely travel in an environment, the vehicle may sometimes need to correctly determine the status of the traffic light. This capability helps ensure that vehicles know the traffic rules currently implemented at a location or intersection. However, the physical characteristics (e.g., appearance) of traffic signals typically vary from geographic area to geographic area. For example, in different geographic areas, traffic signals may include a variety of different shapes, orientations, numbers of bulbs, bulb colors, bulb shapes, and/or other physical features. Thus, in a wide variety of scenarios, it may be difficult to correctly identify traffic rules conveyed by the status of a particular traffic light. Disclosure of Invention Embodiments of the present disclosure relate to traffic light classification for autonomous or semi-autonomous systems and applications. Systems and methods are disclosed that can train and use machine learning models to determine attributes associated with traffic signals, and in some cases, classification associated with traffic signals, to determine traffic rules for operating machines (e.g., autonomous or semi-autonomous machines or vehicles) in an environment. For example, an image depicting a traffic light apparatus may be applied to a machine learning model that includes a plurality of component heads. Each of the plurality of component heads may be trained to detect different attributes and/or combinations of attributes associated with the traffic light apparatus, such as, for example, the color and/or shape of active (active) light bulbs, the number of light bulbs, the housing orientation, and/or any other attribute. In some examples, the machine learning model may include a fusion head trained to classify traffic light devices. For example, the fusion head may classify traffic light devices using attributes or embeddings detected from multiple component heads, and/or using a combined feature vector of multiple feature vectors applied to multiple component heads. Using the detected attributes and/or classifications of traffic light apparatuses, the system of the present disclosure may cause a machine to perform one or more control operations. In contrast to conventional systems, the system of the present disclosure is in some embodiments capable of classifying each component using a multi-component machine learning architecture (e.g., each component may represent one or more attributes of a traffic signal), and in some cases, fusing features from each component head using a fusion classifier to predict a final traffic signal class. For example, the system of the present disclosure may use a multi-component machine learning model to break down a traffic signal into multiple components, where the active bulb state may be one of the multiple components, and then a fusion head within the model may be used to predict the final traffic signal class by combining all of these components, allowing cross-checking between components and eliminating or reducing post-processing in some cases. Furthermore, compared to conventional systems, the system of the present disclosure may apply implicit negative training objectives to the machine learning model as training is performed for each negative sample, and map the negative samples to a uniform distribution, which enables the model to better distinguish valid samples from unknown or invalid samples, thereby reducing false positive activation in each component and fusion. Drawings The present system and method for traffic light classification for autonomous or semi-autonomous systems and applications will be described in detail below with reference to the attached drawing figures, wherein: FIG. 1 is a data flow diagram illustrating an example of a process for traffic light classification using a multi-component machine learning model according to some embodiments of the present disclosure; FIG. 2 is a diagram illustrating image cropping depicting a traffic light, the image cropping being obtained from image data representing an environmental image, according to some embodiments of the present disclosure; FIG. 3 illustrates various examples of traffic light configurations and states that may be categorized using a multi-component machine learning model according to some embodiments of the present disclosure; FIG. 4 illustrates an example of predicting attributes and/or classifications of traffic signals from image data using a multi-component machine learning model architecture according to some embodiments of the present disclosure; FIG. 5 is a data flow diagram illustrating an example process for training one or more machine learning models to predict traffic light categories and/or attributes in accordance with some embodiments of the present disclosure; FIG. 6 illustrates an example of