US-20260127858-A1 - Automatic License Plate Layout Generation Using Machine Learning

US20260127858A1US 20260127858 A1US20260127858 A1US 20260127858A1US-20260127858-A1

Abstract

A method or system for improving vehicle identification accuracy. The system receives a guidance prompt specifying elements of a license plate, including a jurisdiction of the license plate. The system uses a first language model to parse the guidance prompt to extract these elements, which are then applied to a pre-trained layout generation model to produce a license plate layout. The layout generation model is trained on real license plate images with labeled bounding boxes identifying each element. The system then uses a second language model to generate a set of condition embeddings based on the license plate layout. The system then applies a diffusion model conditioned on the set of condition embeddings to generate synthetic license plate images, and uses the synthetic license plate images to train or retrain a license plate identification model.

Inventors

Ji Sung HWANG
Anil Kumar Nayak
Long Ngo Hoang Truong

Assignees

METROPOLIS IP HOLDINGS, LLC

Dates

Publication Date: 20260507
Application Date: 20241107

Claims (20)

1 . A method for improving vehicle identification accuracy in a vehicle management system, comprising: parsing, by a first language model, a guidance prompt to extract elements of a license plate, wherein the guidance prompt specifies an instruction for generating a synthetic license plate image; applying a pre-trained layout generation model to the elements of the license plate to output a layout of the license plate, wherein the pre-trained layout generation model is trained using real license plate images labeled with bounding boxes for each of the elements; generating, by a second language model, a set of condition embeddings based on the layout of the license plate; generating a synthetic license plate image by applying a generative model conditioned on the set of condition embeddings; and training a license plate identification model using the synthetic license plate image.
2 . The method of claim 1 , wherein the pre-trained layout generation model is configured to: identify a license plate template based on the guidance prompt; and for each element of the license plate, generate a position of the element on the license plate based on the license plate template; and generate a size of the element on the license plate based on the license plate template.
3 . The method of claim 2 , wherein the identified license plate template includes one or more of following elements: a jurisdiction associated with the license plate, a month of registration, a year of registration, and a slogan; the guidance prompt includes one or more elements corresponding to the one or more elements of the identified license plate template.
4 . The method of claim 3 , wherein identifying the license plate template is based on the jurisdiction specified in the guidance prompt, and the identified template is associated with the jurisdiction.
5 . The method of claim 2 , wherein the license plate template includes a two-dimensional coordinate corresponding to a position of an element of the license plate template; and generating the position of the element on the license plate includes generating a two-dimensional coordinate corresponding to the position of the element on the license plate based on the two-dimensional coordinate corresponding to the element of the license plate template.
6 . The method of claim 2 , the license plate template includes a width and a height corresponding to an element of the license plate template, and generating the size of the element on the license plate includes generating a width and a height corresponding to the size of the element on the license plate based on the width and height corresponding to the element of the license plate template.
7 . The method of claim 1 , wherein the set of condition embeddings includes: a layout embedding corresponding to the layout of the license plate, and a mask embedding corresponding to one or more regions of the license plate that are to be masked and not to be modified by the generative model.
8 . The method of claim 7 , further comprising applying dual attention to the guidance prompt based on the mask embedding to generate: a text embedding corresponding to words or sub-words in the guidance prompt, and a character embedding corresponding to characters in the guidance prompt.
9 . The method of claim 1 , wherein the generative model is a diffusion model trained to: receive a real license plate image as input; encode the real license plate image into a vector; generating a noisy vector by applying forward diffusion to the vector to incrementally add noise, the noisy vector associated with the real license plate image with added noise; generating a denoised vector by applying reverse diffusion to the noisy vector to incrementally remove noise, the denoised vector associated with the synthetic image; and decode the denoised vector to create the synthetic license plate image.
10 . The method of claim 1 , further comprising: generating random values for the elements of the license plate; and generating the guidance prompt based on the random values of the elements of the license plate.
11 . A non-transitory computer-readable medium comprising memory with instructions encoded thereon, the instructions comprising instructions to cause one or more processors to perform steps comprising: parsing, by a first language model, a guidance prompt to extract elements of a license plate, wherein the guidance prompt specifies an instruction for generating a synthetic license plate image; applying a pre-trained layout generation model to the elements of the license plate to output a layout of the license plate, wherein the pre-trained layout generation model is trained using real license plate images labeled with bounding boxes for each of the elements; generating, by a second language model, a set of condition embeddings based on the layout of the license plate; generating a synthetic license plate image by applying a generative model conditioned on the set of condition embeddings; and training a license plate identification model using the synthetic license plate image.
12 . The non-transitory computer-readable medium of claim 11 , wherein the pre-trained layout generation model is configured to: identify a license plate template based on the guidance prompt; and for each element of the license plate, generate a position of the element on the license plate based on the license plate template; and generate a size of the element on the license plate based on the license plate template.
13 . The non-transitory computer-readable medium of claim 12 , wherein the identified license plate template includes one or more of following elements: a jurisdiction associated with the license plate, a month of registration, a year of registration, and a slogan; the guidance prompt includes one or more elements corresponding to the one or more elements of the identified license plate template.
14 . The non-transitory computer-readable medium of claim 13 , wherein identifying the license plate template is based on the jurisdiction specified in the guidance prompt, and the identified template is associated with the jurisdiction.
15 . The non-transitory computer-readable medium of claim 12 , wherein the license plate template includes a two-dimensional coordinate corresponding to a position of an element of the license plate template; and generating the position of the element on the license plate includes generating a two-dimensional coordinate corresponding to the position of the element on the license plate based on the two-dimensional coordinate corresponding to the element of the license plate template.
16 . The non-transitory computer-readable medium of claim 12 , wherein the license plate template includes a width and a height corresponding to an element of the license plate template, and generating the size of the element on the license plate includes generating a width and a height corresponding to the size of the element on the license plate based on the width and height corresponding to the element of the license plate template.
17 . The non-transitory computer-readable medium of claim 11 , wherein the set of condition embeddings includes: a layout embedding corresponding to the layout of the license plate, and a mask embedding corresponding to regions of the license plate that is to be masked and not to be modified by the generative model.
18 . The non-transitory computer-readable medium of claim 17 , further comprising applying dual attention to the guidance prompt based on the mask embedding to generate: a text embedding corresponding to words or sub-words in the guidance prompt, and a character embedding corresponding to characters in the guidance prompt.
19 . The non-transitory computer-readable medium of claim 11 , the generative model is a diffusion model trained to: receive a real license plate image as input; encode the real license plate image into a vector; apply forward diffusion to the vector to incrementally add noise, generating a noisy vector associated with the real license plate image with added noise; apply reverse diffusion to the noisy vector to produce a denoised vector representing a synthetic license plate image, conditioned on the condition embeddings; and decode the denoised vector to create the synthetic license plate image.
20 . A system comprising: memory with instructions encoded thereon; and one or more processors that, when executing the instructions, are caused to perform operations comprising: parsing, by a first language model, a guidance prompt to extract elements of a license plate, the guidance prompt specifies an instruction for generating a synthetic license plate image; applying a pre-trained layout generation model to the elements of the license plate to output a layout of the license plate, wherein the pre-trained layout generation model is trained using real license plate images labeled with bounding boxes for each of the elements; generating, by a second language model, a set of condition embeddings based on the layout of the license plate; generating a synthetic license plate image by applying a diffusion model conditioned on the set of condition embeddings; and training or retraining a license plate identification model using the synthetic license plate image.

Description

TECHNICAL FIELD The disclosure generally relates to the field of computer vision, and more particularly relates to using diffusion model to generate synthetic images. BACKGROUND Computer vision is a field of artificial intelligence (AI) focused on enabling computers to interpret and make decisions based on visual data. Computer vision includes the use of algorithms and machine learning models to analyze images and videos, extract meaningful information, and perform tasks that typically require human visual understanding. Such machine learning models may include image recognition, image classification, object detection, image segmentation, pose estimation, scene understanding, among others. For example, a license recognition model may be trained to automatically detect and recognize license plates within images or video frames. Such a model may be implemented for access control in restricted areas, such as parking facilities, residential or commercial areas, toll roads, etc. Notably, a sufficient amount of training data is generally required for training a computer vision model due to the complexity and diversity of visual information in real-world applications. Real-world images exhibit large variability, such as differences in lighting, angle, backgrounds, object appearances, sizes, and poses. A model trained on limited data may only perform well on a narrow range of conditions, failing to generalize to unseen scenarios. Further, with a small dataset, a model may “memorize” specific features of the training images instead of learning generalized patterns. This can lead to overfitting, where the model performs well on training data but poorly on new data. Furthermore, real-world datasets may have insufficient or uneven coverage. For example, for a license recognition model, the real-world dataset may have gaps in data for certain license plate layouts, states, characters, lighting conditions, angles, or environmental factors. This can lead to poor model performance in recognizing certain types of plates or underrepresented variations. SUMMARY Embodiments described herein address the above-describe limitations by training and applying a machine learning model (e.g., a diffusion model) to generate synthetic license plate images conditioned on layout conditions of a license plate. The generated synthetic license plate images can then be used to train or retrain a license plate identification model. Embodiments described herein include a system or a method for enhancing vehicle identification accuracy. The system identifies a gap in a training dataset for training a license plate identification model. The gap represents underrepresented visual characteristics in misidentified license plates. A guidance prompt is generated based on the visual characteristics of the misidentified license plate. Condition embeddings are then generated from the guidance prompt, which are used to condition a diffusion model to create synthetic license plate images. The diffusion model is trained to receive a real license plate image, encode it into a vector, apply forward diffusion to add noise, and then apply reverse diffusion to remove the noise, resulting in a denoised vector that represents a synthetic license plate image conditioned by the condition embeddings. The denoised vector is then decoded to produce the synthetic license plate image. The synthetic images are then used to retrain the license plate identification model. In some embodiments, the system receives a guidance prompt specifying elements of a license plate, including a jurisdiction of the license plate. The system uses a first language model to parse the guidance prompt to extract these elements, which are then applied to a pre-trained layout generation model to produce a license plate layout. The layout generation model is trained on real license plate images with labeled bounding boxes identifying each element. The system then uses a second language model to generate a set of condition embeddings based on the license plate layout. The system then applies a machine learning model (e.g., a diffusion model conditioned on the set of condition embeddings) to generate synthetic license plate images, and uses the synthetic license plate images to train or retrain a license plate identification model. BRIEF DESCRIPTION OF DRAWINGS The disclosed embodiments have other advantages and features which will be more readily apparent from the detailed description, the appended claims, and the accompanying figures (or drawings). A brief introduction of the figures is below. FIG. 1 illustrates one embodiment of a system environment for managing vehicle parking or transit through a managed facility, using an edge device and a vehicle management server. FIG. 2 illustrates one embodiment of exemplary modules operated by an edge device in accordance with one or more embodiments. FIG. 3 illustrates one embodiment of exemplary modules operated by a vehicle management server in accord