KR-102964803-B1 - COMPUTER-IMPLEMENTED METHOD AND SYSTEM FOR GENERATING A SYNTHETIC TRAINING DATA SET FOR TRAINING A MACHINE LEARNING COMPUTER VISION MODEL

KR102964803B1KR 102964803 B1KR102964803 B1KR 102964803B1KR-102964803-B1

Abstract

A computer-implemented method for generating a synthetic training data set for training a machine learning computer vision model for performing at least one user-defined computer vision task, wherein spatially decomposed sensor data is processed and evaluated in relation to at least one user-defined object of interest, comprising the following: - Receiving at least one user-defined object of interest, in particular a 2D or 3D model (10), based on user input data; - Determining at least one render parameter (60), preferably a plurality of render parameters (56, 58, 60, 62, 64, 66), based on user input data; - Generating a set of training images (I2, I3) by rendering at least one model (10) of the object of interest based on at least one render parameter (60); - Generating annotation data for a set of training images (I1) for at least one object of interest; - Providing a training data set including a set of annotation data and training images (I2, I3) for output to a user and/or for training a computer vision model.

Inventors

랭가라얀 푸자
굽타 닉힐
브레이텐펠드 안드레
뮬러 안드레아스
슐츠 세바스찬
링 솅
캠머로처 토마스
바이어 파비앙

Assignees

폭스바겐 악티엔게젤샤프트

Dates

Publication Date: 20260513
Application Date: 20220615
Priority Date: 20210616

Claims (15)

A computer implementation method for generating a synthetic training data set for training a machine learning computer vision model for performing at least one user-defined computer vision task, wherein spatially resolved sensor data is processed and evaluated in relation to at least one user-defined object of interest. - A step of receiving at least one model of a user-defined object of interest based on user input data - The user input data is received through a user interface configured to provide a user input device into which the user input data is input by the user in relation to at least one model of the object of interest - ; - A step of determining multiple render parameters based on user input data - The user input data sets multiple boundary values for at least one render parameter among the multiple render parameters - ; - A step of generating a set of training images by rendering at least one model of the object of interest based on at least one render parameter among the plurality of render parameters; - A step of generating annotation data for a set of training images for at least one object of interest - The at least one model of the user-defined object of interest is received based on user input data received through the user interface, and the at least one render parameter included in a plurality of render parameters is based on user input data received through the user interface - ; - A step of providing a training data set including a set of training images and annotation data for at least one of outputting to the user and training the computer vision model. A computer implementation method for generating a synthetic training data set including
In paragraph 1, A computer-implemented method for generating a synthetic training data set, further comprising the step of communicating with the user interface for user input data input by the user in relation to at least one of the at least one render parameter included in the plurality of render parameters for generating the set of training images and at least one of the annotation data to be generated.
In paragraph 1 or 2, A computer-implemented method for generating a synthetic training data set, wherein at least one of the plurality of render parameters is determined based on user input data by randomly considering the user input data.
In paragraph 1 or 2, A computer-implemented method for generating a synthetic training data set, wherein the set of training images above is generated based on a plurality of background images determined based on user input data.
In paragraph 4, A computer-implemented method for generating a synthetic training data set, wherein at least one of the plurality of background images is used to generate at least one training image.
In paragraph 1 or 2, A computer implementation method for generating a synthetic training data set, wherein the generation of the set of training images is based on a set of background images randomly selected from the plurality of background images.
In paragraph 1 or 2, A computer-implemented method for generating a synthetic training data set, wherein each training image among the set of training images is generated based on a photorealistic background image.
In paragraph 1 or 2, A computer-implemented method for generating a synthetic training data set, wherein at least one render parameter among the plurality of render parameters is selected from a group of render parameters that is a characteristic of the view of the object of interest, a characteristic of the field of view of a camera for a rendering process, a characteristic of at least one of the size and zoom range of the object of interest, a characteristic of at least one of the orientation and position of the at least one rendered object of interest within the training image, a characteristic of the view angle, a characteristic of the roll of the rendered model, or a characteristic of at least one of the rotation and translation of the at least one object of interest, a characteristic of the cropping of the at least one object of interest, a characteristic of the occlusion of the object of interest, or a characteristic of the number of model instances, or a combination thereof.
In paragraph 1 or 2, A computer-implemented method for generating a synthetic training data set, wherein at least one render parameter among the plurality of render parameters is selected from a group of render parameters including a parameter characteristic for the maximum number of distraction objects, a parameter characteristic for the lighting conditions of the training image, a parameter characteristic for at least one lighting of the object and background in the training image, a parameter characteristic for the number of light sources, a parameter characteristic for the variation of light intensity, a parameter characteristic for the variation of color variation, a parameter characteristic for the inclusion of at least one of shadows, blur, and noise, a parameter characteristic for the variation of noise intensity and noise size in at least one of the rendered image and the training image, or a combination thereof.
In Paragraph 9, A computer-implemented method for generating a synthetic training data set, wherein at least one distraction object randomly selected from a plurality of distraction objects is included in at least one training image among a plurality of training images.
In Paragraph 10, A step of determining at least one texture parameter which is a characteristic of the texture of the above-mentioned user-defined object of interest, and A step of adjusting at least one attention-distraction object to be included in the at least one training image based on the at least one determined texture parameter A computer implementation method for generating a synthetic training data set, further including
A computer implementation method for training a machine learning computer vision model for performing at least one user-defined computer vision task, wherein spatially decomposed sensor data generated by at least one sensor device is processed and evaluated in relation to at least one user-defined object of interest. The above machine learning computer vision model includes a set of trainable parameters, The above method is, The step of generating a training data set in accordance with paragraph 1 or 2 and Step of training the machine learning computer vision model based on the above training data set A computer implementation method for training a machine learning computer vision model, including
In Paragraph 12, A step of evaluating the computer-implemented vision model trained using the provided training data set and Step of determining evaluation parameters that are characteristics of the accuracy of the above computer-implemented vision model A computer implementation method for training a machine learning computer vision model, further comprising
In Paragraph 13, Step of generating and providing another set of training data based on the above evaluation parameters A computer implementation method for training a machine learning computer vision model, further comprising
A computer system for generating a synthetic training data set for training a machine learning computer vision model for performing at least one user-defined computer vision task, wherein spatially decomposed sensor data is processed and evaluated in relation to at least one user-defined object of interest. The above computer system is, A user interface configured to provide a user input device in which user input data is input by a user in relation to at least one model of an object of interest; and Training data generation unit - The above training data generation unit is, To determine multiple render parameters based on user input data - said user input data sets multiple boundary values for at least one of said multiple render parameters - , To generate a set of training images by rendering the at least one model of the object of interest based on the at least one render parameter, and Configured to generate annotation data for the set of training images in relation to at least one object of interest. Includes, The above training data generation unit is configured to receive at least one model of the user-defined object of interest based on user input data received through the user interface, and The training data generation unit is configured to determine at least one render parameter included in a plurality of render parameters determined by the training data generation unit based on user input data received through the user interface, and A computer system for generating a synthetic training data set, wherein the training data generation unit is configured to provide a training data set including a set of training images and annotation data for at least one of outputting to the user and training the computer vision model.

Description

Computer-implemented method and system for generating a synthetic training data set for training a machine learning computer vision model The present invention relates to a computer-implemented method and system for generating a synthetic training data set for training a machine learning computer vision model. Many everyday problems can be solved very quickly by artificial intelligence and machine learning. These include, among many others, object detection, object classification, or robot training. The processes of data generation, collection, and preparation alone, which involve the manual labeling of data, consume a massive amount of time and cost. Factors such as the availability of hardware, including cameras, or environmental factors such as lighting indoors or weather outdoors also play a significant role. This process takes days, weeks, or sometimes even months, after which it is passed on to computer vision engineers. Computer vision engineers constantly devote time to generating and collecting vast amounts of data to create and train neural networks. Once this data is collected, computer vision engineers must write machine learning algorithms to train these images. This requires experience and knowledge in computer vision to write these algorithms and train neural networks. The biggest challenges in this process are the time and effort required, as well as the high demands and knowledge associated with writing machine learning algorithms and training neural networks. The goal is to minimize this time-consuming and tedious process while making the creation and training of neural networks very easy, even for individuals without knowledge of computer vision. Therefore, there was a need for an alternative that requires less time and manual effort, making tasks in artificial intelligence accessible and easy to use even without domain expertise. Current solutions on the market provide manual labeling of image data. These solutions originate from companies such as Google (searched from <https://cloud.google.com/ai-platform/data-labeling/pricing#labeling_costs>), Scale.AI (searched from <https://scale.com/>), or Understand.AI (searched from <https://understand.ai/>). Additionally, some companies are generating synthetic data based on 3D data. For example, AI.Reverie (searched from <https://aireverie.com/>) or CVEDIA (searched from <https://www.cvedia.com/>) generate images based on 3D virtual environments. These solutions can generate labeled images within a short period of time, but they require a modeled 3D environment, which can also be time-consuming. Additionally, Unity 3D announced the creation of a cloud-based solution that takes CAD files and renders 2D images—which are also labeled—(see search for <https://unity.com/de/products/unity-simulation>). On the other hand, The Nvidia Dataset Synthesizer is an add-on for Unreal Engine (search for <https://github.com/NVIDIA/Dataset_Synthesizer>). It uses Unreal Studio for rendering CAD files, and in addition to RGB images, it can generate depth maps, segmentation masks, and other useful information for machine learning (ML) applications. Publicly available solutions for training neural networks also include using libraries from Google, such as TensorFlow, which simplify the process of creating neural networks and training data. However, this still requires knowledge of programming languages like Python, and it is often difficult to use without such knowledge. Regarding common datasets to be used for training, there are quite a few sources that provide intensive datasets containing images and annotations of commonly needed data, such as vehicle data and geographic data from sources like Kaggle. The publicly available solution involves manual image generation and time-intensive manual labeling to be used for training neural networks (for part detection). Writing algorithms to train neural networks is also a time- and effort-intensive process. To utilize the data effectively, knowledge and experience in computer vision and neural networks are also required. Manually capturing 500 images takes several hours, and manually labeling them takes several days. While there are some tools to assist with the labeling process, they still require manual work to identify objects in the images, which does not significantly reduce the time required. The training process, which includes generating the neural network and/or writing algorithms to train it, will take another few weeks of work; this represents a tremendous amount of time and effort that needs to be consumed throughout the entire process. Even though attempts have been made to somewhat reduce the drawbacks of the time-consuming process of manually generating, clicking, and labeling images of actual objects through synthetic data generators like NVIDIA Data Synthesizer, this still requires an extensive amount of technical knowledge and experience in computer vision for use. Other applications built on platforms s