EP-4742181-A1 - A METHOD FOR A TRAINING AND/OR TESTING OF A MACHINE LEARNING SYSTEM

EP4742181A1EP 4742181 A1EP4742181 A1EP 4742181A1EP-4742181-A1

Abstract

The invention relates to a method (100) for a training and/or testing of a machine learning system (50) for a specific technical application, comprising: - Providing (101) at least one instruction (310) for an image generation process (340) to generate synthetic images (320) that represent a scene specific to the application, - Providing (102) at least one layout specification (330) that specifies spatial restrictions for the generation process (340), - Providing (103) a classification specification (350) that provides different classes for the represented scene, - Dividing (104) the different classes of the classification specification into at least two groups that represent different levels of relevance to the application, - Determining (105) at least one modification for the layout specification (330) based on the divided classes, - Initiating (106) the generation process (340) to generate the images (320) based on the at least one instruction (310) and the at least one modified layout specification (330).

Inventors

Borges, Julio
KUGELE, ALEXANDER
Laube, Kevin Alexander
Cheng, Shin-I
Youett, Evgenia

Assignees

Robert Bosch GmbH
CARIAD SE

Dates

Publication Date: 20260513
Application Date: 20241107

Claims (12)

A method (100) for a training and/or testing of a machine learning system (50) for a specific technical application, comprising: - Providing (101) at least one instruction (310) for an image generation process (340) to generate synthetic images (320) that represent a scene specific to the application, - Providing (102) at least one layout specification (330) that specifies spatial restrictions for the generation process (340), - Providing (103) a classification specification (350) that provides different classes for the represented scene, - Dividing (104) the different classes of the classification specification into at least two groups that represent different levels of relevance to the application, - Determining (105) at least one modification for the layout specification (330) based on the divided classes, - Initiating (106) the generation process (340) to generate the images (320) based on the at least one instruction (310) and the at least one modified layout specification (330).
The method (100) of claim 1, characterized in that the method (100) further comprises at least one of the following steps: - Providing training and/or evaluation data for the training and/or testing of the machine learning system (50) based on the generated images (320), the training and/or evaluation data particularly comprising the generated images (320) and/or further modified generated images (320) - Carrying out the training and/or testing of the machine learning system (50) using the generated images (320) as training and/or evaluation data, particularly for the training and/or testing of the machine learning system (50) for the specific technical application, particularly an object and/or scene detection based on images that are recorded by a vehicle.
The method (100) of any one of the preceding claims, characterized in that the generation process (340) is spatially constrained differently, controlled by the at least one modified layout specification (330), and is thereby more constrained in spatial regions of the images (320) where pixels of the images (320) are classified into at least a first one of the groups for higher relevance to the application, and is less constrained in spatial regions of the images (320) where pixels of the images (320) are classified into at least a second one of the groups for lower relevance to the application.
The method (100) of any one of the preceding claims, characterized in that the layout specification (330) specifies the spatial restrictions in relation to the different classes, and that the determination (105) of the at least one modification comprises: removing those of the spatial restrictions, particularly edge information according to Canny Edges, that are related to at least one of the groups that particularly represent the non-critical classes for the application.
The method (100) of any one of the preceding claims, characterized in that initially synthetic and/or sensor images are provided, particularly generated, that represent the scene, and the generated images (320) are generated based on the initially provided images, particularly to be used as training or evaluation data for the machine learning system (50).
The method (100) of any one of the preceding claims, characterized in that the provided classification specification (350) provides the different classes in the form of categories to classify the images and particularly different objects represented in each of the images, wherein the classification is carried out based on pixels of the images (320) and the provided categories.
The method (100) of any one of the preceding claims, characterized in that a semantic label map is provided for the represented scene and that the division (104) of the different classes comprises: creating a mask from the semantic label map to isolate those of the classes that are relevant to the application, thereby dividing the different classes into the groups critical and non-critical classes.
The method (100) of any one of the preceding claims, characterized in that the scene is a traffic scene, and the machine learning system (50) is trained and/or tested for being used in a driver assistant and/or automated driving system, the technical application particularly comprising at least one of the following: a classification and preferably detection of objects in images received from a camera of the driving system, a scene recognition based on the images, a control of a vehicle based on the output of the machine learning system (50).
A machine learning system (50) trained and/or tested using the images (320) generated by a method (100) of any one of the preceding claims as training and/or evaluation data.
A computer program (20), comprising instructions which, when the computer program (20) is executed by at least one computer (10), cause the computer (10) to carry out the method (100) of any one of claims 1 to 8.
A data processing apparatus (10), comprising means for carrying out the method (100) of any one of claims 1 to 8.
A computer-readable storage medium (15) comprising instructions which, when executed by a computer (10), cause the computer (10) to carry out the steps of the method (100) of any one of claims 1 to 8.

Description

The invention relates to a method for a training and/or testing of a machine learning system. Furthermore, the invention relates to a machine learning system, computer program, an apparatus, and a storage medium for this purpose. State of the art Generative diffusion models like Stable Diffusion, when paired with ControlNet, have ushered in a new era of applications with controllable spatial layouts. These models, fine-tuned on proprietary image datasets, are capable of transforming images from driving simulators into photorealistic outputs closely resembling footage from vehicle cameras. Additionally, these images can be dynamically altered using text prompts. A common solution for image synthesis is disclosed by Rombach, Robin, et al. "High-resolution image synthesis with latent diffusion models." Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. 2022. Disclosure of the invention According to aspects of the invention a method with the features of claim 1, a machine learning system with the features of claim 9, a computer program with the features of claim 10, a data processing apparatus with the features of claim 11 as well as a computer-readable storage medium with the features of claim 12 are provided. Further features and details of the invention are disclosed in the respective dependent claims, the description and the drawings. Features and details described in the context to the method according to the invention also correspond to the machine learning system according to the invention, the computer program according to the invention, the data processing apparatus according to the invention as well as the computer-readable storage medium according to the invention, and vice versa in each case. According to an aspect of the invention a method, particularly for a training and/or testing of a machine learning system for a specific - and particularly technical - application, comprises (preferably as automatically carried out steps): Providing at least one instruction for an image generation process to generate synthetic images that represent a scene specific to said application or a specific technical application,Providing at least one layout specification that specifies spatial restrictions for the generation process, wherein the spatial restrictions are particularly associated with different pixels of the images,Providing a classification specification that provides different classes for the represented scene, particularly for the images,Dividing the different classes of the classification specification into at least (or into exactly) two groups that represent different levels of relevance to the application, wherein the groups and/or the levels of relevance are preferably manually predefined, wherein, preferably, each of the pixels of the layout specification are then mapped to one of the groups,Determining at least one modification for the layout specification based on the divided classes, preferably by removing restrictions for pixels that are mapped to a particular group,Initiating the generation process to generate the images based on the at least one instruction and the at least one modified layout specification. The method allows to improve the training and/or testing of machine learning systems for specific technical applications by generating synthetic images that accurately reflect the target environment. Using the different groups, the method may restrict the generation process in essential areas in the image while allowing flexibility in less critical areas. This leads to more diverse and representative training data, enhancing the accuracy and performance of the machine learning system. By focusing on relevant classes, the system can learn to identify and interpret crucial information within the synthetic images, leading to improved decision-making in the specific application. Each of the above-mentioned method steps may be carried out automatically. For example, the instructions and/or the at least one layout specification and/or the classification specification may be provided as digital data, for example on the basis of a user input. The division and/or determination may be carried out by a computer program using a predefined set of rules. The initiation of the generation process may be carried out using a digital interface to an image generation model that uses the at least one instruction and the at least one modified layout specification as digital inputs. The at least one instruction may comprise a text prompt and/or at least one initial image, particularly from a simulator like a driving simulator and/or from a camera, and/or the like. Generative models, particularly generative diffusion models such as Stable Diffusion, can convert images from simulators like driving simulators into photorealistic outputs that are very similar to the images from vehicle cameras. In other words, based on the images from a driving simulator, synthetic images can be gen