CN-121998127-A - Method for training and/or testing a machine learning system

CN121998127ACN 121998127 ACN121998127 ACN 121998127ACN-121998127-A

Abstract

Methods of training and/or testing a machine learning system. The invention relates to a method (100) of training and/or testing a machine learning system (50) for a technology-specific application, comprising providing (101) at least one instruction (310) for generating a composite image (320) for an image generation process (340), the composite image representing an application-specific scene, providing (102) at least one layout specification (330) specifying spatial constraints of the generation process (340), providing (103) a classification specification (350) providing different classes for the represented scene, classifying (104) the different classes of the classification specification into at least two groups representing different levels of relevance to the application, determining (105) at least one modification of the layout specification (330) based on the classified classes, initiating (106) the generation process (340) to generate the image (320) based on the at least one instruction (310) and the at least one modified layout specification (330).

Inventors

J. Borges
KUGLER ANDREAS
K. A. Lao Bu
ZHENG XINYI
E. Yuete

Assignees

罗伯特博世股份公司
凯瑞达欧洲公司

Dates

Publication Date: 20260508
Application Date: 20251027
Priority Date: 20241107

Claims (12)

1. A method (100) of training and/or testing a machine learning system (50) for a specific technology application, the method comprising the steps of: Providing (101) an image generation process (340) with at least one instruction (310) to generate a composite image (320), the composite image representing a scene specific to the application, -Providing (102) at least one layout specification (330) specifying a spatial limitation of the generating process (340), Providing (103) a classification specification (350) providing different classes for the represented scene, Dividing (104) the different classes of the classification specification into at least two groups representing different levels of relevance to the application, Determining (105) at least one modification to the layout specification (330) based on the divided classes, -Initiating (106) the generation process (340) to generate the image (320) based on the at least one instruction (310) and at least one modified layout specification (330).
2. The method (100) according to claim 1, wherein the method (100) further comprises at least one of the following steps: providing training and/or evaluation data for training and/or testing of the machine learning system (50) based on the generated image (320), said training and/or evaluation data comprising in particular the generated image (320) and/or the further modified generated image (320), -Performing training and/or testing on the machine learning system (50) using the generated image (320) as training and/or evaluation data, in particular for a specific technical application, in particular object and/or scene detection based on images recorded by a vehicle, on the machine learning system (50).
3. The method (100) according to claim 1 or 2, wherein the generating process (340) is subject to different spatial constraints controlled by the at least one modified layout specification (330) and thereby subject to more constraints in spatial regions of the image (320) in at least a first group in which pixels of the image (320) are classified to have a higher relevance to the application and less constraints in spatial regions of the image (320) in at least a second group in which pixels of the image (320) are classified to have a lower relevance to the application.
4. The method (100) according to any one of the preceding claims, wherein the layout specification (330) specifies spatial constraints related to the different classes, and wherein the step of determining (105) at least one modification comprises removing those spatial constraints related to at least one group of non-critical classes, in particular representing the application, in particular according to edge information of CANNY EDGES.
5. The method (100) according to any one of the preceding claims, wherein an initial composite image and/or a sensor image representing the scene is provided, in particular generated, and the generated image (320) is generated based on the initially provided image, in particular for use as training or evaluation data of the machine learning system (50).
6. The method (100) according to any one of the preceding claims, wherein the provided classification specification (350) provides different classes in the form of categories for classifying images, in particular different objects represented in each image, wherein classification is performed based on pixels of the image (320) and the provided categories.
7. The method (100) of any of the preceding claims, wherein a semantic tag map is provided for the represented scene, and wherein the partitioning (104) of the different classes comprises creating a mask from the semantic tag map to isolate those classes that are relevant to the application, thereby partitioning the different classes into a set of critical classes and a set of non-critical classes.
8. The method (100) according to any one of the preceding claims, wherein the scene is a traffic scene and the machine learning system (50) is trained and/or tested for driver assistance and/or an automated driving system, the technical application comprising in particular at least one of classification and preferably detection of objects in an image received from a camera of the driving system, scene recognition based on the image, vehicle control based on an output of the machine learning system (50).
9. A machine learning system (50) trained and/or tested using images (320) generated by the method (100) according to any one of claims 1 to 8 as training and/or assessment data.
10. A computer program (20) comprising instructions which, when the computer program (20) is executed by at least one computer (10), cause the computer (10) to perform the method (100) according to any one of claims 1 to 8.
11. A data processing device (10) comprising means for performing the method (100) according to any one of claims 1 to 8.
12. A computer readable storage medium (15) comprising instructions which, when executed by a computer (10), cause the computer (10) to perform the steps of the method (100) according to any one of claims 1 to 8.

Description

Method for training and/or testing a machine learning system Technical Field The present invention relates to a method for training and/or testing a machine learning system. Furthermore, the invention relates to a machine learning system, a computer program, a device and a storage medium for this purpose. Background A generative Diffusion model such as Stable Diffusion (Stable Diffusion), when combined with control net, opens up an epoch for the controllable spatial layout application. These models are fine-tuned based on proprietary image datasets, enabling the transformation of images from the driving simulator into realistic outputs that closely resemble the pictures from the vehicle camera. In addition, these images may also be dynamically modified using text cues. Rombach, robin et al "High-resolution image synthesis with latent diffusion models"Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. 2022 discloses a common image synthesis solution. Disclosure of Invention According to aspects of the present invention, there is provided a method, a machine learning system, a computer program, a data processing apparatus, and a computer readable storage medium. Further features and details of the invention are disclosed in the respective dependent claims, the description and the drawings. The features and details described in the context of the method according to the invention also correspond to the machine learning system according to the invention, the computer program according to the invention, the data processing device according to the invention and the computer readable storage medium according to the invention, and vice versa. According to an aspect of the invention, a method of training and/or testing a machine learning system, in particular for a specific (in particular technical) application, comprises the following steps (preferably as automatically performed steps): Providing at least one instruction for an image generation process to generate a composite image representing a scene specific to the application or to a specific technical application, -Providing at least one layout specification specifying a spatial limitation of the generating process, wherein the spatial limitation is in particular associated with different pixels of the image. Providing a classification specification, said classification specification providing different classes for the scene represented, in particular the image, Dividing the different classes of the classification specification into at least (or exactly) two groups representing different levels of relevance to the application, wherein the groups and/or levels of relevance are preferably predefined manually, wherein preferably each of the pixels of the layout specification is then mapped to one of the groups, Determining at least one modification to the layout specification based on the divided classes, preferably by removing restrictions on pixels mapped to a specific group, -Initiating the generating process to generate the image based on the at least one instruction and at least one modified layout specification. The method allows for improved training and/or testing of machine learning systems for specific technology applications by generating synthetic images that accurately reflect the target environment. Using different sets, the method may limit the generation process in important areas in the image while allowing flexibility in less critical areas. This results in more diverse and representative training data, thereby improving the accuracy and performance of the machine learning system. By focusing on the relevant classes, the system can learn to identify and interpret key information within the composite image, thereby improving decision making in a particular application. Each of the above method steps may be performed automatically. For example, the instructions and/or at least one layout specification and/or classification specification may be provided as digital data, e.g., based on user input. The partitioning and/or determining may be performed by the computer program using a predefined rule set. The initiation of the generation process may be performed using a digital interface with an image-generated model using at least one instruction and at least one modified layout specification as digital inputs. The at least one instruction may comprise a text prompt and/or at least one initial image, in particular from a simulator (e.g. a driving simulator) and/or from a camera, etc. A generative model, particularly a generative Diffusion model such as Stable Diffusion (Stable Diffusion), can convert images from a simulator such as a driving simulator into a realistic output very similar to images from a vehicle camera. In other words, based on the image from the driving simulator, the generative model may generate a composite image with higher fidelity. In addition, the generated image may be dynamically modified by tex