EP-3786845-B1 - DIFFICULTY-ADAPTIVE TRAINING FOR MACHINE LEARNING MODULES

EP3786845B1EP 3786845 B1EP3786845 B1EP 3786845B1EP-3786845-B1

Inventors

ZHANG, DAN
Beluch, William Harris
PFEIFFER, MICHAEL
Patel, Kanil

Dates

Publication Date: 20260506
Application Date: 20190829

Claims (13)

A method (200) for training a machine learning module (1), comprising: obtaining (210) a training data set (11*) for a machine learning module (1) that is to map input image data (11) to output data (13) with respect to classification of objects within images using a method (100) that comprises: • providing (110) samples (11a) of training input data that are associated with labels (13a) in the sense that if the machine learning module (1) maps this sample (11a) of training input data to output data (13) corresponding to the label (13a), this is deemed to be the ground truth with respect to said classification; • obtaining (120), from the samples (11a) of training input data and the associated labels (13a), a difficulty function (14) that is configured to map a sample (11a) of training input data, or a representation (21a) thereof in a working space (20), onto a measure for the difficulty (14a) of evaluating this sample (11a) with respect to said classification; • obtaining (130) at least one candidate sample (15) of training input data, and/or a representation (25) thereof in the working space (20); • calculating (140), by means of the difficulty function (14), a measure for the difficulty (14a) of evaluating this candidate sample (15), and/or the representation (25) thereof, with respect to said classification; and • in response to this difficulty meeting a predetermined criterion (150), including (160) the candidate sample (15) in the training data set (11*), and further comprising: • optimizing (220) trainable parameters (12) of the machine learning module (1) such that output data (13) to which the machine learning module (1) maps the samples (11a) of training input data of the obtained training dataset (11*) best matches the respective labels (13a) associated with the samples (11a) of training input data, Wherein • the classification comprises mapping each record of input data (11) to a record of output data (13) that indicates, for each class of a set of multiple discrete classes, a probability and/or a confidence that the record of input data (11) belongs to the respective class, • the obtaining (120) of the difficulty function (14) comprises training (121), based on the samples (11a) of training input data and the associated labels (13a), a discriminative classifier (16) that maps a sample (11a) of training input data, and/or a representation (21a) thereof in the working space (20), to a classification (16a) indicative of a probability and/or a confidence that this sample (11a) of training input data belongs to each of the discrete classes, and wherein the difficulty (14a) determined by the difficulty function (14) depends on this classification (16a), wherein the difficulty (14a) determined by the difficulty function (14) is based at least in part • on an entropy of the classification (16a).
The method (100) of claim 1, further comprising: associating (141) the candidate sample (15) with a label corresponding to the classification (16a) determined by the discriminative classifier (16).
The method (100) of any one of claims 1 to 2, wherein the working space (20) is chosen (115) such that: the similarity of samples (11a) of training input data with respect to the classification of objects within images is correlated to the same or a higher degree to the distance between representations (21a) of these samples (11a) in the working space (20) than it is correlated to the distance between these samples (11a).
The method (100) of claim 3, further comprising: performing (131) a search within the working space (20) for a candidate representation (25) of a candidate sample (15) of training input data such that the difficulty (14a) determined by the difficulty function (14) meets the predetermined criterion (150), and transforming (132) this candidate representation (25) into a candidate sample (15).
The method (100) of claim 4, wherein the performing (131) of the search comprises: obtaining (131a), based on a representation (25) in the working space (20), a candidate representation (25) by solving an optimization problem in the working space (20) with respect to a merit function, wherein the merit function is based at least in part on the difficulty function (14).
The method (100) of claim 4, wherein the performing (131) of the search comprises drawing (131b) random representations (25) from the working space (20) and evaluating whether these random representations (25) meet the predetermined criterion (150).
The method (100) of any one of claims 1 to 6, further comprising: training (116), based on the samples (11a) of training input data, an encoder (17) for transforming these samples (11a) into representations (21a) in the working space (20) and a decoder (18) for reconstructing the samples (11a) from the representations (21a), such that the result (11a') of the reconstructing best matches the respective samples (11a).
A method (300) for evaluating physical measurement data (3a), comprising: • providing (310) a machine learning module (1); • training (320) the machine learning module (1) using the method of claim 1; • acquiring (330) the physical measurement data (3a) using at least one sensor (3); and • providing (340) the physical measurement data (3a) to the trained machine learning module (1*) as input image data (11), such that the trained machine learning module (1*) maps (350) the input data (11) to output data (13) which is a ground truth with respect to classification of objects within images, and to an associated entropy (13b).
The method (300) of claim 8, further comprising: • calculating (360), from the output data (13), according to a strategy that depends on the entropy (13b) associated with the output data (13), an actuation signal (4); and • supplying (370) the actuation signal (4) to a vehicle (50), to a classification system (60), to a security surveillance system (70), to a quality control system (80), and/or to a medical imaging system (90).
The method (300) of any one of claims 8 to 9, further comprising, in response to the entropy (13b) produced from at least one record of input data (11) meeting a predetermined criterion (380): • storing (381) the record of input data (11) in a memory for later analysis and/or labelling; and/or • determining (382) that the record of input data (11) corresponds to a corner case with respect to said classification problem to be solved by the machine learning module (1).
A computer program, comprising machine-readable instructions that, when executed by one or more computers, cause the one or more computers to perform a method (100, 200, 300) according to any one of claims 1 to 10.
A non-transitory storage medium and/or a downloadable product with the computer program of claim 11.
A computer with the computer program of claim 11, and/or with the non-transitory storage medium and/or downloadable product of claim 12.

Description

The invention relates to a training of machine learning modules for purposes such a classification that is particularly suitable for safety-critical applications with highly variable uncertainties. Background For the automation of many safety-critical tasks, such as the at least partially automated driving of a vehicle in road traffic, the use of trainable machine learning modules is envisaged. For example, such modules may take in physical measurement data from the environment surrounding the vehicle and classify objects, such as road markings, road signs, pedestrians, or other vehicles. The bonus of using such modules is that based on a training on a finite number of situations, the vehicle can be reasonably expected to behave correctly even in situations that have not been part of the training. In this sense, the training process may be considered akin to the training process for human drivers. A human driver spends only a few tens of hours at the wheel during training, but is then expected to handle any unexpected situation that may come up in a lifetime. For this and other safety-critical applications, it is crucial to know the uncertainties of the output of the machine learning module. Such uncertainties may, for example, be caused by a poor quality of the physical measurement data (such as poor visibility conditions), by an inherent difficulty of recognizing certain objects in certain situations (such as a car that is being partly occluded by a truck), or even by deliberate manipulation of road signs or the camera with "adversarial" patterns that may trick the machine learning module into a wrong classification. US 2019/122 120 A1 discloses a method for augmenting a training data set for a generative adversarial network (GAN). For augmentation, samples generated by the generator, as well as unlabeled samples from the training dataset, may be used. US 2018/373 963 A1 discloses an image classification system that aggregates the outputs of two distinct classifiers. One classifier is a common instance classifier that is trained to identify and recognize commonly occurring objects. The other classifier is a rare instance classifier that is trained to compute rarity scores representing likelihoods that an input image is correctly classified by the common instance classifier. Summary of the invention The invention is defined by the appended claims. Embodiments and examples not covered by the claims are presented to illustrate, and facilitate the understanding of, the claimed invention. Disclosure of the invention The invention provides a method for obtaining and/or augmenting a training data set for a machine learning module that is to map input data to output data which is meaningful with respect to a given to-be-solved problem. One prime example of such a problem is the classification of objects from physical measurement data. The term "machine learning module" shall specifically comprise modules that embody a function that is parametrized with adjustable parameters and ideally has a high power to generalize. When the machine learning module is trained, the parameters may specifically be adjusted such that when training input values are fed into the module, associated ground truth labels are reproduced as exactly as possible. Specifically, the machine learning model may comprise, or be, an artificial neural network. In the course of the method, samples of training input data are provided. Each such sample is associated with a label. This label constitutes "ground truth" with respect to the to-be-solved problem. I.e., if the machine learning module maps a sample of training input data to output data that corresponds to the label of this sample (e.g., it maps photo of a stop sign to "stop sign"), then this is deemed to be meaningful with respect to the given problem. From the samples of training input data and the associated labels, a difficulty function that is configured to map a sample of training input data, or a representation thereof in a working space, onto a measure for the difficulty of evaluating this sample with respect to a given problem. For example, the difficulty may be a measure for the uncertainty that arises when classification of objects is to be performed on physical measurement data. As mentioned above, the reasons why the difficulty may vary are numerous. For example, as every human driver knows, it is harder to recognize objects at a distance at night or in heavy rain. The recognition of objects may also be impeded by motion blur of camera pictures in a situation where the lighting is marginal and the vehicle is moving very fast. Even under optimal conditions, some objects have an inherent potential to be confused with each other. For example, in Germany, the green light of a set of traffic lights may be in the form of an arrow pointing to the right to indicate that a right turn is allowed and unimpeded in the sense that no one else (such as pedestrians) is allowed to cross the path o