DE-102024210908-A1 - Computer-implemented method for training a classification system

DE102024210908A1DE 102024210908 A1DE102024210908 A1DE 102024210908A1DE-102024210908-A1

Abstract

A computer-implemented method for training a classification device comprising an embedding part and a classification part includes: providing a training dataset containing multiple training examples, each training example having an input signal and a desired classification; generating a knowledge graph containing additional information associated with at least one desired classification, the additional information being represented by multiple knowledge graph entities and multiple knowledge graph relationships linking the knowledge graph entities; providing input signal embeddings by embedding the input signals in a latent space; providing knowledge graph embeddings by embedding the knowledge graph in the latent space; and performing training based on the input signal embeddings and the knowledge graph embeddings according to a training objective function composed of a regularization loss function and a cross-entropy loss function.

Inventors

Hongkuan ZHOU
Sebastian Monka
Stefan Schmid
Lavdim Halilaj

Assignees

Robert Bosch Gesellschaft mit beschränkter Haftung

Dates

Publication Date: 20260513
Application Date: 20241113

Claims (13)

Computer-implemented method (100) for training a classification device (10) comprising: an embedding part (12) configured to embed input signals ( z i I ) by embedding input signals (x i ) in a latent space, and a classification part (14) which is set up to perform on the basis of an input signal embedding ( z i I ) to determine a classification (y i ), wherein the procedure (100) comprises: - providing (102) a training dataset (16) comprising multiple training examples (16 1 , ..., 16 Nb ), wherein each training example (16 1 , ..., 16 Nb ) has an input signal (x i ) and a desired classification, and wherein the training dataset (16) includes synthetically generated training examples (16 1 , 16 3 , 16 5 ) as visual prior knowledge, - generating (104) a knowledge graph (20) containing additional information as symbolic prior knowledge associated with at least one desired classification, wherein the additional information is represented by multiple knowledge graph entities (22, 24, 26, 28, 30, 32, 34, 36, 38, 40, 42, 44, 46) and by multiple knowledge graph relations (22-1, 26-1, 26-2, 28-1, 30-1, 30-2, 32-1, 34-1, 38-1, 40-1, 44-1, 46-1), which link the knowledge graph entities (22, 24, 26, 28, 30, 32, 34, 36, 38, 40, 42, 44, 46), - providing (106) input signal embeddings ( z 1 I , … , z N b I ) by embedding the input signals (x i ) in the latent space, - providing (108) knowledge graph embeddings ( z 1 o , … , z N o o , z 0 r ) by embedding the knowledge graph (20) in the latent space, - performing (110) a training based on the input signal embeddings ( z 1 I , … , z N b I ) and the knowledge graph embeddings ( z 1 o , … , z N o o , z 0 r ) according to a training target function, which is composed of: • a regularization loss function (L reg ), to define the input signal embeddings ( z 1 I , … , z N b I ) at the knowledge graph embeddings ( z 1 o , … , z N o o , z 0 r ) to align in the latent space, and • a cross-entropy loss function (L CE ) to assign the input signals (x i ) to the corresponding classifications (y i ).
Procedure (100) according to Claim 1 , wherein the method further comprises: forming triples of the form < z I , z i r , z j o > , where z I is an input signal embedding vector, z i r a knowledge graph relationship embedding vector and z j o a knowledge graph entity embedding vector, where a regularization loss function (L reg ) is used to maximize a rating function for triples corresponding to a true statement and to minimize a rating function for triples corresponding to a false statement.
Procedure (100) according to Claim 2 , where the knowledge graphene embedding vectors z j o are represented as Gaussian embeddings.
Procedure (100) according to Claim 2 or 3 , where a number N I · N 0 · N R of triples of the form < < z I , z i r , z j o > is formed where N I is the number of training examples, N o is the number of knowledge graph entities (22, 24, 26, 28, 30, 32, 34, 36, 38, 40, 42, 44, 46) and N R is the number of knowledge graph relationships (22-1, 26-1, 26-2, 28-1, 30-1, 30-2, 32-1, 34-1, 38-1, 40-1, 44-1, 46-1).
Procedure (100) according to one of the Claims 1 until 4 , wherein the classification device (10) is set up to classify images, wherein the training examples (16 1 , ..., 16 Nb ) are training example images.
Procedure (100) according to one of the Claims 1 until 5 , where the knowledge graph (20) object categories (24, 36, 42) and object category elements (22, 26, 28, 30, 32, 34, 38, 40, 44, 46) are the knowledge graph entities (22, 24, 26, 28, 30, 32, 34, 36, 38, 40, 42, 44, 46) and relationships between the object categories (24, 36, 42) and the object category elements (22, 26, 28, 30, 32, 34, 38, 40, 44, 46) are the knowledge graph relationships (22-1, 26-1, 26-2, 28-1, 30-1, 30-2, 32-1, 34-1, 38-1, 40-1, 44-1, 46-1).
Procedure (100) according to Claim 6 , where the visual prior knowledge for the object category elements (22, 26, 28, 30, 32, 34, 38, 40, 44, 46) is provided in the form of synthetically generated images.
Procedure (100) according to Claim 7 , where the object category elements (22, 26, 28, 30, 32, 34, 38, 40, 44, 46) include: object category elements that correspond to images taken with a camera, and object category elements that correspond to the synthetically generated images.
Method for classifying sensor data, wherein the method comprises: - Training a classification device (10) according to a method (100) according to one of the Claims 1 until 8 , - Acquisition of sensor data, - Classify the acquired sensor data using the classification device (10).
Data processing facility that is set up to carry out a procedure according to one of the Claims 1 until 9 to carry out.
A computer program with instructions which, when executed by a processor, cause the processor to perform a procedure according to one of the Claims 1 until 9 carries out.
A computer-readable medium that stores instructions which, when executed by a processor, cause the processor to perform a procedure according to one of the Claims 1 until 9 carries out.
Control system (204) for controlling an actuator (210), wherein the control system (204) is a method (100) according to one of the Claims 1 until 8 has a trained classification facility (10).

Description

The present disclosure relates to a computer-implemented method for training a classification system. Machine learning techniques that involve training on data (sensor data) frequently encounter overfitting problems. These problems arise particularly when there are significant differences between the training and target domains, for example, when the data distribution in the training domain differs from that in the target domain. This impairs the predictive capability of the classification system. It is therefore desirable to provide a method for training a classification system that can achieve high prediction accuracy even when the training domain differs from the target domain. According to a first aspect of the present disclosure, a computer-implemented method for training a classification device is provided, comprising: an embedding part (e.g., an encoder) configured to provide input signal embeddings by embedding input signals in a latent space, and a classification part (e.g., a decoder) configured to determine a classification based on an input signal embedding, wherein the method comprises: - Providing a training dataset containing multiple training examples, each training example having an input signal and a desired classification, and the training dataset containing synthetically generated training examples as visual prior knowledge, - Generating a knowledge graph that contains additional information as symbolic prior knowledge, linked to at least one desired classification, where the additional information is represented by several knowledge graph entities (nodes) and by several knowledge graph relationships that link the knowledge graph entities. - Providing input signal embeddings by embedding the input signals of the training dataset into the latent space, - Providing knowledge graph embeddings by embedding the knowledge graph in the latent space, - Performing training based on the input signal embeddings and the knowledge graph embeddings according to a training objective function composed of: • a regularization loss function to align the input signal embeddings with the knowledge graph embeddings in the latent space, and • a cross-entropy loss function to assign the input signals to the corresponding classifications. The method described in this disclosure allows for the modeling of numerous relationships in latent space that go beyond "similar" or "dissimilar." By aligning the input signal embeddings with the knowledge graph embeddings, a regularization, such as an adjustment, of the latent space can be achieved, thereby improving the generalizability of the classification device. The regularization loss function can be, for example, a categorical or a relational loss function. A categorical loss function ensures that an input signal embedding lies within the node distribution of the knowledge graph. A relational loss function ensures that an input signal embedding in the knowledge graph has defined relationships with other node distributions. Providing input signal embeddings is known. An exemplary method for this is described, for example, in A. Dosovitskiy et al., “An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale,” arXiv:2010.11929 It is revealed. The provision of knowledge graph embeddings is also known. By way of example, reference is made to the methods described in S. Monka et al.: “Learning Visual Models using a Knowledge Graph as a Trainer”, arXiv:2102.08747 or in S. Monka et al.: “Context-driven Visual Object Recognition based on Knowledge Graphs”, arXiv:2210.11233 are revealed. Knowledge graphs store information about real-world entities and their relationships in the form of triples, such as (subject, predicate, object). The semantic and structural information of these triples is preserved even after being embedded in a knowledge graph. This semantic and structural information can be used in the procedure according to the present disclosure for the regularization of the latent space. The procedure can further include: forming triples of form <zI,zir,zjo>, where z I is an input signal embedding vector, zir Let zl be a knowledge graph relationship embedding vector and zl a knowledge graph entity embedding vector, where a regularization loss function is used to maximize a weighting function for triples corresponding to a true statement and to minimize it for triples corresponding to a false statement. The knowledge graph entity embedding vectors zjo They can be represented (implemented) as Gaussian embeddings. This allows inclusion relationships, such as "is a subclass of", to be represented precisely. It may also be provided that a number N I · N o · N R of triples of the form <zI,zir,zjo> is formed where N I is the number of training examples, N o is the number of knowledge graph entities, and N R is the number of knowledge graph relationships. This further development enables maximum use of the information contained in