US-20260127491-A1 - Device for Training, in Particular Continuously, a Machine Learning Model

US20260127491A1US 20260127491 A1US20260127491 A1US 20260127491A1US-20260127491-A1

Abstract

A device for training, in particular continuously, a machine learning model is disclosed. The device includes an evaluation and computing device that is designed to perform the following steps (i) providing trainable task feature vectors and associated, pre-known task modules for a selection of pre-known tasks from a data pool containing data on tasks that have been solved using the pre-known task modules of the machine learning model, (ii) providing a task input embedding, a trainable task module with an associated trainable task feature vector for a task input to be solved, (iii) calculating comparison weights between the task input embedding and the trainable task feature vectors to form a weighted sum, in particular a temporary weighted sum, from the task modules and the comparison weights, (iv) combining the weighted sum with the machine learning model, and (v) training the machine learning model to solve the task input by training the task feature vector and the task module depending on a training criterion.

Inventors

MingYang Wang
Heike Adel-Vu
Lukas Lange

Assignees

ROBERT BOSCH GMBH

Dates

Publication Date: 20260507
Application Date: 20250501
Priority Date: 20241104

Claims (9)

1 . A device for training a machine learning model, wherein the device comprises an evaluation and computing device that is designed to perform the following: providing trainable task feature vectors and associated, pre-known task modules for a selection of pre-known tasks from a data pool containing data on tasks that have been solved using the pre-known task modules of the machine learning model; providing a task input embedding, a trainable task module with an associated trainable task feature vector for a task input to be solved; calculating comparison weights between the task input embedding and the trainable task feature vectors to form a temporary, weighted sum, from the task modules and the comparison weights; combining the weighted sum with the machine learning model; and training the machine learning model to solve the task input by training the task feature vector and the task module depending on a training criterion.
2 . The device according to claim 1 , wherein, after training the machine learning model to solve the task input, the comparison weight of the task module is compared with a predefined threshold value, and wherein, based on the comparison, it is determined whether the task module is to be removed from the weighted sum of task modules or remains therein.
3 . The device according to claim 1 , wherein training the machine learning model to solve the task input includes finding the task module and the task feature vector which minimize a loss of cross-entropy of training examples and simultaneously maximize a cosine similarity between the task-specific task feature vector and the corresponding task input embedding.
4 . The device according to claim 1 , wherein the machine learning model comprises a language model or a convolutional neural network.
5 . The device according to claim 1 , wherein calculating the comparison weights comprises calculating the cosine similarity between the task input embedding and the trainable task feature vectors.
6 . The device according to claim 1 , wherein the task modules each have parameter-efficient fine-tuning parameters.
7 . The device according to claim 1 , wherein the plurality of tasks and/or the input task comprises/comprise data classification and/or set classification and/or multilingual sentiment analysis and/or image classification and/or control of a technical system and/or provision of a knowledge database for controlling a technical system.
8 . The device according to claim 1 , wherein the device for training the machine learning model is a device for continuously training the machine learning model.
9 . The device according to claim 1 . wherein the machine learning model comprises a large language model or a convolutional neural network.

Description

This application claims priority under 35 U.S.C. § 119 to patent application no. DE 20 2024 106 283.3, filed on Nov. 4, 2024 in Germany, the disclosure of which is incorporated herein by reference in its entirety. The disclosure relates to a device for training a machine learning model, in particular continuously. BACKGROUND Machine learning should be based on a continuous learning approach in order to evolve continuously and remain effective over time. There are three particular challenges to continuous learning: On the one hand, forgetting what has already been learned should be avoided, i.e., newly acquired information should not interfere with or impair previously acquired knowledge. Furthermore, knowledge transfer should be facilitated, i.e., knowledge gained from previous tasks should be reused for the efficient learning of new tasks. Parameter efficiency should also be maintained, i.e., machine learning models must remain lightweight and effective even when the continuous learning sequence scales to hundreds of tasks. To address these challenges, previous work has employed the idea of parameter isolation and parameter-efficient fine-tuning for continuous learning. Despite its effectiveness in terms of performance in continuous learning tasks, the progressive expansion of task-specific parameters, when the tasks in a continuous learning sequence number in the hundreds, leads to parameter efficiency and significantly increases computing and memory costs. It is an object of the disclosure to provide an improved device. The task is solved by a device according to the features set forth below. SUMMARY According to a first aspect, a device for training a machine learning model, in particular continuously, is proposed, wherein the device comprises an evaluation and computing device that is designed to perform the following steps: providing trainable task feature vectors v0 . . . Vm-1 and associated, pre-known task modules P0 . . . Pm-1 for a selection of pre-known tasks from a data pool containing data on tasks that have been solved using the pre-known task modules P0 . . . Pm-1 of the machine learning model;providing a task input embedding Xn, a trainable task module Pm with an associated trainable task feature vector vm for a task input Tn to be solved;calculating comparison weights α1 . . . αm between the task input embedding (xn) and the trainable task feature vectors v0 . . . vm to form a weighted sum P′m, in particular a temporary weighted sum, from the task modules P0 . . . Pm and the comparison weights α1 . . . αm;combining the weighted sum P′m with the machine learning model; andtraining the machine learning model to solve the task input Tn by training the task feature vector vm and the task module Pm depending on a training criterion. It is understood that the steps according to the disclosure and further optional steps do not necessarily have to be carried out in the order shown, but may also be carried out in a different order. Furthermore, intermediate steps may also be provided. The individual steps may also comprise one or more sub-steps without departing from the scope of the disclosure. In this paper, we propose a machine learning model that follows a continuous learning approach (also referred to as MoCL-P for short). This machine learning model offers a lightweight approach to continuous learning that uses task-oriented module composition and adaptive thinning to address the three challenges of continuous learning mentioned above, either in whole or in part. The machine learning model and the associated method are not only characterized by their high performance, but also surpass previous algorithms in terms of parameter efficiency many times over, as demonstrated by a corresponding benchmark. MoCL-P proposes a sustainable path for continuous learning so that models remain simple and effective even as they evolve with increasing tasks and continue to evolve as tasks increase. The present device proposes a solution for parameterized, isolated, continuous learning. In this process, task-specific parameters are assigned to each task in the continuous learning sequence. Each task-specific module is preferably “frozen” as soon as training for a specific task is complete, thus preventing catastrophic forgetting, as the knowledge in the respective specific module is retained for the subsequent training process. Furthermore, the present approach features modular and compositional learning and ensures that the machine learning model exhibits effective knowledge transfer through the reuse of relevant knowledge from previous tasks. In addition, there may be a set of parameters that is used for all tasks. These make up the majority of the parameters and have been pre-trained (such as BERT or DINO). These are also frozen. The task-specific modules are then inserted between the existing layers. The present disclosure can be used to analyze data of various types that can be represented with vector repre