EP-4740135-A1 - MULTITASK MACHINE LEARNING USING DISJOINT DATASETS

EP4740135A1EP 4740135 A1EP4740135 A1EP 4740135A1EP-4740135-A1

Abstract

Certain aspects of the present disclosure provide techniques and apparatus for improved machine learning. A first dataset comprising one or more labeled exemplars for a first machine learning task and a second dataset comprising one or more labeled exemplars for a second machine learning task are accessed. A combined loss is generated based on the first and second datasets, comprising generating a supervised loss for the first machine learning task based on the one or more labeled exemplars from the first dataset, and generating a self-supervised loss for the first machine learning task based on the one or more labeled exemplars from the second dataset. One or more parameters of a multitask machine learning model are updated based on the combined loss.

Inventors

JANSSON MINNE, Teo, Desmond, Per, Siki, Amon, Lee
DABBAGHCHIAN, SAEED
MALEK-MOHAMMADI, Mohammadreza

Assignees

Qualcomm Incorporated

Dates

Publication Date: 20260513
Application Date: 20240503

Claims (20)

1. A processing system comprising: a memory comprising computer-executable instructions; and one or more processors configured to execute the computer-executable instructions and cause the processing system to: access a first dataset comprising one or more labeled exemplars for a first machine learning task; access a second dataset comprising one or more labeled exemplars for a second machine learning task; generate a combined loss based on the first and second datasets, wherein to generate the combined loss, the one or more processors are configured to execute the computer-executable instructions to cause the processing system to: generate a first supervised loss for the first machine learning task based on the one or more labeled exemplars from the first dataset; and generate a first self-supervised loss for the first machine learning task based on the one or more labeled exemplars from the second dataset; and update one or more parameters of a multitask machine learning model based on the combined loss.
2. The processing system of claim 1, wherein to generate the combined loss, the one or more processors are configured to execute the computer-executable instructions to cause the processing system to aggregate the first supervised loss and the first selfsupervised loss based at least in part on a first weight for the first self-supervised loss, the first weight being determined based on a current epoch of training the multitask machine learning model.
3. The processing system of claim 2, wherein the first weight is assigned a relatively lower value during relatively earlier epochs of training the multitask machine learning model, as compared to relatively later epochs of training the multitask machine learning model.
4. The processing system of claim 2, wherein the first supervised loss and the first self-supervised loss are aggregated based further on a second weight for the first machine learning task, the second weight having a constant value during training of the multitask machine learning model.
5. The processing system of claim 1, wherein to generate the combined loss, the one or more processors are configured to execute the computer-executable instructions to cause the processing system to generate a second supervised loss for the second machine learning task based on the one or more labeled exemplars from the second dataset.
6. The processing system of claim 1, wherein to generate the combined loss, the one or more processors are configured to execute the computer-executable instructions to cause the processing system to generate a second self-supervised loss for the second machine learning task based on the one or more labeled exemplars from the first dataset.
7. The processing system of claim 1, wherein to generate the first self-supervised loss, the one or more processors are configured to execute the computer-executable instructions to cause the processing system to, for a first labeled exemplar from the second dataset: generate a first output based on the first labeled exemplar augmented according to a first set of augmentations; generate a second output based on the first labeled exemplar augmented according to a second set of augmentations; generate a pseudo-label based on modifying the first output using the first and second sets of augmentations; and compare the pseudo-label and the second output.
8. The processing system of claim 1, wherein the multitask machine learning model comprises an encoder component shared by both the first and second machine learning tasks, a first decoder component for the first machine learning task, and a second decoder component for the second machine learning task.
9. The processing system of claim 1, wherein the first and second machine learning tasks are computer vision tasks and comprise at least one of: monocular depth estimation, semantic segmentation, object detection, surface normal estimation, or edge detection.
10. A processor-implemented method, comprising: accessing a first dataset comprising one or more labeled exemplars for a first machine learning task; accessing a second dataset comprising one or more labeled exemplars for a second machine learning task; generating a combined loss based on the first and second datasets, comprising: generating a first supervised loss for the first machine learning task based on the one or more labeled exemplars from the first dataset; and generating a first self-supervised loss for the first machine learning task based on the one or more labeled exemplars from the second dataset; and updating one or more parameters of a multitask machine learning model based on the combined loss.
11. The processor-implemented method of claim 10, wherein generating the combined loss further comprises aggregating the first supervised loss and the first selfsupervised loss based at least in part on a first weight for the first self-supervised loss, the first weight being determined based on a current epoch of training the multitask machine learning model.
12. The processor-implemented method of claim 11, wherein the first weight is assigned a relatively lower value during relatively earlier epochs of training the multitask machine learning model, as compared to relatively later epochs of training the multitask machine learning model.
13. The processor-implemented method of claim 11, wherein the first supervised loss and the first self-supervised loss are aggregated based further on a second weight for the first machine learning task, the second weight having a constant value during training of the multitask machine learning model.
14. The processor-implemented method of claim 10, wherein generating the combined loss further comprises generating a second supervised loss for the second machine learning task based on the one or more labeled exemplars from the second dataset.
15. The processor-implemented method of claim 10, wherein generating the combined loss further comprises generating a second self-supervised loss for the second machine learning task based on the one or more labeled exemplars from the first dataset.
16. The processor-implemented method of claim 10, wherein generating the first self-supervised loss comprises, for a first labeled exemplar from the second dataset: generating a first output based on the first labeled exemplar augmented according to a first set of augmentations; generating a second output based on the first labeled exemplar augmented according to a second set of augmentations; generating a pseudo-label based on modifying the first output using the first and second sets of augmentations; and comparing the pseudo-label and the second output.
17. The processor-implemented method of claim 10, wherein the multitask machine learning model comprises an encoder component shared by both the first and second machine learning tasks, a first decoder component for the first machine learning task, and a second decoder component for the second machine learning task.
18. The processor-implemented method of claim 10, wherein the first and second machine learning tasks are computer vision tasks and comprise at least one of: monocular depth estimation, semantic segmentation, object detection, surface normal estimation, or edge detection.
19. A non-transitory computer-readable medium comprising computer-executable instructions that, when executed by one or more processors of a processing system, cause the processing system to: access a first dataset comprising one or more labeled exemplars for a first machine learning task; access a second dataset comprising one or more labeled exemplars for a second machine learning task; generate a combined loss based on the first and second datasets, wherein to generate the combined loss, the one or more processors are configured to execute the computer-executable instructions to cause the processing system to: generate a first supervised loss for the first machine learning task based on the one or more labeled exemplars from the first dataset; and generate a first self-supervised loss for the first machine learning task based on the one or more labeled exemplars from the second dataset; and update one or more parameters of a multitask machine learning model based on the combined loss.
20. The non-transitory computer-readable medium of claim 19, wherein to generate the combined loss, the one or more processors are configured to execute the computerexecutable instructions to cause the processing system to aggregate the first supervised loss and the first self-supervised loss based at least in part on a first weight for the first self-supervised loss, the first weight being determined based on a current epoch of training the multitask machine learning model.

Description

MULTITASK MACHINE LEARNING USING DISJOINT DATASETS CROSS-REFERENCE TO RELATED APPLICATION [0001] This application claims priority to U.S. Patent Application No. 18/346,374, filed July 3, 2023, which is assigned to the assignee hereof and hereby expressly incorporated by reference in its entirety as if fully set forth below and for all applicable purposes. INTRODUCTION [0002] Aspects of the present disclosure relate to machine learning. [0003] Multitask learning, where a single machine learning model (e.g., a neural network) is trained to perform multiple tasks, has become increasingly useful in a variety of environments (e.g., in autonomous driving applications). Generally, supervised training of a multitask model demands large amounts of multi-labeled ground truth data (e.g., where each input sample has multiple labels or ground truths — one for each desired task). Labeled data (even for a single task) is generally difficult and/or expensive to obtain (e.g., it is often not feasible to generate highly granular ground truth for depth estimation tasks). As such, multi-labeled data is virtually non-existent in most scenarios. Additionally, when such multi-labeled data can be created, this data is generally very expensive in terms of cost and computational resources used. Further, in some conventional systems, new features or tasks cannot be added or learned without a massive re-gathering of such multi-labeled data. BRIEF SUMMARY [0004] Certain aspects provide a method, comprising: accessing a first dataset comprising one or more labeled exemplars for a first machine learning task; accessing a second dataset comprising one or more labeled exemplars for a second machine learning task; generating a combined loss based on the first and second datasets, comprising generating a first supervised loss for the first machine learning task based on the one or more labeled exemplars from the first dataset; and generating a first self-supervised loss for the first machine learning task based on the one or more labeled exemplars from the second dataset; and updating one or more parameters of a multitask machine learning model based on the combined loss. [0005] Other aspects provide processing systems configured to perform the aforementioned methods as well as those described herein; non-transitory, computer- readable media comprising instructions that, when executed by one or more processors of a processing system, cause the processing system to perform the aforementioned methods as well as those described herein; a computer program product embodied on a computer-readable storage medium comprising code for performing the aforementioned methods as well as those further described herein; and a processing system comprising means for performing the aforementioned methods as well as those further described herein. [0006] The following description and the related drawings set forth in detail certain illustrative features of one or more aspects. BRIEF DESCRIPTION OF THE DRAWINGS [0007] The appended figures depict certain features of one or more aspects of the present disclosure and are therefore not to be considered limiting of the scope of this disclosure. [0008] FIG. 1 illustrates an example environment for multitask machine learning using disjoint datasets. [0009] FIG. 2 illustrates an example workflow for multitask training using disjoint datasets. [0010] FIG. 3 depicts an example workflow for self-supervised training in disjoint multitask learning. [0011] FIG. 4 is a flow diagram depicting an example method for training multitask machine learning models using disjoint datasets. [0012] FIG. 5 is a flow diagram depicting an example method for aggregating losses to train multitask machine learning models using disjoint datasets. [0013] FIG. 6 is a flow diagram depicting an example method for training a multitask machine learning model. [0014] FIG. 7 depicts an example processing system configured to perform various aspects of the present disclosure. [0015] To facilitate understanding, identical reference numerals have been used, where possible, to designate identical elements that are common to the drawings. It is contemplated that elements and features of one aspect may be beneficially incorporated in other aspects without further recitation. DETAILED DESCRIPTION [0016] Aspects of the present disclosure provide apparatuses, methods, processing systems, and computer-readable mediums for multitask machine learning models based on disjoint datasets. [0017] In some conventional approaches, supervised learning can be used to train multitask models. However, such supervised approaches often rely on training (and evaluation) data with ground truth annotations for all tasks to be learned by the model. As the quantity and quality of training data often has a vital impact on model performance, this reliance on multi-labeled data is a major limiting factor. Collecting and labeling such data is generally a costly (and impractical) pro