DE-102025144165-A1 - OPTIONAL ADJUSTMENT OF PRE-TRAINED NEURAL NETWORK MODELS
Abstract
A procedure adapts a pre-trained neural network model that has layers of pre-trained weight matrices. At least one layer is augmented with several parameter-efficient adaptation modules, each module being associated with a specific learnable scoring parameter. The model is fine-tuned on a target dataset while the pre-trained weight matrices are kept in a frozen state. The fine-tuning process involves performing a forward pass, where an indicator function selectively applies a weight update from each module based on its scoring parameter and a threshold. An overall loss score is determined from a task-specific loss and a sparse-inducing regularization expression. Parameters of the adaptation modules and the scoring parameters are updated based on the overall loss score. A final fine-tuned model, featuring a sparse subset of activated adaptation modules, is provided for an inference task.
Inventors
- Reza Akbarian Bafghi
- Carden Bagwell
- Avinash Aghoram Ravichandran
Assignees
- GM CRUISE HOLDINGS LLC
Dates
- Publication Date
- 20260513
- Application Date
- 20251029
- Priority Date
- 20250912
Claims (10)
- A computer-implemented procedure executed in data processing hardware that causes the data processing hardware to perform operations including: Obtaining a pre-trained neural network model comprising multiple layers, each layer containing multiple corresponding pre-trained weight matrices; Extending at least one of the multiple layers with multiple parameter-efficient fitting modules, each parameter-efficient fitting module being configured to generate a respective weight update matrix based on a low-rank factorization and associated with a respective learnable scoring parameter; Fine-tuning the pre-trained neural network model on a target dataset to produce a fine-tuned model while keeping the pre-trained weight matrices in a frozen state, where the fine-tuning for each of several training iterations includes: Performing a forward pass, where for each parameter-efficient fitting module an indicator function selectively applies the respective weight update matrix based on a comparison of the respective learnable rating parameter with a predefined threshold; Determining an overall loss score based on a combination of a task-specific loss and a regularization expression, where the regularization expression is configured to induce thin staffing by applying a penalty proportional to a norm of the respective learnable rating parameters; and updating parameters of the multiple parameter-efficient fitting modules and the respective learnable evaluation parameters based on the total loss value, and providing the fine-tuned model, comprising a sparse subset of activated parameter-efficient fitting modules, to perform an inference task.
- Procedure according to Claim 1 , where the parameter-efficient adaptation modules are low-rank adaptation modules (LoRA modules).
- Procedure according to Claim 1 , where the regularization expression is proportional to a sum of the amounts of the respective learnable evaluation parameters.
- Procedure according to Claim 1 , where the pre-trained neural network model is a transformer-based model chosen from the group consisting of a vision model and a vision/language model.
- Procedure according to Claim 4 , wherein the parameter-efficient adaptation modules are extended to at least one query component, one key component, one value component or one control network component in a transformer block of the pretrained neural network model.
- Procedure according to Claim 1 , where the sparse subset of activated parameter-efficient matching modules comprises less than twenty-five percent of the multiple parameter-efficient matching modules that are extended to the at least one layer.
- Procedure according to Claim 1 , where fine-tuning the pre-trained neural network model further includes setting a hyperparameter that controls a magnitude of penalty applied by the regularization expression to control a trade-off between accuracy on the target dataset and performance on a distribution-outside dataset.
- Procedure according to Claim 1 , wherein the target dataset corresponds to a first vehicle operating context and the operations further comprise: generating a second fine-tuned model by fine-tuning the pre-trained neural network model on a second target dataset corresponding to a second vehicle operating context, wherein the second fine-tuned model comprises a second sparse subset of activated parameter-efficient fitting modules; and selecting, based on a current operating context of a vehicle, either one of the fine-tuned model or the second fine-tuned model to perform the inference task.
- Procedure according to Claim 1 , where performing the inference task using the fine-tuned model requires fewer floating-point operations per second (FLOPs) than performing the inference task using a second fine-tuned model in which the regularization expression is omitted from the total loss value during fine-tuning.
- Procedure according to Claim 1 , where the target dataset comprises sensor data provided by a The vehicle was recorded, and the inference task involves processing real-time sensor data from the vehicle to provide an output for the vehicle's advanced driver assistance system (ADAS).
Description
CROSS-REFERENCE TO RELATED REGISTRATION This application claims priority under 35 USC, § 119(e) of the preliminary US application with file number [Ifd. No.]. 63/720,110 , filed on 13 November 2024. The disclosure of this previous application is considered as part of the disclosure of this application and is fully incorporated herein by reference. INTRODUCTION The information provided in this section serves the purpose of providing a general overview of the context of the disclosure. Neither the work of the inventors currently named, to the extent described in this section, nor those aspects of the description that could not otherwise qualify as prior art at the time of filing, are expressly or implicitly recognized as prior art against the present disclosure. The present disclosure relates generally to machine learning and in particular to systems and methods for adapting pre-trained neural network models to new tasks or domains. Extensive, pre-trained neural network models, such as vision models and vision/speech models, are often trained on large datasets and are known for their ability to generalize across a wide variety of applications. To apply these general-purpose models to a more specialized task, such as object detection in a specific environment or domain-specific image classification, a process of adaptation or fine-tuning is typically performed. This adaptation aims to improve the model's performance on the new, specialized dataset. One solution for adaptation is full fine-tuning, which involves retraining a substantial portion or all of a model's pre-trained weights using the new task-specific data. This process can be computationally intensive, requiring significant processing resources and time. Furthermore, a recognized challenge associated with full fine-tuning is a phenomenon known as catastrophic forgetting. In this scenario, as the model adapts to the new task, its performance on its original general-purpose tasks or on other tasks outside the distribution can decline significantly. This can also reduce the model's zero-shot classification and retrieval capabilities, which are valuable properties of the original pre-trained model. SUMMARY One aspect of the disclosure provides a procedure that is executed in data processing hardware and causes the data processing hardware to perform operations. These operations include obtaining a pre-trained neural network model that has multiple layers, each layer containing multiple corresponding pre-trained weight matrices. The operations include extending at least one of the multiple layers with multiple parameter-efficient fitting modules. Each of the multiple parameter-efficient fitting modules is configured to generate a respective weight update matrix based on a low-rank factorization and is associated with a respective learnable scoring parameter. The operations include fine-tuning the pre-trained neural network model against a target dataset to produce a fine-tuned model while keeping the pre-trained weight matrices in a frozen state. The fine-tuning includes, for each of several training iterations, performing a forward pass, determining a total loss score, and updating parameters. During the forward pass, for each parameter-efficient fitting module, an indicator function selectively applies the respective weight update matrix based on a comparison of the respective learnable rating parameter with a predefined threshold. The total loss value is determined based on a combination of a task-specific loss and a regularization expression, where the regularization expression is configured to induce sparsity by applying a penalty proportional to a norm of the respective learnable rating parameters. The parameters of the multiple parameter-efficient fitting modules and the respective learnable rating parameters are updated based on the total loss value. The operations also include deploying the fine-tuned model, which has a sparse subset of activated parameter-efficient fitting modules, to perform an inference task. Implementations of the disclosure may include one or more of the following optional features. The parameter-efficient adaptation modules are included. In some implementations, the parameter-efficient adaptation modules are low-rank adaptation modules (LoRA modules). The regularization expression can be proportional to a sum of the magnitudes of the respective learnable evaluation parameters. Optionally, the pretrained neural network model is a transformer-based model chosen from the group consisting of a view model and a view/language model. In some of these examples, the parameter-efficient adaptation modules are extended to at least one query component, key component, value component, or pre-control network component in a transformer block of the pretrained neural network model. In some examples, the sparse subset of activated parameter-efficient fitting modules contains less than 25 percent of the multiple parameter-efficient fitting m