WO-2026096212-A1 - DYNAMICALLY SWITCHING PARAMETER-EFFICIENT FINE-TUNING PROFILES DURING MACHINE-LEARNED MODEL EXECUTION

WO2026096212A1WO 2026096212 A1WO2026096212 A1WO 2026096212A1WO-2026096212-A1

Abstract

An example method includes inputting, to a machine-learned model instantiated with a first parameter profile, a first input; generating, by a computing system executing the machine-learned model instantiated with the first parameter profile, and based on the first input, one or more first outputs; mapping one or more profile identifier elements of the one or more first outputs to a second parameter profile; inputting, to the machine-learned model instantiated with the second parameter profile, a second input; generating, by the computing system executing the machine-learned model instantiated with the second parameter profile, and based on the second input, one or more second outputs; and generating a response based on the one or more first outputs and the one or more second outputs.

Inventors

HARTMANN, Florian Nils
SHARIFI, MATTHEW

Assignees

GDM HOLDING LLC

Dates

Publication Date: 20260507
Application Date: 20251016
Priority Date: 20241031

Claims (20)

1. A computer-implemented method, comprising: inputting, to a machine-learned model instantiated with a first parameter profile that defines one or more first learned parameter values for the machine-learned model, a first input; generating, by a computing system executing the machine-learned model instantiated with the first parameter profile, and based on the first input, one or more first outputs; mapping one or more profile identifier elements of the one or more first outputs to a second parameter profile, wherein the second parameter profile defines one or more second learned parameter values for the machine-learned model; inputting, to the machine-learned model instantiated with the second parameter profile, a second input, wherein the second input is based on at least one of the first input or the one or more first outputs; generating, by the computing system executing the machine-learned model instantiated with the second parameter profile, and based on the second input, one or more second outputs; and generating a response based on the one or more first outputs and the one or more second outputs.
2. The computer-implemented method of claim 1, comprising: loading, into a memory of the computing system, a first plurality of values respectively for a plurality of learned parameters of the machine-learned model, the first plurality of values defined according to the first parameter profile; generating the one or more first outputs using the first plurality of values; loading, into the memory, a second plurality of values respectively for the plurality of learned parameters, the second plurality of values defined according to the second parameter profile; and generating the one or more second outputs using the second plurality of values.
3. The computer-implemented method of claim 2, wherein the second parameter profile defines delta values, and wherein the second plurality of values are obtained by combining the delta values with corresponding baseline values for the plurality of learned parameters.
4. The computer-implemented method of claim 3, wherein the first plurality of values are the baseline values.
5. The computer-implemented method of any one of claims 2 to 4, wherein the loading, into the memory, of the second plurality of values is performed responsive to the mapping of the one or more profile identifier elements to the second parameter profile.
6. The computer-implemented method of any preceding claim, wherein the second input comprises the first input.
7. The computer-implemented method of claim 6, wherein the second input comprises the one or more first outputs.
8. The computer-implemented method of claim 6 or 7, comprising: generating one or more first-profile activations based on the first input using the first parameter profile; generating the one or more first outputs based on the first-profile activations based on the first input; generating one or more second-profile activations based on the first input using the second parameter profile; and generating the one or more second outputs based on the second-profile activations based on the first input.
9. The computer-implemented method of claim 8, wherein: the one or more first-profile activations based on the first input comprises attention values computed between elements of the first input using the one or more first learned parameter values; the one or more second-profile activations based on the first input comprises attention values computed between the elements in the first input using the one or more second learned parameter values.
10. The computer-implemented method of claim 8 or 9. comprising: caching the one or more first-profile activations based on the first input; mapping, to the first parameter profile, one or more second profile identifier elements generated by the machine-learned model based on a third input that comprises the first input; generating one or more first-profile activations based on the third input using the one or more first learned parameter values, wherein generating the one or more first-profile activations based on the third input comprises, for a portion of the third input corresponding to the first input, retrieving the cached one or more first-profile activations based on the first input; generating, by the computing system executing the machine-learned model instantiated with the first parameter profile, and based on the one or more first-profile activations based on the third input, one or more third outputs.
11. The computer-implemented method of claim 10, wherein: the third input is the second input; and the one or more second outputs comprise the one or more second profile identifier elements.
12. The computer-implemented method of any proceeding claim, wherein generating the one or more first outputs comprises: generating a swap profile element that signals a profile swap; and generating the one or more profile identifier elements.
13. The computer-implemented method of claim 12, wherein the swap profile element is a token sampled from an output vocabulary of tokens of the machine-learned model based on a prediction value associated with the swap profile element, the prediction value conditioned on one or more preceding tokens in a context window of the machine-learned model.
14. The computer-implemented method of any preceding claim, comprising: training the one or more first parameter values using a first training dataset, wherein training the one or more first parameter values comprises: for a respective batch of one or more first training examples in the first training dataset: inputting, to the machine-learned model, at least a portion of the respective batch of one or more first training examples; generating, by the machine-learned model instantiated with the first parameter profile, one or more respective first outputs; computing a first respective loss based on the one or more respective first outputs; and generating, based on the first respective loss, a first respective training update for the first parameter profile; training the one or more second parameter values using a second training dataset, wherein training the one or more second parameter values comprises: for a respective batch of one or more second training examples in the second training dataset: inputting, to the machine-learned model, at least a portion of the respective batch of one or more second training examples; generating, by the machine-learned model instantiated with the second parameter profile, one or more respective second outputs; computing a second respective loss based on the one or more respective second outputs; and generating, based on the second respective loss, a second respective training update for the second parameter profile.
15. The computer-implemented method of claim 14, comprising: storing the one or more first parameter values in association with a first identifier; and storing the one or more second parameter values in association with a second identifier indicated by the one or more profile identifier elements.
16. The computer-implemented method of any preceding claim, comprising: training the one or more first parameter values using a training example that comprises the one or more profile identifier elements; wherein training the one or more first parameter values using the training example that comprises the one or more profile identifier elements comprises: providing a masked training input to the machine-learned model instantiated with the first parameter profile, wherein the masked training input comprises a portion of the training example with the one or more profile identifier elements masked; generating, by the machine-learned model instantiated with the first parameter profile, one or more training outputs associated with one or more training output tokens; computing a training loss that indicates an alignment between the one or more training outputs and the masked one or more profile identifier elements; and training the first parameter profile based on the training loss.
17. The computer-implemented method of claim 16, comprising: sampling tokens from the training example to mask based on a distribution over the tokens in the training example, wherein one or more distribution values associated with the one or more profile identifier elements are selected to indicate a higher likelihood of being sampled, on a normalized basis, than a baseline value associated with a proportion of training example corresponding to the one or more profile identifier elements.
18. The computer-implemented method of claim 16, comprising: training the one or more first parameter values using a first training dataset: wherein training the one or more first parameter values comprises: for a respective batch of one or more first training examples in the first training dataset: inputting, to the machine-learned model, at least a portion of the respective batch of one or more first training examples; generating, by the machine-learned model instantiated with the first parameter profile, one or more first respective outputs; computing a first respective loss based on the one or more first respective outputs; and generating, based on the first respective loss, a first respective training update for the first parameter profile; training the one or more second parameter values using a second training dataset, wherein training the one or more second parameter values comprises: for a respective batch of one or more second training examples in the second training dataset: inputting, to the machine-learned model, at least a portion of the respective batch of one or more second training examples; generating, by the machine-learned model instantiated with the second parameter profile, one or more second respective outputs; computing a second respective loss based on the one or more second respective outputs; and generating, based on the second loss, a second training update for the second parameter profile.
19. The computer-implemented method of any one of claims 14 to 18, comprising: inputting training example source material associated with a candidate training example to a machine-learned example generation model; generating, by the machine-learned example generation model and based on the training example source material, an output indicating a proposed profile swap from the first parameter profile to the second parameter profile; computing a performance measure of the machine-learned model using the proposed profile swap over the training example source material; updating, based on the performance measure, the training example to include the proposed profile swap; and storing the candidate training example in a training dataset.
20. A computer-implemented method of generating training data for training a machine- learned sequence processing model to swap parameter profiles, the method comprising: inputting, to a training sequence multiplexer, first training example source material associated with a first domain; inputting, to the training sequence multiplexer, second training example source material associated with a second domain; generating, by the training sequence multiplexer, a multiplexed training sequence comprising: first elements corresponding to the first training example source material; one or more elements indicating a parameter profile associated with the second domain; and second elements corresponding to the second training example source material; and storing the multiplexed training sequence in a training dataset.

Description

DYNAMICALLY SWITCHING PARAMETER-EFFICIENT FINE-TUNING PROFILES DURING MACHINE-LEARNED MODEL EXECUTION BACKGROUND [0001] A computer can receive inputs. The computer can execute instructions to process the inputs to generate outputs using a parameterized model. The computer can obtain feedback on its performance in generating the outputs with the model. The computer can generate feedback by evaluating its performance. The computer can receive feedback from an external source. The computer can update parameters of the model based on the feedback to improve its performance. In this manner, the computer can iteratively "learn” to generate the desired outputs. The resulting model is often referred to as a machine-learned model. SUMMARY [0002] Aspects and advantages of embodiments of the present disclosure will be set forth in part in the following description, or can be learned from the description, or can be learned through practice of the embodiments. [0003] In an aspect, the present disclosure provides a first example computer- implemented method. In some implementations, the first example computer-implemented method includes inputting, to a machine-learned model instantiated with a first parameter profile that defines one or more first learned parameter values for the machine-learned model, a first input. In some implementations, the first example computer-implemented method includes generating, by a computing system executing the machine-learned model instantiated with the first parameter profile, and based on the first input, one or more first outputs. In some implementations, the first example computer-implemented method includes mapping one or more profile identifier elements of the one or more first outputs to a second parameter profile, wherein the second parameter profile defines one or more second learned parameter values for the machine-learned model. In some implementations, the first example computer-implemented method includes inputting, to the machine-learned model instantiated with the second parameter profile, a second input, wherein the second input is based on at least one of the first input or the one or more first outputs. In some implementations, the first example computer-implemented method includes generating, by the computing system executing the machine-learned model instantiated with the second parameter profile, and based on the second input, one or more second outputs. In some implementations, the first example computer-implemented method includes generating a response based on the one or more first outputs and the one or more second outputs. [0004] In some implementations, the first example computer-implemented method includes loading, into a memory of the computing system, a first plurality of values respectively for a plurality of learned parameters of the machine-learned model, the first plurality of values defined according to the first parameter profile. In some implementations, the first example computer-implemented method includes generating the one or more first outputs using the first plurality of values. In some implementations, the first example computer-implemented method includes loading, into the memory, a second plurality of values respectively for the plurality' of learned parameters, the second plurality of values defined according to the second parameter profile. In some implementations, the first example computer-implemented method includes generating the one or more second outputs using the second plurality of values. [0005] In some implementations of the first example computer-implemented method, the second parameter profile defines delta values. In some implementations of the first example computer-implemented method, the second plurality of values are obtained by combining the delta values with corresponding baseline values for the plurality of learned parameters. [0006] In some implementations of the first example computer-implemented method, the second parameter profile defines replacement values. In some implementations of the first example computer-implemented method, the second plurality of values are obtained by using the replacement values in lieu of corresponding baseline values for the plurality of learned parameters. [0007] In some implementations of the first example computer-implemented method, the first plurality of values are the baseline values. [0008] In some implementations of the first example computer-implemented method, the loading, into the memory, of the second plurality of values is performed responsive to the mapping of the one or more profile identifier elements to the second parameter profile. [0009] In some implementations of the first example computer-implemented method, the second input includes the first input. [0010] In some implementations of the first example computer-implemented method, the second input includes the one or more first outputs. [0011] In some implementations, the first example computer-implemented method incl