KR-20260062620-A - COMPUTER SYSTEM AND METHOD FOR LIGHTWEIGHTING TRANSFORMER-BASED MODEL BASED ON VARIABLE PERIODIC LEARNING RATE SCHEDULING

KR20260062620AKR 20260062620 AKR20260062620 AKR 20260062620AKR-20260062620-A

Abstract

A method for lightweighting a transformer-based model having a multi-head attention structure is provided. The lightweighting method includes obtaining an initial model and a lightweighting goal for the initial model, obtaining a lightweighted model through learning after performing head pruning on the initial model, and obtaining a final lightweighted model in which the initial model is lightweighted according to the lightweighting goal by performing neuron pruning on the lightweighted model and learning on the neuron-pruned model according to a variable periodic learning rate scheduling set based on the lightweighting goal.

Inventors

심경환
최신국

Assignees

주식회사 노타

Dates

Publication Date: 20260507
Application Date: 20241029

Claims (18)

In a method for lightweighting a transformer-based model performed by a computer system, A step of obtaining an initial model, which is a transformer-based model having a multi-head attention structure, and a lightweighting goal for said initial model; A step of performing head pruning to remove at least one head among a plurality of heads of the multi-head attention structure for the initial model; and A step of obtaining a final lightweight model in which the initial model is lightweighted according to the lightweighting goal by performing neuron pruning and learning on the neuron-pruned model, wherein at least one neuron among a plurality of neurons of the lightweight model, according to the head pruning of the initial model, is removed according to a variable periodic learning rate scheduling set based on the lightweighting goal. Includes, The step of obtaining the above-mentioned final lightweight model is, A step of obtaining at least one intermediate model that has been lightweighted according to the above head pruning. Includes, A method for lightweighting a transformer-based model, wherein the above final lightweight model is the above intermediate model or the above intermediate model is further lightweighted.
In paragraph 1, The step of obtaining the above intermediate model is, A step of generating the intermediate model by performing neuron pruning on the model lightweighted in the previous cycle and training on the neuron-pruned model for each cycle according to the variable periodic learning rate scheduling above. A method for lightweighting a transformer-based model, including
In paragraph 2, A method for lightweighting a transformer-based model, wherein as the period for generating the intermediate model according to the above variable periodic learning rate scheduling is repeated, at least one of the maximum value of the learning rate or the number of learning units in the learning for generating the intermediate model is reduced.
In paragraph 3, A method for lightweighting a transformer-based model, wherein the maximum value of the learning rate in the learning for generating the intermediate model in each of the above cycles is calculated based on the ratio of the learning rate of the initial model and the amount of computation or number of parameters of the model in which neuron pruning is performed in each of the above cycles relative to the initial model.
In paragraph 3, A method for lightweighting a transformer-based model, wherein the number of learning units in the learning for generating the intermediate model in each of the above cycles is calculated based on the ratio of the number of learning units of the learning data set of the initial model and the amount of computation or number of parameters of the model in which neuron pruning is performed in each of the above cycles for the initial model.
In paragraph 1, A step of obtaining a lightweight model according to head pruning by training the head-pruned initial model with a first number of learning units of the training data set determined based on the number of learning units of the training data set of the initial model or a first learning rate determined based on the learning rate of the initial model. A method for lightweighting a transformer-based model, further including
In paragraph 6, The number of the first training unit of the above training data set is The number of training units of the training data set of the above initial model and, The ratio of the computational amount or number of parameters of the head-pruned initial model to the initial model. A lightweighting method for a transformer-based model calculated based on
In paragraph 1, The step of obtaining the above intermediate model involves obtaining a plurality of intermediate models that have been lightened according to the head pruning, and The lightweight model resulting from the head pruning described above is a model that can be installed on a cloud server, and Among the above intermediate models, the first intermediate model, which is a model further lightened by head pruning, is a model that can be installed on a mobile terminal with relatively smaller resources or computing power than the cloud server, and A method for lightweighting a transformer-based model, wherein the second intermediate model, which is a lighter version of the first intermediate model among the above intermediate models, is a model that can be installed on an edge device with relatively smaller resources or computing power than the mobile terminal.
In paragraph 2, A step of receiving a first parameter indicating the number of lightweight models to be generated for the above initial model. Includes more, A method for lightweighting a transformer-based model, wherein the lightweighting ratio for the model in each period according to the variable periodic learning rate scheduling is determined based on the above lightweighting goal and the above first parameter.
In paragraph 2, receiving a second parameter representing the lightweighting ratio for a lightweight model generated from the initial model above. Includes more, A method for lightweighting a transformer-based model, wherein the number of periods according to the variable periodic learning rate scheduling is determined based on the above lightweighting goal and the above second parameter.
In paragraph 1, A method for lightweighting a transformer-based model, wherein the learning rate for training the head-pruned initial model is scheduled to be the same as the learning rate during training of the initial model, for generating a lightweight model according to the head-pruning above.
In paragraph 3, A method for lightweighting a transformer-based model, wherein the learning rate in each period for generating the intermediate model according to the above variable periodic learning rate scheduling is scheduled based on cosine annealing.
In paragraph 2, Step of receiving a third parameter indicating the number of target devices Includes more, A method for lightweighting a transformer-based model, wherein the number of periods according to the variable periodic learning rate scheduling is determined based on the third parameter above.
In Paragraph 13, The number of cycles and the weighting ratio for the model in each cycle are A step of obtaining information regarding the available resources of each target device of the above target devices; A step of determining the degree of lightweighting of the transformer-based model for each target device based on the acquired information; and Step of determining the number of cycles and the weight reduction ratio based on the degree of weight reduction and the weight reduction target. A lightweighting method for a transformer-based model determined by
In Paragraph 13, A method for lightweighting a transformer-based model, wherein the number of cycles is determined to be a value greater than or equal to the number of target devices.
In paragraph 1, Step to output the acquired intermediate model A method for lightweighting a transformer-based model, further including
A computer-readable recording medium for executing the method of claim 1 on the computer system.
In a computer system for performing lightweighting of transformer-based models, At least one processor implemented to execute readable instructions in the above computer system Includes, The above-mentioned at least one processor is, Obtaining an initial model, which is a transformer-based model having a multi-head attention structure, and a lightweighting goal for said initial model, and Head pruning is performed on the above initial model to remove at least one head among a plurality of heads of the above multi-head attention structure, and By performing neuron pruning and learning on the neuron-pruned model, which remove at least one neuron among a plurality of neurons of the model lightweighted according to the head pruning of the initial model according to the above lightweighting goal, according to variable periodic learning rate scheduling set based on the above lightweighting goal, a final lightweighted model obtained by lightweighting the initial model according to the above lightweighting goal, In obtaining the above final lightweight model, at least one intermediate model is obtained by lightweighting the model that has been lightweighted according to the above head pruning, and The above final lightweight model is a computer system in which the above intermediate model or the above intermediate model is further lightweighted.

Description

Computer System and Method for Lightweighting Transformer-Based Model Based on Variable Periodic Learning Rate Scheduling The present disclosure relates to a computer system and method for lightweighting a transformer-based model by performing pruning and learning on the transformer-based model based on variable periodic learning rate scheduling. Transformer-based models are constructed to include an attention mechanism. Transformer-based models are widely used for tasks involving the processing of inputs (e.g., sequential input data such as natural language). Transformer-based models may include a Multi-Head Self-Attention (MHSA) mechanism, which allows the model to simultaneously pay attention to each part of the sequence represented by the input from multiple perspectives. By using multiple attention heads, the model can better grasp complex dependencies within the input. Since MHSA significantly increases model complexity and requires a large number of parameters and high computational load, transformer-based models containing MHSA require massive computational resources and memory during the inference process. This can hinder the practicality of transformer-based models in environments with limited resources. Therefore, it is necessary to effectively lighten transformer-based models while minimizing performance degradation by removing the head of the transformer-based model or/or additionally through lightweighting at the neuron or filter level. Meanwhile, Korean Patent Publication No. 10-2024-0102610 (publication date July 3, 2024) discloses an electronic device for performing kernel pruning on a neural network model and a method of operating the same. The information described above is for illustrative purposes only and may include content that does not constitute part of the prior art and may not include what the prior art would present to a person skilled in the art. FIG. 1 illustrates a method for lightweighting a transformer-based model based on variable periodic learning rate scheduling according to one embodiment. FIG. 2 shows a computer system for performing a method to lighten a transformer-based model according to one embodiment. FIG. 3 is a flowchart illustrating a method for lightweighting a transformer-based model based on variable periodic learning rate scheduling according to one embodiment. FIG. 4 is a flowchart illustrating a method for determining the number of cycles and the lightweighting ratio in each cycle to generate lightweight models according to variable periodic learning rate scheduling for lightweighting a transformer-based model, according to one example. FIG. 5 illustrates a method for generating a lightweight model through pruning and learning in each cycle according to variable periodic learning rate scheduling for lightweighting a transformer-based model, according to one example. Figure 6 illustrates a method for calculating the number of learning units (or learning rate) in each cycle to create a lightweight model according to one example. Hereinafter, embodiments of the present invention will be described in detail with reference to the attached drawings. FIG. 1 illustrates a method for lightweighting a transformer-based model based on variable periodic learning rate scheduling according to one embodiment. Referring to FIG. 1, a method for a computer system (100) to lighten an initial model (50) (hereinafter also referred to as model (50) for convenience) which is a transformer-based model is described. A transformer-based model (50) is a model widely used for tasks to process inputs and can be configured to include an attention mechanism. The transformer-based model (50) can be a model configured to process input data, such as natural language, for example. The attention mechanism included in the model (50) can be configured to calculate the association between each element within the input sequence represented by the input (X) using a query (Q), key (K), and value (V) vector. The attention mechanism included in the model (50) may include multiple heads. For example, this attention mechanism may include a Multi-Head Self-Attention (MHSA) mechanism. The MHSA may include multiple heads. The model (50) may be a mega-scale model or large-scale model composed of a very large number of parameters. For example, the model (50) may be a mega-scale model for performing natural language processing (NLP), computer vision (CV), speech recognition, etc. High-performance hardware may be required for inference using such a mega-scale model. A model (50) such as this large model can be subject to lightweighting by the lightweighting method of the embodiment. In an embodiment, a computer system (100) can lighten the model (50) by pruning and training the model (50) according to variable periodic learning rate scheduling. For example, as illustrated in FIG. 1, a computer system (100) may receive at least one of the following parameters input by a user or administrator who wishes to l