JP-2026075404-A - Machine learning device, machine learning method, and program

JP2026075404AJP 2026075404 AJP2026075404 AJP 2026075404AJP-2026075404-A

Abstract

[Challenge] To further train a neural network with low computational cost and high accuracy. [Solution] The machine learning device comprises an additional unit that adds weight parameters between the final layer and each layer excluding the final layer of a neural network having multiple layers, and an additional learning unit that learns the weight parameters based on training data. [Selection Diagram] Figure 6

Inventors

松谷宏紀
近藤正章

Assignees

慶應義塾

Dates

Publication Date: 20260508
Application Date: 20241022

Claims (8)

A weight addition unit configured to add weight parameters between the final layer of a neural network having multiple layers and each layer excluding the final layer, An additional learning unit configured to learn the weight parameters based on the training data, A machine learning device equipped with the following features.
The aforementioned additional learning unit is A forward propagation unit configured to forward propagate the training data to the neural network using the weight parameters, A backpropagation unit is configured to update the weight parameters by backpropagating the output data of the neural network back into the neural network, A machine learning apparatus according to claim 1, comprising:
The backpropagation unit is configured to backpropagate the output data only to the final layer of the neural network. The machine learning apparatus according to claim 2.
The system further comprises a storage unit configured to store the calculation results from the forward propagation unit, The forward propagation unit is configured to use the calculation results read from the storage unit to calculate the output data that the neural network will output when the learning data is forward propagated to the neural network. The machine learning apparatus according to claim 2.
The aforementioned weight parameters are low-rank approximation matrices. A machine learning apparatus according to any one of claims 1 to 4.
The additional learning unit learns a plurality of weight parameters corresponding to the task to be performed by the neural network. A machine learning apparatus according to any one of claims 1 to 4.
Computers A procedure for adding weight parameters between the final layer of a neural network having multiple layers and each layer excluding the final layer, A procedure for learning the weight parameters based on the training data, A machine learning method to perform this task.
On the computer, A procedure for adding weight parameters between the final layer of a neural network having multiple layers and each layer excluding the final layer, A procedure for learning the weight parameters based on the training data, A program to execute.

Description

This disclosure relates to machine learning apparatus, machine learning methods, and programs. Techniques for further training machine learning models are known to apply them to new tasks. This type of technique is also called fine-tuning or transfer learning. For example, Non-Patent Document 1 discloses a technique in which weight parameters called adapters are added to a pre-trained neural network, and only these added weight parameters are learned. E. J. Hu, Y. Shen, P. Wallis, Z. Allen-Zhu, Y. Li, S. Wang, L. Wang, and W. Chen, "LoRA: Low-Rank Adaptation of Large Language Models," arXiv:2106.09685, 2021. This is a block diagram showing an example of the overall configuration of a machine learning system.An example of a microcomputer.This figure shows the first example of a neural network related to conventional technology.This figure shows a second example of a neural network related to conventional technology.This figure shows an example of a neural network according to the embodiment.This is a block diagram showing an example of the functional configuration of a machine learning device.A flowchart shows an example of a machine learning method.This figure shows an example of the results of the calculation speed evaluation.This figure shows an example of the evaluation results for prediction accuracy. The embodiments of this disclosure will be described below with reference to the accompanying drawings. In this specification and the drawings, components having substantially the same functional configuration are denoted by the same reference numerals, thus omitting redundant descriptions. [Embodiment] One embodiment of this disclosure is an example of an information processing system that performs various processes related to a neural network. Hereinafter, the information processing system according to this embodiment will be referred to as the "machine learning system". In this embodiment, the machine learning system has a function to further train a trained neural network. The further training may be fine-tuning or transfer learning. Conventionally, a technique for further training a neural network is known, which involves adding weight parameters called adapters to the network and training only these added weight parameters. For example, Non-Patent Document 1 discloses a technique for adding adapters represented by low-rank approximation matrices to a neural network. This type of technique is also known as LoRA (Low-Rank Adaptation). Conventional techniques have proposed several types of adapter placement. The first type adds adapters between each layer of the neural network. This type offers high expressive power due to the large number of adapters, but the computational cost for additional training is high. The second type adds adapters only between the final layer and the preceding layer of the neural network. This type offers low computational cost for additional training due to the small number of adapters, but the expressive power is low. This embodiment aims to further train a neural network with low computational cost and high accuracy. To achieve this, in this embodiment, weight parameters are added between the final layer and each layer excluding the final layer of a multi-layered neural network, and these weight parameters are learned based on the training data. One aspect of this embodiment is that, while maintaining the same expressive power as when weight parameters are added between all layers, the weight parameters can be updated with the same computational cost as when weight parameters are added only to the final layer. Therefore, the neural network can be further trained accurately with less computation. Another aspect of this embodiment is that, because the neural network can be further trained with less computation, a highly accurate neural network can be further trained even on devices with limited computing resources. <Overall Structure> The overall configuration of the machine learning system according to this embodiment will be described with reference to Figure 1. Figure 1 is a block diagram showing an example of the overall configuration of the machine learning system. As shown in Figure 1, the machine learning system 1000 includes a machine learning device 10. The machine learning device 10 is an example of an information processing device that performs various processes related to neural networks. The machine learning device 10 may be a computer including a microcomputer, personal computer, workstation, server, etc. In this embodiment, the machine learning device 10 may be a computer with limited computing resources, and as an example, it may be a single-board computer that can be used as an edge device. The machine learning device 10 includes a neural network NN. The neural network NN may be a pre-trained neural network. The neural network NN may be pre-trained within the machine learning device 10, or a neural network trained in another information processing device may