KR-102962371-B1 - Learning method and apparatus using neural network separation and combination

KR102962371B1KR 102962371 B1KR102962371 B1KR 102962371B1KR-102962371-B1

Abstract

A learning method and apparatus using neural network separation and combination are disclosed. The learning method using neural network separation and combination comprises (a) intermediate features in a neural network model as bottleneck features Step of defining as; (b) the above bottleneck characteristic Based on, the above neural network model is the first parameter ( A forward network with ) and a second parameter ( (c) a step of defining by separating into a backward network having ); (c) applying gradient descent to a loss function defined as the loss between the output value of the backward network and the actual label value (GT) to the second parameter ( ) and the above bottleneck characteristics A step of updating the backward network by updating the above; and (d) the first parameter (in the loss function The slope of ) and the above bottleneck characteristics ( It includes a step of updating the forward network by considering all gradients for ).

Inventors

홍병우
나카무라 켄수케

Assignees

중앙대학교 산학협력단

Dates

Publication Date: 20260507
Application Date: 20231010

Claims (10)

In a learning method using neural network separation and combination performed on a computing device, (a) Intermediate features in a neural network model are bottleneck features Step defined as; (b) The above bottleneck characteristics Based on, the above neural network model is the first parameter ( A forward network with ) and a second parameter ( Step of defining by separating into a backward network having ); (c) Applying gradient descent to the loss function defined as the loss between the output value of the above backward network and the actual label value (GT) to the above second parameter ( ) and the above bottleneck characteristics A step of updating the backward network by updating the above; and (d) In the above loss function, the first parameter ( The slope of ) and the above bottleneck characteristics ( A learning method using neural network splitting and combining, comprising the step of updating the forward network by considering all gradients for ).
In Article 1, The above neural network model has multiple convolution layers, A learning method using neural network separation and combination, characterized in that the location of the above bottleneck feature is randomly determined among the convolution layers located in the middle of the plurality of convolution layers.
In Article 1, A learning method using neural network separation and combination, characterized in that in step (c) above, the backward network is updated using the following mathematical formula. The above n represents the number of repetitions, represents the learning rate for the second parameter, and represents the loss function, and represents the output value, and represents a backward network, is the second parameter ( Represents the slope for ).
In Article 1, A learning method using neural network separation and combination, characterized in that the above forward network is updated using the following mathematical formula. Here, n represents the number of repetitions, and is a bottleneck characteristic and the above first parameter ( Represents the learning rate for ), represents the loss function, and represents the output value, and and is the above first parameter ( The slope of ) and the above bottleneck characteristics ( Represents the slope for ), represents the L2-norm operator.
In a learning method using neural network separation and combination performed on a computing device, Bottleneck features using the encoder output in the autoencoder model A step of sampling—the output values are the mean and variance, and the autoencoder model is the bottleneck feature Includes a decoder that takes input and generates an output value; The above bottleneck characteristics With the input of the above decoder, the decoder parameter ( Considering the slope for ), the above decoder parameter ( ) and the above bottleneck characteristics ( A step of updating the decoder by updating ) The above updated bottleneck features Using the slope of the sum of the slopes for the loss function for the mean and variance respectively, and the encoder parameter( Considering the slope of ), the above encoder parameter ( A learning method using neural network separation and combination, comprising the step of updating the encoder by updating ).
In Article 5, The above bottleneck characteristics A learning method using neural network separation and combination, characterized by being defined by the following mathematical formula. is a Gaussian normal distribution Represents a random vector sampled from, and represents the average, represents variance.
In Article 5, A learning method using neural network separation and combination, characterized in that the above encoder is updated using the following mathematical formula. Here, n represents the number of repetitions, and is a bottleneck characteristic and encoder parameters( Represents the learning rate for ), represents the loss function, and represents a decoder, and represents the output value, and is encoder parameter( Represents the slope for ), and represents the slope with respect to the mean and variance.
A computer-readable recording medium having recorded program code for performing a method according to any one of claims 1 to 7.
Memory for storing at least one instruction; and It includes a processor that executes instructions stored in the memory above, The instruction executed by the above processor is, (a) Intermediate features in a neural network model are bottleneck features Step defined as; (b) The above bottleneck characteristics Based on, the above neural network model is the first parameter ( A forward network with ) and a second parameter ( Step of defining by separating into a backward network having ); (c) Applying gradient descent to the loss function defined as the loss between the output value of the above backward network and the actual label value (GT) to the above second parameter ( ) and the above bottleneck characteristics A step of updating the backward network by updating; and (d) In the above loss function, the first parameter ( The slope of ) and the above bottleneck characteristics ( A computing device characterized by performing a step of updating the forward network by considering all gradients for ).
Memory for storing at least one instruction; and It includes a processor that executes instructions stored in the memory above, The instruction executed by the above processor is, Bottleneck features using the encoder output in the autoencoder model A step of sampling—the output values are the mean and variance, and the autoencoder model is the bottleneck feature Includes a decoder that takes input and generates an output value; The above bottleneck characteristics With the input of the above decoder, the decoder parameter ( Considering the slope for ), the above decoder parameter ( ) and the above bottleneck characteristics ( A step of updating the decoder by updating ); and The above updated bottleneck features Using the slope of the sum of the slopes for the loss function for the mean and variance respectively, and the encoder parameter( Considering the slope of ), the above encoder parameter ( A computing device characterized by performing the step of updating the encoder by updating )

Description

Learning method and apparatus using neural network separation and combination The present invention relates to a learning method and apparatus using neural network separation and combination. Deep learning has had a significant impact on machine learning and is demonstrating successful performance in various fields. Since deep learning consists of neural networks with large-scale datasets, efficient optimization methods are required. A useful optimization method in deep learning is Stochastic Gradient Descent (SGD), and various variations such as AdaGradi and Adam have been developed based on it. These SGD and variations use the backpropagation algorithm to probabilistically select mini-batches to calculate gradients. They determine the direction of network parameter updates and use an adaptive learning rate and momentum to ensure the parameters do not deviate significantly from that direction; based on this approach, they update the parameters to facilitate convergence to the optimal point of the objective function. These SGD and its variations have the problem that convergence is not always guaranteed because they update directions based on probabilistically partitioned data. Therefore, research is needed on applying Bottleneck Theory (IB) to improve the efficiency of parameter update directions in SGD and its variations. FIG. 1 is a flowchart illustrating a learning method using neural network separation and combination according to an embodiment of the present invention. FIG. 2 is a diagram illustrating pseudocode for a learning method of a neural network model according to an embodiment of the present invention. FIG. 3 is a flowchart illustrating a learning method for an autoencoder model according to another embodiment of the present invention. FIG. 4 is a diagram illustrating pseudocode for a learning method of an autoencoder model according to another embodiment of the present invention. FIG. 5 is a block diagram schematically illustrating the internal configuration of a computing device according to one embodiment of the present invention. As used in this specification, singular expressions include plural expressions unless the context clearly indicates otherwise. In this specification, terms such as "composed" or "comprising" should not be interpreted as necessarily including all of the various components or steps described in the specification, and should be interpreted as meaning that some of the components or steps may be excluded, or that additional components or steps may be included. Furthermore, terms such as "...part," "module," etc., as used in the specification refer to a unit that processes at least one function or operation, which may be implemented in hardware or software, or a combination of hardware and software. Hereinafter, embodiments of the present invention will be described in detail with reference to the attached drawings. FIG. 1 is a flowchart illustrating a learning method using neural network separation and combination according to an embodiment of the present invention, and FIG. 2 is a diagram illustrating pseudocode for a learning method of a neural network model according to an embodiment of the present invention. In step 110, the computing device (100) has an intermediate feature of a neural network model having multiple convolutional layers as a bottleneck feature. Defined as. A neural network model can be composed of multiple convolutional layers. Here, bottleneck features The position of is determined in correspondence with the middle convolution layer among the multiple convolution layers, but the position may not be specified. In step 115, the computing device (100) has a bottleneck feature Based on , the first parameter ( A forward network having ) and a second parameter ( Defines a backward network having ). To facilitate understanding and explanation, the forward network is We will represent it as follows, and the backward network is It shall be represented as follows. In one embodiment of the present invention, the forward network refers to the front network portion separated by the bottleneck feature z, and the backward network refers to the rear network portion. For example, a data pair We will assume that it is equal to . In this case, is input data In the forward network ( Since it is a bottleneck characteristic passed to ), the constraint is It can be represented as follows. In step 120, the computing device (100) has a loss function defined as the loss between the output value of the backward network and the actual label value (GT). The second parameter for ) Based on the gradient descent method of ) aspect, the second parameter ( ) and bottleneck characteristics Update the backward network by updating it. In step 125, the computing device (100) has updated bottleneck features Using the first parameter in the loss function ( The slope of ) and updated bottleneck features( The forward network is updated by considering all gradients for ). This will be mor