KR-102964288-B1 - Apparatus and method for extracting deep learning models

KR102964288B1KR 102964288 B1KR102964288 B1KR 102964288B1KR-102964288-B1

Abstract

A method and apparatus for extracting a deep learning model are disclosed. An apparatus for extracting a deep learning model according to one embodiment includes one or more first artificial neural networks that are pre-trained to perform a specific function; N (N≥2) second artificial neural networks that are trained to perform a specific function based on the first artificial neural network; and a learning unit that trains the N second artificial neural networks, wherein the learning unit calculates a total loss function based on ground truth for a specific input value, a probability score of the first artificial neural network, and the probability score of each of the N second artificial neural networks, and can train each of the N second artificial neural networks based on the total loss function.

Inventors

오규삼
서지현
민찬호
윤용근
조성우

Assignees

삼성에스디에스 주식회사

Dates

Publication Date: 20260513
Application Date: 20200928
Priority Date: 20200605

Claims (20)

One or more first artificial neural networks pre-trained to perform specific functions; N (N≥2) second artificial neural networks trained to perform the specific function based on the first artificial neural network and each initialized with different weight initial values; and It includes a learning unit that trains the above N second artificial neural networks, The above learning unit A deep learning model extraction device that calculates a total loss function based on a ground truth for a specific input value, a probability score of a first artificial neural network, and the probability scores of each of the N second artificial neural networks, and trains each of the N second artificial neural networks based on the total loss function.
In claim 1, The number of layers included in the second artificial neural network is equal to or less than the number of layers included in the first artificial neural network, or A deep learning model extraction device in which the number of parameters included in the second artificial neural network is equal to or less than the number of parameters included in the first artificial neural network.
One or more first artificial neural networks pre-trained to perform specific functions; N (N≥2) second artificial neural networks trained to perform the specific function based on the first artificial neural network; and It includes a learning unit that trains the above N second artificial neural networks, The above learning unit Calculate a total loss function based on the ground truth for a specific input value, the probability score of the first artificial neural network, and the probability score of each of the N second artificial neural networks, and train each of the N second artificial neural networks based on the total loss function. The above learning unit is, For the nth (1≤n≤N)th second artificial neural network A first loss function generated based on the above correct answer value and the probability score of the nth second artificial neural network, A second loss function generated based on the probability score of the first artificial neural network and the probability score of the n-th second artificial neural network, and A deep learning model extraction device that calculates a third loss function generated based on a probability score generated using the probability score of the n-th second artificial neural network and the probability score of all second artificial neural networks excluding the n-th second artificial neural network.
In claim 3, The above learning unit is, A deep learning model extraction device that calculates a total loss function by summing a first loss function, a second loss function, and a third loss function calculated for each of N second artificial neural networks.
In claim 3, The above second loss function is, A probability score generated based on the score having the maximum value among the scores output from the last layer of each of the above one or more first artificial neural networks, A probability score generated based on the probability score having the maximum value among the probability scores of each of the one or more first artificial neural networks, and A deep learning model extraction device generated based on at least one of the probability scores generated by averaging the probability scores of each of the one or more first artificial neural networks and the probability score of the nth second artificial neural network.
In claim 3, The above third loss function is, A deep learning model extraction device, wherein the conditional probability of the second probability score of the nth second artificial neural network is based on the first probability score generated based on the score of the second artificial neural network having the largest magnitude among the scores output from the last layer of all second artificial neural networks excluding the nth second artificial neural network.
In claim 3, The above third loss function is, A deep learning model extraction device generated based on the result of subtracting the entropy of the first probability score of the n-th second artificial neural network from the cross entropy of the second probability score generated based on the first probability score of the n-th second artificial neural network and the score output from the last layer of all second artificial neural networks excluding the n-th second artificial neural network.
In claim 7, The above second probability score is, A deep learning model extraction device generated based on the score of the second artificial neural network having the largest score among the scores output from the last layer of all second artificial neural networks excluding the nth second artificial neural network.
In claim 7, The above learning unit is, A deep learning model extraction device that applies inverse Kullback-Leibler divergence to optimize the first probability score of the nth second artificial neural network.
In claim 1, A deep learning model extraction device in which the above correct answer value is a one-hot vector, and the probability score is a probability vector generated by applying a softmax function to the score output from the last layer of an artificial neural network.
delete
In a deep learning model extraction method performed by a deep learning model extraction device, A step of receiving a probability score for a specific input from one or more first artificial neural networks that have been pre-trained to perform a specific function; A step of receiving probability scores for a specific input from N (N≥2) second artificial neural networks, each initialized with different weight initial values and trained to perform the specific function based on the first artificial neural network; and The method includes the step of training the N second artificial neural networks mentioned above, The above-mentioned learning step A deep learning model extraction method that calculates a total loss function based on a ground truth for a specific input value, a probability score of a first artificial neural network, and the probability scores of each of the N second artificial neural networks, and trains each of the N second artificial neural networks based on the total loss function.
In claim 12, The number of layers included in the second artificial neural network is equal to or less than the number of layers included in the first artificial neural network, or A deep learning model extraction method in which the number of parameters included in the second artificial neural network is equal to or less than the number of parameters included in the first artificial neural network.
In a deep learning model extraction method performed by a deep learning model extraction device, A step of receiving a probability score for a specific input from one or more first artificial neural networks that have been pre-trained to perform a specific function; A step of receiving a probability score for a specific input from N (N≥2) second artificial neural networks trained to perform the specific function based on the first artificial neural network; and The method includes the step of training the N second artificial neural networks mentioned above, The above-mentioned learning step Calculate a total loss function based on the ground truth for a specific input value, the probability score of the first artificial neural network, and the probability score of each of the N second artificial neural networks, and train each of the N second artificial neural networks based on the total loss function. The above-mentioned training step is, For the nth (1≤n≤N)th second artificial neural network A first loss function generated based on the above correct answer value and the probability score of the nth second artificial neural network, A second loss function generated based on the probability score of the first artificial neural network and the probability score of the n-th second artificial neural network, and A deep learning model extraction method that calculates a third loss function generated based on a probability score generated using the probability score of the n-th second artificial neural network and the probability score of all second artificial neural networks excluding the n-th second artificial neural network.
In claim 14, The above-mentioned learning step is, A deep learning model extraction method that calculates a total loss function by summing a first loss function, a second loss function, and a third loss function calculated for each of N second artificial neural networks.
In claim 14, The above second loss function is, A probability score generated based on the score having the maximum value among the scores output from the last layer of each of the above one or more first artificial neural networks, A probability score generated based on the probability score having the maximum value among the probability scores of each of the one or more first artificial neural networks, and A deep learning model extraction method generated based on at least one of the probability scores generated by averaging the probability scores of each of the one or more first artificial neural networks and the probability score of the nth second artificial neural network.
In claim 14, The above third loss function is, A deep learning model extraction method, wherein the conditional probability of the second probability score of the nth second artificial neural network is based on the first probability score generated based on the score of the second artificial neural network having the largest magnitude among the scores output from the last layer of all second artificial neural networks excluding the nth second artificial neural network.
In claim 14, The above third loss function is, A deep learning model extraction method generated based on the result of subtracting the entropy of the first probability score of the n-th second artificial neural network from the cross entropy of the second probability score generated based on the first probability score of the n-th second artificial neural network and the score output from the last layer of all second artificial neural networks excluding the n-th second artificial neural network.
In claim 18, The above second probability score is, A deep learning model extraction method generated based on the score of the second artificial neural network having the largest score among the scores output from the last layer of all second artificial neural networks excluding the nth second artificial neural network.
In claim 18, The above-mentioned learning step is, A deep learning model extraction method that applies inverse Kullback-Leibler divergence to optimize the first probability score of the nth second artificial neural network.

Description

Apparatus and method for extracting deep learning models The disclosed embodiments relate to deep learning model extraction technology. Recently, active research is being conducted on knowledge distillation, which involves transferring knowledge from a pre-trained large network to a small network usable on mobile devices. However, the technology currently disclosed has a problem in that it is difficult to surpass the accuracy of a large deep learning model when extracting a small deep learning model with a network of the same type or a heterogeneous network. FIG. 1 is a configuration diagram of a deep learning model extraction device according to one embodiment. FIG. 2 is a configuration diagram of a deep learning model extraction device according to one embodiment. FIG. 3 is a flowchart illustrating a deep learning model extraction method according to one embodiment. FIG. 4 is a flowchart illustrating a deep learning model extraction method according to one embodiment. FIG. 5 is a block diagram illustrating a computing environment including a computing device according to one embodiment. Hereinafter, specific embodiments of the present invention will be described with reference to the drawings. The following detailed description is provided to facilitate a comprehensive understanding of the methods, apparatuses, and/or systems described herein. However, this is merely illustrative and the present invention is not limited thereto. In describing the embodiments of the present invention, detailed descriptions of known technologies related to the present invention are omitted if it is determined that such detailed descriptions may unnecessarily obscure the essence of the present invention. Furthermore, the terms described below are defined in consideration of their functions within the present invention, and these may vary depending on the intentions or practices of the user or operator. Therefore, such definitions should be based on the content throughout this specification. Terms used in the detailed description are intended merely to describe the embodiments of the present invention and should not be limiting in any way. Unless explicitly stated otherwise, expressions in the singular form include the meaning of the plural form. In this description, expressions such as "include" or "comprise" are intended to refer to certain characteristics, numbers, steps, actions, elements, parts thereof, or combinations thereof, and should not be interpreted to exclude the existence or possibility of one or more other characteristics, numbers, steps, actions, elements, parts thereof, or combinations thereof other than those described. FIG. 1 is a configuration diagram of a deep learning model extraction device according to one embodiment. Referring to FIG. 1, a deep learning model extraction device (100) may include a first artificial neural network (110) that is pre-trained to perform a specific function, N (N≥2) second artificial neural networks (121, 123, 125) that are trained to perform a specific function based on the first artificial neural network (110), and a learning unit (130) that trains the N second artificial neural networks (121, 123, 125). For convenience of explanation, each of N (N≥2) second artificial neural networks (121, 123, 125) can be represented as a second artificial neural network (120). Accordingly, embodiments of the second artificial neural network (120) can be interpreted as embodiments of each of the second artificial neural networks (121, 123, 125). According to one example, the first artificial neural network (110) can be represented as a large network (or, teacher model), and the second artificial neural network (120) can be represented as a small network (or, student model). For example, the number of layers included in the second artificial neural network (120) may be fewer than the number of layers included in the first artificial neural network (110). For example, each of the second artificial neural networks (121, 123, 125) may include three layers, and the first artificial neural network (110) may include four layers. As another example, the number of parameters included in the second artificial neural network (120) may be fewer than the number of parameters included in the first artificial neural network (110). For example, each of the second artificial neural networks (121, 123, 125) may include 100 parameters, and the first artificial neural network (110) may include 150 parameters. According to one embodiment, the second artificial neural network (121, 123, 125) may each be initialized with different weight initial values. Accordingly, the second artificial neural network (121, 123, 125) may each output different results for the same input data. According to one embodiment, the learning unit (130) calculates a total loss function based on a ground truth for a specific input value, a probability score of the first artificial neural network (110), and the probability scores of eac