US-12620217-B2 - Training device, training method, and training program

US12620217B2US 12620217 B2US12620217 B2US 12620217B2US-12620217-B2

Abstract

A learning device includes processing circuitry configured to calculate a degree of deviation between a first output obtained by inputting first training data to a learned first model and a second output obtained by inputting second training data created by giving noise to the first training data to a second model, and a degree of deviation between an intermediate representation of the first model generated in a process of obtaining the first output and an intermediate representation of the second model generated in a process of obtaining the second output, and update a parameter of the second model so that the degree of deviation between the first output and the second output and the degree of deviation between the intermediate representation of the first model and the intermediate representation of the second model are reduced.

Inventors

Tomokatsu Takahashi
Masanori Yamada
Tomoya Yamashita
Yuki Yamanaka

Assignees

NTT, INC.

Dates

Publication Date: 20260505
Application Date: 20211108

Claims (5)

1 . A learning device comprising: processing circuitry configured to: calculate a degree of deviation between a first output obtained by inputting first training data to a learned first model and a second output obtained by inputting second training data created by giving noise to the first training data to a second model, and a degree of deviation between an intermediate representation of the first model generated in a process of obtaining the first output and an intermediate representation of the second model generated in a process of obtaining the second output; and update a parameter of the second model so that the degree of deviation between the first output and the second output and the degree of deviation between the intermediate representation of the first model and the intermediate representation of the second model are reduced.
2 . The learning device according to claim 1 , wherein the processing circuitry is further configured to calculate a degree of deviation between the first output, which is an output of a final layer of the first model that is a neural network, and the second output, which is an output of a final layer of the second model that is a neural network having a same topology as the first model, and a degree of deviation between a first intermediate representation, which is an output of an intermediate layer of the first model, and a second intermediate representation, which is an output of an intermediate layer of the second model in a same layer as the intermediate layer.
3 . The learning device according to claim 1 , wherein the processing circuitry is further configured to calculate a degree of deviation between the first output obtained by inputting the first training data that is an image to the first model and the second output obtained by inputting the second training data that is an image created by giving noise to the first training data to the second model.
4 . A learning method executed by a learning device, the learning method comprising: calculating a degree of deviation between a first output obtained by inputting first training data to a learned first model and a second output obtained by inputting second training data created by giving noise to the first training data to a second model, and a degree of deviation between an intermediate representation of the first model generated in a process of obtaining the first output and an intermediate representation of the second model generated in a process of obtaining the second output; and updating a parameter of the second model so that the degree of deviation between the first output and the second output and the degree of deviation between the intermediate representation of the first model and the intermediate representation of the second model are reduced.
5 . A non-transitory computer-readable recording medium storing therein a learning program that causes a computer to execute a process comprising: calculating a degree of deviation between a first output obtained by inputting first training data to a learned first model and a second output obtained by inputting second training data created by giving noise to the first training data to a second model, and a degree of deviation between an intermediate representation of the first model generated in a process of obtaining the first output and an intermediate representation of the second model generated in a process of obtaining the second output; and updating a parameter of the second model so that the degree of deviation between the first output and the second output and the degree of deviation between the intermediate representation of the first model and the intermediate representation of the second model are reduced.

Description

CROSS-REFERENCE TO RELATED APPLICATION This application is a national stage application, pursuant to 35 U.S.C. § 371, of International Patent Application No. PCT/JP2021/041037, filed Nov. 8, 2021, the entire contents of which are incorporated herein by reference. TECHNICAL FIELD The present invention relates to a learning device, a learning method, and a learning program. BACKGROUND ART In the related art, adversarial training is known as a technique for creating deep learning models that are robust against adversarial examples (adversarial samples). An adversarial example is created by adding a small artificial perturbation that cannot be perceived by humans to a certain sample (clean sample). Adversarial examples may be used as adversarial input samples to perturb the output of deep learning. For example, in image classification, an adversarial example image is created by applying an artificial perturbation to a certain image. Such an image causes the classification result of deep learning to be erroneously classified as that of a different image while maintaining the appearance of the original image. For example, in a case where the type of sign recognized by a vehicle that automatically drives is changed from the original one to another, it is conceivable that the vehicle erroneously recognizes the sign. Non Patent Literature 1 describes adversarial training that enhances robustness of a deep learning model by incorporating an adversarial example into training data in advance. CITATION LIST Non Patent Literature Non Patent Literature 1: Aleksander Madry, Aleksandar Makelov, Ludwig Schmidt, Dimitris Tsipras, Adrian Vladu, “Towards Deep Learning Models Resistant to Adversarial Attacks” (://arxiv.org/abs/1706.06083) SUMMARY OF INVENTION Technical Problem However, the related art has a problem in that accuracy for the clean sample may be reduced when enhancing robustness of the model to the adversarial example. For example, a deep learning model trained by adversarial training described in Non Patent Literature 1 shows a certain degree of robustness to the adversarial example, but the accuracy for the clean sample may decrease. Solution to Problem In order to solve the above-described problems and achieve the object, a learning device includes: processing circuitry configured to: calculate a degree of deviation between a first output obtained by inputting first training data to a learned first model and a second output obtained by inputting second training data created by giving noise to the first training data to a second model, and a degree of deviation between an intermediate representation of the first model generated in a process of obtaining the first output and an intermediate representation of the second model generated in a process of obtaining the second output; and update a parameter of the second model so that the degree of deviation between the first output and the second output and the degree of deviation between the intermediate representation of the first model and the intermediate representation of the second model are reduced. Advantageous Effects of Invention According to the present invention, it is possible to suppress a decrease in accuracy for a clean sample when enhancing robustness of a model to an adversarial example. BRIEF DESCRIPTION OF DRAWINGS FIG. 1 is a diagram illustrating a configuration example of a learning device according to a first embodiment. FIG. 2 is a diagram illustrating learning processing. FIG. 3 is a flowchart illustrating a flow of processing of the learning device according to the first embodiment. FIG. 4 is a diagram illustrating test results. FIG. 5 is a diagram illustrating test results. FIG. 6 is a diagram illustrating a configuration example of a vehicle control system. FIG. 7 is a diagram illustrating an example of a computer that executes a learning program. DESCRIPTION OF EMBODIMENTS Hereinafter, embodiments of a learning device, a learning method, and a learning program according to the present application will be described in detail with reference to the drawings. Note that the present invention is not limited to the embodiments described below. Here, in the conventional adversarial training, for example, minimization of an error function Lee as illustrated in Formula (1) is performed. [Math. 1]minθ Lc⁢e(ϕθ(x+η),y)(1) Furthermore, noise η (adversarial noise) is set as in Formula (2) so that the error function is maximized under a constraint S of the noise magnitude. [Math. 2]η=arg⁢ maxη∈S⁢ Lc⁢e(ϕθ(x+η),y)(2) Here, x is an input training sample. y is a label attached to the sample. The training data is a combination of x and y. In addition, φθ is a model having a parameter θ (for example, a deep learning model). Note that x corresponds to a clean sample. In addition, x+η corresponds to an adversarial example. When x+η in Formula (1) is replaced with x, an error function for learning the clean sample is obtained. Therefore, it can be said tha