KR-102962743-B1 - OPERATING METHOD AND LEARNING METHOD OF NEURAL NETWORK AND NEURAL NETWORK THEREOF

KR102962743B1KR 102962743 B1KR102962743 B1KR 102962743B1KR-102962743-B1

Abstract

A method of operating a neural network including a first network and a second network according to one embodiment acquires state information output based on input information using the first network, determines whether the state information satisfies a preset condition using the second network, cyclically applies the state information to the first network based on the determination that the state information does not satisfy the condition, and outputs the state information based on the determination that the state information satisfies the condition.

Inventors

최준휘
김영석
박정훈
옥성민
전재훈

Assignees

삼성전자주식회사

Dates

Publication Date: 20260508
Application Date: 20180928

Claims (20)

A method of operating a neural network comprising a first network, a second network, and a third network, performed by a processor, A step of obtaining state information output based on input information using the first network above—the state information is information corresponding to the result of processing the input information in the first network—; A step of obtaining a first evaluation score output from the second network based on the above state information—the first evaluation score indicates that the state information is sufficient to perform the third network providing application services—; A step of determining whether the state information satisfies a preset condition by comparing a predetermined threshold with the first evaluation score; A step of repeatedly performing the following (1), (2) and (3) steps until the currently updated state information satisfies the condition, based on the judgment that the above state information does not satisfy the above condition. Step (1) of applying the previously updated state information previously output by the first network as the next input to the first network to output the currently updated state information; Step (2) of applying the currently updated status information currently output from the first network as input to the second network to output an updated evaluation score; and Step (3) of determining whether the currently updated state information satisfies the condition by comparing the above updated evaluation score with the above threshold. Includes, A step of providing the application service by decoding the last updated state information using the third network based on the determination that the above state information satisfies the above conditions. A method of operation of a neural network including
delete
In paragraph 1, The above-mentioned judgment step A step of comparing the number of times the above state information is cyclically applied to the first network as the next input with a preset number. A method of operation of a neural network that further includes
In paragraph 1, The above-mentioned first network cyclically processes the input information to provide the predetermined application service, and A method of operation of a neural network in which the second network evaluates the state information corresponding to the result of the cyclic processing of the first network.
delete
In paragraph 1, A step of encoding the above input information into the dimension of the above state information; and A step of applying input information encoded in the dimension of the above state information to the first network A method of operation of a neural network that further includes
In paragraph 1, The step of cyclically applying the above state information to the first network as the next input A step of encoding the above state information into the dimensions of the above input information; and Step of applying state information encoded in the dimension of the above input information to the first network A method of operation of a neural network including
In paragraph 1, The above input information A method of operating a neural network comprising at least one of single data and sequential data.
In paragraph 1, If the above input information is sequential data, a step of encoding the sequential data into an embedding vector of the input dimension of the first network; and Step of applying the above embedding vector to the first network A method of operation of a neural network that further includes
In Paragraph 9, The step of outputting the above state information If the above input information is sequential data, the step of decoding the state information into sequential data and outputting it. A method of operation of a neural network including
In paragraph 1, The above-mentioned first network is A method of operating a neural network including a neural network for speech recognition or a neural network for image recognition.
In paragraph 1, The above-mentioned first network is A method of operating a neural network comprising at least one of a fully-connected layer, a simple recurrent neural network, a Long-Short Term Memory (LSTM) network, and Gated Recurrent Units (GRUs).
A method for learning a neural network comprising a first network and a third network, performed by a learning device, A step of generating state information per iteration—wherein the state information per iteration is information corresponding to the result of the first network cyclically processing the input information—by cyclically and repeatedly applying input information corresponding to the training data to the first network as the next input according to a preset number of iterations, wherein for the iteration: A step of repeatedly generating currently updated state information based on previously updated state information until a calculated evaluation of the above-mentioned cycle-specific state information indicates that the generated state information is satisfactory state information satisfying a condition; and A step of providing an application service by predicting a result corresponding to the currently updated state information using the third network above. Includes, A step of training the first network based on first losses between the predicted results for each of the above cycles and the ground truth corresponding to the input information; and A step of training a second network configured to perform inference on the currently updated state information based on evaluation scores for the predicted results for each of the above cycles. Includes, The above second network is A learning method for a neural network that evaluates state information for each of the iterations by comparing an evaluation score indicating that the output of the first network is sufficient to perform the provision of the application service with a threshold value for determining whether the state information is saturated with the evaluation score.
delete
In Paragraph 13, The step of training the second network mentioned above A step of determining the evaluation scores by evaluating the predicted results for each cycle based on the predicted results for each cycle and the correct answer. A neural network learning method including
In Paragraph 13, The step of training the second network mentioned above A step of adding noise to at least some of the above-mentioned cycle-specific state information. A neural network learning method including
In Paragraph 13, The step of learning the first network above Step of training the third network based on the first losses A neural network learning method including
In Paragraph 13, A step of encoding the above input information into the dimension of the above state information; and A step of applying input information encoded in the dimension of the above state information to the first network Includes more, and neural network learning method.
In Paragraph 13, The step of generating the above-mentioned cycle-based state information is A step of encoding the above state information into the dimensions of the above input information; and Step of applying state information encoded in the dimension of the above input information to the first network A neural network learning method that further includes
A computer program stored on a computer-readable recording medium to execute the method of any one of claims 1, 3 to 4, 6 to 13, and 15 to 19 in combination with hardware.

Description

Operating Method and Learning Method of Neural Network and Neural Network Thereof The following embodiments relate to a method of operation and learning of a neural network and the neural network itself. Neural networks can have a structure that derives results by passing through a fixed number of layers or operations. Generally, to ensure high interpretability or performance regarding feature vectors, neural networks can be configured in the form of deep neural networks. However, deep neural networks include multiple layers with various weights, and storing these layers requires a significant amount of storage space. Additionally, recurrent neural networks that process sequential data perform operations by repeating a fixed number of times (e.g., the length of the sequential data). Therefore, they are not easily applicable to general feature vectors that are not sequential data, and processing times may become prolonged if the length of the sequential data is too long. FIG. 1 is a flowchart illustrating the operation method of a neural network according to one embodiment. FIGS. 2 and 3 are drawings for explaining the operation method of a neural network according to embodiments. FIG. 4 is a diagram illustrating the structure of a neural network according to one embodiment. FIG. 5 is a diagram illustrating the structure of a neural network according to another embodiment. FIG. 6 is a flowchart illustrating a learning method of a neural network according to one embodiment. FIGS. 7 to 9 are drawings for explaining a learning method of a neural network according to embodiments. FIG. 10 is a block diagram of a neural network configuration according to one embodiment. Hereinafter, embodiments are described in detail with reference to the attached drawings. However, the scope of the patent application is not limited or restricted by these embodiments. Identical reference numerals in each drawing indicate identical components. Various modifications may be made to the embodiments described below. The embodiments described below are not intended to limit the forms of practice and should be understood to include all modifications, equivalents, and substitutions thereof. The terms used in the embodiments are used merely to describe specific embodiments and are not intended to limit the embodiments. Singular expressions include plural expressions unless the context clearly indicates otherwise. In this specification, terms such as “comprising” or “having” are intended to indicate the presence of the features, numbers, steps, actions, components, parts, or combinations thereof described in the specification, and should be understood as not precluding the existence or addition of one or more other features, numbers, steps, actions, components, parts, or combinations thereof. Unless otherwise defined, all terms used herein, including technical or scientific terms, have the same meaning as generally understood by those skilled in the art to which the embodiments pertain. Terms such as those defined in commonly used dictionaries should be interpreted as having a meaning consistent with their meaning in the context of the relevant technology, and should not be interpreted in an ideal or overly formal sense unless explicitly defined in this application. Hereinafter, embodiments will be described in detail with reference to the attached drawings. Identical reference numerals in each drawing indicate identical components. FIG. 1 is a flowchart illustrating a method of operation of a neural network according to one embodiment. Referring to FIG. 1, a neural network according to one embodiment obtains state information output based on input information using a first network (110). The first network may recursively process input information to provide a predetermined application service. The first network may include, for example, a neural network for speech recognition or a neural network for image recognition. The first network may be composed of a single network or may be composed of a recursion network. The first network may include, for example, a fully-connected layer, a simple recurrent neural network, a Long-Short Term Memory (LSTM) network, and Gated Recurrent Units (GRUs). The input information may be at least one of single data and sequential data. The input information may be, for example, an image or speech. The input information may be information of the same dimension as the state information or information of a different dimension from the state information. The state information may correspond to the result of processing input information in the first network and/or the result of recurrent processing of the first network. The state information may, for example, be a multidimensional intermediate output vector (or output vector) according to a task for providing a predetermined application service. The state information is not necessarily limited to a multidimensional vector and may take the form of various informati