KR-20260063315-A - METHOD FOR SEARCHING NEURAL NETWORK ARCHITECTURE

KR20260063315AKR 20260063315 AKR20260063315 AKR 20260063315AKR-20260063315-A

Abstract

The artificial neural network architecture search method of the present invention searches for a first-order artificial neural network architecture using a cell-based architecture search method based on a set operation search space, and then searches for a second-order artificial neural network architecture by performing kernel pattern search on convolution operations using the first-order artificial neural network architecture as a backbone model. Subsequently, the second-order artificial neural network architecture is tuned through hyperparameter tuning and retraining.

Inventors

이정우
이정은
한승엽

Assignees

서울대학교산학협력단
호두에이아이랩 주식회사

Dates

Publication Date: 20260507
Application Date: 20241030

Claims (4)

A method for exploring an artificial neural network architecture performed in a computing device comprising one or more processors and memory storing program instructions executable by said processors, A data partitioning step of partitioning a data set into a first data set, a second data set, a third data set, and a fourth data set; A cell-based architecture search step that performs learning through a first dataset, searches for cells through cell-based architecture search based on a set operation search space to assign operations to edges connecting nodes of a cell defined as a directed acyclic graph (DAG) with N nodes, duplicates the discovered cells, and constructs a first-order artificial neural network architecture by nesting them into L layers; A kernel pattern search step for constructing a second artificial neural network architecture by using the first artificial neural network architecture discovered while training with a second dataset as a backbone architecture, and searching for kernel patterns of convolution operations included in the first neural network architecture based on a set kernel search space; A hyperparameter tuning step for tuning the hyperparameters of a second-order neural network architecture by performing training with a third dataset; and A retraining step for retraining a second-order neural network architecture using the fourth dataset and tuned hyperparameters; An artificial neural network architecture search method including
In Article 1, The operation search space consists of a separate convolution operation with a kernel size of 3, an extended separate convolution operation with a kernel size of 5 and an extension rate of 2, an mean pooling operation with a kernel size of 3, an equality operation, and a zero operation, Artificial Neural Network Architecture Search Method
In Article 1, The operation search space consists of a separate convolution operation with a kernel size of 3, an extended separate convolution operation with a kernel size of 5 and an extension rate of 2, an identity operation, and a zero operation, Artificial Neural Network Architecture Search Method
In Article 1, The cell-based architecture search step searches for cells through cell-based architecture search for operations to be assigned to edges for normal cells based on a configured operation search space, and searches for cells through cell-based architecture search for operations to be assigned to edges for reduction cells based on a configured operation search space, and then configures L-2 layers of normal cells and 2 layers of reduction cells, wherein the reduction cells are configured at index positions determined by the mathematical formula above and below the layer. Artificial Neural Network Architecture Search Method <Mathematical Formula> i is the shrink cell index, and L is the number of layers.

Description

Method for Searching Artificial Neural Network Architecture The present invention relates to artificial neural network technology, and more specifically, to a technology for exploring artificial neural network architectures. In the field of machine learning, there has been a demand to automate the development process of artificial neural network architectures, and AutoML, which automates the creation of such architectures, has emerged and is receiving significant attention. AutoML simplifies and democratizes the complex tasks of designing and fine-tuning neural network architectures, enabling practitioners across various disciplines to leverage the potential of advanced algorithms without requiring specialized knowledge of complex model architectures. Among the numerous approaches within AutoML, Neural Architecture Search (NAS) has emerged as a pivotal technique for automating model development, aiming to simplify the design of deep neural networks. Among NAS methods, Differentiable Architecture Search (DARTS) introduces differentiability into the architecture search process, enabling the use of gradient descent-based optimization methods and transforming discrete architecture search into a continuous optimization problem. However, despite efforts to improve search speed and achieve state-of-the-art performance on vision datasets such as CIFAR and ImageNet, NAS methods, including DARTS, often fail to deliver optimal performance when applied to tasks outside their original scope. Figure 1 conceptually illustrates the artificial neural network architecture search method of the present invention. FIG. 2 is a flowchart of an artificial neural network architecture search method according to one aspect of the present invention. Figure 3 is an example of a cell searched using cell-based search. FIG. 4 illustrates an exemplary architecture including a general cell and a reduced cell, searched by the artificial neural network architecture search method of the present invention. The foregoing and additional aspects are embodied in the embodiments described with reference to the attached drawings. It is understood that the components of each embodiment may be combined in various ways within the embodiment unless otherwise stated or contradictory. Each block in the block diagram may represent a physical part in some cases, but in others, it may be a logical representation of a part of the function of a single physical part or a function spanning multiple physical parts. Sometimes, the entity of a block or part thereof may be a set of program instructions. These blocks may be implemented in whole or in part by hardware, software, or a combination thereof. Neural Architecture Search (NAS) is a research field aimed at automating the search for network architectures. NAS technology searches by selecting and evaluating candidate network architectures that achieve a goal for a given dataset, and then selecting the next candidate based on the evaluation results. NAS technology can be broadly divided into approaches that utilize reinforcement learning and evolutionary algorithms for selecting and incorporating architecture candidates, and approaches that utilize differentiable architectures. The method of proposing and selecting candidates through reinforcement learning selects the most suitable structure for the given data. However, the biggest problem with the reinforcement learning approach is the time required for search. Even for the CIFAR-10 dataset, which does not have a large amount of data, the reinforcement learning approach requires very long search times. Although methods utilizing learned parameters (e.g., Efficient-NAS) have been proposed to significantly reduce the time required for training, they still require a considerable amount of time. It is known that NAS in the reinforcement learning approach requires more than 1,000 GPU-Days, and Efficient-NAS requires about 4 GPU-Days. Even when applying evolutionary algorithms, it is known that more than 3,000 GPU-Days of search time are still consumed because learning must still be performed to search. The DARTS method, which defines and utilizes a differentiable structure, sets the network into repeating structures called cells and learns a graph representing how operations are connected within those cells. However, even this method is known to require a search time of 4 GPU-days. FIG. 1 conceptually illustrates the artificial neural network architecture search of the present invention. The artificial neural network architecture search of the present invention consists of two stages: cell-based architecture search and kernel pattern search. Cell-based architecture exploration explores cells rather than each artificial neural network layer in the same way as NASnet's cell-based exploration technique, and constructs the architecture by overlapping the explored cells. A cell is defined as a Directed Acyclic Graph (DAG) composed of multiple nodes, where each node is a l