CN-115688874-B - Recording medium, information processing apparatus, and computer-implemented method

CN115688874BCN 115688874 BCN115688874 BCN 115688874BCN-115688874-B

Abstract

The present disclosure relates to a recording medium storing a machine learning program, an information processing apparatus, and a computer-implemented method. In one embodiment, the machine learning program controls machine learning of a distributed neural network model generated by partitioning a neural network. The machine learning program includes instructions for causing a processor to perform a process including, for each of the distributed neural network models, adding individual noise for the distributed neural network model to non-parallel processing blocks in the distributed neural network model such that the individual noise of the distributed neural network model is different from the individual noise of other distributed neural network models among the distributed neural network models, and assigning the individual noise added distributed neural network model to a plurality of processes such that each of the plurality of processes performs machine learning on the assigned neural network model among the individual noise added distributed neural network models.

Inventors

Tian Yuanjingda

Assignees

富士通株式会社

Dates

Publication Date: 20260512
Application Date: 20220412
Priority Date: 20210726

Claims (3)

1. A non-transitory computer-readable recording medium storing a machine learning program that controls machine learning of a plurality of distributed neural network models that are generated by dividing a neural network and are executed in parallel on a plurality of information processing apparatuses, respectively, the machine learning program including instructions for causing a processor to execute a process comprising: Adding, in each of the plurality of distributed neural network models, individual noise for the distributed neural network model by setting parameters of a discard layer in a non-parallel processing block in the distributed neural network model that cause individual noise in the distributed neural network model that is added by the discard layer in the non-parallel processing block to be different from individual noise of other distributed neural network models in the plurality of distributed neural network models, and The plurality of distributed neural network models to which the individual noise is added are assigned to a corresponding plurality of processes processed in parallel by the plurality of information processing apparatuses, so that each of the plurality of processes performs the machine learning on an assigned distributed neural network model among the plurality of distributed neural network models to which the individual noise is added.
2. An information processing apparatus that controls machine learning of a plurality of distributed neural network models generated by dividing a neural network, comprising: A plurality of information processing devices configured to execute the plurality of distributed neural network models in parallel, and A processing unit configured to: Adding, in each of the plurality of distributed neural network models, individual noise for the distributed neural network model by setting parameters of a discard layer in a non-parallel processing block in the distributed neural network model that cause individual noise in the distributed neural network model that is added by the discard layer in the non-parallel processing block to be different from individual noise of other distributed neural network models in the plurality of distributed neural network models, and The plurality of distributed neural network models to which the individual noise is added are assigned to a corresponding plurality of processes processed in parallel by the plurality of information processing apparatuses, so that each of the plurality of processes performs the machine learning on an assigned distributed neural network model among the plurality of distributed neural network models to which the individual noise is added.
3. A computer-implemented method of controlling machine learning of a plurality of distributed neural network models generated by dividing a neural network and executed in parallel on a plurality of information processing apparatuses, respectively, the machine learning program including instructions for causing a processor to execute a process comprising: Adding, in each of the plurality of distributed neural network models, individual noise for the distributed neural network model by setting parameters of a discard layer in a non-parallel processing block in the distributed neural network model that cause individual noise in the distributed neural network model that is added by the discard layer in the non-parallel processing block to be different from individual noise of other distributed neural network models in the plurality of distributed neural network models, and The plurality of distributed neural network models to which the individual noise is added are assigned to a corresponding plurality of processes processed in parallel by the plurality of information processing apparatuses, so that each of the plurality of processes performs the machine learning on an assigned distributed neural network model among the plurality of distributed neural network models to which the individual noise is added.

Description

Recording medium, information processing apparatus, and computer-implemented method Technical Field Embodiments discussed herein relate to a machine learning program, an information processing apparatus, and a machine learning method. Background In recent years, in machine learning in a neural network, as the size of a machine learning model increases, it is necessary to increase the learning speed. For example, in a simulation using CosmoFlow to estimate a cosmic parameter from dark matter data, the data volume is 5.1TB and machine learning by a single V100 Graphics Processing Unit (GPU) takes one week. Further, data parallelism as a mainstream acceleration method in machine learning has a limitation in terms of accuracy. In other words, for example, when parallelism increases, the batch size increases, and there is a possibility that this adversely affects learning accuracy. Therefore, in recent years, a model parallel method for dividing a machine learning model in a neural network and performing parallel processing by a plurality of computers is known. Hereinafter, there are cases where the machine learning model in the neural network is simply referred to as a neural network model or model. By performing parallel processing by a plurality of computers on each of the models created by dividing the neural network model, learning accuracy is not affected, and the speed of machine learning can be improved. Fig. 8 is a diagram for explaining a conventional model parallel method in a neural network. In fig. 8, reference numeral a indicates an un-parallelized neural network model. Further, reference numeral B indicates a model parallelization neural network model, and denotes two models (process #0 and process # 1) created by dividing a single model indicated by reference numeral a. In the model parallelization neural network indicated by reference numeral B, all layers (layers) including the convolutional layer and the fully-connected layer of the neural network indicated by reference numeral a are divided and parallelized. However, in the model parallelized neural network indicated by reference numeral B, communication between the process #0 and the process #1 frequently occurs before and after each layer (allgather and allreduce). This increases the communication load and causes delay or the like due to waiting or the like for synchronization. Therefore, a method of parallelizing only convolution layers having a large calculation amount among the layers included in the neural network is considered. Fig. 9 is a diagram for explaining a conventional model parallel method in a neural network. Fig. 9 shows a model parallelized neural network created based on a neural network model that is not parallelized and indicated by reference numeral a in fig. 8. The neural network shown in fig. 9 divides only the convolutional layer of the neural network model, which is not parallelized and indicated by reference numeral a in fig. 8, into two. In other words, for example, the processing of the convolution layer is performed in parallel by the process #0 and the process #1, and the processing of the full connection layer is performed only by the process # 0. In general, regarding the processing of the convolution layer, although the calculation amount is large, only data exchange between adjacent portions is performed as communication. Therefore, the disadvantages caused by dividing the convolution layers are few. Further, since the number of neurons in the fully connected layer at the subsequent stage is small, the time does not increase even if parallelization calculation is not performed, and there are cases where the processing speed is higher than that when model parallelization is performed. Examples of the related art include japanese national laid-open international patent application No. 2017-514251, and U.S. patent application publication No. 2020/0372337. Disclosure of Invention Technical problem However, in the conventional model parallelized neural network shown in fig. 9, the process #1 does not perform the process other than the convolution layer, and the process #1 wastes the computational resources and is inefficient. Further, in the model parallelization neural network shown in fig. 9, in order to share the loss finally calculated by the process #0 with the process #1, data communication from the process #0 to the process #1 is performed. In order to shorten the period for data communication, as in the process #0, it is considered to calculate the loss by performing calculation of each full connection layer by the process # 1. However, in this case, the same calculation of the full connection layer is repeatedly performed in the process #0 and the process #1, and its efficiency is low. In one aspect, an object of an embodiment is to efficiently use computing resources in machine learning of multiple distributed neural network models for model parallel processing. Solution to the problem According to asp