CN-121997998-A - Method and apparatus for memory planning for code generation for creating program code for artificial neural network computation in a hardware environment
Abstract
The present invention relates to a computer-implemented method for performing memory planning for code generation for generating code for neural network computation in a hardware environment.
Inventors
- B. WAGNER
- S. Bob Lester
- D.K.Wu
- L. Suleiman
- M. Lokman
- U. Hote
Assignees
- 罗伯特·博世有限公司
Dates
- Publication Date
- 20260508
- Application Date
- 20251106
- Priority Date
- 20241106
Claims (7)
- 1. A computer-implemented method for performing memory planning for code generation for generating code for neural network computation in a hardware environment, the method having the steps of: Providing (S1) successive calculation steps of a network layer of the neural network, wherein for each calculation step the size of an input data block (EB), an output data block (AB) and the size of one or more model parameter blocks (MB) is determined according to the type of calculation step, wherein the one or more model parameter blocks (MB) have model parameters for the respective calculation step, Determining rules for memory planning for specific calculation steps using model parameters and for which block calculation steps should be set, which block calculation steps are set for executing a block for a specific calculation step among a plurality of individual calculation steps for processing input data of individual sections of the input data block (EB), wherein the rules specify that model parameters for all individual calculation steps (O1-O4) are stored in a defined memory area, Executing (S2) a memory plan taking into account the determined rules, in which memory areas of the respective input data block (EB), output data block (AB 1, AB2, AB 3) and model parameter block (MB) are specified in the working memory for each calculation step, -Performing (S3) code generation, wherein the model parameter block (MB) is copied into the working memory only before the first single calculation step (O1) is implemented, and the calculation of the remaining single calculation steps (O1-O3) is performed with the model parameters in the model parameter block (MB).
- 2. The method of claim 1, wherein the performing of a memory plan comprises applying an optimization method, wherein an objective function considers minimization of total storage requirements in the working memory.
- 3. Method according to any of claims 1 to 2, wherein in the creation of the rule for memory planning for a specific calculation step, the rule further describes a lifetime of a corresponding model parameter block (MB) for storing the model parameter, wherein the lifetime description, the model parameter block (MB) remains stored in working memory and accessible for the block calculation step during the duration of implementing the single calculation step.
- 4. A method according to any one of claims 1 to 3, wherein the specific calculation step for which the blocking is applied comprises element level operations, convolution layer or pooling layer calculations.
- 5. An apparatus for performing the method of any one of claims 1 to 4.
- 6. A computer program product comprising instructions which, when implemented by at least one data processing apparatus, cause the data processing apparatus to implement the steps of the method according to any one of claims 1 to 4.
- 7. A machine readable storage medium comprising instructions which, when implemented by at least one data processing apparatus, cause the data processing apparatus to implement the steps of the method according to any one of claims 1 to 4.
Description
Method and apparatus for memory planning for code generation for creating program code for artificial neural network computation in a hardware environment Technical Field The present invention relates to the implementation of program code on a hardware environment, for example, represented as a control device or the like controlled by a microcontroller. The invention further relates to a method for memory planning for processing input data, output data and model parameters. Background A particular hardware environment, such as a microcontroller in a control device, requires that matching executable program code be created in order to take into account the characteristics and limitations of the particular hardware environment. Thus, the available memory size of the working memory to which the microcontroller or the acceleration hardware can directly access may be limited, or memory move or copy operations from a data memory such as a flash memory or an external memory onto the working memory may be particularly costly as defined by the hardware. The computation steps of the corresponding network layer for the neural network computation may require a huge memory requirement, since for each computation step the input data block, the model parameter block and the output data block have to be stored callable and usable by the microcontroller in the working memory. The existing code generator specifies during memory planning in which region of the working memory the data blocks required for each calculation step are stored in memory. During memory planning, in addition to the memory areas of the working memory to which the input data blocks, the output data blocks and the necessary model parameters are assigned, the data generated during the calculation step are assigned corresponding memory areas. Conventional code generators for neural networks typically do not start with a limited working memory and typically allocate a different memory area for storing input data blocks, network parameter blocks and output data blocks for each of the calculation steps that are calculated successively. It has thus far been common to distribute the model parameters freely in the available memory in order to minimize the overall demand for memory. However, this approach may result in that the model parameters have to be copied in sections into different memory areas of the working memory before the calculation step is carried out. In particular, copying of memory regions from data memory to working memory and between different memory regions in working memory is often a time-consuming memory operation, so that memory planning must have the goal of reducing the total computation time due to the implementation time of the memory operation. To reduce the maximum required working memory, a blocking (tilling) algorithm may be applied, in which the calculation of the layers of the neural network is divided into separate successive individual calculation steps. The partitioning is suitable for element level operation, convolution layer calculation, pooling layer calculation and the like. These separate calculated partial results are then again joined to each other by a final stitching step. The blocking has an influence on the memory planning, since for successive calculation steps the input data and/or model parameters of the layers to be calculated are loaded into the working memory section by section and removed again after the end of a single calculation step, so that the overall maximum memory requirement for the memory (for the calculation work of all neural network layers) can be reduced, so that the blocking can be used in particular on systems with limited working memory. However, dividing the calculation steps for a particular neural network layer into individual calculation steps in the case of blocking requires that for each individual calculation step of the individual calculation steps an increased number of memory operations for copying the model parameter blocks into the working memory have to be copied. Whereby the number of memory operations increases significantly in the case of application blocking. The object of the present invention is to provide a method for executing a memory plan for code generation for creating code for artificial neural network computation in a hardware environment, wherein the number of memory operations can be reduced. Disclosure of Invention This task is solved by a method for code generation of code for performing a calculation for a layer of a neural network in a hardware environment according to claim 1 and by an apparatus according to the parallel claim. Further embodiments are specified in the dependent claims. According to a first aspect, a computer-implemented method for performing a memory plan for code generation for determining code for neural network computation in a hardware environment is provided, the method having the steps of: Providing successive calculation steps of a net