US-20260127012-A1 - Method and Device for Memory Planning for Code Generation to Create Program Code for Computing an Artificial Neural Network in a Hardware Environment

US20260127012A1US 20260127012 A1US20260127012 A1US 20260127012A1US-20260127012-A1

Abstract

A computer-implemented method for performing memory planning for code generation to generate code for computing a neural network in a hardware environment is disclosed.

Inventors

Benjamin Wagner
Sebastian Boblest
Duy Khoi Vo
Leif Sulaiman
Markus Lochmann
Ulrik Hjort

Assignees

ROBERT BOSCH GMBH

Dates

Publication Date: 20260507
Application Date: 20251104
Priority Date: 20241106

Claims (7)

1 . A computer-implemented method for performing memory planning for code generation to generate a code for computing a neural network in a hardware environment, comprising: providing successive calculation steps of network layers of the neural network, wherein for each calculation step the size of an input data block, an output data block and, depending on the type of calculation step, one or more model parameter blocks is determined, wherein the one or more model parameter blocks have model parameters for a respective calculation step; determining a rule for memory planning for a specific calculation step, the model parameters are used and for which a tiling calculation step is to be provided, which provides that the tiling calculation step provides for performing a tiling for the specific calculation step in a plurality of individual calculation steps for processing the input data of a respective section of the input data block (EB), wherein the rule specifies that the model parameters for all individual calculation steps are stored in a specified memory space; performing memory planning, in which the memory space of the respective input data block, output data block, and model parameter block is determined in the working memory for each calculation step, taking into account the determined rules; and performing code generation, wherein the model parameter block is copied into the working memory only before executing the first individual calculation step, and the calculations of the remaining individual calculation steps are performed using the model parameters in the model parameter block.
2 . The method according to claim 1 , wherein performing the memory planning comprises applying an optimization method in which the target function takes into account minimizing the total memory requirement in the working memory.
3 . The method according to claim 1 , wherein: when creating the rule for memory planning for the specific calculation step, the rule further specifies a lifetime of the corresponding model parameter block for storing the model parameters, and the lifetime specifies that the model parameter block remains stored and accessible in the working memory for the duration of the execution of the individual calculation steps for the tiling calculation step.
4 . The method according to claim 1 , wherein the specific calculation step for which tiling is applied comprises an element-wise operation, a convolutional layer or a pooling layer calculation.
5 . A device for performing the method according to claim 1 .
6 . A computer program product comprising commands which, when the program is executed by at least one data processing device, cause the data processing device to perform the steps of the method according to claim 1 .
7 . A machine-readable storage medium comprising commands which, when executed by at least one data processing device, cause the data processing device to perform the steps of the method according to claim 1 .

Description

This application claims priority under 35 U.S.C. § 119 to patent application no. DE 10 2024 210 661.5, filed on Nov. 6, 2024 in Germany, the disclosure of which is incorporated herein by reference in its entirety. The disclosure relates to the implementation of a program code on a hardware environment, such as that occurring as microcontroller-controlled control devices and the like. The disclosure further relates to methods for memory planning for handling input data, output data and model parameters. BACKGROUND Certain hardware environments, such as microcontrollers in control devices, require the creation of an adjusted executable program code to take into account the characteristics and limitations of the specific hardware environment. In particular, the available memory size of the working memory that can be directly accessed by the microcontroller or acceleration hardware may be limited, or memory shift or copy operations from a data memory, such as flash or external memory, to the working memory may be particularly complex due to hardware constraints. The calculation steps for computing corresponding network layers of neural networks can require considerable memory, since for each calculation step an input data block, a model parameter block, and an output data block must be retrieved and stored in the working memory in a form that can be used by the microcontroller. During memory planning, existing code generators determine in which region of the working memory the data blocks required for each calculation step are stored. During memory planning, in addition to assigning the input data blocks, output data blocks, and, if necessary, the model parameters to memory spaces of the working memory, the data generated during the calculation in a calculation step is also assigned a corresponding memory space. Conventional code generators for neural networks do not usually assume limited working memory and typically allocate distinct memory spaces for storing the input data blocks, network parameter blocks, and output data blocks for each of the successive calculation steps. Until now, it has therefore been common practice to distribute the model parameters freely in the available memory in order to minimize the total memory requirement. However, this approach can result in the model parameters having to be copied section in sections into different memory spaces of the working memory before the calculation step is executed. In particular, copying memory spaces from the data memory to a working memory as well as between memory spaces in the working memory are generally time-consuming memory operations, so that memory planning must aim to reduce the total computing time caused by the execution time of memory operations. To reduce the maximum required working memory, a tiling algorithm can be used in which the calculation of a layer of a neural network is divided into separate successive individual calculation steps. Tiling is suitable for element-by-element operations, convolutional layer calculations, pooling layer calculations and the like. The partial results of the separate calculations are then combined again in a final concatenation step. Tiling has an influence on memory planning, as the input data and/or model parameters of the layer to be calculated are loaded into the working memory section by section for the successive calculation steps and are discarded again after the individual calculation step has been completed, so that the total maximum memory requirement for the working memory for computing all layers of the neural network can be reduced, so that tiling can be used in particular on systems with limited working memory. However, splitting the calculation steps for certain layers of a neural network into individual calculation steps during tiling requires an increased number of memory operations to copy a model parameter block into the working memory for each of the separate individual calculation steps. This considerably increases the number of memory operations when tiling is used. It is the object of the present disclosure to provide a method for performing memory planning for code generation for creating a code for computing artificial neural networks in a hardware environment, in which the number of memory operations can be reduced. SUMMARY This object is achieved by the method for performing memory planning for code generation of a code for computing layers of a neural network in a hardware environment according to description set forth below, as well as by the device also according to the description set forth below. Further embodiments are specified in the description set forth below. According to a first aspect, there is provided a computer-implemented method for performing code generation memory planning to determine a code for computing a neural network in a hardware environment, comprising the steps of (i) providing successive calculation steps of network layers of the neural network,