EP-3895024-B1 - CACHING DATA IN ARTIFICIAL NEURAL NETWORK COMPUTATIONS

EP3895024B1EP 3895024 B1EP3895024 B1EP 3895024B1EP-3895024-B1

Inventors

DELERSE, SEBASTIEN
DE VANGEL, BENOIT CHAPPET
CAGNAC, Thomas

Dates

Publication Date: 20260506
Application Date: 20181212

Claims (13)

A method for caching data in artificial neural network, ANN, computations, the method comprising: receiving, by a communication unit, data and a logical address of the data, the data being associated with the ANN (500), wherein the logical address of the data is indicative of a neuron, a layer of the ANN, an input of one or more layers (505-535) of the ANN (500) or a weight of the one or more layers (505-535) of the ANN (500); determining, by a processing unit (410) coupled to the communication unit and to a plurality of physical memories (425) and based on the logical address and physical parameters of the physical memories (425), a physical address of a physical memory of the plurality of physical memories (425), wherein the plurality of physical memories (425) includes at least two physical memories associated with different physical parameters, the different physical parameters including one of: different access speeds, different read speeds, different write speeds, and different latencies for the operation, wherein the processing unit (410) is configured with a configuration (415) including rules for mapping the logical addresses to the physical memories (425), wherein the configuration (415) of the processing unit (410) is determined based on a structure of the ANN; and performing, by the processing unit (410), an operation associated with the data and the physical address wherein the operation includes a write of the data to the physical memory at the physical address.
The method of claim 1, wherein the determination of the physical address of the physical memory (425) is based on a usage count of the data in the ANN computation.
The method of any of the preceding claims, wherein: the physical memories (425) are associated with priorities, the priorities being based on physical parameters associated with the physical memories (425); and the processing unit (410) is configured to select, based on an order of the priorities, the physical memory from the plurality of physical memories (425), the physical memory being selected to perform the operation.
The method of claim 3, wherein a priority of the physical memory of the plurality of physical memories (425) is determined based on a time lapse between a time the data is written to the physical memory and a time the data is used in the ANN computation.
The method of claim 3 or 4, wherein the data include a sequence of inputs and the processing unit (410) is configured to: determine that a size of the data exceeds a size of the selected physical memory; and in response to the determination: select a further physical memory from the plurality of memories (425), the further physical memory being associated with a further priority, the further priority being lower than a priority of the selected physical memory; and write the inputs from the sequence to the selected physical memory and to the further physical memory.
The method of any of claims 3-5, wherein the configuration (415) includes information related to mapping the data associated with one or more layers (505-535) of the ANN (500) to the priorities.
A system for caching data in artificial neural network, ANN, computations, the system comprising: a communication unit configured to receive data and a logical address of the data, the data being associated with the ANN (500); a plurality of physical memories (425), the physical memories (425) being associated with physical addresses and physical parameters, wherein the plurality of physical memories (425) includes at least two physical memories associated with different physical parameters, the different physical parameters including one of: different access speeds, different read speeds, different write speeds, and different latencies for the operation; and a processing unit (410) coupled to the communication unit and to the plurality of physical memories (425), wherein the system (400) is configured to perform the method of any of claims 1-6.
The system of claim 7, wherein the processing unit (410) resides in one of: a field- programmable gate array or an application-specific integrated circuit.
The system of claim 8, wherein the plurality of physical memories (425) includes a memory integrated with the processing unit (410).
The system of claim 8 or 9, wherein the plurality of physical memories (425) includes a memory storage external to the processing unit (410).
The system of any of claims 8-10, wherein a bit of the physical address is used as a control of the physical memory, the physical memory receiving or providing the data associated with an ANN (500), or wherein a bit of the physical address is used as a selector of a type of a multiplexer, wherein an input or an output of the multiplexer includes a bit of the data associated with an ANN (500).
The system of any of claims 7-11, wherein the determination of the physical address of the physical memory by the processing unit (410) is based on one of: a usage count of the data in the ANN computation or a time lapse between a time the data is written to the physical memory and a time the data is used in the ANN computation.
A computer program for caching data in artificial neural network, ANN, computations, wherein the computer program comprises control instructions which, when executed by one or more processors (910) of a computing system (900) which implements a communication unit and a processing unit (410) coupled to the communication unit, cause the one or more processors (910) to perform any of the methods according to claims 1-6.

Description

TECHNICAL FIELD The present disclosure relates generally to data processing and, more particularly, to system and method for caching data in artificial neural network (ANN) computations. BACKGROUND Artificial Neural Networks (ANNs) are simplified and reduced models reproducing behavior of human brain. The human brain contains 10-20 billion neurons connected through synapses. Electrical and chemical messages are passed from neurons to neurons based on input information and their resistance to passing information. In the ANNs, a neuron can be represented by a node performing a simple operation of addition coupled with a saturation function. A synapse can be represented by a connection between two nodes. Each of the connections can be associated with an operation of a multiplication by a constant. The ANNs are particularly useful for solving problems that cannot be easily solved by classical computer programs. While forms of the ANNs may vary, they all have the same basic elements similar to the human brain. A typical ANN can be organized into layers, each of the layers may include many neurons sharing similar functionality. The inputs of a layer may come from a previous layer, multiple previous layers, any other layers or even the layer itself. Major architectures of ANNs include Convolutional Neural Network (CNN), Recurrent Neural Network (RNN) and Long Term Short Memory (LTSM) network, but other architectures of ANN can be developed for specific applications. While some operations have a natural sequence, for example a layer depending on previous layers, most of the operations can be carried out in parallel within the same layer. The ANNs can then be computed in parallel on many different computing elements similar to neurons of the brain. A single ANN may have hundreds of layers. Each of the layers can involve millions of connections. Thus, a single ANN may potentially require billions of simple operations like multiplications and additions. Because of the larger number of operations and their parallel nature, ANNs can result in a very heavy load for processing units (e.g., CPU), even ones running at high rates. Sometimes, to overcome limitations of CPUs, graphics processing units (GPUs) can be used to process large ANNs because GPUs have a much higher throughput capacity of operations in comparison to CPUs. Because this approach solves, at least partially, the throughput limitation problem, GPUs appear to be more efficient in the computations of ANNs than the CPUs. However, GPUs are not well suited to the computations of ANNs because the GPUs have been specifically designed to compute graphical images. The GPUs may provide a certain level of parallelism in computations. However, the GPUs are constraining the computations in long pipes, which results in latency and lack of reactivity. To deliver the maximum throughput, very large GPUs can be used which may involve excessive power consumption, a typical issue of GPUs. Since the GPUs may require more power consumption for the computations of ANNs, the deployment of GPUs can be difficult. To summarize, CPUs provide a very generic engine that can execute very few sequences of instructions with a minimum effort in terms of programming, but lack the power of computing required for ANNs. GPUs are slightly more parallel and require a larger effort of programming than CPUs, which can be hidden behind libraries with some performance costs, but are not very well suitable for ANNs. Field Programmable Gate Arrays (FPGAs) are professional components that can be programmed at the hardware level after they are manufactured. The FPGAs can be configured to perform computations in parallel. Therefore, FPGAs can be well suited to compute ANNs. Programming of FPGAs, however, is challenging, requiring a much larger effort than programming CPUs and GPUs. Thus, adaption of FPGAs to perform ANN computations can be more challenging than for CPUs and GPUs. Most attempts in programming FPGAs to compute ANNs have been focusing on a specific ANN or a subset of ANNs, or required to modify the ANN structure to fit into a specific limited accelerator, or provided a basic functionality without solving the problem of computing ANN on FPGAs globally. The computation scale is typically not taken into account by existing FPGA solutions, many of the research being limited to a single or few computation engines, which could be replicated. Furthermore, the existing FPGA solutions do not solve the problem of massive data movement required at a large scale for the actual ANN involved in real industrial applications. The inputs computed with an ANN are typically provided by an artificial intelligence (AI) framework. Those programs are used by the AI community to develop new ANN or global solutions based on ANN. However, FPGAs typically lack integration with AI frameworks. The document US 2005/0262323 A1 describes a memory system that employs group levelization, wherein different latencies are enable