US-12619371-B2 - Generating data movement networks for machine learning models

US12619371B2US 12619371 B2US12619371 B2US 12619371B2US-12619371-B2

Abstract

Implementing a data movement network includes tiling one or more layers of a machine learning model based, at least in part, on amounts of addressable memory available in different memory levels of a memory architecture of an electronic system. Logical connections specifying compute tiles of the electronic system and logical address spaces corresponding to the compute tiles are generated. Physical connections are generated within the memory architecture by binding ports of direct memory access circuits of the memory architecture to the logical connections. Data transfers for memories between the different memory levels are scheduled based, at least in part, on a loop order of the tiling. Buffers for data of the data transfers are placed within the memories based on the scheduling.

Inventors

Hua Sun
Bryan Bowyer
Tayyar Rzayev
Kaushik Barman

Assignees

XILINX, INC.

Dates

Publication Date: 20260505
Application Date: 20240531

Claims (16)

1 . A method of implementing a data movement network, comprising: tiling one or more layers of a machine learning model based, at least in part, on amounts of addressable memory available in different memory levels of a memory architecture of an electronic system; generating logical connections specifying compute tiles of the electronic system and logical address spaces corresponding to the compute tiles; creating physical connections within the memory architecture by binding ports of direct memory access circuits of the memory architecture to the logical connections; scheduling data transfers for memories between the different memory levels; and placing buffers for data of the data transfers within the memories based on the scheduling, wherein the different memory levels are hierarchically ordered and include level 3 memory, level 2 memory comprising a plurality of memory tiles each having a direct memory access (DMA) circuit configured to move data, and level 1 memory comprising data memories of the compute tiles.
2 . The method of claim 1 , wherein the tiling is performed based, at least in part, on an amount of addressable memory available in the level 2 memory, an amount of addressable memory available in the level 1 memory, and a number of the compute tiles available.
3 . The method of claim 1 , wherein the tiling comprises: generating a level 3 memory to level 2 memory tiling based on an amount of available level 2 memory.
4 . The method of claim 3 , wherein the tiling comprises: generating a level 2 memory to level 1 memory tiling based on a size of the data memories of the level 1 memory and a number of available compute tiles.
5 . The method of claim 1 , wherein the ports of the direct memory access circuits are bound to the logical connections based, at least in part, on address ranges of the level 2 memory accessed by the compute tiles concurrently.
6 . The method of claim 5 , wherein for two or more compute tiles located in a same column that access same address ranges in the level 2 memory concurrently, providing data to the two or more compute tiles from the level 2 memory using a same direct memory access circuit of the level 2 memory.
7 . The method of claim 1 , further comprising: generating a hardware implementation that implements the data movement network within the memory architecture, wherein the hardware implementation specifies the physical connections, a schedule, and the buffers as placed.
8 . A system for implementing a data movement network, comprising: one or more hardware processors configured to execute operations including: tiling one or more layers of a machine learning model based, at least in part, on amounts of addressable memory available in different memory levels of a memory architecture of an electronic system; generating logical connections specifying compute tiles of the electronic system and logical address spaces corresponding to the compute tiles; creating physical connections within the memory architecture by binding ports of direct memory access circuits of the memory architecture to the logical connections; scheduling data transfers for memories between the different memory levels; and placing buffers for data of the data transfers within the memories based on the scheduling, wherein the different memory levels are hierarchically ordered and include level 3 memory, level 2 memory comprising a plurality of memory tiles, and level 1 memory comprising data memories of the compute tiles; and wherein the tiling comprises generating a level 3 memory to level 2 memory tiling based on an amount of available level 2 memory.
9 . The system of claim 8 , wherein the tiling is performed based, at least in part, on an amount of addressable memory available in the level 2 memory, an amount of addressable memory available in the level 1 memory, and a number of the compute tiles available.
10 . The system of claim 8 , wherein the tiling comprises: generating a level 2 memory to level 1 memory tiling based on a size of the data memories of the level 1 memory and a number of available compute tiles.
11 . The system of claim 8 , wherein the ports of the direct memory access circuits are bound to the logical connections based, at least in part, on address ranges of the level 2 memory accessed by the compute tiles concurrently.
12 . The system of claim 11 , wherein for two or more compute tiles located in a same column that access same address ranges in the level 2 memory concurrently, providing data to the two or more compute tiles from the level 2 memory using a same direct memory access circuit of the level 2 memory.
13 . The system of claim 8 , wherein the one or more hardware processors execute operations including: generating a hardware implementation that implements the data movement network within the memory architecture, wherein the hardware implementation specifies the physical connections, a schedule, and the buffers as placed.
14 . A computer program product comprising one or more computer readable storage mediums having program instructions embodied therewith, wherein the program instructions are executable by computer hardware to cause the computer hardware to initiate executable operations comprising: tiling one or more layers of a machine learning model based, at least in part, on amounts of addressable memory available in different memory levels of a memory architecture of an electronic system; generating logical connections specifying compute tiles of the electronic system and logical address spaces corresponding to the compute tiles; creating physical connections within the memory architecture by binding ports of direct memory access circuits of the memory architecture to the logical connections; scheduling data transfers for memories between the different memory levels; and placing buffers for data of the data transfers within the memories based on the scheduling, wherein the different memory levels are hierarchically ordered and include level 3 memory, level 2 memory comprising a plurality of memory tiles, and level 1 memory comprising data memories of the compute tiles; and wherein the tiling comprises generating a level 3 memory to level 2 memory tiling based on an amount of available level 2 memory.
15 . The computer program product of claim 14 , wherein the tiling is performed based, at least in part, on an amount of addressable memory available in the level 2 memory, an amount of addressable memory available in the level 1 memory, and a number of the compute tiles available.
16 . The computer program product of claim 14 , wherein the tiling comprises: generating a level 2 memory to level 1 memory tiling based on a size of the data memories of the level 1 memory and a number of available compute tiles.

Description

RESERVATION OF RIGHTS IN COPYRIGHTED MATERIAL A portion of the disclosure of this patent document contains material which is subject to copyright protection. The copyright owner has no objection to the facsimile reproduction by anyone of the patent document or the patent disclosure, as it appears in the Patent and Trademark Office patent file or records, but otherwise reserves all copyright rights whatsoever. TECHNICAL FIELD This disclosure relates to machine learning and, more particularly, to generating data movement networks for machine learning models. BACKGROUND Some varieties of hardware platforms include a plurality of interconnected compute circuits. Such hardware platforms are capable of providing significant computational power and a high degree of parallelism. Applications that execute on these platforms, e.g., machine learning models, are often specified in graph form. In general, a graph specifying the machine learning model is formed of a plurality of nodes connected by edges. The nodes represent operations performed on input data and the edges represent communication links that convey data among the nodes. Because the machine learning model typically consumes a significant amount of data, the ability to efficiently provide data to the various compute circuits and the ability to efficiently output resulting data from the compute circuits has a significant effect on runtime performance and power consumption of the application as executed on the hardware platform. SUMMARY One or more embodiments of the disclosed technology are directed to a method of implementing a data movement network. The method includes tiling one or more layers of a machine learning model based, at least in part, on amounts of addressable memory available in different memory levels of a memory architecture of an electronic system. The method includes generating logical connections specifying compute tiles of the electronic system and logical address spaces corresponding to the compute tiles. The method includes creating physical connections within the memory architecture by binding ports of direct memory access circuits of the memory architecture to the logical connections. The method includes scheduling data transfers for memories between the different memory levels based, at least in part, on a loop order of the tiling. The method includes placing buffers for data of the data transfers within the memories based on the scheduling. One or more embodiments of the disclosed technology are directed to a system for implementing a data movement network. The system includes one or more hardware processors configured to execute operations. The operations include tiling one or more layers of a machine learning model based, at least in part, on amounts of addressable memory available in different memory levels of a memory architecture of an electronic system. The operations include generating logical connections specifying compute tiles of the electronic system and logical address spaces corresponding to the compute tiles. The operations include creating physical connections within the memory architecture by binding ports of direct memory access circuits of the memory architecture to the logical connections. The operations include scheduling data transfers for memories between the different memory levels based, at least in part, on a loop order of the tiling. The operations include placing buffers for data of the data transfers within the memories based on the scheduling. One or more embodiments of the disclosed technology are directed to a computer program product. The computer program product includes one or more computer readable storage mediums having program instructions embodied therewith. The program instructions are executable by computer hardware to cause the computer hardware to execute operations that implement a data movement network. The operations include tiling one or more layers of a machine learning model based, at least in part, on amounts of addressable memory available in different memory levels of a memory architecture of an electronic system. The operations include generating logical connections specifying compute tiles of the electronic system and logical address spaces corresponding to the compute tiles. The operations include creating physical connections within the memory architecture by binding ports of direct memory access circuits of the memory architecture to the logical connections. The operations include scheduling data transfers for memories between the different memory levels based, at least in part, on a loop order of the tiling. The operations include placing buffers for data of the data transfers within the memories based on the scheduling. This Summary section is provided merely to introduce certain concepts and not to identify any key or essential features of the claimed subject matter. Other features of the inventive arrangements will be apparent from the accompanying drawings and from the fol