Search

US-12619370-B2 - Memory unit partitioning for reconfigurable dataflow computers

US12619370B2US 12619370 B2US12619370 B2US 12619370B2US-12619370-B2

Abstract

A system and method for memory unit partitioning for reconfigurable dataflow computing systems includes a parser that receives and parses source code for a reconfigurable dataflow processor, a tensor expression extractor that extracts tensor indexing expressions from the source code, a logical memory constraint generator that converts the tensor indexing expressions to logical memory indexing constraints, a grouping module that groups the logical memory indexing constraints into concurrent access groups, and a memory partitioning module that determines a memory unit partitioning solution for each concurrent access group.

Inventors

  • Yaqi Zhang
  • Matthew Feldman

Assignees

  • SambaNova Systems, Inc.

Dates

Publication Date
20260505
Application Date
20240917

Claims (11)

  1. 1 . A system for controlling memory unit partitioning for reconfigurable dataflow computing systems, comprising: a parser configured to receive and parse source code for a reconfigurable dataflow processor that comprises an array of compute units and an array of memory units interconnected with a switching fabric, the source code comprising a plurality of tensor indexing expressions; a tensor expression extractor configured to extract the plurality of tensor indexing expressions from the source code; a logical memory constraint generator configured to convert the plurality of tensor indexing expressions to a plurality of logical memory indexing constraints; a grouping module configured to group the plurality of logical memory indexing constraints into concurrent access groups; and a memory partitioning module configured to determine a memory unit partitioning solution for each concurrent access group that supports the plurality of logical memory indexing constraints without concurrent usage conflicts including memory unit and memory port conflicts, wherein the reconfigurable dataflow processor is configured to execute the plurality of tensor indexing expressions and access the array of memory units according to the memory unit partitioning solution, wherein memory units in the array of memory units comprise address generators that generate, for each memory cycle, a physical address comprising a bank identifier and a bank offset, wherein said memory units in the array of memory units are configured to respond to a specific bank identifier, and wherein the memory partitioning module is further configured to determine the memory unit partitioning solution by selecting a set of logical-to-physical mapping parameters, and wherein the set of logical-to-physical mapping parameters comprise a logical memory unit count N, a blocking parameter B, a scaling vector alpha and a packing vector P.
  2. 2 . A system for providing memory unit partitioning solutions for reconfigurable dataflow computing systems, the system comprising: a parser configured to receive and parse source code for a reconfigurable dataflow processor that comprises an array of compute units and an array of memory units interconnected with a switching fabric, the source code comprising a plurality of tensor indexing expressions; a tensor expression extractor configured to extract the plurality of tensor indexing expressions from the source code; a logical memory constraint generator configured to convert the plurality of tensor indexing expressions to a plurality of logical memory indexing constraints; a grouping module configured to group the plurality of logical memory indexing constraints into concurrent access groups; and a memory partitioning module configured to determine a memory unit partitioning solution for each concurrent access group that supports the plurality of logical memory indexing constraints without concurrent usage conflicts including memory unit and memory port conflicts, wherein the dataflow processor is configured to execute the plurality of tensor indexing expressions and access the array of memory units according to the memory unit partitioning solution, wherein memory units in the array of memory units comprise address generators that generate, for each memory cycle, a physical address comprising a bank identifier and a bank offset, and wherein said memory units in the array of memory units are configured to respond to a specific bank identifier, and wherein the memory partitioning module is further configured to determine the memory unit partitioning solution by selecting a set of logical-to-physical mapping parameters, and wherein the set of logical-to-physical mapping parameters comprise a logical memory unit count N, a blocking parameter B, a scaling vector alpha and a packing vector P.
  3. 3 . The system of claim 2 , wherein the selecting comprises testing legal combinations of N, B and alpha.
  4. 4 . The system of claim 2 , further comprising a capacity modification module configured to perform a capacity modification to legalize the memory unit partitioning solution.
  5. 5 . The system of claim 4 , wherein the capacity modification comprises scaling packing vector P or increasing a logical memory unit count N of a set of logical-to-physical mapping parameters.
  6. 6 . A method for controlling memory unit partitioning solutions for reconfigurable dataflow computing systems, the method comprising: receiving source code for a reconfigurable dataflow processor that comprises an array of compute units and an array of memory units interconnected with a switching fabric, the source code comprising a plurality of tensor indexing expressions; converting the plurality of tensor indexing expressions to a plurality of logical memory indexing constraints; grouping the plurality of logical memory indexing constraints into concurrent access groups; determining a memory unit partitioning solution for each concurrent access group that supports the plurality of logical memory indexing constraints without concurrent usage conflicts including memory unit and memory port conflicts; and accessing the array of memory units according to the memory unit partitioning solution in conjunction with executing the plurality of tensor indexing expressions with the reconfigurable dataflow processor, wherein determining the memory unit partitioning solution comprises selecting a set of logical-to-physical mapping parameters, and wherein the set of logical-to-physical mapping parameters comprise a logical memory unit count N, a blocking parameter B, a scaling vector alpha and a packing vector P.
  7. 7 . The method of claim 6 , wherein selecting comprises testing legal combinations of N, B and alpha.
  8. 8 . The method of claim 6 , wherein the capacity modification comprises scaling a packing vector P or increasing a logical memory unit count N of a set of logical-to-physical mapping parameters.
  9. 9 . The method of claim 6 , wherein the selecting comprises testing legal combinations of N, B and alpha.
  10. 10 . The method of claim 8 , wherein the set of logical-to-physical mapping parameters define a hyperplane partitioning or a parallel-piped partitioning.
  11. 11 . The method of claim 6 , wherein the array of compute units operate on vectors and the memory unit partitioning solution is vectorized.

Description

PRIORITY APPLICATION This application is a continuation of U.S. patent application Ser. No. 18/208,343, filed Jun. 12, 2023, now U.S. Pat. No. 12,093,551, entitled “Memory Unit Partitioning Solutions for Reconfigurable Dataflow Computing Systems,” which is a continuation of U.S. patent application Ser. No. 17/878,504, filed Aug. 1, 2022, now U.S. Pat. No. 11,709,611, entitled “Determining and Using Memory Unit Partitioning Solutions for Reconfigurable Dataflow Computing Systems,” which claims priority to U.S. Patent Application No. 63/271,906, filed Oct. 26, 2021, entitled “Automatic Tensor Partitioning,”, which is incorporated by reference herein for any and all purposes. RELATED APPLICATIONS AND DOCUMENTS This application is related to the following papers and commonly owned applications: U.S. Nonprovisional patent application Ser. No. 17/031,679, filed Sep. 24, 2020, entitled “SYSTEMS AND METHODS FOR MEMORY LAYOUT DETERMINATION AND CONFLICT RESOLUTION,” now U.S. Pat. No. 11,645,057;U.S. Nonprovisional patent application Ser. No. 16/922,975, filed Jul. 7, 2020, entitled “RUNTIME VIRTUALIZATION OF RECONFIGURABLE DATA FLOW RESOURCES,” now U.S. Pat. No. 11,809,908;U.S. Nonprovisional patent application Ser. No. 17/216,647, filed Mar. 29, 2021, entitled “TENSOR PARTITIONING AND PARTITION ACCESS ORDER,” now U.S. Pat. No. 11,204,889;U.S. Provisional Patent Application No. 63/271,906, filed Oct. 26, 2021, entitled “AUTOMATIC TENSOR PARTITIONING,”; All of the related application(s) and documents listed above are hereby incorporated by reference herein for all purposes. BACKGROUND The present subject matter relates to determining and using memory unit partitioning solutions for reconfigurable dataflow computing systems. Reconfigurable processors can be configured to implement a variety of functions more efficiently or faster than might be achieved using a general-purpose processor executing a computer program. For example, coarse-grained reconfigurable architectures (e.g. CGRAs) have been proposed that can enable implementation of energy-efficient accelerators for machine learning and artificial intelligence workloads. See, Prabhakar, et al., “Plasticine: A Reconfigurable Architecture for Parallel Patterns,” ISCA '17, Jun. 24-28, 2017, Toronto, ON, Canada. Memory unit management can dramatically affect the performance of dataflow computing systems. SUMMARY OF THE INVENTION A system for determining and using memory unit partitioning solutions for reconfigurable dataflow computing systems includes a parser that receives and parses source code for a reconfigurable dataflow processor, a tensor expression extractor that extracts tensor indexing expressions from the source code, a logical memory constraint generator that converts the tensor indexing expressions to logical memory indexing constraints, a grouping module that groups the logical memory indexing constraints into concurrent access groups and a memory partitioning module that determines a memory unit partitioning solution for each concurrent access group. The system also includes a reconfigurable dataflow processor that comprises an array of compute units and an array of memory units interconnected with a switching fabric. The reconfigurable dataflow processor may be configured to execute the plurality of tensor indexing expressions and access the array of memory units according to the memory unit partitioning solution. A corresponding method and computer-readable medium are also disclosed herein. BRIEF DESCRIPTION OF THE DRAWINGS FIG. 1A is a layout diagram illustrating a CGRA (Coarse-Grained Reconfigurable Architecture) suitable for dataflow computing. FIG. 1B is a block diagram of a compiler stack suitable for a CGRA (Coarse-Grained Reconfigurable Architecture). FIG. 1C is a system diagram illustrating a system including a host, a memory, and a reconfigurable data processor. FIG. 2 is a simplified block diagram of a top-level network and components of a CGRA (Coarse Grain Reconfigurable Architecture). FIG. 3A is a simplified diagram of a tile and an array level network usable in the configuration of FIG. 2, where the configurable units are nodes on the array level network. FIG. 3B illustrates an example switch unit connecting elements in an array level network. FIG. 4 is a block diagram illustrating an example configurable compute unit. FIG. 5 is a block diagram illustrating an example configurable memory unit. FIG. 6A and FIG. 6B illustrate two classes of memory unit partitioning in accordance with embodiments disclosed herein. FIG. 7 is a block diagram depicting one example of a system for determining and using memory unit partitioning solutions. FIG. 8 is a flowchart depicting one example of a method for determining and using memory unit partitioning solutions. FIG. 9 shows one example of a set of logical-to-physical address mapping equations. FIG. 10A is a flowchart depicting one example of a partitioning solution optimization method. FIG. 10B is a flow