US-12619567-B2 - Network on chip for high performance computing and a method of using the same

US12619567B2US 12619567 B2US12619567 B2US 12619567B2US-12619567-B2

Abstract

A network on chip for high performance computing includes a first fat-tree network topology that connects each of a number of initiator nodes to one of a number of initiator core switches through one or more levels of initiator edge switches. A second fat-tree network topology connects each of the target nodes to one of a number of target core switches through one or more levels of target edge switches, and the first fat-tree network topology is joined to the second fat-tree network topology through connections between the initiator core switches and the target core switches.

Inventors

Suresh Govindachar

Assignees

Mercedes-Benz Group AG

Dates

Publication Date: 20260505
Application Date: 20231212

Claims (18)

1 . A system comprising: a plurality of initiator nodes; a plurality of target nodes; a first fat-tree network topology connecting each of the plurality of initiator nodes to one of a plurality of initiator core switches through one or more levels of initiator edge switches; a second fat-tree network topology connecting each of the plurality of target nodes to one of a plurality of target core switches through one or more levels of target edge switches, the first fat-tree network topology being joined to the second fat-tree network topology through connections between the plurality of initiator core switches and the plurality of target core switches; a system memory with a contiguous address space; and a last level cache between the system memory and the second fat-tree network topology, wherein the plurality of target nodes comprise a plurality of virtual memory channels connected to the system memory through the last level cache.
2 . The system of claim 1 , wherein addressing of the system memory strides across the virtual memory channels, such that each of the initiator nodes has uniform access to the contiguous address space at a granularity of a predetermined stride width.
3 . The system of claim 1 , wherein a total bandwidth between the last level cache and the second fat-tree network topology is greater than the total bandwidth the last level cache and the system memory.
4 . The system of claim 1 , further comprising: a last level cache between the first fat-tree topology and the second fat-tree topology.
5 . The system of claim 1 , wherein total bandwidth at the plurality of target nodes is at least equal to the total bandwidth at the plurality of initiator nodes.
6 . The system of claim 1 , wherein the connections between the plurality of initiator core switches and the plurality of target core switches are in an any-to-any configuration.
7 . The system of claim 1 , wherein at least some of the plurality of initiator nodes correspond to direct memory access interfaces for digital signal processors.
8 . The system of claim 1 , wherein at least some of the plurality of initiator nodes correspond to chiplets comprising a system on chip.
9 . The system of claim 1 , wherein each of the plurality of initiator nodes has topologically equidistant connectivity to each of the plurality of target nodes.
10 . A method comprising: receiving, at an initiator node of a network on chip, a memory access request; splitting the memory access request into a plurality of subrequests based on an amount of memory specified by the memory access request, a plurality of target nodes in the network on chip, and a predetermined stride width for the network on chip; and transmitting the plurality of subrequests across the network on chip through (a) a first fat-tree network topology connecting the initiator node to one of a plurality of initiator core switches through one or more levels of initiator edge switches, and (b) a second fat-tree network topology connecting each of the plurality of target nodes to one of a plurality of target core switches through one or more levels of target edge switches, the first fat-tree network topology being joined to the second fat-tree network topology through connections between the plurality of initiator core switches and the plurality of target core switches.
11 . The method of claim 10 , further comprising: receiving responses to the plurality of subrequests; and combining the responses to the plurality of subrequests to provide a combined response to the memory access request.
12 . The method of claim 10 , wherein the memory access request is a read request specifying a memory address and the amount of memory to read.
13 . The method of claim 10 , wherein the memory access request is a write request specifying a memory address and the amount of memory is a data payload to be written starting at that memory address.
14 . The method of claim 11 , wherein the initiator node splits the memory access request into the plurality of subrequests, receives responses to the plurality of subrequests, and combines the responses to the plurality of subrequests.
15 . The method of claim 11 , wherein one or more of the initiator core switches, the initiator edge switches, the target core switches, and/or the target edge switches split the memory access request into the plurality of subrequests, receive responses to the plurality of subrequests, and combine the responses to the plurality of subrequests.
16 . A network on chip comprising: a plurality of initiator nodes; a plurality of target nodes; a first fat-tree network topology connecting each of the plurality of initiator nodes to one of a plurality of initiator core switches through one or more levels of initiator edge switches; a second fat-tree network topology connecting each of the plurality of target nodes to one of a plurality of target core switches through one or more levels of target edge switches, the first fat-tree network topology being joined to the second fat-tree network topology through connections between the plurality of initiator core switches and the plurality of target core switches; and a last level cache, wherein the last level cache is between a system memory with a contiguous address space and the second fat-tree network topology, and wherein the plurality of target nodes comprise a plurality of virtual memory channels connected to the system memory through the last level cache.
17 . The network on chip of claim 16 , wherein addressing of the system memory strides across the virtual memory channels, such that each of the initiator nodes has uniform access to the contiguous address space at a granularity of a predetermined stride width.
18 . The network on chip of claim 16 , wherein a total bandwidth between the last level cache and the second fat-tree network topology is greater than the total bandwidth between the last level cache and the system memory.

Description

BACKGROUND In modern electronics and integrated circuit design, the demand for higher performance, reduced power consumption, and increased functionality has led to the development of advanced integration techniques. Two architecture designs that have emerged as solutions to these challenges are systems on chip (SoC) and networks on chip (NoC). These technologies are pivotal in advancing the design and implementation of complex electronic systems, particularly in the context of microprocessors, application-specific integrated circuits (ASICs), and other semiconductor devices. A system on chip is an integrated circuit that integrates multiple functional blocks and components, such as processors, memory units, input/output interfaces, and various peripheral devices onto a single silicon chip. The primary motivation behind the development of SoCs is to achieve a high level of integration, allowing for compact and efficient designs that deliver improved performance, lower power consumption, and reduced manufacturing costs compared to traditional discrete designs. By incorporating diverse functionalities into a single chip, SoCs have facilitated the creation of highly sophisticated devices, ranging from smartphones and wearable gadgets to automotive control systems and industrial automation equipment. A network on chip is an architectural paradigm that addresses the intricate communication requirements within an SoC. As SoCs continue to grow in complexity and include a myriad of functional blocks, efficient and scalable communication mechanisms become more important. By providing a dedicated communication infrastructure, NoCs ensure efficient and reliable data transfer among the numerous functional blocks within an SoC, regardless of their physical placement on the chip. They offer a robust framework for data exchange, synchronization, and coordination among different modules while accommodating the increasing demand for bandwidth, low latency, and fault tolerance. BRIEF DESCRIPTION OF THE DRAWINGS The disclosure herein is illustrated by way of example, and not by way of limitation, in the figures of the accompanying drawings in which like reference numerals refer to similar elements. FIG. 1 is a block diagram depicting an example system in which embodiments described herein may be implemented, in accordance with some aspects. FIG. 2 is a block diagram depicting one embodiment of a network on chip for high performance computing. FIG. 3 is a block diagram depicting another embodiment of a network on chip for high performance computing. FIG. 4 is a block diagram depicting an example system in which embodiments described herein may be implemented, in accordance with some aspects. FIG. 5 is a block diagram depicting an example network on chip configuration. FIG. 6 is a block diagram depicting another example network on chip configuration. FIG. 7 is a flow chart describing a method of using an example network on chip, in accordance with some aspects. FIG. 8 is a flow chart describing another method of using an example network on chip, in accordance with further aspects. DETAILED DESCRIPTION A network on chip for high performance computing is described herein that implements a network topology combining two fat-tree topologies in an hourglass configuration. A first fat-tree network topology connects initiator nodes, which issue memory read and write requests, through one or more levels of edge switches to initiator core switches. A second fat-tree network topology connects target nodes, which receive the memory read and write requests, through one or more levels of edge switches to target core switches. The number of switches at each level is reduced hierarchically until it is possible to perform any-to-any connections directly. The first fat-tree network topology is joined to the second fat-tree network topology through the any-to-any connections between the initiator core switches and the target core switches. Today's complex system on chip (SoC) designs can contain between tens to hundreds of blocks using diverse intellectual properties (IP). Each IP block has its own data width, clock frequency, and interface protocol. Connecting all these IPs is a significant challenge in SoC design, which has been addressed by network on chip (NoC) architectures. In the case of an initiator IP, a network interface packetizes the data generated by the IP, assigns an ID to the packet, and dispatches it into the network. When the packet arrives at its destination IP, the associated interface extracts the data from the packet and transforms it into the protocol required by the IP. A large number of packets can be in flight throughout the network at any given time depending on the topology of the NoC. The topology is a fundamental aspect of NoC design, and it has a profound effect on the overall network cost and performance. The topology determines the physical layout and connections between nodes and channels. Nodes represent indivi