US-12619537-B2 - Processor with high-capacity last-level cache
Abstract
A processor with a high-capacity last-level cache is shown, which includes a plurality of cores, a primary storage node, a plurality of cache slices corresponding to the cores, and a mesh-type interconnection structure. The cache slices are combined as a last-level cache. The mesh-type interconnection structure connects the primary storage node and the cache slices in a ring, and connects at least two cache slices non-adjacent to each other in the ring.
Inventors
- Chen Chen
- Qi Li
- Yongjie Zhang
Assignees
- SHANGHAI ZHAOXIN SEMICONDUCTOR CO., LTD.
Dates
- Publication Date
- 20260505
- Application Date
- 20240918
- Priority Date
- 20231019
Claims (13)
- 1 . A processor with a high-capacity last-level cache, comprising: a plurality of cores; a primary storage node; a plurality of cache slices corresponding to the plurality of cores, configured to form a last-level cache; and a mesh-type interconnection structure, configured to connect the primary storage node and the cache slices in a ring; wherein: the cache slices are arranged in a first column and a second column; and the mesh-type interconnection structure connects at least two pairs of cache slices, adjacent to each other in a lateral direction, in the first and second columns.
- 2 . The processor with a high-capacity last-level cache as claimed in claim 1 , wherein: the number of cache slices and the number of cores are both N; N is greater than 2; the cache slices are numbered from 0 to (N−1); the cores are numbered from 0 to (N−1); a ring connection formed by the primary storage node and the cache slices starts from the primary storage node, and then goes through the cache slices in numerical order, and then comes back to the primary storage node; and the mesh-type interconnection structure connects every pair of cache slices with non-adjacent numbers.
- 3 . The processor with a high-capacity last-level cache as claimed in claim 2 , wherein: N is an even number greater than 2; the cache slices numbered from 0 to (N/2−1) are arranged from a head row to a bottom row in the first column in an order in which they are numbered; and the cache slices numbered from N/2 to (N−1) are arranged from a bottom row to a head row in the second column in the order in which they are numbered.
- 4 . The processor with a high-capacity last-level cache as claimed in claim 3 , wherein: the cache slices in the first column are aligned laterally to the cache slices in the second column.
- 5 . The processor with a high-capacity last-level cache as claimed in claim 4 , wherein: the core and cache slice with the same number are connected to each other.
- 6 . The processor with a high-capacity last-level cache as claimed in claim 5 , wherein: the cores numbered from 0 to (N/2−1) corresponding to the cache slices numbered from 0 to (N/2−1) are located on a first side of the cache slices numbered from 0 to (N/2−1); and the cores numbered from N/2 to (N−1) corresponding to the cache slices numbered from N/2 to (N−1) are located on a second side of the cache slices numbered from N/2 to (N−1), and the second side is opposite to the first side.
- 7 . The processor with a high-capacity last-level cache as claimed in claim 6 , wherein: the primary storage node is arranged on a upper side of the first column and the second column.
- 8 . The processor with a high-capacity last-level cache as claimed in claim 4 , wherein: the mesh-type interconnection structure includes pairs of first-type communication lines between adjacent cache slices in the first column for bidirectional communication between the adjacent cache slices in the first column, pairs of first-type communication lines between adjacent cache slices in the second column for bidirectional communication between the adjacent cache slices in the second column, and pairs of first-type communication lines between laterally-aligned cache slices in the first and second columns for bidirectional communication between the laterally-aligned cache slices in the first and second columns, so as to realize communication between the cores and the last-level cache.
- 9 . The processor with a high-capacity last-level cache as claimed in claim 8 , wherein: the mesh-type interconnection structure includes a pair of second-type communication lines between the primary storage node and a cache slice numbered 0 for bidirectional communication between the primary storage node and the cache slice numbered 0, pairs of second-type communication lines between adjacent cache slices in the first column for bidirectional communication between the adjacent cache slices in the first column, a pair of second-type communication lines between the primary storage node and a cache slice numbered (N−1) for bidirectional communication between the primary storage node and the cache slice numbered (N−1), and pairs of second-type communication lines between adjacent cache slices in the second column for bidirectional communication between the adjacent cache slices in the second column, so as to realize communication between the primary storage node and the last-level cache.
- 10 . The processor with a high-capacity last-level cache as claimed in claim 9 , wherein: the mesh-type interconnection structure includes third-type communication lines connecting the primary storage node and the cache slices in a unidirectional manner, so as to transfer control signals.
- 11 . The processor with a high-capacity last-level cache as claimed in claim 3 , wherein: the mesh-type interconnection structure that connects the primary storage node and the cache slices in the ring further connects the cache slice numbered j with the cache slice numbered (N−1−j); and j=0˜[(N/2)−2].
- 12 . The processor with a high-capacity last-level cache as claimed in claim 11 , wherein: N is 8.
- 13 . The processor with a high-capacity last-level cache as claimed in claim 2 , wherein: the cache slices numbered from 0 to M are arranged from a head row to a bottom row in the first column in the order in which they are numbered; the cache slices numbered from (M+1) to (N−1) are arranged from a bottom row to a head row in the second column in the order in which they are numbered; and M is a number greater than 0 and less than N−2.
Description
CROSS REFERENCE TO RELATED APPLICATIONS This Application claims priority of China Patent Application No. 202311360470.4, filed on Oct. 19, 2023, the entirety of which is incorporated by reference herein. BACKGROUND OF THE DISCLOSURE Field of the Disclosure The present disclosure relates to processors, and, in particular, it relates to processors with a high-capacity last level cache (LLC). Description of the Related Art A processor generally includes cache memories. Data read from a primary memory may be temporarily stored in the cache memories. In kernel calculations, the cache memories are searched first. If the requested data can be found from the cache memories, there is no need to access the slow-speed primary memory. Cache memories typically adopt a multi-level architecture. For a multi-core processor, the last-level cache (LLC) is external to the cores to be shared by the different cores. The last level cache (LLC) may be composed of multiple cache slices to achieve a large capacity thanks to the development of computer systems. How to use the multiple cache slices to create a large-capacity LLC without affecting the data transmission bandwidth among the cores, the LLC, and the primary memory is an important issue in this technical field. BRIEF SUMMARY OF THE DISCLOSURE This application introduces a mesh-type interconnection structure to improve the data transmission bandwidth among the cores, the last-level cache (LLC), and the primary memory. A processor in accordance with an exemplary embodiment of the disclosure includes a plurality of cores, a primary storage node, a plurality of cache slices corresponding to the cores, and a mesh-type interconnection structure. The cache slices are combined as a last-level cache (LLC). The mesh-type interconnection structure connects the primary storage node and the cache slices in a ring, and connects at least two cache slices that are not adjacent to each other in the ring. In this manner, data transmission bandwidth among the cores, the last-level cache (LLC), and the primary memory is considerably improved. In an exemplary embodiment, the number of cache slices and the number of cores are both N. N is greater than 2. The cache slices are numbered from 0 to (N−1). The cores are numbered from 0 to (N−1). A ring connection formed by the primary storage node and the cache slices starts from the primary storage node, and then goes through the cache slices in numerical order, and then comes back to the primary storage node. The mesh-type interconnection structure connects two cache slices with non-adjacent numbers. In an exemplary embodiment, N is an even number greater than 2. The cache slices numbered from 0 to (N/2−1) are arranged from a head row to a bottom row in a first column in the order in which they are numbered. The cache slices numbered from N/2 to (N−1) are arranged from a bottom row to a head row in a second column in the order in which they are numbered. In an exemplary embodiment, the cache slices in the first column are aligned laterally to the cache slices in the second column. The mesh-type interconnection structure connects laterally-aligned cache slices in the first and second columns. In an exemplary embodiment, the core and cache slice with the same number are connected to each other. The cores numbered from 0 to (N/2−1) corresponding to the cache slices numbered from 0 to (N/2−1) are located on a first side of the cache slices numbered from 0 to (N/2−1). The cores numbered from N/2 to (N−1) corresponding to the cache slices numbered from N/2 to (N−1) are located on a second side of the cache slices numbered from N/2 to (N−1). The second side is opposite to the first side. The primary storage node is arranged on the upper side of the first column and the second column. In an exemplary embodiment, the mesh-type interconnection structure that connects the primary storage node and the cache slices in the ring further connects the cache slice numbered j with the cache slice numbered (N−1−j), where j=˜[(N/2)−2]. In an exemplary embodiment, the two columns may have different numbers of cache slices. The cache slices numbered from 0 to M are arranged from a head row to a bottom row in a first column in the order in which they are numbered. The cache slices numbered from (M+1) to (N−1) are arranged from a bottom row to a head row in a second column in the order in which they are numbered. M is a number greater than 0 and less than N−2. The mesh-type interconnection structure connects laterally-adjacent cache slices in the first and second columns. In an exemplary embodiment, the core and cache slice with the same number are connected to each other; and the mesh-type interconnection structure includes pairs of first-type communication lines between adjacent cache slices in the first column for bidirectional communication between the adjacent cache slices in the first column, pairs of first-type communication lines between adjacent cache slices in th