KR-102963294-B1 - TECHNOLOGIES FOR PROVIDING A SCALABLE ARCHITECTURE FOR PERFORMING COMPUTE OPERATIONS IN MEMORY

KR102963294B1KR 102963294 B1KR102963294 B1KR 102963294B1KR-102963294-B1

Abstract

Technologies providing a scalable architecture for efficiently performing computational operations in memory include a memory having a media access circuit coupled to a memory medium. The media access circuit accesses data from the memory medium to perform a requested operation, simultaneously performs the requested operation on the accessed data using each of a plurality of computational logic units included in the media access circuit, and writes the result data generated from the execution of the requested operation to the memory medium.

Inventors

토미시마, 시게키
스리니바산, 스리칸쓰
차우한, 체탄
선다람, 라제쉬
칸, 재와드 비.

Assignees

인텔 코포레이션

Dates

Publication Date: 20260513
Application Date: 20200227
Priority Date: 20190329

Claims (20)

As memory, It includes a media access circuitry coupled to a memory media, and the media access circuitry is: Using one of the computational logic units included in the media access circuit, access data from a partition of the memory medium to perform a requested operation; Using one of the plurality of calculation logic units, the accessed data is broadcast to other calculation logic units among the plurality of calculation logic units; Using each of the plurality of computational logic units included in the media access circuit, the requested operation is performed simultaneously on the accessed data; A memory that writes result data generated from the execution of the above requested operation to the memory medium.
In claim 1, the memory, wherein performing the requested operation using each of a plurality of computational logic units includes performing the requested operation using a plurality of computational logic units associated with different partitions of the memory medium.
A memory according to claim 1, wherein performing the requested operation using each of a plurality of computational logic units includes performing the requested operation using each of a plurality of computational logic units located on a die or package.
In claim 1, the memory, wherein performing the requested operation using each of a plurality of computational logic units includes performing the requested operation using each of a plurality of computational logic units located in a plurality of dies or packages of a dual in-line memory module.
In claim 1, the memory, wherein performing the requested operation using each of a plurality of computational logic units includes performing the requested operation using each of a plurality of computational logic units distributed across a plurality of dual in-line memory modules.
delete
A memory according to claim 1, wherein accessing data from a partition of the memory medium comprises reading the data into a scratch pad associated with the partition.
In claim 7, reading the data into the scratch pad includes reading the data into the register file of the media access circuit or into static random access memory.
In claim 1, accessing data from a partition of the memory medium includes accessing data from a partition of the memory medium having a three-dimensional cross-point architecture.
In claim 1, the memory, wherein performing the requested operation includes performing a tensor operation.
In claim 1, the memory, wherein performing the requested operation includes performing a matrix multiplication operation.
In claim 1, the memory, wherein performing the requested operation includes performing a bitwise operation on the accessed data.
In claim 1, the media access circuit further provides data indicating the completion of the requested operation to a component of the computing device, a memory.
As a method, A step of accessing data from a partition of a memory medium coupled to the media access circuit to perform a requested operation by one of the computational logic units within the media access circuit included in the memory; A step of broadcasting the accessed data to other calculation logic units among the plurality of calculation logic units by one calculation logic unit among the plurality of calculation logic units; A step of simultaneously performing the requested operation on the accessed data by each of a plurality of computational logic units within the media access circuit; and A step of writing result data generated from the execution of the requested operation to the memory medium by the above-described media access circuit. A method including
In claim 14, the step of performing the requested operation by each of the plurality of computational logic units comprises the step of performing the requested operation by the plurality of computational logic units associated with different partitions of the memory medium.
A method according to claim 14, wherein the step of performing the requested operation by each of a plurality of computational logic units comprises the step of performing the requested operation by each of a plurality of computational logic units located in a die or package.
A method according to claim 14, wherein the step of performing the requested operation by each of a plurality of computational logic units comprises the step of performing the requested operation by each of a plurality of computational logic units located in a plurality of dies or packages of a dual in-line memory module.
In claim 14, the step of performing the requested operation by each of the plurality of computational logic units comprises the step of performing the requested operation by each of the plurality of computational logic units distributed across the plurality of dual in-line memory modules.
delete
One or more machine-readable storage media comprising a plurality of stored instructions, In response to the execution of the above plurality of instructions, a media access circuit included in the memory medium: Using one of the computational logic units included in the media access circuit, data from a partition of the memory medium is accessed to perform a requested operation; Using one of the plurality of calculation logic units, the accessed data is broadcast to other calculation logic units among the plurality of calculation logic units; By using each of the plurality of computational logic units included in the media access circuit, the requested operation is performed simultaneously on the accessed data; One or more machine-readable storage media that write result data generated from the execution of the above-mentioned requested operation to the memory medium.

Description

Technologies for providing a scalable architecture for performing computational operations in memory Artificial intelligence applications, such as those that train neural networks and/or utilize them for inference (e.g., identifying objects in images, performing speech recognition, etc.), typically utilize a relatively large amount of compute capacity to perform tensor operations (e.g., matrix calculations such as matrix multiplication) on matrix data. On some computing devices, computational operations to support artificial intelligence applications can be offloaded from general-purpose processors to accelerator devices such as graphics processing units (GPUs). However, while GPUs can perform tensor operations faster than processors, the efficiency with which computing devices can perform operations (e.g., energy consumption and speed) is still hampered by the fact that data to be operated on (e.g., matrix data) resides in memory and is transmitted via a bus from memory to the device (e.g., GPU) that performs the computational operations, thereby consuming time and energy. As the complexity and volume of data to be processed (for example, by increasingly complex artificial intelligence applications) increase, the inefficiencies of existing systems in terms of energy consumption and speed may increase in response. Meanwhile, the background technology of the present invention includes Korean Patent Publication No. 10-1858597, Korean Patent Publication No. 10-1647907, U.S. Patent Publication No. 2015/0199266, and Korean Patent Publication No. 10-2017-0102418. The concepts described herein are illustrated in the accompanying drawings as examples, not as limitations. For the sake of simplicity and clarity of illustration, elements illustrated in the drawings are not necessarily depicted in a fixed proportion. Where deemed appropriate, reference labels have been repeated to indicate corresponding or similar elements between the drawings. FIG. 1 is a simplified drawing of at least one embodiment of a computing device having a scalable architecture for efficiently performing computation operations in memory. FIG. 2 is a simplified drawing of at least one embodiment of a memory medium included in the computing device of FIG. 1. FIG. 3 is a simplified drawing of at least one embodiment of components of a memory media access circuit and partitions of a memory medium included in the computing device of FIG. 1. FIG. 4 is a simplified drawing of at least one embodiment of a set of dual in-line memory modules that may be included in the computing device of FIG. 1. FIG. 5 is a simplified diagram of at least one embodiment of a tensor operation that can be performed in the memory of the computing device of FIG. 1. FIGS. 6 to 8 are simplified drawings of at least one embodiment of a method for performing efficient computation operations in memory that can be performed by the computing device of FIG. 1. FIG. 9 is a simplified drawing of two dies from the memory of the computing device of FIG. 1 combined into a package. While various modifications and alternative forms of the concepts of the present disclosure are possible, specific embodiments thereof are illustrated by way of example in the drawings and will be described in detail in this specification. However, it should be understood that there is no intention to limit the concepts of the present disclosure to the specific forms disclosed, but rather to cover all modifications, equivalents, and alternatives according to the present disclosure and the appended claims. References to "one embodiment," "an embodiment," "an exemplary embodiment," etc., in this specification indicate that while the described embodiment may include a specific feature, structure, or characteristic, not all embodiments may or may not necessarily include that specific feature, structure, or characteristic. Furthermore, such phrases do not necessarily refer to the same embodiment. In addition, when a specific feature, structure, or characteristic is described in relation to an embodiment, it is suggested that achieving such feature, structure, or characteristic in relation to other embodiments, regardless of whether it is explicitly described, is within the knowledge of a person skilled in the art. Additionally, it will be well known that items listed in the form of "at least one of A, B, and C" may mean (A); (B); (C); (A and B); (A and C); (B and C); or (A, B, and C). Similarly, items listed in the form of "at least one of A, B, or C" may mean (A); (B); (C); (A and B); (A and C); (B and C); or (A, B, and C). The disclosed embodiments may be implemented in some cases as hardware, firmware, software, or any combination thereof. The disclosed embodiments may also be implemented as instructions contained in or stored on a transient or non-transient machine-readable (e.g., computer-readable) storage medium that can be read and executed by one or more processors. A machine-readable storage medium may be