EP-4738122-A1 - MEMORY COPY

EP4738122A1EP 4738122 A1EP4738122 A1EP 4738122A1EP-4738122-A1

Abstract

Apparatuses, systems, and techniques to perform one or more memory copy operatons via a single application programming interface. In at least one embodiment, said memory copy oeprations are performed with a single set of startup and/or shutdown operations between them. In at least one embodiment, processors comprising one or more circuits to perform an application programming interface (API) to cause information to be copied from two or more first storage locations to two or more second storage locations based, at least in part, on one or more parameters of the APIs to indicate the two or more first storage locations and the two or more second storage locations.

Inventors

RAMESH, Fnu Vishnuswaroop
KINI, Vivek Belve
Chauhan, Jitendra Pratap Singh
FOOTE, Andrew Robert
IVERSON, JEREMY
SHAH, Amber
Bujak, Jakub
GOWDA, Harsha Banuli Nanje
PAPADOPOULOU, Misel Myrto

Assignees

Nvidia Corporation

Dates

Publication Date: 20260506
Application Date: 20251029

Claims (15)

A processor comprising: one or more circuits to perform an application programming interface (API) to cause information to be copied from two or more first storage locations to two or more second storage locations based, at least in part, on one or more parameters of the APIs to indicate the two or more first storage locations and the two or more second storage locations.
The processor of claim 1, wherein the two or more first storage locations are noncontiguous or the two or more second storage locations are noncontiguous.
The processor of claim 1 or 2, wherein the one or more parameters indicate a size of the information to be copied.
The processor of any preceding claim, wherein the one or more parameters indicate an order in which the information is to be copied.
The processor of any preceding claim, wherein the two or more second storage locations comprise one or more memory addresses of a size based, at least in part, on a size of the two or more first storage locations.
The processor of any preceding claim, wherein the two or more first storage locations and the two or more second storage locations are associated with different storage devices.
The processor of any preceding claim, wherein the information is to be used to perform one or more instructions based, at least in part, on accessing the two or more second storage locations.
A system comprising: one or more circuits to perform an application programming interface (API) to cause information to be copied from two or more first storage locations to two or more second storage locations based, at least in part, on one or more parameters of the APIs to indicate the two or more first storage locations and the two or more second storage locations.
The system of claim 8, wherein the two or more first storage locations are noncontiguous or the two or more second storage locations are noncontiguous.
A method comprising: causing performance of an application programming interface (API) to cause information to be copied from two or more first storage locations to two or more second storage locations based, at least in part, on one or more parameters of the APIs to indicate the two or more first storage locations and the two or more second storage locations.
The method of claim 10, wherein the two or more first storage locations are noncontiguous or the two or more second storage locations are noncontiguous.
The method of claim 10 or 11, wherein the one or more parameters indicate a size of the information to be copied.
The method of claim 10, 11, or 12, wherein the one or more parameters indicate an order in which the information is to be copied.
The method of any of claims 10-13, wherein the two or more second storage locations comprise one or more memory addresses of a size based, at least in part, on a size of the two or more first storage locations.
The method of any of claims 10-14, wherein the information is to be used to perform one or more instructions based, at least in part, on accessing the two or more second storage locations.

Description

TECHNICAL FIELD At least one embodiment pertains to batching of memory copy application programming interface (API) operations. At least one embodiment pertains to eliminating need for multiple individual memory copy API operations to prevent startup and/or shutdown operations from performing repeatedly. At least one embodiment pertains to perform an application programming interface (API) to cause information to be copied from two or more first storage locations to two or more second storage locations based, at least in part, on one or more parameters of the APIs to indicate the two or more first storage locations and the two or more second storage locations. BACKGROUND Memory copy APIs are performed for memory stored in non-contiguous memory selections, or to be copied to non-contiguous memory selections. Such APIs perform startup operations and shutdown operations repeatedly for each time APIs are performed. Methods to perform multiple memory copy operations can be improved. SUMMARY The invention is defined by the claims. In order to illustrate the invention, aspects and embodiments which may or may not fall within the scope of the claims are described herein. Apparatuses, systems, and techniques to perform one or more memory copy operations via a single application programming interface are described. In at least one embodiment, said memory copy operations are performed with a single set of startup and/or shutdown operations between them. In at least one embodiment, processors comprising one or more circuits to perform an application programming interface (API) to cause information to be copied from two or more first storage locations to two or more second storage locations based, at least in part, on one or more parameters of the APIs to indicate the two or more first storage locations and the two or more second storage locations. Any feature of one aspect or embodiment may be applied to other aspects or embodiments, in any appropriate combination. In particular, any feature of a method aspect or embodiment may be applied to an apparatus aspect or embodiment, and vice versa. BRIEF DESCRIPTION OF DRAWINGS FIG. 1 illustrates an example batched memory copy operation, in accordance with at least one embodiment;FIG. 2 illustrates an example input requirements for a batched memory copy operation, in accordance with at least one embodiment;FIG. 3 illustrates an example variations of differentiation for batched memory copy operations, in accordance with at least one embodiment;FIG. 4 illustrates an example batched memory copy operation using separate engines, in accordance with at least one embodiment;FIG. 5 illustrates an example memory copy operation performed across a network, in accordance with at least one embodiment;FIG. 6 illustrates an example batched memory copy process, in accordance with at least one embodiment;FIG. 7 illustrates an example processor, in accordance with at least one embodiment;FIG. 8 illustrates an example API call to perform a batched memory copy operation, in accordance with at least one embodiment;FIG. 9 illustrates an example data center system, in accordance with at least one embodiment;FIG. 10 illustrates a system-on-a-chip (SOC), in accordance with at least one embodiment;FIG. 11A illustrates a parallel processor, in accordance with at least one embodiment;FIG. 11B illustrates a processing cluster, in accordance with at least one embodiment;FIG. 11C illustrates a graphics multiprocessor, in accordance with at least one embodiment;FIG. 12 illustrates an accelerator processor, in accordance with at least one embodiment;FIG. 13A illustrate a central processing unit, in accordance with at least one embodiment;FIG. 13B illustrates a core of central processing unit in FIG. 13A, in accordance with at least one embodiment;FIG. 14 illustrates another accelerator processor, in accordance with at least one embodiment;FIG. 15 illustrates a neuromorphic processor, in accordance with at least one embodiment;FIG. 16 illustrates a supercomputer, in accordance with at least one embodiment;FIG. 17 illustrates another accelerator processor, in accordance with at least one embodiment;FIG. 18 illustrates another processor, in accordance with at least one embodiment;FIG. 19 illustrates another accelerator processor, in accordance with at least one embodiment;FIG. 20 illustrates a tensor processing unit, in accordance with at least one embodiment;FIG. 21 illustrates a RISC-V-compatible processor, in accordance with at least one embodiment;FIGS. 22A and 22B illustrate a language processing unit, in accordance with at least one embodiment;FIG. 23 illustrates a software stack of a programming platform, in accordance with at least one embodiment;FIG. 24 illustrates software that is supported by a programming platform, in accordance with at least one embodiment;FIG. 25 illustrates compiling code to execute on programming platforms of FIG. 24, in accordance with at least one embodiment;FIG. 26 illustrate