US-12619548-B2 - Apparatus and method for molecular dynamics simulation

US12619548B2US 12619548 B2US12619548 B2US 12619548B2US-12619548-B2

Abstract

Disclosed is an apparatus and method. The method includes generating dispatchable streams and binding the dispatchable streams one-to-one to cache slices, where the cache slices are pre-partitioned from an accelerated cache, and, for each of dispatchable streams binding a dispatchable kernel function, determined for a corresponding dispatchable stream, to the corresponding dispatchable stream, for a first cache slice, of the cache slices, first duplicating the dispatchable kernel function to the first cache slice and starting the first duplicated dispatchable kernel function with respect to the first cache slice, and for a second cache slice, of the cache slices, second duplicating the dispatchable kernel function to the second cache slice and starting the second duplicated dispatchable kernel function with respect to the second cache slice, wherein the starting of the first duplicated dispatchable kernel function is performed asynchronously with the starting of the second duplicated dispatchable kernel function.

Inventors

Zhen Zhang
Jiali PANG
Xiaohui SU
Lin Chen
VASYLTSOV IHOR

Assignees

SAMSUNG ELECTRONICS CO., LTD.

Dates

Publication Date: 20260505
Application Date: 20241112
Priority Date: 20231113

Claims (20)

1 . A processor-implemented method, the method comprising: generating a plurality of dispatchable streams and binding the plurality of dispatchable streams one-to-one to a plurality of cache slices, where the plurality of cache slices are pre-partitioned from an accelerated cache; and binding a respective subset of a plurality of dispatchable kernel functions to each respective dispatchable stream of the plurality of dispatchable streams, and to the cache slice of the plurality of cache slices which is bound to the respective dispatchable stream; for a first cache slice, of the plurality of cache slices, first duplicating a respective bound dispatchable kernel function bound to the first cache slice as a first duplicated bound dispatchable kernel function and starting the first duplicated bound dispatchable kernel function with respect to the first cache slice; and for a second cache slice, of the plurality of cache slices, second duplicating a respective bound dispatchable kernel function bound to the second cache slice as a second duplicated bound dispatchable kernel function and starting the second duplicated bound dispatchable kernel function with respect to the second cache slice, wherein the starting of the first duplicated bound dispatchable kernel function is performed asynchronously with the starting of the second duplicated bound dispatchable kernel function.
2 . The method of claim 1 , wherein the accelerated cache includes molecular dynamics (MD) data, and the plurality of dispatchable kernel functions are kernel functions of a MD simulation.
3 . The method of claim 2 , further comprising: determining a size of each dispatchable kernel function among the plurality of kernel functions; and performing the pre-partitioning of the accelerated cache into the plurality of cache slices based on a determined largest size among the determined sizes.
4 . The method of claim 3 , wherein a total number of the plurality of dispatchable streams is same as a total number of the plurality of cache slices.
5 . The method of claim 1 , wherein the binding a respective subset of a plurality of dispatchable kernel functions to a respective dispatchable stream of the plurality of dispatchable streams comprises: collecting all dispatchable kernel functions determined for the respective dispatchable stream and sequentially arranging the collected all dispatchable kernel functions in an execution order; and binding each of the collected all dispatchable kernel functions to the respective dispatchable stream sequentially according to the sequential arranging.
6 . The method of claim 1 , further comprising: generating a first event object for the first cache slice indicating whether the first cache slice is occupied by the first duplicated bound dispatchable kernel function; and generating a second event object for the second cache slice indicating whether the second cache slice is occupied by the second duplicated bound dispatchable kernel function.
7 . The method of claim 6 , wherein the binding a respective subset of a plurality of dispatchable kernel functions to a respective dispatchable stream of the plurality of dispatchable streams comprises binding each of a plurality of dispatchable kernel functions, determined for the respective dispatchable stream, to the respective dispatchable stream according to sequential execution order of the plurality of dispatchable kernel functions, and wherein the method further comprises, in response to the first cache slice being determined unoccupied based on the first event object indicating that the first cache slice is not occupied by the first duplicated bound dispatchable kernel function, duplicating, to the first cache slice, an unexecuted dispatchable kernel function that is directly subsequent to the first duplicated bound dispatchable kernel function in the sequential execution order of the plurality of dispatchable kernel functions.
8 . The method of claim 7 , wherein the unexecuted dispatchable kernel function that is directly subsequent to the first duplicated bound dispatchable kernel function in the sequential execution order of the plurality of dispatchable kernel function is determined at a very front of all remaining unexecuted dispatchable kernel functions, among the plurality of dispatchable kernel function, in a pageable memory, and wherein the duplicating of the unexecuted dispatchable kernel function includes copying the unexecuted dispatchable kernel function from the pageable memory to a page-pinned memory, and duplicating, to the cache slice, the copied unexecuted dispatchable kernel function.
9 . The method of claim 7 , further comprising: marking the first event object of the first cache slice as being occupied in response to the duplicating, to the first cache slice, of the unexecuted dispatchable kernel function; and marking the first event object of the first cache slice as unoccupied in response to completion of an execution of the first duplicated bound dispatchable kernel function in response to the starting of the first duplicated bound dispatchable kernel function.
10 . The method of claim 1 , wherein the starting of the second duplicated bound dispatchable kernel function is performed while the first duplicated bound dispatchable kernel function is executing.
11 . A non-transitory computer-readable storage medium storing instructions that, when executed by a processor, cause the processor to perform the method of claim 1 .
12 . An apparatus, the apparatus comprising: one or more processors configured to: generate a plurality of dispatchable streams and bind the plurality of dispatchable streams one-to-one to a plurality of cache slices, where the plurality of cache slices are pre-partitioned from an accelerated cache; and bind a respective subset of a plurality of dispatchable kernel functions to each respective dispatchable stream of the plurality of dispatchable streams, and to the cache slice of the plurality of cache slices which is bound to the respective dispatchable stream; for a first cache slice, of the plurality of cache slices, perform a first duplicating of a respective bound dispatchable kernel function bound to the first cache slice as a first duplicated bound dispatchable kernel function and a starting of the first duplicated bound dispatchable kernel function with respect to the first cache slice; and for a second cache slice, of the plurality of cache slices, perform a second duplicating of a respective bound dispatchable kernel function bound to the second cache slice as a second duplicated bound dispatchable kernel function and a starting of the second duplicated bound dispatchable kernel function with respect to the second cache slice, wherein the starting of the first duplicated bound dispatchable kernel function is performed asynchronously with the starting of the second duplicated bound dispatchable kernel function.
13 . The apparatus of claim 12 , wherein the accelerated cache includes molecular dynamics (MD) data, and the plurality of dispatchable kernel functions are kernel functions of a MD simulation.
14 . The apparatus of claim 13 , wherein the one or more processors are further configured to: determine a size of each dispatchable kernel function among the plurality of kernel functions; and perform the pre-partitioning of the accelerated cache into the plurality of cache slices based on a determined largest size among the determined sizes.
15 . The apparatus of claim 14 , wherein a total number of the plurality of dispatchable streams is same as a total number of the plurality of cache slices.
16 . The apparatus of claim 12 , wherein, for the binding a respective subset of a plurality of dispatchable kernel functions to a respective dispatchable stream of the plurality of dispatchable streams, the one or more processors are configured to: collect all dispatchable kernel functions determined for the respective dispatchable stream and sequentially arrange the collected all dispatchable kernel functions in an execution order; and bind each of the collected all dispatchable kernel functions to the respective dispatchable stream sequentially according to the sequential arranging.
17 . The apparatus of claim 12 , wherein the one or more processors are further configured to: generate a first event object for the first cache slice indicating whether the first cache slice is occupied by the first duplicated bound dispatchable kernel function; and generate a second event object for the second cache slice indicating whether the second cache slice is occupied by the second duplicated bound dispatchable kernel function.
18 . The apparatus of claim 17 , wherein, for the binding a respective subset of a plurality of dispatchable kernel functions to a respective dispatchable stream of the plurality of dispatchable streams, the one or more processors are configured to bind each of a plurality of dispatchable kernel functions, determined for the respective dispatchable stream, to the respective dispatchable stream according to sequential execution order of the plurality of dispatchable kernel functions, and wherein the one or more processors are further configured to, in response to the first cache slice being determined unoccupied based on the first event object indicating that the first cache slice is not occupied by the first duplicated bound dispatchable kernel function, duplicate, to the first cache slice, an unexecuted dispatchable kernel function that is directly subsequent to the first duplicated bound dispatchable kernel function in the sequential execution order of the plurality of dispatchable kernel functions.
19 . The apparatus of claim 18 , wherein the unexecuted dispatchable kernel function that is directly subsequent to the first duplicated bound dispatchable kernel function in the sequential execution order of the plurality of dispatchable kernel function is determined at a very front of all remaining unexecuted dispatchable kernel functions, among the plurality of dispatchable kernel function, in a pageable memory, and wherein, for the duplicating of the unexecuted dispatchable kernel function, the one or more processors are configured to copy the unexecuted dispatchable kernel function from the pageable memory to a page-pinned memory, and duplicate, to the cache slice, the copied unexecuted dispatchable kernel function.
20 . The apparatus of claim 18 , wherein the one or more processors are further configured to: mark the first event object of the first cache slice as occupied in response to performance of the duplicating, to the first cache slice, of the unexecuted dispatchable kernel function; and mark the first event object of the first cache slice as unoccupied in response to completion of an execution of the first duplicated bound dispatchable kernel function in response to the starting of the first duplicated bound dispatchable kernel function.

Description

CROSS-REFERENCE TO RELATED APPLICATIONS This application claims the benefit under 35 USC § 119(a) of Chinese Patent Application No. 202311512383.6, filed on Nov. 13, 2023, in the China National Intellectual Property Administration, and Korean Patent Application No. 10-2024-0110414, filed on Aug. 19, 2024, in the Korean Intellectual Property Office, the entire disclosures of which are incorporated herein by reference for all purposes. BACKGROUND 1. Field The following description relates to an apparatus and method for molecular dynamics simulation. 2. Description of Related Art Molecular dynamics (MD) simulation is a is a computational numerical approach to simulate and study structures and properties of a molecular system, such as by solving the equation of motion of the molecular system, based on classical mechanics, quantum mechanics, and/or statistical mechanics, as non-limiting examples. MD simulation is used in various scientific and technological fields such as chemistry, chemical engineering, materials science, engineering, physics, and biomedicine. MD simulation may obtain a motion trajectory of an atom and may observe various details during a motion process of the atom, so MD simulation may be a powerful complement to previous theoretical and experiment approaches. SUMMARY This Summary is provided to introduce a selection of concepts in a simplified form that are further described below in the Detailed Description. This Summary is not intended to identify key features or essential features of the claimed subject matter, nor is it intended to be used as an aid in determining the scope of the claimed subject matter. In one general aspect, a processor-implemented method includes generating a plurality of dispatchable streams and binding the plurality of dispatchable streams one-to-one to a plurality of cache slices, where the plurality of cache slices are pre-partitioned from an accelerated cache, for each of the plurality of dispatchable streams binding a dispatchable kernel function, determined for a corresponding dispatchable stream, to the corresponding dispatchable stream, for a first cache slice, of the plurality of cache slices, first duplicating the bound dispatchable kernel function to the first cache slice and starting the first duplicated bound dispatchable kernel function with respect to the first cache slice, and for a second cache slice, of the plurality of cache slices, second duplicating the bound dispatchable kernel function to the second cache slice and starting the second duplicated bound dispatchable kernel function with respect to the second cache slice, wherein the starting of the first duplicated bound dispatchable kernel function is performed asynchronously with the starting of the second duplicated bound dispatchable kernel function. The accelerated cache may include molecular dynamics (MD) data, and the dispatchable kernel function may be a kernel function among a plurality of kernel functions of a MD simulation. The method may further include determining a size of each dispatchable kernel function among the plurality of kernel functions, and performing the pre-partitioning of the accelerated cache into the plurality of cache slices based on a determined largest size among the determined sizes. A total number of the plurality of dispatchable streams may be same as a total number of the plurality of cache slices. The binding of the dispatchable kernel function may include collecting all dispatchable kernel functions determined for the corresponding dispatchable stream and sequentially arranging the collected all dispatchable kernel functions in an execution order, and binding each of the collected all dispatchable kernel functions to the corresponding dispatchable stream sequentially according to the sequential arranging. The method may further include generating a first event object for the first cache slice indicating whether the first cache slice is occupied by the first duplicated bound dispatchable kernel function, and generating a second event object for the second cache slices indicating whether the second cache slice is occupied by the second duplicated bound dispatchable kernel function. The binding of the dispatchable kernel function may include binding each of a plurality of dispatchable kernel functions, determined for the corresponding dispatchable stream, to the corresponding dispatchable stream according to sequential execution order of the plurality of dispatchable kernel functions, and the method may further include, in response to the first cache slice being determined unoccupied based on the first event object indicating that the first cache slice is not occupied by the first duplicated bound dispatchable kernel function, duplicating, to the first cache slice, an unexecuted dispatchable kernel function that is directly subsequent to the first duplicated bound dispatchable kernel function in the sequential execution order of the plurality of dispatchable kern