US-12626163-B2 - Model parameter sharing between inference application instances in processing unit of information processing system

US12626163B2US 12626163 B2US12626163 B2US 12626163B2US-12626163-B2

Abstract

Techniques for model parameter sharing between inference model instances are disclosed. For example, a method performed by a first process obtains a representation of an inference model for which multiple instances of the inference model are to be executed on at least one processing unit. The method determines, from the representation of the inference model, one or more model parameters that are a pre-trained type of model parameter. The method allocates a shared memory for storing the one or more model parameters that are the pre-trained type of model parameter. The method stores the one or more model parameters that are the pre-trained type of model parameter in the shared memory for access by the multiple instances of the inference model to be executed on the at least one processing unit.

Inventors

Jinpeng LIU
Danqing Sha
Zhen Jia
Christopher S. Maclellan

Assignees

EMC IP Holding Company LLC

Dates

Publication Date: 20260512
Application Date: 20210330

Claims (20)

1 . An apparatus comprising: at least one memory storing program code; and at least one processing platform comprising at least one processor coupled to the at least one memory, the at least one processing platform, when executing the program code, is configured to: obtain, via a first process, an inference model for which multiple instances of the inference model are to be executed on at least one processing unit; differentiate, using a host manager, from the inference model, via the first process, one or more model parameters that are a pre-trained type of model parameter, the differentiating comprising: parsing the inference model; generating a computation table for each computation of a plurality of computations defined in the inference model, the computation table comprising a plurality of computation nodes, the computation nodes comprising computation node numbers; generating a parameter table for each parameter used by the plurality of computations, the parameter table comprising a plurality of parameter nodes, the parameter nodes comprising parameter node numbers, wherein each computation of the plurality of computations is indicated by a computation node number of the computation table and one or more parameter node numbers of the parameter table; generating a memory model and associating the computation table and the parameter table using the generated memory model to determine one or more parameter node numbers of the parameter table associated with each computation node number of the plurality of computations, and wherein one or more parameter nodes of the plurality of parameter nodes are associated with two or more computation nodes of the plurality of computation nodes; inferring a shape of each parameter node of each computation; and identifying the one or more model parameters that are the pre-trained type of model parameter; determine, using the host manager, based on the inferred shape of the parameter nodes that are identified as the one or more model parameters that are the pre-trained type of model parameter, an amount of memory to store the one or more model parameters that are the pre-trained type of model parameter in a shared memory storage space; allocate, via the first process, a shared memory of the shared memory storage space corresponding to the amount of memory for storing the one or more model parameters that are the pre-trained type of model parameter; extract, using the host manager, the one or more model parameters that are the pre-trained type of model parameter from the inference model; store, via the first process, the one or more model parameters that are the pre-trained type of model parameter in the shared memory for access by the multiple instances of the inference model to be executed on the at least one processing unit; and permit access to the shared memory of the shared memory storage space created by the first process to allow the multiple instances of the inference model to utilize the one or more model parameters that are the pre-trained type of model parameter in the shared memory of the shared memory storage space while executing the inference model.
2 . The apparatus of claim 1 , wherein the at least one processing platform, when executing the program code, is further configured to: obtain, via a second process associated with a given one of the multiple instances of the inference model, the inference model; determine from the inference model, via the second process, one or more model parameters that are not the pre-trained type of model parameter; allocate, via the second process, a local memory for storing the one or more model parameters that are not the pre-trained type of model parameter; and store, via the second process, the one or more model parameters that are not the pre-trained type of model parameter in the local memory for the given one of the multiple instances of the inference model.
3 . The apparatus of claim 2 , wherein the at least one processing platform, when executing the program code, is further configured to adjust one or more pointers to point to the local memory.
4 . The apparatus of claim 2 , wherein the at least one processing platform, when executing the program code, is further configured to adjust pointers to point to the shared memory of the shared memory storage space.
5 . The apparatus of claim 2 , wherein the one or more model parameters that are the pre-trained type of model parameter comprise one or more immutable model parameters, and the one or more model parameters that are not the pre-trained type of model parameter comprise one or more mutable model parameters.
6 . The apparatus of claim 2 , wherein the first process comprises a host process and the second process comprises a guest process.
7 . The apparatus of claim 1 , wherein the at least one processing unit comprises at least one graphic processing unit.
8 . The apparatus of claim 7 , wherein the at least one graphic processing unit is part of an edge computing network.
9 . The apparatus of claim 1 , wherein each of the multiple instances of the inference model are configured to receive and process data sets received from multiple users.
10 . The apparatus of claim 1 , wherein the inference model comprises a deep learning model.
11 . A method, comprising: obtaining, via a first process, an inference model for which multiple instances of the inference model are to be executed on at least one processing unit; differentiating, using a host manager, from the inference model, via the first process, one or more model parameters that are a pre-trained type of model parameter, the differentiating comprising: parsing the inference model; generating a computation table for each computation of a plurality of computations defined in the inference model, the computation table comprising a plurality of computation nodes, the computation nodes comprising computation node numbers; generating a parameter table for each parameter used by the plurality of computations, the parameter table comprising a plurality of parameter nodes, the parameter nodes comprising parameter node numbers, wherein each computation of the plurality of computations is indicated by a computation node number of the computation table and one or more parameter node numbers of the parameter table; generating a memory model and associating the computation table and the parameter table using the generated memory model to determine one or more parameter node numbers of the parameter table associated with each computation node number of the plurality of computations, and wherein one or more parameter nodes of the plurality of parameter nodes are associated with two or more computation nodes of the plurality of computation nodes; inferring a shape of each parameter node of each computation; and identifying the one or more model parameters that are the pre-trained type of model parameter; determining, using the host manager, based on the inferred shape of the parameter nodes that are identified as the one or more model parameters that are the pre-trained type of model parameter, an amount of memory to store the one or more model parameters that are the pre-trained type of model parameter in a shared memory storage space; allocating, via the first process, a shared memory of the shared memory storage space corresponding to the amount of memory for storing the one or more model parameters that are the pre-trained type of model parameter; extracting, using the host manager, the one or more model parameters that are the pre-trained type of model parameter from the inference model; storing, via the first process, the one or more model parameters that are the pre-trained type of model parameter in the shared memory for access by the multiple instances of the inference model to be executed on the at least one processing unit; and permitting access to the shared memory of the shared memory storage space created by the first process to allow the multiple instances of the inference model to utilize the one or more model parameters that are the pre-trained type of model parameter in the shared memory of the shared memory storage space while executing the inference model.
12 . The method of claim 11 , further comprising: obtaining, via a second process associated with a given one of the multiple instances of the inference model, the inference model; determining from the inference model, via the second process, one or more model parameters that are not the pre-trained type of model parameter; allocating, via the second process, a local memory for storing the one or more model parameters that are not the pre-trained type of model parameter; and storing, via the second process, the one or more model parameters that are not the pre-trained type of model parameter in the local memory for the given one of the multiple instances of the inference model.
13 . The method of claim 12 , further comprising adjusting one or more pointers to point to the local memory.
14 . The method of claim 12 , further comprising adjusting pointers to point to the shared memory of the shared memory storage space.
15 . A computer program product stored on a non-transitory computer-readable medium and comprising machine executable instructions, the machine executable instructions, when executed, causing a processing platform to: obtain, via a first process, an inference model for which multiple instances of the inference model are to be executed on at least one processing unit; differentiate, using a host manager, from the inference model, via the first process, one or more model parameters that are a pre-trained type of model parameter, the differentiating comprising: parsing the inference model; generating a computation table for each computation of a plurality of computations defined in the inference model, the computation table comprising a plurality of computation nodes, the computation nodes comprising computation node numbers; generating a parameter table for each parameter used by the plurality of computations, the parameter table comprising a plurality of parameter nodes, the parameter nodes comprising parameter node numbers, wherein each computation of the plurality of computations is indicated by a computation node number of the computation table and one or more parameter node numbers of the parameter table; generating a memory model and associating the computation table and the parameter table using the generated memory model to determine one or more parameter node numbers of the parameter table associated with each computation node number of the plurality of computations, and wherein one or more parameter nodes of the plurality of parameter nodes are associated with two or more computation nodes of the plurality of computation nodes; inferring a shape of each parameter node of each computation; and identifying the one or more model parameters that are the pre-trained type of model parameter; determine, using the host manager, based on the inferred shape of the parameter nodes that are identified as the one or more model parameters that are the pre-trained type of model parameter, an amount of memory to store the one or more model parameters that are the pre-trained type of model parameter in a shared memory storage space; allocate, via the first process, a shared memory of the shared memory storage space corresponding to the amount of memory for storing the one or more model parameters that are the pre-trained type of model parameter; extract, using the host manager, the one or more model parameters that are the pre-trained type of model parameter from the inference model; store, via the first process, the one or more model parameters that are the pre-trained type of model parameter in the shared memory for access by the multiple instances of the inference model to be executed on the at least one processing unit; and permit access to the shared memory of the shared memory storage space created by the first process to allow the multiple instances of the inference model to utilize the one or more model parameters that are the pre-trained type of model parameter in the shared memory of the shared memory storage space while executing the inference model.
16 . The computer program product of claim 15 , wherein the processing platform is further caused to: obtain, via a second process associated with a given one of the multiple instances of the inference model, the inference model; determine from the inference model, via the second process, one or more model parameters that are not the pre-trained type of model parameter; allocate, via the second process, a local memory for storing the one or more model parameters that are not the pre-trained type of model parameter; and store, via the second process, the one or more model parameters that are not the pre-trained type of model parameter in the local memory for the given one of the multiple instances of the inference model.
17 . The computer program product of claim 16 , wherein the one or more model parameters that are the pre-trained type of model parameter comprise one or more immutable model parameters, and the one or more model parameters that are not the pre-trained type of model parameter comprise one or more mutable model parameters.
18 . The method of claim 12 , wherein the one or more model parameters that are the pre-trained type of model parameter comprise one or more immutable model parameters, and the one or more model parameters that are not the pre-trained type of model parameter comprise one or more mutable model parameters.
19 . The computer program product of claim 16 , wherein the processing platform is further caused to adjust one or more pointers to point to the local memory.
20 . The computer program product of claim 16 , wherein the processing platform is further caused to adjust pointers to point to the shared memory of the shared memory storage space.

Description

FIELD The field relates generally to information processing systems, and more particularly to a artificial intelligence (AI) model management implemented in an information processing system. BACKGROUND In recent years, with the progress of artificial intelligence (AI) technology, application programs that employ AI models (such as, but not limited to, machine learning (ML) applications, deep learning (DL) applications, and data mining (DM) applications) have enabled significant development in many fields. Typically, an AI model is initially trained, and an AI inference model (e.g., inference program or inference application) is generated from the trained model. The inference model governs how to make predictions on new data. In some scenarios, multiple instances of the same inference application can be deployed in a computing node to satisfy real-time requirement of the inference application. SUMMARY Embodiments provide an artificial intelligence model framework with model parameter sharing between inference application instances in an information processing system such as, but not limited to, an edge computing network. For example, in an illustrative embodiment, a method for model parameter sharing between inference model instances performed by a first process comprises the following steps. The method obtains a representation of an inference model for which multiple instances of the inference model are to be executed on at least one processing unit. The method determines, from the representation of the inference model, one or more model parameters that are a pre-trained type of model parameter. The method allocates a shared memory for storing the one or more model parameters that are the pre-trained type of model parameter. The method stores the one or more model parameters that are the pre-trained type of model parameter in the shared memory for access by the multiple instances of the inference model to be executed on the at least one processing unit. In a further illustrative embodiment, the method may further comprise: obtaining, via a second process associated with a given one of the multiple instances of the inference model, the representation of the inference model; determining from the representation of the inference model, via the second process, one or more model parameters that are not the pre-trained type of model parameter; allocating, via the second process, a local memory for storing the one or more model parameters that are not the pre-trained type of model parameter; and storing, via the second process, the one or more model parameters that are not the pre-trained type of model parameter in the local memory for the given one of the multiple instances of the inference model. In yet another illustrative embodiment, the method may further comprise: determining from the representation of the inference model, via the second process, one or more model parameters that are the pre-trained type of model parameter; and accessing, via the second process, the shared memory created by the first process and obtain the one or more model parameters that are the pre-trained type of model parameter. Advantageously, illustrative embodiments differentiate model parameters that are pre-trained (and thus are considered immutable) from model parameters that are not pre-trained (and thus are considered mutable). While each of the multiple inference model instances maintain their own local memory for the mutable parameters, the multiple inference model instances access the same shared memory for the immutable parameters. These and other illustrative embodiments include, without limitation, apparatus, methods and computer program products comprising processor-readable storage media. BRIEF DESCRIPTION OF THE DRAWINGS FIG. 1 illustrates a multiple instance deployment of a data parallelism inference application in a processing unit of an edge computing network with which one or more illustrative embodiments can be implemented. FIG. 2 illustrates memory and computing requirements in a processing unit of an edge computing network with which one or more illustrative embodiments can be implemented. FIG. 3 illustrates a memory management process in an inference framework with which one or more illustrative embodiments can be implemented. FIG. 4 illustrates a computation graph associated with a memory management process in an inference framework with which one or more illustrative embodiments can be implemented. FIG. 5 illustrates an inference result of a convolution computation associated with a memory management process in an inference framework with which one or more illustrative embodiments can be implemented. FIG. 6 illustrates computation and parameter pointer tables associated with a memory management process in an inference framework with which one or more illustrative embodiments can be implemented. FIG. 7 illustrates allocation of memory for model parameters associated with a memory management process in an inferen