EP-4742100-A1 - METHOD TO MAP A SPLIT BRANCH NAME

EP4742100A1EP 4742100 A1EP4742100 A1EP 4742100A1EP-4742100-A1

Abstract

In an example of a split inferencing method, a first endpoint obtains information describing a graph structure of a machine learning, ML, model with a plurality of nodes whose outputs are tensors, each tensor having an original branch name. The first endpoint obtains information indicating a split of the model into first and second sub-models. For at least one of the tensors that is an output of a node in the first sub-model, the first endpoint sends, to a second endpoint, information associating the original branch name of the first tensor with a replacement branch name. The first endpoint runs the first sub-model to obtain the first tensor, and it sends the first tensor to the second endpoint, the first tensor being identified by the replacement branch name. The second node may run the second sub-model using the first tensor as an input.

Inventors

Quinquis, Cyril
FILOCHE, THIERRY
ONNO, STEPHANE

Assignees

InterDigital CE Patent Holdings, SAS

Dates

Publication Date: 20260513
Application Date: 20241108

Claims (15)

A method comprising: obtaining information describing a graph structure of a machine learning, ML, model, wherein the ML model has a plurality of nodes, each node having an output of at least one tensor, each tensor having an original branch name; obtaining information indicating a split of the ML model into at least a first sub-model and a second sub-model, each of the nodes being in one of the sub-models; for at least a first one of the tensors, the first tensor being an output of a node in the first sub-model, sending, to a second endpoint, information associating the original branch name of the first tensor with a replacement branch name; running the first sub-model to obtain the first tensor; and sending the first tensor to the second endpoint, wherein the first tensor is identified by the replacement branch name.
An apparatus comprising one or more processors configured to perform at least: obtaining information describing a graph structure of a machine learning, ML, model, wherein the ML model has a plurality of nodes, each node having an output of at least one tensor, each tensor having an original branch name; obtaining information indicating a split of the ML model into at least a first sub-model and a second sub-model, each of the nodes being in one of the sub-models; for at least a first one of the tensors, the first tensor being an output of a node in the first sub-model, sending, to a second endpoint, information associating the original branch name of the first tensor with a replacement branch name; running the first sub-model to obtain the first tensor; and sending the first tensor to the second endpoint, wherein the first tensor is identified by the replacement branch name.
The method of claim 1 or the apparatus of claim 2, further comprising, at the first endpoint, assigning the replacement branch name to the first tensor by substituting a string or an index for the original branch name.
The method of claim 1 or the apparatus of claim 2, further comprising, at the first endpoint, assigning the replacement branch name to the first tensors by performing a hash of the original branch name.
The method of claim 1, or claims 3-4 as they depend from claim 1, or the apparatus of claim 2, or claims 3-4 as they depend from claim 2, wherein the information associating the original branch name of the first node with a replacement branch name comprises a table associating a plurality of original branch names with respective replacement branch names.
The method of claim 5 as it depends from claim 1, or the apparatus of claim 5 as it depends from claim 2, wherein the table further includes tensor metadata including a data type or a tensor dimension for each of the tensors.
The method of claim 5 as it depends from claim 1, or the apparatus of claim 5 as it depends from claim 2, wherein the table identifies a renaming method of each of the tensors.
The method of claim 5 as it depends from claim 1, or the apparatus of claim 5 as it depends from claim 2, wherein the table associates all tensors of the ML model with a respective replacement branch name.
A method comprising: obtaining information describing a graph structure of a machine learning, ML, model, wherein the ML model has a plurality of nodes, each node having an input of at least one tensor, each tensor having an original branch name; obtaining information indicating a split of the ML model into at least a first sub-model and a second sub-model, each of the nodes being in one of the sub-models; receiving, from a first endpoint, information associating at least one original branch name with a replacement branch name, the original branch name identifying an input to a node in the second sub-model; receiving a first tensor from the first endpoint, wherein the first tensor is identified by the replacement branch name; and running the second sub-model using the first tensor as an input to the node associated with the original branch name.
An apparatus comprising one or more processors configured to perform at least: obtaining information describing a graph structure of a machine learning, ML, model, wherein the ML model has a plurality of nodes, each node having an input of at least one tensor, each tensor having an original branch name; obtaining information indicating a split of the ML model into at least a first sub-model and a second sub-model, each of the nodes being in one of the sub-models; receiving, from a first endpoint, information associating at least one original branch name with a replacement branch name, the original branch name identifying an input to a node in the second sub-model; receiving a first tensor from the first endpoint, wherein the first tensor is identified by the replacement branch name; and running the second sub-model using the first tensor as an input to the node associated with the original branch name.
The method of claim 9, or the apparatus of claim 10, further comprising decoding the replacement branch name to obtain the original branch name, wherein using the first tensor comprises routing the first tensor to the node associated with the original branch name.
The method of claim 9, or claim 11 as it depends from claim 9, or the apparatus of claim 10, or claim 11 as it depends from claim 10, wherein the replacement branch name is a hash of the original branch name, further comprising performing a hash of the original branch name to check the validity of the replacement branch name.
The method of claim 9, or claims 11-12 as they depend from claim 9, or the apparatus of claim 10, or claims 11-12 as they depend from claim 10, wherein the information associating the original branch name of the first node with a replacement branch name comprises a table associating a plurality of original branch names with respective replacement branch names.
The method of claim 13 as it depends from claim 9, or the apparatus of claim 13 as it depends from claim 10, wherein the table further includes tensor metadata including a data type or a tensor dimension for each of the tensors.
The method of claim 13 as it depends from claim 9, or the apparatus of claim 13 as it depends from claim 10, wherein the table identifies a renaming method of each of the tensors.

Description

BACKGROUND The present disclosure relates to machine learning (ML) models. New uses for machine learning models are constantly being developed. As the usefulness of such models increases, the computational complexity of those models continues to grow. It may be difficult to implement a complete machine learning model entirely on one computing device, particularly where the computing device is consumer level user equipment. The present disclosure thus relates to techniques in which an ML model is implemented partly on one endpoint (such as user equipment, UE) and partly on another endpoint (which may be a network entity), referred to as split inferencing. SUMMARY Briefly stated, in one embodiment, a method comprises: obtaining information describing a graph structure of a machine learning, ML, model, wherein the ML model has a plurality of nodes, each node having an output of at least one tensor, each tensor having an original branch name; obtaining information indicating a split of the ML model into at least a first sub-model and a second sub-model, each of the nodes being in one of the sub-models; for at least a first one of the tensors, the first tensor being an output of a node in the first sub-model, sending, to a second endpoint, information associating the original branch name of the first tensor with a replacement branch name; running the first sub-model to obtain the first tensor; and sending the first tensor to the second endpoint, wherein the first tensor is identified by the replacement branch name. An apparatus according to some embodiments comprises one or more processors configured to perform at least: obtaining information describing a graph structure of a machine learning, ML, model, wherein the ML model has a plurality of nodes, each node having an output of at least one tensor, each tensor having an original branch name; obtaining information indicating a split of the ML model into at least a first sub-model and a second sub-model, each of the nodes being in one of the sub-models; for at least a first one of the tensors, the first tensor being an output of a node in the first sub-model, sending, to a second endpoint, information associating the original branch name of the first tensor with a replacement branch name; running the first sub-model to obtain the first tensor; and sending the first tensor to the second endpoint, wherein the first tensor is identified by the replacement branch name. A method according to some embodiments comprises: obtaining information describing a graph structure of a machine learning, ML, model, wherein the ML model has a plurality of nodes, each node having an input of at least one tensor, each tensor having an original branch name; obtaining information indicating a split of the ML model into at least a first sub-model and a second sub-model, each of the nodes being in one of the sub-models; receiving, from a first endpoint, information associating at least one original branch name with a replacement branch name, the original branch name identifying an input to a node in the second sub-model; receiving a first tensor from the first endpoint, wherein the first tensor is identified by the replacement branch name; and running the second sub-model using the first tensor as an input to the node associated with the original branch name. An apparatus according to some embodiments comprises one or more processors configured to perform at least: obtaining information describing a graph structure of a machine learning, ML, model, wherein the ML model has a plurality of nodes, each node having an input of at least one tensor, each tensor having an original branch name; obtaining information indicating a split of the ML model into at least a first sub-model and a second sub-model, each of the nodes being in one of the sub-models; receiving, from a first endpoint, information associating at least one original branch name with a replacement branch name, the original branch name identifying an input to a node in the second sub-model; receiving a first tensor from the first endpoint, wherein the first tensor is identified by the replacement branch name; and running the second sub-model using the first tensor as an input to the node associated with the original branch name. BRIEF DESCRIPTION OF THE DRAWINGS The following detailed description will be better understood when read in conjunction with the appended drawings, in which there are shown examples of one or more of the multiple embodiments of the present disclosure. It should be understood, however, that the embodiments described herein are not limited to the precise arrangements and instrumentalities shown in the drawings. FIG. 1 illustrates an architecture for a split inference between the UE (user equipment) and network, with the media data source in the UE.FIG. 2 illustrates an architecture for a split inference between the UE (user equipment) and network, with the media data source in the network.FIGs. 3A-3C illust