EP-4742103-A1 - METHOD TO INDEX AN INTERMEDIATE DATA TENSOR

EP4742103A1EP 4742103 A1EP4742103 A1EP 4742103A1EP-4742103-A1

Abstract

In an example method for use in a split inferencing architecture, information is obtained at a first endpoint describing at least a first sub-model of a split-inferencing machine learning model. In communication with a second endpoint, a tensor indexing algorithm is selected. The first sub-model is executed to obtain at least one tensor. The tensor is sent to the second endpoint, each tensor being identified by a respective reference determined according to the selected tensor indexing algorithm.

Inventors

FILOCHE, THIERRY
Quinquis, Cyril
ONNO, STEPHANE

Assignees

InterDigital CE Patent Holdings, SAS

Dates

Publication Date: 20260513
Application Date: 20241108

Claims (15)

A method comprising: at a first endpoint, obtaining information describing at least a first sub-model of a split-inferencing machine learning model; in communication with a second endpoint, selecting a tensor indexing algorithm; executing the first sub-model to obtain at least one tensor; and sending the tensor to the second endpoint, wherein each tensor is identified by a respective reference determined according to the selected tensor indexing algorithm.
An apparatus comprising one or more processors configured to perform at least: at a first endpoint, obtaining information describing at least a first sub-model of a split-inferencing machine learning model; in communication with a second endpoint, selecting a tensor indexing algorithm; executing the first sub-model to obtain at least one tensor; and sending the tensor to the second endpoint, wherein each tensor is identified by a respective reference determined according to the selected tensor indexing algorithm.
The method of claim 1, or the apparatus of claim 2, wherein the selected tensor indexing algorithm assigns an integer to each tensor.
The method of claim 1, or the apparatus of claim 2, wherein the selected tensor indexing algorithm performs a loop through each tensor output by each node of the first sub-model and sequentially assigns an integer to each of the tensors.
The method of claim 1, or the apparatus of claim 2, wherein the selected tensor indexing algorithm assigns a string including at least one integer to each tensor.
The method of claim 1, or the apparatus of claim 2, wherein the selected tensor indexing algorithm assigns to each tensor a reference that includes an array or a pair of integers.
The method of claim 6 as it depends from claim 1, or the apparatus of claim 6 as it depends from claim 2, wherein a first integer in the array or pair identifies a node in the first sub-model of which the respective tensor is an output, and a second integer in the array or pair identifies the respective tensor from among all outputs of the same node.
The method of claim 1, or claims 3-7 as they depend from claim 1, or the apparatus of claim 2, or claims 3-7 as they depend from claim 2, wherein selecting the tensor indexing algorithm comprises sending to the second endpoint, or receiving from the second endpoint, a tensor name encoding algorithm identifier.
A method comprising: at a second endpoint, obtaining information describing at least a second sub-model of a split-inferencing machine learning model; in communication with a first endpoint, selecting a tensor indexing algorithm; receiving at least one tensor from the first endpoint, wherein each tensor is identified by a respective reference determined according to the selected tensor indexing algorithm; and executing the second sub-model on the at least one tensor.
An apparatus comprising one or more processors configured to perform at least: at a second endpoint, obtaining information describing at least a second sub-model of a split-inferencing machine learning model; in communication with a first endpoint, selecting a tensor indexing algorithm; receiving at least one tensor from the first endpoint, wherein each tensor is identified by a respective reference determined according to the selected tensor indexing algorithm; and executing the second sub-model on the at least one tensor.
The method of claim 9, or the apparatus of claim 10, further comprising obtaining information describing a first sub-model of the split-inferencing machine learning model, wherein the reference of the at least one tensor is determined based at least in part on the information describing the first sub-model.
The method of claim 11 as it depends from claim 9, or the apparatus of claim 11 as it depends from claim 10, wherein the selected tensor indexing algorithm performs a loop through each tensor output by each node of the first sub-model and sequentially assigns an integer to each of the tensors.
The method of claim 11 as it depends from claim 9, or the apparatus of claim 11 as it depends from claim 10, wherein the selected tensor indexing algorithm assigns to each tensor a reference that includes an array or a pair of integers, and wherein a first integer in the array or pair identifies a node in the first sub-model of which the respective tensor is an output, and a second integer in the array or pair identifies the respective tensor from among all tensors that are outputs of the same node.
The method of claim 9, or the apparatus of claim 10, wherein the selected tensor indexing algorithm assigns an integer to each tensor.
The method of claim 9, or claims 11-14 as they depend from claim 9, or the apparatus of claim 10, or claims 11-14 as they depend from claim 10, wherein selecting the tensor indexing algorithm comprises sending to the first endpoint, or receiving from the second endpoint, a tensor name encoding algorithm identifier.

Description

BACKGROUND The present disclosure relates to machine learning (ML) models. New uses for machine learning models are constantly being developed. As the usefulness of such models increases, the computational complexity of those models continues to grow. It may be difficult to implement a complete machine learning model entirely on one computing device, particularly where the computing device is consumer level user equipment. The present disclosure thus relates to techniques in which an ML model is implemented partly on one endpoint (such as user equipment, UE) and partly on another endpoint (which may be a network entity), referred to as split inferencing. SUMMARY Briefly stated, in one embodiment, a method comprises: at a first endpoint, obtaining information describing at least a first sub-model of a split-inferencing machine learning model; in communication with a second endpoint, selecting a tensor indexing algorithm; executing the first sub-model to obtain at least one tensor; and sending the tensor to the second endpoint, wherein each tensor is identified by a respective reference determined according to the selected tensor indexing algorithm. An apparatus according to some embodiments comprises one or more processors configured to perform such a method. A method according to some embodiments comprises: at a second endpoint, obtaining information describing at least a second sub-model of a split-inferencing machine learning model; in communication with a first endpoint, selecting a tensor indexing algorithm; receiving at least one tensor from the first endpoint, wherein each tensor is identified by a respective reference determined according to the selected tensor indexing algorithm; and executing the second sub-model on the at least one tensor. An apparatus according to some embodiments comprises one or more processors configured to perform such a method. BRIEF DESCRIPTION OF THE DRAWINGS The following detailed description will be better understood when read in conjunction with the appended drawings, in which there are shown examples of one or more of the multiple embodiments of the present disclosure. It should be understood, however, that the embodiments described herein are not limited to the precise arrangements and instrumentalities shown in the drawings. FIG. 1 illustrates an architecture for a split inference between the UE (user equipment) and network, with the media data source in the UE.FIG. 2 illustrates an architecture for a split inference between the UE (user equipment) and network, with the media data source in the network.FIGs. 3A-3C illustrate a split inferencing model in which only one connection is present. FIG. 3A illustrates Part I, which is the sub-model used to initiate the inference on the EndPoint1 (e.g. UE). FIG. 3B illustrates Part II, which is the sub-model used to finalize the inference process on EndPoint2 (e.g. a network). FIG. 3C illustrates the topology of the entire model (with names removed for clarity of illustration).FIGs. 4A-4D illustrate an example of a split inferencing model with two connections. FIG. 4A illustrates Part I. FIG. 4B and 4C together illustrate Part II. FIG. 4D illustrates the complete model topology (with names removed for clarity of illustration).FIG. 5 illustrates a split inference architecture that may be used to implement an indexing method with negation and transmission of intermediate data with encoded tensor name.FIG. 6 is a call flow diagram illustrating a negotiation method performed in some embodiments.FIG. 7 is a call flow diagram illustrating an inference loop method performed in some embodiments.FIG. 8 provides an overview of the topology of an ML model used in some embodiments. For clarity of illustration, most nodes are not labelled.FIG. 9 illustrates Submodel I based on the model of FIG. 8 split at the node with index 5.FIG. 10 illustrates Submodel II based on the model of FIG. 8 split at the node with index 5.FIG. 11 is a call flow diagram illustrating implementation of split inferencing in a 3GPP framework.FIG. 12 is a functional block diagram of an apparatus on which some embodiments may be implemented. DETAILED DESCRIPTION In describing the various embodiments of the present disclosure, certain terminology is used herein for convenience only and should not be considered as limiting such embodiments. In the drawings, the same reference numerals are employed for designating the same elements throughout the several figures and the present description. Overview of Split Inferencing of a Trained Model FIG. 1 provides an overview of an architecture (3GPP SA4 Al4media) for split inferences of a model composed of n layers or nodes (1..n) between the network and a user equipment (UE), where a first inference implemented on a first endpoint processes first part of the model, i.e. layers 1..k, and the second inference implemented on a second endpoint processes the second part of the model, i.e. layers k+1..n. The architecture shows the de