EP-4742099-A1 - METHOD TO COMMUNICATE INTERMEDIATE DATA VALUES WITHOUT INTERMEDIATE DATA NAMES

EP4742099A1EP 4742099 A1EP4742099 A1EP 4742099A1EP-4742099-A1

Abstract

In an example of a split-inferencing method, a first endpoint obtains information describing at least a first sub-model of a split-inferencing machine learning model. In communication with a second endpoint, a tensor sorting algorithm is selected from among a plurality of available tensor sorting algorithms. The first endpoint runs the first sub-model to obtain a plurality of tensors, and it sends the tensors to the second endpoint in an order determined by the selected tensor sorting algorithm. The tensors may be sent without tensor names. The second endpoint may interpret the tensors based on the selected tensor sorting algorithm and run the second sub-model.

Inventors

FILOCHE, THIERRY
Quinquis, Cyril
ONNO, STEPHANE

Assignees

InterDigital CE Patent Holdings, SAS

Dates

Publication Date: 20260513
Application Date: 20241108

Claims (15)

A method comprising: at a first endpoint, obtaining information describing at least a first sub-model of a split-inferencing machine learning model; in communication with a second endpoint, selecting a tensor sorting algorithm from among a plurality of available tensor sorting algorithms; running the first sub-model to obtain a plurality of tensors; and sending the plurality of tensors to the second endpoint, wherein the plurality of tensors are sent in an order determined by the selected tensor sorting algorithm.
An apparatus comprising one or more processors configured to perform at least: at a first endpoint, obtaining information describing at least a first sub-model of a split-inferencing machine learning model; in communication with a second endpoint, selecting a tensor sorting algorithm from among a plurality of available tensor sorting algorithms; running the first sub-model to obtain a plurality of tensors; and sending the plurality of tensors to the second endpoint, wherein the plurality of tensors are sent in an order determined by the selected tensor sorting algorithm.
The method of claim 1 or the apparatus of claim 2, wherein the first endpoint does not send names for the plurality of tensors to the second endpoint.
The method of claim 1, or claim 3 as it depends from claim 1, or the apparatus of claim 2, or claim 3 as it depends from claim 2, further comprising, in communication with the second endpoint, selecting a table of tensor sorting algorithms from among a plurality of available tables, wherein the selection of a tensor sorting algorithm comprises selecting the tensor sorting algorithm from the selected table.
The method of claim 1, or claims 3-4 as they depend from claim 1, or the apparatus of claim 2, or claims 3-4 as they depend from claim 2, wherein selecting a tensor sorting algorithm comprises sending or receiving a sorting method identifier.
The method of claim 1, or claims 3-5 as they depend from claim 1, or the apparatus of claim 2, or claims 3-5 as they depend from claim 2, wherein the selected tensor sorting algorithm comprises sorting the tensors according to ranks of the tensors within the machine learning model.
The method of claim 1, or claims 3-5 as they depend from claim 1, or the apparatus of claim 2, or claims 3-5 as they depend from claim 2, wherein the selected tensor sorting algorithm comprises sorting the tensors according to an alphabetical order of tensor names.
The method of claim 1, or claims 3-5 as they depend from claim 1, or the apparatus of claim 2, or claims 3-5 as they depend from claim 2, wherein the selected tensor sorting algorithm comprises sorting the tensors according to an order of outputs of the first sub-model.
The method of claim 1, or claims 3-5 as they depend from claim 1, or the apparatus of claim 2, or claims 3-5 as they depend from claim 2, wherein the machine learning model further comprises a second sub-model, and wherein the selected tensor sorting algorithm comprises sorting the tensors according to an order of inputs of the second sub-model.
A method comprising: at a second endpoint, obtaining information describing at least a second sub-model of a split-inferencing machine learning model; in communication with a first endpoint, selecting a tensor sorting algorithm from among a plurality of available tensor sorting algorithms; receiving a plurality of tensors from the first endpoint, wherein the plurality of tensors are received in an order determined by the selected tensor sorting algorithm; and running the second sub-model on the plurality of tensors.
An apparatus comprising one or more processors configure to perform at least: at a second endpoint, obtaining information describing at least a second sub-model of a split-inferencing machine learning model; in communication with a first endpoint, selecting a tensor sorting algorithm from among a plurality of available tensor sorting algorithms; receiving a plurality of tensors from the first endpoint, wherein the plurality of tensors are received in an order determined by the selected tensor sorting algorithm; and running the second sub-model on the plurality of tensors.
The method of claim 10, or the apparatus of claim 11, wherein names for the plurality of tensors are not received from the first endpoint.
The method of claim 10, or claim 12 as it depends from claim 10, or the apparatus of claim 11, or claim 12 as it depends from claim 11, further comprising, in communication with the second endpoint, selecting a table of tensor sorting algorithms from among a plurality of available tables, wherein the selection of a tensor sorting algorithm comprises selecting the tensor sorting algorithm from the selected table.
The method of claim 10, or claims 12-13 as they depend from claim 10, or the apparatus of claim 11, or claims 12-13 as they depend from claim 11, wherein selecting a tensor sorting algorithm comprises sending or receiving a sorting method identifier.
The method of claim 10, or claims 12-14 as they depend from claim 10, or the apparatus of claim 11, or claims 12-14 as they depend from claim 11, further comprising determining a name for each of the plurality of tensors, wherein the second sub-model is run according to the determined names.

Description

BACKGROUND The present disclosure relates to machine learning (ML) models. New uses for machine learning models are constantly being developed. As the usefulness of such models increases, the computational complexity of those models continues to grow. It may be difficult to implement a complete machine learning model entirely on one computing device, particularly where the computing device is consumer level user equipment. The present disclosure thus relates to techniques in which an ML model is implemented partly on one endpoint (such as user equipment, UE) and partly on another endpoint (which may be a network entity), referred to as split inferencing. SUMMARY Briefly stated, in one embodiment, a method comprises: at a first endpoint, obtaining information describing at least a first sub-model of a split-inferencing machine learning model; in communication with a second endpoint, selecting a tensor sorting algorithm from among a plurality of available tensor sorting algorithms; running the first sub-model to obtain a plurality of tensors; and sending the plurality of tensors to the second endpoint, wherein the plurality of tensors are sent in an order determined by the selected tensor sorting algorithm. An apparatus according to some embodiments comprises one or more processors configured to perform at least: at a first endpoint, obtaining information describing at least a first sub-model of a split-inferencing machine learning model; in communication with a second endpoint, selecting a tensor sorting algorithm from among a plurality of available tensor sorting algorithms; running the first sub-model to obtain a plurality of tensors; and sending the plurality of tensors to the second endpoint, wherein the plurality of tensors are sent in an order determined by the selected tensor sorting algorithm. In some embodiments, the first endpoint does not send names for the plurality of tensors to the second endpoint. A method according to some embodiments comprises: at a second endpoint, obtaining information describing at least a second sub-model of a split-inferencing machine learning model; in communication with a first endpoint, selecting a tensor sorting algorithm from among a plurality of available tensor sorting algorithms; receiving a plurality of tensors from the first endpoint, wherein the plurality of tensors are received in an order determined by the selected tensor sorting algorithm; and running the second sub-model on the plurality of tensors. An apparatus according to some embodiments comprises one or more processors configure to perform at least: at a second endpoint, obtaining information describing at least a second sub-model of a split-inferencing machine learning model; in communication with a first endpoint, selecting a tensor sorting algorithm from among a plurality of available tensor sorting algorithms; receiving a plurality of tensors from the first endpoint, wherein the plurality of tensors are received in an order determined by the selected tensor sorting algorithm; and running the second sub-model on the plurality of tensors. BRIEF DESCRIPTION OF THE DRAWINGS The following detailed description will be better understood when read in conjunction with the appended drawings, in which there are shown examples of one or more of the multiple embodiments of the present disclosure. It should be understood, however, that the embodiments described herein are not limited to the precise arrangements and instrumentalities shown in the drawings. FIG. 1 illustrates an architecture for a split inference between the UE (user equipment) and network, with the media data source in the UE.FIG. 2 illustrates an architecture for a split inference between the UE (user equipment) and network, with the media data source in the network.FIGs. 3A-3C illustrate a split inferencing model in which only one connection is present. FIG. 3A illustrates Part I, which is the sub-model used to initiate the inference on the EndPoint1 (e.g. UE). FIG. 3B illustrates Part II, which is the sub-model used to finalize the inference process on EndPoint2 (e.g. a network). FIG. 3C illustrates the topology of the entire model (with names removed for clarity of illustration).FIGs. 4A-4D illustrate an example of a split inferencing model with two connections. FIG. 4A illustrates Part I. FIG. 4B and 4C together illustrate Part II. FIG. 4D illustrates the complete model topology (with names removed for clarity of illustration).FIG. 5 illustrates a split inference architecture that may be used to implement a method in which intermediate tensor data is sent without including tensor names.FIG. 6 is a call flow diagram illustrating a negotiation method performed in some embodiments.FIG. 7 is a call flow diagram illustrating a negotiation method performed in other embodiments.FIG. 8 is a call flow diagram illustrating an inference loop method performed in some embodiments.FIG. 9 provides an overview of the topology of an ML model used in some e