EP-4736006-A1 - ANOMALY DETECTION IN DISTRIBUTED SPLIT LEARNING ENVIRONMENT

EP4736006A1EP 4736006 A1EP4736006 A1EP 4736006A1EP-4736006-A1

Abstract

A computer-implemented method is provided performed by a first node in a distributed split learning environment for anomaly detection. The method includes identifying (1100) a first indication of whether a data sample from input data contains an anomaly; sending (1102) to a second node an output of an activation function from a first portion of a machine learning (ML) model; receiving (1104) a second indication that indicates whether the anomaly is detected at the second node as well; and performing (1106) one of (i) exclude the data sample from input data to the first node and output of the second node when the anomaly is detected at the second node as well, followed by updating a threshold in a comparator at the first node, and (ii) update parameters of the ML model for a task when the second indication indicates that the anomaly is not detected at the second node.

Inventors

ICKIN, Selim
VANDIKAS, KONSTANTINOS
LARSSON, Hannes
LAN, Xiaoyu
RIAZ, Hassam

Assignees

Telefonaktiebolaget LM Ericsson (publ)

Dates

Publication Date: 20260506
Application Date: 20231102

Claims (20)

1. A computer-implemented method performed by a first node in a distributed split learning environment for anomaly detection, the distributed split learning environment comprising a machine learning, ML, model comprising at least a first portion and a second portion at the first node and at least a third portion at a second node, the method comprising: identifying (1100) a first indication that indicates whether a data sample from the input data contains an anomaly based on comparing (i) the data sample input to the first portion of the ML model of the first node with (ii) an output comprising a reconstruction of the data sample from the second portion of the ML model of the first node; sending (1102) to the second node an output of an activation function from the first portion of the ML model that defines the output from the second portion of the ML model of the first node when the data sample is input to the first portion of the ML model of the first node; based on the output of the activation function, receiving (1104) from a second node a second indication that indicates whether the second node detected that the data sample contains the anomaly; and performing (1106) one of (i) exclude the data sample from the input data to the first node and from the output of the second node when the second indication indicates that the anomaly is detected at the second node, followed by updating a threshold in a comparator at the first node, and (ii) update parameters of the ML model for a task when the second indication indicates that the anomaly is not detected at the second node.
2. The method of Claim 1, wherein second node comprises a plurality of second nodes, and at least the sending (1102), the receiving (1104), and the performing (1106) is repeated for respective data samples and respective second nodes in the plurality of second nodes, the method further comprising: assigning (1204) a rank to the respective data samples based on the second indication received from the first node and the respective second nodes in the plurality of secondary nodes.
3. The method of Claim 2, further comprising: resampling (1206) the input data when the anomaly is detected at the first node and the second node with the consideration of the rank of the respective data samples.
4. The method of any one of Claims 1 to 3, wherein the comparing comprises (i) computing a distance between the reconstruction of the data sample from the second portion of the ML model and the data sample to the first portion of the ML model, and (ii) based on the distance, setting a value of the threshold to identify whether the data sample from the input data contains an anomaly, and (iii) comparing the reconstruction of the data sample from the second portion of the ML model with the threshold value.
5. The method of any one of Claims 1 to 4, wherein the update parameters of the ML model for a task when the second indication indicates that the anomaly is not detected at the second node comprises (i) calculating a first weight of a first loss of the second portion of the ML model, (ii) receiving the second indication from the second node that the anomaly is not detected at the second node, (iii) receiving a second weight from the second node based on a second loss calculated at the second node comprising a difference between an accuracy of the third portion of the ML model and an expected accuracy of the third portion of the ML model based on a historical accuracy or a trajectory of accuracy, and (iv) updating the ML model with the first weight and the second weight.
6. The method of Claim 5, wherein the first loss comprises a reconstruction loss of the second portion of ML model and the second loss comprises at least one of a classification loss and a regressor loss.
7. The method of any of Claims 1 to 6, wherein the second node comprises a plurality of second nodes, and the method further comprises: when the anomaly is detected at the first node and the second node, propagating (1208) an identifier for the data sample having the anomaly to the plurality of second nodes for multi-task learning to exclude the data sample related to identifier.
8. The method of any one of Claims 1 to 7, wherein the identifying (1100), the sending (1102), the receiving (1104), and the performing (1106) exclude a data sample are performed in a forward propagation pass among the first and second portions of the first node and the second node, and the updating is performed in a backward propagation pass among the second node and the first node.
9. The method of any one of Claims 1 to 8, further comprising: calculating (1202) a reconstruction loss of the second portion of the ML model with the data sample and the reconstruction of the data sample, wherein the update parameters of the ML model of the first node is performed in a backward propagation pass with the reconstruction loss and a loss of the third portion of the ML model from the second node.
10. The method of any one of Claims 1 to 9, wherein the distributed split learning environment further comprises a plurality of respective third portions of the ML model at a plurality of respective second nodes, and the method further comprises: training (1302) the first portion of the ML model with the plurality of respective third portions of the ML model.
11. The method of Claim 10, wherein the ML model comprises a neural network, wherein the sending to the second node an output of the activation function is performed in a forward propagation pass from the first portion of the ML model comprising a first number of layers to (i) the second portion of the ML model comprising a second number of layer and (ii) the second node comprising a third number of layers, and wherein the receiving from the second node the second indication that indicates whether to remove the data sample from the input data is performed in a backward propagation pass wherein the second node obtains a second gradient at a last layer of the third number of layers.
12. The method of Claim 11, wherein the update parameters of the ML model when the second indication indicates that the anomaly is not detected at the second node comprises merging a first gradient and the second gradient, the method further comprising: receiving (1300) the ML model from the second node in a plurality of second nodes; performing (1304) a forward propagation pass on the output of the activation function with the second portion of the ML model; computing (1306) a loss at the output of the second portion of the ML model; performing (1308) a backward propagation pass with the second portion of the ML model to obtain the first gradient of a last layer of the second number of layers; and receiving (1310), at the first portion of the first node, a merged gradient comprising the first and the second gradients.
13. The method of Claim 12, wherein when the plurality of second nodes have a same task, the method further comprising: sending (1312) the updated ML model to the second node in the plurality of second nodes.
14. The method of any one of Claims 1 to 13, wherein the second portion of the ML model comprises a decoder for anomaly detection and the third portion of the ML model comprises at least one of a tail portion of the ML model for at least one of a classification task and a regression task.
15. The method of any one of Claims 1 to 14, wherein the ML model comprises a neural network, the first portion of the ML model comprises a head portion of the neural network comprising a first number of layers of the neural network and an encoder, the second portion comprises a first tail portion of the neural network comprising a second number of layers, and the third portion comprises a second tail portion of the neural network comprising a third number of layers.
16. The method of any one of Claims 1 to 15, wherein the distributed split learning environment comprises a cloud implementation, wherein the ML model comprises a neural network, the first portion comprises an encoder, the second portion comprises a decoder, a comparator connected to the output of the decoder, and the third portion comprises at least one of a classifier ML model and a regressor ML model for a task.
17. The method of Claim 16, wherein the distributed split learning environment further comprises a second comparator connected to the output of the third portion of the ML model.
18. The method of any one of Claims 1 to 17, wherein the distributed split learning environment comprises a network data analytics function, NWDAF, implementation, the ML model comprises a neural network, the first node comprises a first model training logical function, MTLF, client, the second node comprises a second MTLF client, the first portion comprises an encoder, the second portion comprises a decoder, the third portion comprises at least one of a classifier ML model and a regressor ML model for a task, and wherein the comparator is connected to the output of the decoder and to the output of the at least one of a classifier ML model and a regressor ML model.
19. The method of any one of Claims 1 to 17, wherein the distributed split learning environment comprises an open radio access network, O- RAN, implementation, the ML model comprises a neural network, the first portion comprises an encoder, the second portion comprises a decoder, the second node comprises an rApp or an xApp, the third portion comprises at least one of a classifier ML model and a regressor ML model for a task, and wherein the comparator is connected to the output of the decoder and to the output of the at least one of a classifier ML model and a regressor ML model.
20. The method of any one of Claims 1 to 19, wherein the task comprises a task in a telecommunications network.

Description

Anomaly Detection In Distributed Split Learning Environment TECHNICAL FIELD [0001] The present disclosure relates generally to computer-implemented methods performed by a first node in a distributed split learning environment for collaborative anomaly detection, where the distributed split learning environment includes a machine learning (ML) model. BACKGROUND [0002] Artificial intelligence (Al) architecture in sixth generation (6G) networks may be described as having four main components: machine learning (ML) operations (Ops) (e.g., intelligence anywhere when beneficial), DataOps, Zero-touch, and Al as a service (AlaaS). In MLOps, there may be a need for efficient ML model management as the number of ML models for different tasks grow exponentially. For example, it may be important to simplify and reduce the number of managed ML models as much as possible while still sustaining ML model accuracy. In addition, 6G architecture is forecasted to include split architecture with distributed network elements. There may be a need for ML model generalization mechanisms applicable for a distributed network architecture. [0003] Split learning can leverage a split neural network that is a distributed deep learning technique, which can train deep neural networks over multiple data sources without sharing raw labeled data directly. In split learning, a deep neural network can be split into multiple partitions, each of which can be trained on a different client. In forward propagation, each client can train a partial neural network up to its last layer, which may be referred to as a cut layer. The outputs at the cut layers from the clients can be sent to the server and used to train the rest of the network. In backward propagation, gradients are back propagated from a cut layer until the cut layer in server sends back to clients to complete the back propagation. This process can be continued back and forth to complete forward and backward propagations to train the distributed deep neural networks without sharing raw data of clients. The process can be done either by all clients feeding into the server in parallel or in sequence. [0004] In a split learning setting, a client can have orthogonal or overlapping input features, without having to have labels. An example use case in a fifth generation (5G) core network includes where each function handles certain features of a dataset related to the radio, network, end device (e.g., user equipment (UE)), meta data related to the user context, and quality of experience (QoE). Using split learning, each function in the core network may join the training by transferring its local model encoded output (which may be referred to as “smashed feature values”) to a server without sending raw data. Split learning also can perform training in a sequential fashion when collaborating clients have the same set of input features, which may yield advantages over federated learning, for example when the ML models have a large size. SUMMARY [0005] There currently exist certain challenges. Training separate ML models may be costly including, for example, regarding computation, communication, storage, model management, etc. Moreover, in a decentralized or distributed split neural network setting that include a head model and a tail model, anomaly detection may be a challenge because, for example, input data can be distributed over multiple computation nodes; and in some cases, an anomaly detected at the input data does not necessarily impact the overall results of a prediction. Thus, anomaly detection may be lacking that is performed both from a input raw data quality perspective and a ML model efficacy perspective. Moreover, when there are multiple use cases, an abnormal accuracy detected for one use case may not be detected for another use case which may indicate that the reason for lack of detection may not be due to a shared encoder model (e.g., a head model in a split ML model architecture) responsible for input data feature extraction, but may be due to the use case tail model itself (e.g., a decoder and/or classifier of regressor ML model in the split ML model architecture) . Looking at both perspectives also may avoid unnecessary input data removal at the head model that receives input data. [0006] Certain aspects of the disclosure and their embodiments may provide solutions to these or other challenges. [0007] Some embodiments provide a computer-implemented method performed by a first node in a distributed split learning environment for anomaly detection. The distributed split learning environment comprises a ML model comprising at least a first portion and a second portion at the first node and at least a third portion at a second node. The method comprises identifying a first indication that indicates whether a data sample from the input data contains an anomaly based on comparing (i) the data sample input to the first portion of the ML model of the first node with (ii) an output com