EP-4736011-A1 - SYSTEM AND METHOD FOR DATA LABELING USING KNOWLEDGE GRAPH IN THE CLOUD

EP4736011A1EP 4736011 A1EP4736011 A1EP 4736011A1EP-4736011-A1

Abstract

A method, system and apparatus are disclosed. A method implemented in a management node configured with a knowledge graph including a plurality of links for a plurality of nodes in a communication network is provided. The method includes detecting anomalous metric data associated with at least one of the plurality of nodes of the knowledge graph based on a first historical dataset including metric data representative of a normal state of the plurality of nodes of the knowledge graph, determining at least one anomalous node of the plurality of nodes based on the anomalous metric data, and determining at least one label for the anomalous metric data based on at least one link from the knowledge graph associated with the at least one anomalous node.

Inventors

MOURADIAN, Carla
SOUALHIA, Mbarka
GÉHBERGER, Dániel

Assignees

Telefonaktiebolaget LM Ericsson (publ)

Dates

Publication Date: 20260506
Application Date: 20230630

Claims (20)

1. A management node (21) in a communication network, the management node (21) being configured with a knowledge graph comprising a plurality of links for a plurality of nodes (16) in the communication network, the management node (21) comprising processing circuitry 36 configured to: detect anomalous metric data associated with at least one of the plurality of nodes (16) of the knowledge graph based on a first historical dataset comprising metric data representative of a normal state of the plurality of nodes (16) of the knowledge graph; determine at least one anomalous node (16) of the plurality of nodes (16) based on the anomalous metric data; and determine at least one label for the anomalous metric data based on at least one link from the knowledge graph associated with the at least one anomalous node (16).
2. The management node (21) of Claim 1, wherein the processing circuitry (36) is further configured to: train a machine learning (ML) model using the anomalous metric data and the at least one label.
3. The management node (21) of Claim 2, wherein the processing circuitry (36) is further configured to: detect additional anomalous metric data associated with at least one of the plurality of nodes (16) of the knowledge graph; and determine at least one additional label for the additional anomalous metric data based on the trained ML model.
4. The management node (21) of any one of Claims 1-3, wherein the determining of the at least one label for the anomalous metric data further includes: determining a plurality of candidate paths for traversing the knowledge graph, each of the plurality of candidate paths traversing at least one anomalous node ( 16) of the plurality of nodes (16); and determining the at least one label based on metrics associated with at least one of the plurality of candidate paths.
5. The management node (21) of Claim 4, wherein the determining of the at least one label for the anomalous metric data further includes: determining a rank order of the plurality of candidate paths; and determining at least one first label based on at least one corresponding highest-ranked candidate paths of the plurality of candidate paths.
6. The management node (21) of Claim 5, wherein the determining of the rank order of the plurality of candidate paths includes: determining a respective score for each candidate path of the plurality of candidate paths, the respective score being based on at least one of: a number of anomalous nodes (16) of the respective candidate path; a rate of anomaly of the respective candidate path; a severity of anomaly of the respective candidate path; and a number of anomalous metrics of the respective candidate path.
7. The management node (21) of any one of Claims 5 and 6, wherein the at least one highest-ranked candidate path traverses a first set of nodes (16); and the determining of the at least one first label based on the at least one highest-ranked candidate path includes: determining a start node (16) of the first set of nodes (16) based on a hierarchical relationship among the first set of nodes (16); determining a first plurality of anomalous metrics associated with a first number of nodes (16) of the first set of nodes (16), the first plurality of anomalous metrics including a first plurality of metric names and a corresponding first plurality of metric values; determining a first keyword based on the first plurality of metric names; and determining the at least one first label based on the first keyword.
8. The management node (21) of Claim 7, wherein the processing circuitry (36) is further configured to: validate the at least one first label based on the first number of nodes (16) associated with the first plurality of anomalous metrics exceeding a preconfigured threshold number of nodes (16).
9. The management node (21) of Claim 8, wherein the processing circuitry (36) is further configured to: responsive to a failure to validate the at least one first label, determine a second candidate path traversing a second set of nodes (16); determine a second plurality of anomalous metrics associated with a second number of nodes (16) of the second set of nodes (16), the second plurality of anomalous metrics including a second plurality of metric names and a corresponding second plurality of metric values; determine a second keyword based on the second plurality of metric names; determine at least one second label based on the second keyword; and validate the at least one second label based on the second number of nodes (16) associated with the second plurality of anomalous metrics exceeding the preconfigured threshold number of nodes (16).
10. The management node (21) of Claim 9, wherein the second candidate path is one of: a second-highest-ranked candidate path of the plurality of candidate paths traversing the second set of nodes (16); and a sub-path of the at least one highest-ranked candidate path, the sub-path traversing the second set of nodes (16), the second set of nodes (16) being a subset of the first set of nodes (16), the second set of nodes (16) including at least one dependent node (16) of the start node (16).
11. The management node (21) of any one of Claims 1-10, wherein the processing circuitry (36) is further configured to: receive, prior to the detecting of the anomalous metric data, the first historical dataset including at least one of monitoring metrics, traces, and log files associated with the plurality of nodes (16) of the knowledge graph.
12. The management node (21) of any one of Claims 1-11, wherein the processing circuitry (36) is further configured to at least one of: receive the knowledge graph from at least one network node (16) in the communication network; and receive the anomalous metric data from at least one network node (16) in the communication network.
13. A method implemented in a management node (21) in a communication network, the management node (21) being configured with a knowledge graph comprising a plurality of links for a plurality of nodes ( 16) in the communication network, the method comprising: detecting anomalous metric data associated with at least one of the plurality of nodes ( 16) of the knowledge graph based on a first historical dataset comprising metric data representative of a normal state of the plurality of nodes (16) of the knowledge graph; determining at least one anomalous node (16) of the plurality of nodes (16) based on the anomalous metric data; and determining at least one label for the anomalous metric data based on at least one link from the knowledge graph associated with the at least one anomalous node (16).
14. The method of Claim 13, wherein the method further comprises: training a machine learning (ML) model using the anomalous metric data and the at least one label.
15. The method of Claim 14, wherein the method further comprises: detecting additional anomalous metric data associated with at least one of the plurality of nodes (16) of the knowledge graph; and determining at least one additional label for the additional anomalous metric data based on the trained ML model.
16. The method of any one of Claims 13-15, wherein the determining of the at least one label for the anomalous metric data further includes: determining a plurality of candidate paths for traversing the knowledge graph, each of the plurality of candidate paths traversing at least one anomalous node ( 16) of the plurality of nodes (16); and determining the at least one label based on metrics associated with at least one of the plurality of candidate paths.
17. The method of Claim 16, wherein the determining of the at least one label for the anomalous metric data further includes: determining a rank order of the plurality of candidate paths; and determining at least one first label based on at least one corresponding highest-ranked candidate paths of the plurality of candidate paths.
18. The method of Claim 17, wherein the determining of the rank order of the plurality of candidate paths includes: determining a respective score for each candidate path of the plurality of candidate paths, the respective score being based on at least one of: a number of anomalous nodes (16) of the respective candidate path; a rate of anomaly of the respective candidate path; a severity of anomaly of the respective candidate path; and a number of anomalous metrics of the respective candidate path.
19. The method of any one of Claims 17 and 18, wherein the at least one highest- ranked candidate path traverses a first set of nodes (16); and the determining of the at least one first label based on the at least one highest-ranked candidate path includes: determining a start node (16) of the first set of nodes (16) based on a hierarchical relationship among the first set of nodes (16); determining a first plurality of anomalous metrics associated with a first number of nodes (16) of the first set of nodes (16), the first plurality of anomalous metrics including a first plurality of metric names and a corresponding first plurality of metric values; determining a first keyword based on the first plurality of metric names; and determining the at least one first label based on the first keyword.
20. The method of Claim 19, wherein the method further comprises: validating the at least one first label based on the first number of nodes (16) associated with the first plurality of anomalous metrics exceeding a preconfigured threshold number of nodes (16).

Description

SYSTEM AND METHOD FOR DATA LABELING USING KNOWLEDGE GRAPH IN THE CLOUD TECHNICAL FIELD The present disclosure relates to wireless communications, and in particular, to performance metric data labeling using knowledge graphs in a wireless communications network. BACKGROUND The Third Generation Partnership Project (3GPP) has developed and is developing standards for Fourth Generation (4G) (also referred to as Long Term Evolution (LTE)) and Fifth Generation (5G) (also referred to as New Radio (NR)) wireless communication systems. Such systems provide, among other features, broadband communication between network nodes, such as base stations, and mobile wireless devices (WD), as well as communication between network nodes and between WDs. The 3GPP is also developing standards for Sixth Generation (6G) wireless communication networks. Knowledge bases have been in use for decades in telecom systems. Knowledge bases store facts about (a subset of) the world, and these facts can for example be used to reason and deduce new facts or highlight inconsistencies. Initially, knowledge bases were mainly parts of expert systems. This technology is gaining momentum again as part of machine reasoning solutions. The “knowledge” of a knowledge base is typically stored in a graph form (e.g., data structure), and as a result, the term “knowledge graph” may be used to refer to a knowledge base, or some aspect thereof. Expert and machine reasoning systems may use knowledge graphs to infer new facts and answer questions or, for example, pinpoint root causes, identify undesirable system states, etc. As knowledge may be complex in nature, knowledge graphs may quickly become complex, e.g., with thousands of nodes and edges. Storing such graphs may be storage/memory consuming, and traversing them may consume significant time and computational resources. For real-time applications, decision making on large knowledge graphs by such traversals may not be practicable or feasible. In edge cloud contexts and/or loT environments, there may be significant memory and/or computational limitations, and thus an entire knowledge base may not be available for decision making. As a result, knowledge graph traversal may be more expensive, especially when it is required to traverse the knowledge graph repeatedly to answer different questions (e.g., “Is the system in a faulty state?”, “Is there a security threat?”, etc.). With the rise of data-centric technologies in telecom systems, Machine Learning (ML) techniques are increasingly used in various domains including fault, performance and security management. Such ML techniques may use the data collected from the system(s), train detection and/or prediction models, and then use the models for online detection and/or prediction. ML techniques have become an important stepping-stone in building automated and intelligent cloud systems. For instance, they allow applications running in a cloud system to become more accurate when identifying its outcomes, without necessarily being explicitly programmed to do so. Different ML algorithms have been proposed in the literature, including, e.g., supervised machine learning, which uses labeled datasets to train algorithms to classify data or predict outcomes accurately. For example, some existing supervised machine learning systems have been characterized as answering questions with various levels of accuracy and efficiency (e.g., similar to the questions answered by the knowledge graphs). Labeling is a noteworthy function for configuring supervised machine learning as the quality of labels may have a direct impact on the performance of supervised learning. Data labeling is usually a manual and costly process and requires prior knowledge about the monitored system. Such manual solutions may not be able to generically identify all labels in large cloud systems, and may also not be able to guarantee a full coverage of the labels, especially for large datasets. In addition, it may not be possible for such manual solutions to identify rare events or unknown events when they occur. Traditional label learning methods require the labeler to be an expert and have prior knowledge about the monitored system to guarantee the correctness of the output labels. This results in expensive labeling costs, and requires substantial time and effort to provide such manual configuration and knowledge. In addition, it may not be generalizable for large data sets, especially for large systems. Some existing systems have built knowledge graphs from extracted facts from large- scale systems. One of the challenges in this process is inferring labels consistently. Some existing systems have considered an approach based on determining co-reference entities in the knowledge graph to produce consistent sets of labels and relations for each node, and use classification methods to label nodes by taking into account ontological information and neighboring labels. In some existing systems, a know