US-12621233-B2 - QoS aware reinforcement learning prevention intrusion system

US12621233B2US 12621233 B2US12621233 B2US 12621233B2US-12621233-B2

Abstract

Methods, systems, and apparatuses are disclosed. A network node configured for performing network routing associated with a plurality of wireless devices, WDs, in a communication system is described. The network node includes processing circuit configured to collect, from a control plane, a plurality of graph states associated with a plurality of graphs. Each graph of the plurality of graphs has at least one graph node associated with one WD of the plurality of WDs. At least one action is determined, using self-learning, to update at least one route in at least one graph of the plurality of graphs based on the collected plurality of graphs states. The at least one action is transmitted to a controller for instructing at least one WD to update at least one network route based on the at least one action.

Inventors

Amine BOUKHTOUTA
Hyame ALAMEDDINE
Taous MADI
Christian Miranda MOREIRA
GEORGES KADDOUM

Assignees

TELEFONAKTIEBOLAGET LM ERICSSON (PUBL)

Dates

Publication Date: 20260505
Application Date: 20211022

Claims (20)

1 . A network node configured for performing network routing associated with a plurality of wireless devices, WDs, in a communication system, the network node comprising processing circuitry configured to: collect, from a control plane, a plurality of graph states associated with a plurality of graphs, each graph of the plurality of graphs having at least one graph node associated with one WD of the plurality of WDs; determine, using self-learning, at least one action to update at least one route in at least one graph of the plurality of graphs based on the collected plurality of graphs states, the self-learning comprising: entering a warm-up phase comprising exploring the plurality of graph states as a ground truth to self-learn optimizing network routes; and entering a production phase including: exploiting the explored plurality of graph states of the warm-up phase; and determining a plurality of actions including the at least one action to update the at least on route in at least one graph; and cause the network node to transmit the at least one action to a controller for instructing at least one WD to update at least one network route based on the at least one action.
2 . The network node of claim 1 , wherein the self-learning is based at least in part on a quality of service parameter.
3 . The network node of claim 1 , wherein the plurality of graph states includes flowtables and metrics, the metrics including at least one of a transmission delay, a packet loss rate, and a queue delay.
4 . The network node of claim 1 , wherein the self-learning further includes any one of: monitoring a topology of at least one graph of the plurality of graphs; when at least one WD has been one of removed from and added to the at least one graph, one of enter and continue with the warm-up phase; and when at least one WD has not been one of removed from and added to the at least one graph, one of enter and continue with the production phase.
5 . The network node of claim 1 , wherein the self-learning further includes: selecting a random state depicting a graph snapshot and a random plurality of actions for each graph node; and evaluating the selected random state and the random plurality of actions using a probabilistic policy based on a derived quality value.
6 . The network node of claim 5 , wherein the self-learning further includes: determining a reward based on a cost of the random plurality of actions, the selected random state, an overall transmission delay, a queue delay, and overall packet loss rate; determining a future state and a future action; evaluating an action selection policy based on the derived quality value; learning another quality value by evaluating an impact of the reward and how the future state and the future action compare with the selected random state and random plurality of actions; updating a current state with the future state and a current action with the future action; and capturing an overall quality value determine a convergence.
7 . The network node of claim 6 , wherein updating the current state and the current action includes selecting a graph node that is a parent to another graph node, the selecting being based at least on the derived quality value.
8 . The network node of claim 1 , wherein the controller is in the control plane, the at least one WD is in a data plane, and transmitting the at least one action triggers the WD to update the at least one network route.
9 . The network node of claim 1 , wherein the plurality of graphs is a plurality of Destination Oriented Directed Acyclic Graphs, DODAGs.
10 . The network node of claim 1 , wherein the communication system includes a wireless sensor network, the network routing is a Quality of Service, QoS, awareness-based routing that is performed in Routing Protocol for low Power and Lossy networks, RPL, in the wireless sensor network, the network node is a border router to the wireless sensor network, and the wireless sensor network is a IPv6 low power wireless personal area network, SD6LowPAN, network.
11 . A method implemented in a network node configured for performing network routing associated with a plurality of wireless devices, WDs, in a communication system, the method comprising: collecting, from a control plane, a plurality of graph states associated with a plurality of graphs, each graph of the plurality of graphs having at least one graph node associated with one WD of the plurality of WDs; determining, using self-learning, at least one action to update at least one route in at least one graph of the plurality of graphs based on the collected plurality of graphs states, the self-learning comprising: entering a warm-up phase comprising exploring the plurality of graph states as a ground truth to self-learn optimizing network routes; and entering a production phase including: exploiting the explored plurality of graph states of the warm-up phase; and determining a plurality of actions including the at least one action to update the at least on route in at least one graph; and transmitting the at least one action to a controller for instructing at least one WD to update at least one network route based on the at least one action.
12 . The method of claim 11 , wherein the self-learning is based at least in part on a quality of service parameter.
13 . The method of claim 11 , wherein the plurality of graph states includes flowtables and metrics, the metrics including at least one of a transmission delay, a packet loss rate, and a queue delay.
14 . The method of claim 11 , wherein the self-learning further includes any one of: monitoring a topology of at least one graph of the plurality of graphs; when at least one WD has been one of removed from and added to the at least one graph, one of enter and continue with the warm-up phase; and when at least one WD has not been one of removed from and added to the at least one graph, one of enter and continue with the production phase.
15 . The method of claim 11 , wherein the self-learning further includes: selecting a random state depicting a graph snapshot and a random plurality of actions for each graph node; and evaluating the selected random state and the random plurality of actions using a probabilistic policy based on a derived quality value.
16 . The method of claim 15 , wherein the self-learning further includes: determining a reward based on a cost of the random plurality of actions, the selected random state, an overall transmission delay, a queue delay, and overall packet loss rate; determining a future state and a future action; evaluating an action selection policy based on the derived quality value; learning another quality value by evaluating an impact of the reward and how the future state and the future action compare with the selected random state and random plurality of actions; updating a current state with the future state and a current action with the future action; and capturing an overall quality value determine a convergence.
17 . The method of claim 16 , wherein updating the current state and the current action includes selecting a graph node that is a parent to another graph node, the selecting being based at least on the derived quality value.
18 . The method of claim 11 , wherein the controller is in the control plane, the at least one WD is in a data plane, and transmitting the at least one action triggers the WD to update the at least one network route.
19 . The method of claim 11 , wherein the plurality of graphs is a plurality of Destination Oriented Directed Acyclic Graphs, DODAGs.
20 . The method of claim 11 , wherein the communication system includes a wireless sensor network, the network routing is a Quality of Service, QoS, awareness-based routing that is performed in Routing Protocol for low Power and Lossy networks, RPL, in the wireless sensor network, the network node is a border router to the wireless sensor network, and the wireless sensor network is a IPv6 low power wireless personal area network, SD6LowPAN, network.

Description

CROSS-REFERENCE TO RELATED APPLICATIONS This application is a Submission Under 35 U.S.C. § 371 for U.S. National Stage Patent Application of International Application No.: PCT/IB2021/059777, filed Oct. 22, 2021 entitled “QoS AWARE REINFORCEMENT LEARNING PREVENTION INTRUSION SYSTEM,” which claims priority to U.S. Provisional Application No. 63/104,607, filed Oct. 23, 2020, entitled “QoS AWARE REINFORCEMENT LEARNING PREVENTION INTRUSION SYSTEM,” the entireties of both of which are incorporated herein by reference. TECHNICAL FIELD The present disclosure relates to wireless communications, and in particular, to rank attacks prevention and/or preventive security controls based at least on quality of service, QoS, awareness based routing in a wireless sensor network. BACKGROUND 6LowPAN Networks Wireless sensor networks (WSNs) are considered an important application of the Internet of Things (IoT). In general, WSNs can be considered Low Power and Lossy Networks (LLNs), presenting some constraints on their deployment such as in critical and large-scale scenarios (e.g., massively distributed, and heterogeneous networks). The resource-constrained limitations may prevent the deployment of WSNs in scenarios where the operation is subject to strict reliability and performance requirements. At the same time, the lack of flexibility stems from the rigidity of WSNs towards policy changes, making these networks difficult to adapt. The possibility of direct and bidirectional access to wireless devices using IP technology in WSNs may be considerably reduced the mentioned difficulties, but some other issues emerge concerning the complexity of interconnections. In WSNs, one goal is to provide end-to-end communication, which allows wireless devices to be accessed without the need for gateways to use network adaptation techniques to enhance efficiency and quality of wireless transmissions. In this context, the 6LoWPAN standard (IPv6 over low power wireless personal area network) has been developed to help avoid such adaptation techniques, thereby making it possible to reach WSN devices with IPv6 addresses. Nevertheless, due to common factors, such as the limited bandwidth, node failures, etc., the wireless links in multi-hop 6LowPAN are unstable, and therefore not reliable. These difficulties can severely impact the performance of the entire network. The routing decisions in IP-based networks are made by distributed protocols (e.g., Routing Information Protocol (RIP), Open Shortest Path First (OSPF), Border Gateway Protocol (BGP)) that may be used to maintain topology while reducing control overhead in the overall network. Low-power devices have a reduced radio range when compared to typical wireless devices/nodes that communicate with a single base station such that a multi-hop mesh allows systems (WSNs) to be extended over a greater area. Unfortunately, by introducing multiple hops, link uncertainty is compounded across the hop distance and can increase the chance of packets being dropped along the way. Further, the Response Packet Length (RPL) protocol has been adopted to manage routing in 6LowPAN networks where the RPL protocol is described in detail below. RPL Protocol RPL is an IPv6 routing protocol designed by the Internet Engineering Task Force (IETF) as a proposed standard. RPL organizes the network topology as DAGs (Directed Acyclic Graphs). A DAG can be partitioned into one or more Destination Oriented DAGs (DODAGs), where each DODAG has a root (Sink) node. Multiple sinks are connected through a backbone network consisting of border routers that connect them to the internet. RPL is a proactive routing protocol that starts to find routes based on a pre-defined Objective Function (OF) established as soon as the RPL network is initialized. The OF is used to deliver traffic to different routes according to traffic requirements. These requirements are encoded within the OF and used by the RPL during routing operations. RPL makes use of three different types of control messages, namely DIO (DODAG Information Object), DIS (DODAG Information Solicitation), and DAO (DODAG Advertisement Object), as illustrated in FIG. 1. The sink node (e.g., node A) transmits DIO messages at regular intervals determined by a trickle algorithm. The DIO message provides information to the sensor nodes that enable them to discover RPL instances, learn the configuration parameters, and select the preferred parent set. For the selection of the parent set, RPL uses the OF, which comprises of one or more routing metrics. The DIS message is used by a new sensor node or a floating DODAG to solicit DIO information from another node in its vicinity to join a DODAG. DAO messages are propagated by the sensor nodes to the sink node to update the topological view of the DODAG. Thus, the formation of the DODAG topology is maintained by the sink node. The RPL operations include neighborhood discovery, route generation, DAG construction, data path validation, and