US-20260129449-A1 - A SYSTEM AND METHOD FOR CHANNEL ACCESS IN OPPORTUNISTIC REINFORCEMENT LEARNING-BASED 802.11 NETWORKS

US20260129449A1US 20260129449 A1US20260129449 A1US 20260129449A1US-20260129449-A1

Abstract

The invention relates to an opportunistic reinforcement learning-based system developed for channel access and selection in 802.11 networks, which allows users to improve the quality of service received from the network and a method which enables the system to operate.

Inventors

Mehmet ARIMAN
Lal Verda ÇAKIR
Mehmet ÖZDEM
Berk Canberk
Gökhan YURDAKUL

Assignees

BTS KURUMSAL BILISIM TEKNOLOJILERI ANONIM SIRKETI

Dates

Publication Date: 20260507
Application Date: 20231229
Priority Date: 20231222

Claims (5)

1 . An opportunistic reinforcement learning-based system with computer-aided machine learning that includes at least one processor, which is developed for channel access and selection in 802.11 networks and allows users to improve the quality of service received from the network, characterized in that it comprises: at least one device ( 1 ) which is located in an 802.11 network and communicates over that network, at least one channel selection controller ( 2 ) based on opportunistic reinforcement learning, which provides data transmission in wireless communication and performs channel selection between the networks, at least one software module ( 3 ), which is a deep Q network (DQN) agent performing an action selection to carry out the channel selection, at least one rule network module ( 4 ), which is a deep neural network, inputting a medium status and estimating the probabilities for each action, at least one destination network module ( 5 ) which avoids the blocking of the evaluation of the updated network arising from the successive implementation of the actions applied to the medium, at least one optimization module ( 6 ), which allows the weights of the destination network module ( 5 ) to be optimized, at least one data storage unit ( 7 ), which is an experience memory unit in which the actions taken by the software module ( 3 ), which is a DQN agent, the rewards obtained, and the situations obtained by the medium in response to the action are recorded, at least one reward calculation module ( 8 ) which calculates the success (reward) of the channel (action) to be selected considering the channel density in data transmission, at least one status module ( 9 ), which generates status data using the device's ( 1 ) location data in the second and third dimensions, timestamp data, and signal values read from the channels.
2 . A method for operating an opportunistic reinforcement learning-based system with a computer-aided machine learning that includes at least one processor, which is developed for channel access and selection in 802.11 networks and allows users to improve the quality of service received from the network, characterized in that it comprises the steps of: initiating a training process by the software module ( 3 ), which is a deep Q network agent, and creating the rule network module ( 4 ), the destination network module ( 5 ) and the data storage unit ( 7 ) ( 1000 ), determining the network training repetition limit and equating the network training repetition counter to 0 ( 1001 ), checking whether the network training counter value is greater than the network training repetition limit ( 1002 ), generating status data (s t ) by the status module ( 9 ), inputting the status data s t ) to the rule network, selecting an action by discovery or exploit, and transmitting the selected action data to the medium ( 1003 ), calculating the reward received in response to the action applied by the reward calculation module ( 8 ) ( 1004 ), generating new status data by the status module ( 9 ) ( 1005 ), recording the status, action, reward and new status data in the data storage unit ( 7 ) and generating the number of samples by increasing the network training repetition counter by 1 ( 1006 ), checking the number of samples in the data storage unit ( 7 ) ( 1007 ), if there are enough samples, take a batch of samples from the data storage unit ( 7 ) ( 1008 ), inputting the samples taken to the rule network module ( 4 ) and the destination network module ( 5 ) ( 1009 ), calculating the mean square difference using the outputs of the rule network module ( 4 ) and destination network modules ( 5 ) ( 1010 ), transferring the average square difference to the optimization module ( 6 ) and updating the rule network module ( 4 ) in the optimization module ( 6 ) ( 1011 ), checking whether the training of the rules network module ( 4 ) and the network training repetition limit has been completed ( 1012 ), if it has been completed, transferring the weights in the rule network module ( 4 ) to the destination network module ( 5 ) ( 1013 ), or if the training has not been completed, checking whether the network training counter value is greater than the network training repetition limit ( 1002 ), completing the training of the software module ( 3 ), which is the Deep Q Network Agent and new channel selection and access in the destination network module ( 5 ) by the controller ( 2 ) ( 1014 ).
3 . A method as claimed in claim 2 , characterized by comprising randomly selecting an action by discovery, or selecting the highest probability of the actions produced in response to the input of the destination module ( 5 ) status information.
4 . A method as claimed in claim 3 , characterized by calculating the mean square difference using the outputs of the rule network module ( 4 ) and destination network modules ( 5 ) according to the Mean Squared Error (MSE) formula.
5 . A method as claimed in claim 4 , characterized in that upon completion of the training phase, the destination network module ( 5 ) and the software module ( 3 ), which is a deep Q network agent, operates in an inference mode.

Description

TECHNICAL FIELD The invention relates to an opportunistic reinforcement learning-based system developed for channel access and selection in 802.11 networks, which allows users to improve the quality of service received from the network and a method which enables the system to operate. STATE OF THE ART 802.11 networks are based on the IEEE 802.11 standard. This standard defines data transfer, network security, and other related characteristics in wireless communications. 802.11 networks are widely used, so the communication medium becomes crowded, and ultimately, the quality of service received from the network by the users decreases over time. For the effective use of resources in 802.11 networks in the state of the art, the density should be distributed subtly and evenly throughout the channels. Different channels need to be assigned to different devices located in close proximity, and this problem is similar to the vertex coloring problem in graphs. Said problem is also called k-coloring problem and is an NP-hard problem to solve. These NP-stiffness (Non-Deterministic Polynomial-Time Hardness) problems also apply to channel selection. To overcome the computational complexity of the variable medium of a wireless medium, the Access Point (AP) vendor scoring system, vertical frequency selection, and selection-based algorithms are utilized. If APs among said algorithms are especially produced by the same manufacturer, improvements in computation and channel density and an increase in efficiency may be observed. However, the channel selection problem persists for the other 802.11 networks located in the areas where these networks are located. Another problem arising from the channel density is the interference problem. In dense network regions, data conflicts that occur as a result of many devices trying to use the channel at the same time are called interference. This may degrade wireless network performance, slow down the connection speeds, or cause the connections to drop. In order to avoid this, there are many methods based on broadcast power control. These include centralized mechanisms based on sending data to a central controller, and mechanisms in which different network users exchange data between and decide on the appropriate values. However, the fact that these mechanisms require access points to be controlled by a structure reduces their applicability. In the state of the art and in order to eliminate the above-mentioned disadvantages, new systems and methods need to be developed. SUMMARY OF THE INVENTION The present invention relates to an opportunistic reinforcement learning-based system developed for channel access and selection in 802.11 networks, which allows users to improve the quality of service received from the network and a method which enables the system to operate, in order to eliminate the above-mentioned disadvantages and provide the relevant technical field with new advantages. The invention detects the channel expected to have a minimum density between the channels by collecting data from the medium using an opportunistic reinforcement learning-based system and a method that ensures the operation of the system and increases the quality of service received from the networks/network by the users. The system and method of the invention allow the networks located around, which may operate in the 802.11 network and communicate in the same medium to evaluate the opportunities arising from the operating mechanisms. The opportunistic reinforcement learning-based channel selection controller included in the system of the invention reduces the computational complexity in the networks and increases the required correct channel selection success. DESCRIPTION OF THE DRAWINGS Embodiments of the present invention which are briefly summarized above and discussed in more detail below, may be understood by referring to the exemplary embodiments of the invention illustrated in the accompanying drawings. It should be noted, however, that the accompanying drawings only depict typical embodiments of this invention and are not to be construed as limiting the scope thereof. FIG. 1. is a representative view of a diagram showing the operation principle of the invention. FIG. 2. is a representative view of a diagram showing the principle of operation of the method according to the invention. DESCRIPTION OF THE REFERENCES IN THE DRAWINGS In order to provide a better understanding of the invention, the numerals in the drawings are provided below: 1. Device2. Controller3. Software Module4. Rule Network Module5. Destination Network Module6. Optimization Module7. Data Storage Unit8. Calculation Module9. Status Module1000. Initiating a training process by the software module, which is a deep Q network agent, and creating the rule network module, the destination network module and the data storage unit1001. Determining the network training repetition limit and equating the network training repetition counter to 010