CN-121986535-A - Apparatus and method for multi-target radio resource allocation

CN121986535ACN 121986535 ACN121986535 ACN 121986535ACN-121986535-A

Abstract

The present disclosure relates to resource allocation in a wireless communication network. The present disclosure proposes a network device for efficient multi-target radio resource allocation for user equipment. The network device includes a preference module and a policy module. The preference module is to determine a global preference vector, wherein the global preference vector describes an overall weight of each network performance indicator in a set of network performance indicators of one or more user devices, and provide the global preference vector to the policy module. The policy module is to obtain a global preference vector from the preference module and to make a resource allocation decision based on the global preference vector.

Inventors

Aiman Shuayak
Dimitris tislimontos
Theodore Ross Giannacas
Babakar Toure

Assignees

华为技术有限公司

Dates

Publication Date: 20260505
Application Date: 20231018

Claims (20)

1. A network device (100) for allocating resources to one or more user devices (200), characterized in that the network device (100) comprises a preference module (110) and a policy module (120), The preference module (110) is configured to: Determining a global preference vector (111), wherein the global preference vector (111) describes an overall weight of each network performance indicator in a set of network performance indicators of the one or more user devices (200), Providing the global preference vector (111) to the policy module (120), The policy module (120) is configured to: Obtaining the global preference vector (111) from the preference module (110), A resource allocation decision (121) is made based on the global preference vector (111).
2. The network device (100) of claim 1, wherein the preference module (110) is configured to: Obtaining one or more preference vectors (201) from the one or more user devices (200), wherein each preference vector (201) describes a weight of each network performance indicator in a set of network performance indicators of one of the one or more user devices (200), and generating the global preference vector (111) based on the one or more preference vectors (201), or The global preference vector is obtained (111) from a network operator.
3. The network device (100) according to claim 1 or 2, wherein the policy module (120) is further configured to: one or more local states of the one or more user devices (200) are acquired, -Making the resource allocation decision (121) based on the global preference vector (111) and the one or more local states of the one or more user equipments (200).
4. A network device (100) according to claim 3, wherein the policy module (120) is further configured to: A first neural network model is determined and, By inputting the global preference vector (111) and the one or more local states of the one or more user devices (200) to the first neural network model, obtaining a set of output values, -Making (121) the resource allocation decision based on the set of output values.
5. The network device (100) of claim 4, wherein the policy module (120) is further configured to: Making the resource allocation decision (121) by randomly selecting output values from the set of output values according to a probability distribution of the set of output values, or -Making the resource allocation decision (121) by selecting an output value from the set of output values having a preset value.
6. The network device (100) according to any one of claims 1 to 5, wherein the network device (100) further comprises a training module (130), the training module (130) being configured to: A second neural network model is determined and, Determining whether to update a first neural network model and/or the second neural network model used by the policy module (120) using the second neural network model.
7. The network device (100) of claim 6, wherein, The preference module (110) is further configured to: a set of global preference vectors (111) is determined, A set of preference samples (112) is obtained by sampling the set of global preference vectors (111), Providing the set of preference samples (112) to the training module (130), The training module (130) is further configured to: The set of preference samples (112) is received from the preference module (110).
8. The network device (100) according to claim 6 or 7 and claim 4 or 5, characterized in that, The policy module (120) is further configured to: Providing one or more local states of the one or more user devices (200) and one or more actions to a training module (130), wherein each action is a result of the resource allocation decision (121), The training module (130) is further configured to: The one or more local states and at least one action are stored in a buffer.
9. The network device (100) of any of claims 2 to 8, wherein the policy module (120) is further configured to: After making the resource allocation decision (121), one or more next local states of the one or more user equipments (200) are acquired, and/or At least one reward is obtained after making the resource allocation decision (121).
10. The network device (100) of claim 9 and any of claims 6 to 8, wherein the policy module (120) is further configured to: providing the one or more next local states of the one or more user devices (200) and/or the at least one reward to the training module (130), The training module (130) is further configured to: The one or more next local states and/or the at least one reward are stored in the buffer.
11. The network device (100) of claim 10, wherein, The training module (130) is further configured to: a set of conversion tuples is generated by sampling the information stored in the buffer, Training the first neural network model of the policy module (120) using the set of conversion tuples and the set of preference samples, and/or The second neural network model is trained using the set of conversion tuples and the set of preference samples.
12. The network device (100) of claim 11, wherein the training module (130) is further configured to: One or more loss functions are used to update one or more parameters associated with the first neural network model and/or the second neural network.
13. The network device (100) of claim 12, wherein, The training module (130) is further configured to: providing the one or more parameters associated with the first neural network model to the policy module (120), The policy module (120) is further configured to: the first neural network model is updated based on the one or more parameters.
14. The network device (100) of any one of claims 4 to 13, wherein the first and second neural network models form a reinforcement learning model.
15. The network device (100) of any of claims 7 to 14, wherein the reinforcement learning model is based on a strategy gradient approach.
16. The network device (100) of any of claims 2 to 15, wherein the preference module (110) is configured to: The global preference vector (111) is generated based on the one or more preference vectors (201) using at least one solution selected from a negotiation solution, a mean function, a game theory solution, a priority solution, or a neural network.
17. The network device (100) of any one of claims 1 to 16, wherein the set of network performance indicators comprises one or more of rate, throughput, latency, reliability, energy efficiency, fairness, and network coverage.
18. A user equipment (200) for assisting in allocating resources to one or more user equipments (200), characterized in that the user equipment is adapted to: Providing a preference vector (201) associated with the user device (200) to a network device (100), wherein the preference vector (201) describes a weight of each network performance indicator in a set of network performance indicators of one of the one or more user devices (200).
19. A method for allocating resources for one or more user equipments (200), the method comprising: Determining a global preference vector (111), wherein the global preference vector (111) describes an overall weight of each network performance indicator in a set of network performance indicators of the one or more user devices (200), A resource allocation decision (121) is made based on the global preference vector (111).
20. A method for assisting in allocating resources to one or more user equipments (200), the method comprising: A preference vector (201) associated with a user device (200) is provided to the network device (100), wherein the preference vector (201) describes a weight of each network performance indicator in a set of network performance indicators of one of the one or more user devices (200).

Description

Apparatus and method for multi-target radio resource allocation Technical Field The present disclosure relates to wireless communication networks, and more particularly to resource allocation in wireless communication networks. In order to allocate radio resources to devices with different performance requirements, the present disclosure proposes a network device and method for efficient multi-target radio resource allocation for User Equipment (UE) in a wireless network. Background In the existing wireless network, an Access Point (AP) is connected to a huge number of devices. One of the most important tasks of an AP is to allocate radio resources to those connected devices for communication with the outside world. For this purpose, the AP relies on state information of the device (e.g., queue length, channel gain, etc.) to optimize its performance. Typically, this is achieved by optimizing one of many important performance indicators of the wireless network, commonly referred to as key performance indicators (Key Performance Indicators, KPIs), such as rate, latency, reliability, energy efficiency, network coverage, etc. The goal of the operator is therefore to propose a resource allocation policy, essentially a function of mapping state information to actions (resource allocation decisions), making it perform well on the target KPIs. However, this approach is becoming more and more outdated because, as has been envisaged, next generation wireless networks will be able to support various services such as enhanced mobile broadband (enhanced mobile broadband, eMBB) applications and ultra-reliable and low-latency communications (URLLC). Importantly, these different services may need to perform well on different KPIs. For example, eMBB traffic requires a high throughput connection and URLLC traffic requires robust data exchange with stringent reliability and latency requirements. In order to support these services, it is apparent that the key requirement of an AP is to be able to perform well on more than one, and possibly even on a combination of multiple KPIs/targets. One convenient way to model priorities in the target space is to introduce preferences (linear weights). These preferences describe the relative importance between different KPIs. Thus, a new challenging problem is how to design a resource allocation strategy based on a so-called multi-objective optimization paradigm by constructing strategies that can perform well under a number of different services (even combinations). Disclosure of Invention In view of the above challenges, the present disclosure is directed to providing a scheme for efficient multi-target radio resource allocation in a wireless network. In particular, the objective is to devise a single resource allocation strategy that is optimized for any relative importance among multiple objectives. Another object is to provide a flexible solution that can provide a single policy throughout the preference space, thereby achieving optimal performance for a variety of different preference vectors. These and other objects are achieved by the solutions of the present disclosure provided in the independent claims. Advantageous implementations are further defined in the dependent claims. A first aspect of the present disclosure provides a network device for allocating resources to one or more user devices, the network device comprising a preference module and a policy module. The preference module is to determine a global preference vector, wherein the global preference vector describes an overall weight of each network performance indicator in a set of network performance indicators of one or more user devices, and provide the global preference vector to the policy module. The policy module is to obtain a global preference vector from the preference module and to make a resource allocation decision based on the global preference vector. The present disclosure enables a network device to make resource allocation decisions based on a global preference vector designed to direct AP (i.e., network device) performance toward good performance in a desired KPI. In one implementation of the first aspect, the preference module is configured to obtain one or more preference vectors from one or more user devices, wherein each preference vector describes a weight of each network performance indicator in a set of network performance indicators for one of the one or more user devices, generate a global preference vector based on the one or more preference vectors, or obtain the global preference vector from a network operator. Alternatively, separate device preferences may be received from each associated device. The network device then maps from the device preferences to the AP preferences (i.e., global preference vector). In another implementation, the global preference vector may be obtained directly from the network operator, for example by monitoring network conditions and daily/hourly traffic curves