CN-121984546-A - Reinforced learning-based method, equipment and medium for power control of unmanned cellular large-scale MIMO unmanned aerial vehicle

CN121984546ACN 121984546 ACN121984546 ACN 121984546ACN-121984546-A

Abstract

The invention discloses a non-cellular large-scale MIMO unmanned aerial vehicle associated power control method, equipment and medium based on reinforcement learning, and relates to the technical field of wireless communication. The method comprises the steps of obtaining communication system parameters comprising multiple unmanned aerial vehicles and access points, constructing a non-honeycomb large-scale MIMO transmission model, determining a state space, an action space and a reward function based on the non-honeycomb large-scale MIMO transmission model, respectively constructing intelligent bodies aiming at the problem of optimizing the uplink power control coefficient of the unmanned aerial vehicle in the system and the problem of clustering the unmanned aerial vehicles of the access points, training the intelligent bodies by utilizing a depth deterministic strategy gradient reinforcement learning algorithm, and obtaining the unmanned aerial vehicle clustering result of the uplink power control coefficient of the unmanned aerial vehicle in the system and each access point by utilizing the intelligent bodies based on an optimized uplink power distribution strategy and an access point unmanned aerial vehicle clustering strategy. The invention can realize the efficient joint optimization of unmanned aerial vehicle access point association and uplink power control under the fast-changing channel condition, and has stronger practical applicability under the low-altitude economic background.

Inventors

ZHANG QI
JIA XURI
CAI SHU

Assignees

南京邮电大学

Dates

Publication Date: 20260505
Application Date: 20251229

Claims (10)

1. The method for associating power control of the unmanned cellular large-scale MIMO unmanned aerial vehicle based on reinforcement learning is characterized by comprising the following steps: acquiring communication system parameters comprising multiple unmanned aerial vehicles and access points; constructing a honeycomb-free large-scale MIMO transmission model based on the acquired communication system parameters; Based on the honeycomb-free large-scale MIMO transmission model, determining a state space, an action space and a reward function, and respectively constructing intelligent bodies aiming at the problem of optimization of an uplink power control coefficient of the unmanned aerial vehicle in the system and the problem of clustering optimization of the unmanned aerial vehicle with an access point; training the intelligent agent by using a depth deterministic strategy gradient reinforcement learning algorithm to respectively optimize an unmanned aerial vehicle uplink power distribution strategy and an access point unmanned aerial vehicle clustering strategy; And obtaining an unmanned aerial vehicle uplink power control coefficient and unmanned aerial vehicle clustering results of all access points in the system by using the intelligent agent based on the optimized uplink power distribution strategy and the access point unmanned aerial vehicle clustering strategy.
2. The reinforcement learning-based cellular-free massive MIMO drone associated power control method of claim 1, wherein the agents include clustered agents and power agents, the clustered agents and power agents being composed of a policy network and a value network, respectively, and creating experience playback pools, respectively, for storing interaction data.
3. The reinforcement learning-based cellular-free massive MIMO drone associated power control method of claim 2, wherein training the agent setting phased training strategy using a depth deterministic strategy gradient reinforcement learning algorithm comprises performing a clustered agent training phase and a power agent training phase, respectively, during each round of training corresponding to a round of flight communication process completed by the drone along a given trajectory, and: In a clustering agent training stage, determining power control parameters by utilizing a pilot frequency distribution baseline, performing optimization training on an access point unmanned aerial vehicle clustering strategy of the clustering agent, and obtaining an unmanned aerial vehicle clustering result by the clustering agent based on the environment; In the training stage of the power intelligent agent, the clustering control parameters are set as the result obtained in the current round of clustering intelligent agent training stage, the uplink power distribution strategy of the unmanned aerial vehicle of the power intelligent agent is trained, and the power intelligent agent obtains the uplink power control coefficient of the unmanned aerial vehicle in the system based on the environment.
4. The reinforcement learning-based non-cellular massive MIMO drone associated power control method of claim 1, wherein the non-cellular massive MIMO transmission model includes an uplink transmission rate model of a communication system including multiple drones and access points, expressed as: (1) Wherein, the Is the first Access point number The variance of the estimated channels of the individual drones, Represents the uplink transmission power of the unmanned aerial vehicle, Is the first The uplink power control coefficient of the unmanned aerial vehicle is obtained through reinforcement learning training, Representation service of the first A set of access points for the individual drones, Represented by the first A collection of drones served by a single access point, Is the first Access point number Fading coefficients between access points of the drones of the individual drones, Is the first Access point number Fading coefficients between access points of unmanned aerial vehicles among unmanned aerial vehicles, For the total number of unmanned aerial vehicles, Is the first The individual drones use the conjugate transpose of the pilots, Is the first The pilot used by the individual unmanned aerial vehicle, Is the first And the uplink power control coefficient of the unmanned aerial vehicle.
5. The reinforcement learning-based non-cellular massive MIMO unmanned aerial vehicle associated power control method according to claim 4, wherein the fading coefficient between unmanned aerial vehicle access points is determined according to the line-of-sight propagation and non-line-of-sight propagation probabilities of wireless links in the non-cellular massive MIMO transmission model, and the formula is expressed as: (2) Wherein, the Is the first Access point number The probability of line-of-sight propagation between the individual unmanned aerial vehicles, And Path attenuation factors for line-of-sight propagation and non-line-of-sight propagation respectively, Is the first Access point number Fading coefficients between the individual unmanned aerial vehicles, Is the first Unmanned aerial vehicle and the first The Euclidean distance between the access points is calculated as: (3) Wherein, the first The horizontal coordinate of the personal unmanned aerial vehicle is First, the Horizontal coordinates of individual access points , Is the flying height of the unmanned aerial vehicle.
6. The reinforcement learning-based cellular-free massive MIMO drone associated power control method of claim 4 or 5, wherein the formula of estimating the variance of the channel is: (4) Wherein, the Is the first Access point number The variance of the estimated channels of the individual drones, Is the length of the pilot signal and, Is the average transmit power of the pilot signal, Is the first Access point number Fading coefficients between the individual unmanned aerial vehicles, Is the first The individual drones use the conjugate transpose of the pilots, Is the first Pilot used by the individual drone.
7. The reinforcement learning-based cellular massive MIMO drone association power control method of claim 1 or 4, wherein the state space is defined as a matrix of large-scale fading coefficients between all drones and access points, each element in the matrix Represent the first Unmanned aerial vehicle frame Channel gains between access points; The motion space comprises clustering motion vectors And a power control motion vector Wherein Representation unmanned aerial vehicle To an access point Is used for the connection degree of the connecting rod, Representation unmanned aerial vehicle Is provided.
8. The reinforcement learning based cellular-free massive MIMO drone associated power control method of claim 1 or 4, wherein the reward function is defined as: (5) Wherein, the For the total number of unmanned aerial vehicles, Is the first The uplink transmission rate of the individual unmanned aerial vehicle, Is that The total uplink transmission rate of the unmanned aerial vehicle at moment; Training the intelligent agent by using a depth deterministic strategy gradient reinforcement learning algorithm to respectively optimize an uplink power distribution strategy of the unmanned aerial vehicle and an access point unmanned aerial vehicle clustering strategy, wherein the optimization problems are expressed as follows: (6) Is the first The uplink power control coefficient of the individual unmanned aerial vehicle, Denoted as the first Number of APs in the access point set served by the drone.
9. An electronic device, comprising a processor and a storage medium; The storage medium is used for storing instructions; the processor is configured to operate in accordance with the instructions to execute the computer program to implement the reinforcement learning based cellular-free massive MIMO drone associated power control method steps of any one of claims 1 to 8.
10. A computer readable storage medium having stored thereon a computer program, which when executed by a processor, implements the steps of the reinforcement learning based unmanned cellular massive MIMO drone associated power control method of any of claims 1 to 8.

Description

Reinforced learning-based method, equipment and medium for power control of unmanned cellular large-scale MIMO unmanned aerial vehicle Technical Field The invention relates to the technical field of wireless communication, in particular to a non-cellular large-scale MIMO unmanned aerial vehicle associated power control method, equipment and medium based on reinforcement learning. Background In recent years, with the widespread deployment of fifth-generation mobile communication (The Fifth Generation Wireless Communications, 5G) worldwide. Research and full deployment of sixth generation mobile communication technologies (The Sixth Generation Wireless Communications, 6G) for 2030 and beyond. The core driver of the 6G integrates all the characteristics of the 5G, such as network densification, high throughput, low power consumption, large link and the like, and can support multi-scene emerging applications including intelligent medical treatment, automatic driving, industrial control and the like. However, with the continued expansion of application scenarios and the continued increase in the number of users, conventional communication architectures face a number of challenges. In this context, the concept of Multiple-Input Multiple-Output (MIMO) without cellular large-scale has been developed, providing new ideas and directions for further development of wireless communications. The non-cellular massive MIMO breaks the architecture of the traditional cellular network, each user equipment is served by a large number of distributed Access Points (APs) connected to a central processing unit (Central Processing Unit, CPU), the limitation of cell boundaries is effectively eliminated, the interference among cells can be greatly relieved and even eliminated, meanwhile, the difference between the center and the edge users is eliminated due to the distribution of balanced nodes, and the seamless connection of the service quality under high-speed mobility is ensured. With the rapid development of low-altitude economy in China, unmanned aerial vehicles are gradually changed from auxiliary equipment to key components of a network system as important information acquisition and communication nodes. Disclosure of Invention The invention aims to provide a non-cellular large-scale MIMO unmanned aerial vehicle association power control method, equipment and medium based on reinforcement learning, which are used for training the rate expression in stages through a reinforcement learning algorithm so as to realize efficient calculation of the optimal unmanned aerial vehicle access point clustering relation and uplink power control coefficient. In order to achieve the above purpose, the invention is realized by adopting the following technical scheme. In one aspect, the invention provides a reinforcement learning-based non-cellular large-scale MIMO unmanned aerial vehicle associated power control method, which comprises the following steps: acquiring communication system parameters comprising multiple unmanned aerial vehicles and access points; constructing a honeycomb-free large-scale MIMO transmission model based on the acquired communication system parameters; Based on the honeycomb-free large-scale MIMO transmission model, determining a state space, an action space and a reward function, and respectively constructing intelligent bodies aiming at the problem of optimization of an uplink power control coefficient of the unmanned aerial vehicle in the system and the problem of clustering optimization of the unmanned aerial vehicle with an access point; training the intelligent agent by using a depth deterministic strategy gradient reinforcement learning algorithm to respectively optimize an unmanned aerial vehicle uplink power distribution strategy and an access point unmanned aerial vehicle clustering strategy; And obtaining an unmanned aerial vehicle uplink power control coefficient and unmanned aerial vehicle clustering results of all access points in the system by using the intelligent agent based on the optimized uplink power distribution strategy and the access point unmanned aerial vehicle clustering strategy. Optionally, the agents include a cluster agent and a power agent, which are respectively composed of a policy network and a value network, and respectively create an experience playback pool for storing interaction data. The reinforcement learning training agent is independent of the analytic expression of the model, and can learn the optimal action strategy through continuous interaction with the environment, so that the self-adaptive optimization of access point association and uplink power control can be realized under the condition of rapid change of the channel state. Optionally, the training the agent setting phased training strategy by using the depth deterministic strategy gradient reinforcement learning algorithm includes respectively executing a clustering agent training phase and a power agent training