Search

CN-122028158-A - LDACS station network power optimization method and system based on reinforcement learning

CN122028158ACN 122028158 ACN122028158 ACN 122028158ACN-122028158-A

Abstract

The invention discloses a LDACS station network power optimization method and system based on reinforcement learning, which belong to the technical field of aviation communication network resource management and intelligent decision making, and comprise the following steps of S1, problem formalization and mathematical modeling; S2, constructing a Markov decision process model of network power decision, S3, training reinforcement learning agents, S4, applying the trained agents to carry out power decision, S5, post-processing and output of a scheme. According to the LDACS station network power optimization method and system based on reinforcement learning, the physical parameters of the ground station transmitting power are adaptively regulated and controlled, and the propagation distance of radio signals in space is changed, so that the coverage performance and energy efficiency index of an aviation communication network are improved.

Inventors

  • WANG ZHIPENG
  • HUANG SIQI
  • ZHU YANBO
  • GUO KAI

Assignees

  • 北京航空航天大学

Dates

Publication Date
20260512
Application Date
20260319

Claims (8)

  1. 1. A LDACS station network power optimization method based on reinforcement learning is characterized by comprising the following steps: S1, performing problem formalization and mathematical modeling, namely performing mathematical modeling on LDACS ground network power distribution problems, and defining decision variables, objective functions and constraint conditions; S2, constructing a Markov decision process model of network power decision, namely forming a power distribution problem into a sequential decision process based on mathematical modeling of S1, and defining the four core elements of a state space, an action space, a state transfer function and a reward function; S3, training the reinforcement learning agent, namely constructing a deep reinforcement learning algorithm frame, and learning an optimal power distribution strategy through interactive training of the agent and a simulation environment; S4, performing power decision by applying the trained intelligent agent, namely deploying the training converged reinforcement learning intelligent agent model as a power decision engine, and performing power optimization on a specific network instance; and S5, carrying out scheme post-processing and output, namely carrying out local optimization, verification and final configuration on the power configuration scheme output by the intelligent agent.
  2. 2. The LDACS station network power optimization method based on reinforcement learning according to claim 1, wherein S1 specifically comprises the following steps: S11, defining the inclusion in the network Individual ground stations, denoted as a set Ith ground station Including known geographical locations And pre-assigned operating frequency ; Defining the coverage in a network The key waypoints are recorded as a set The j-th key waypoint Is the geographic position of (a) ; S12, defining each ground station Is set to the transmission power of (a) Selected from a discrete set of power levels, denoted as Wherein a level of 0 indicates that the station is turned off, Is the highest power level; the power configurations of all stations constitute a decision vector ; S13, according to the wireless propagation model, the ground station At power level Effective coverage radius of Determined by a link budget equation; Using free space propagation models as the basis, path loss Distance from Frequency of The relationship of (2) is as follows: ; Defining each ground station For each key waypoint Coverage indication function of (2) The following formula is shown: ; In the formula, Is the geographic distance; S14, defining the overall coverage rate of the network to all the waypoints For the waypoint proportion covered by at least one active ground station, the following formula is: ; In the formula, Is an indication function; Total transmit power of network The following formula is shown: ; In the formula, The value is 1 for the weight coefficient; s15, forming a power distribution problem into a constraint optimization problem, wherein the constraint optimization problem is represented by the following formula: ; ; ; In the formula, Is a preset overall coverage threshold.
  3. 3. The LDACS station network power optimization method based on reinforcement learning according to claim 2, wherein S2 specifically comprises the following steps: S21, a state space S, wherein a state vector is used for representing the overall condition of the network at a certain decision moment; The state vector includes: Current power class coding, current network pair for individual ground stations Overall coverage rate of each key waypoint, and abstract information obtained after feature extraction of the currently uncovered waypoints; The method comprises the steps of extracting the characteristics of uncovered waypoints, gathering the uncovered waypoints of any ground station into K clusters by adopting a clustering algorithm, taking the central point coordinate of each cluster and the quantity of the waypoints contained in the cluster as characteristics, and transcoding variable quantity of uncovered waypoint information into a characteristic vector with fixed length; s22, defining an action space A, wherein the action space A defines the operation which can be executed by the agent in each step; Each action corresponds to one adjustment of the transmitting power of a specific ground station, and the specific adjustment mode comprises the steps of increasing the power of the station by a predefined level, reducing the power by one level and keeping the current level unchanged; the action space is of the scale of ; S23, after the intelligent agent executes an action, the coverage area of all ground stations in the network is recalculated according to the wireless propagation model, and the covered state of the waypoints is updated, so that the intelligent agent is transferred to the next new network state; s24, rewarding function, wherein the rewarding function is used for evaluating the instant quality of actions executed by the intelligent agent, and the mathematical expression is shown as follows: ; ; ; In the formula, Indexing the training iteration step number; is a time step Instant rewards obtained by the intelligent agent; , , the priority levels of the three targets, namely the weight coefficient, are respectively used for balancing energy conservation, coverage lifting and coverage punishment; Is the total power variation; The overall coverage rate achieved for the current network; The forward gain is coverage; penalty items for covering failure to reach the standard; configuring a vector for the current power; indicating that the agent is executing the action, at the time step Is provided.
  4. 4. The LDACS station network power optimization method based on reinforcement learning according to claim 3, wherein S3 specifically comprises the following steps: s31, constructing a deep reinforcement learning intelligent agent, namely adopting a near-end strategy optimization algorithm frame; the agent comprises two major core networks: policy network Input state Outputting probability distribution on the action space A, wherein the network adopts a multi-layer perceptron or an encoder combined with an attention mechanism; Value network Input state Assessing a long-term expected return for the status; S32, interaction and experience collection, namely placing the intelligent agent in the simulation environment constructed by the S1 and the S2, initializing the environment to be a default state in each training round, and observing the current state of the intelligent agent according to the current strategy network of the intelligent agent And sample an action Executing, after the environment receives the action, updating the power configuration, calculating the new coverage rate, and transferring to the next state And calculate the instant rewards Experience of this interaction Store in an experience playback buffer; S33, model updating and learning, namely periodically sampling a batch of historical experience data from an experience playback buffer zone, wherein the historical experience data is used for updating a strategy network and a value network of an intelligent agent, and a reinforcement learning algorithm based on strategy gradient ensures training stability by limiting the amplitude of each strategy updating, and an objective function of the reinforcement learning algorithm The following formula is shown: ; ; In the formula, In state for new and old strategies Down selection action Probability ratio of (2); Is the current policy network; Is an old policy network; is an empirically desired operator; estimating a value for a dominance function; is a clipping function; is a policy network parameter; cutting out super parameters; s34, iterative training, namely gradually learning a power distribution strategy by an intelligent agent through a large number of rounds of iterative training 。
  5. 5. The method for optimizing LDACS th station network power based on reinforcement learning of claim 4, wherein S4 specifically comprises deploying the reinforcement learning agent model after training convergence as a power decision engine, inputting network parameters when power optimization is required to be carried out on a LDACS ground network, enabling the decision engine to simulate the decision process of the agent, starting from an initial state, automatically generating a series of power adjustment instructions until the network state meets a termination condition, and outputting an optimized power configuration scheme.
  6. 6. The method for optimizing LDACS th station network power based on reinforcement learning of claim 5, wherein S5 specifically comprises performing iterative local search strategy by adopting greedy strategy, traversing ground stations in each working state, reducing power level of ground stations on the premise of not reducing coverage rate, verifying coverage performance of a final scheme, and generating a power configuration instruction list of each station.
  7. 7. A LDACS station network power optimization system based on reinforcement learning, which is applied to a LDACS station network power optimization method based on reinforcement learning as set forth in any one of claims 1 to 6, and is characterized in that a three-layer architecture design is adopted, comprising: The application layer is provided with a network planning management terminal, provides a man-machine interaction interface for a user, is used for starting an optimization task, configuring network parameters, monitoring training and decision making processes, and checking and confirming an optimization result; The system comprises a model training server, a decision-making reasoning server, a simulation calculation server, a network coverage simulation server, a simulation control server and a control system, wherein the model training server is used for running a reinforcement learning training service and constructing a simulation training environment according to input network parameters, and executing a training process of reinforcement learning intelligent agents, the training process comprises a strategy network, a value network, an experience playback buffer zone and a model optimizer; the data and resource layer comprises a network parameter database for storing and configuring the network parameters, the waypoint information, the power level and the propagation model parameters of the ground station, and a computing resource pool for providing flexible computing power support for each computing task of the service layer.
  8. 8. The LDACS station network power optimization system based on reinforcement learning according to claim 7 is characterized in that the output of a network planning management terminal is connected to a model training server, a decision reasoning server and a simulation calculation server, the model training server is in bidirectional connection with the simulation calculation server to conduct interaction of training data and a simulation environment, the output of the model training server is connected to the decision reasoning server, the output of the decision reasoning server is verified by the simulation calculation server and then returned to the network planning management terminal to generate a final configuration report, and a transmitting power control instruction is issued to a ground station radio frequency unit through a base station control interface.

Description

LDACS station network power optimization method and system based on reinforcement learning Technical Field The invention relates to the technical field of aviation communication network resource management and intelligent decision making, in particular to a LDACS station network power optimization method and system based on reinforcement learning. Background As L-band digital aviation communication systems (L-band Digital Aeronautical Communication System, LDACS) move from the technical research and verification phase to the actual deployment phase, the fine planning and operational optimization of their ground networks become key links for realizing their commercial value. After completing the allocation of candidate ground station working frequencies based on electromagnetic compatibility constraint, the network planning faces a core engineering decision problem, namely how to allocate proper transmitting power for each ground station in the network, thereby ensuring the service performance of the ground station network and realizing resource saving. Currently, a conservative but non-optimal strategy is often adopted in practical engineering, namely, setting the transmitting power of all the ground stations to the supervision maximum allowed by the working frequency. While this strategy maximizes the potential coverage of a single station and ensures basic coverage redundancy, it presents significant drawbacks, firstly, it results in an unnecessary increase in the total radiated power of the network, increasing the overall energy consumption and operating costs of the system, and secondly, excessive power may exacerbate inter-station interference or create higher aggregate interference levels at the system level, causing additional stress on the otherwise crowded L-band spectrum environment. Therefore, from the standpoint of overall efficiency and economy of the network, there is a need for an intelligent power distribution scheme that minimizes the total transmit power of the network while ensuring that the target route reaches a predetermined coverage level. Disclosure of Invention The invention aims to provide a LDACS station network power optimization method and system based on reinforcement learning, which change the propagation distance of a radio signal in space by adaptively regulating and controlling physical parameters of ground station transmitting power, thereby improving coverage performance and energy efficiency indexes of an aviation communication network. In order to achieve the above purpose, the invention provides a LDACS station network power optimization method based on reinforcement learning, which comprises the following steps: S1, performing problem formalization and mathematical modeling, namely performing mathematical modeling on LDACS ground network power distribution problems, and defining decision variables, objective functions and constraint conditions; S2, constructing a Markov decision process model of network power decision, namely forming a power distribution problem into a sequential decision process based on mathematical modeling of S1, and defining the four core elements of a state space, an action space, a state transfer function and a reward function; S3, training the reinforcement learning agent, namely constructing a deep reinforcement learning algorithm frame, and learning an optimal power distribution strategy through interactive training of the agent and a simulation environment; S4, performing power decision by applying the trained intelligent agent, namely deploying the training converged reinforcement learning intelligent agent model as a power decision engine, and performing power optimization on a specific network instance; and S5, carrying out scheme post-processing and output, namely carrying out local optimization, verification and final configuration on the power configuration scheme output by the intelligent agent. Preferably, S1 specifically comprises the following steps: S11, defining the inclusion in the network Individual ground stations, denoted as a setIth ground stationIncluding known geographical locationsAnd pre-assigned operating frequency; Defining the coverage in a networkThe key waypoints are recorded as a setThe j-th key waypointIs the geographic position of (a); S12, defining each ground stationIs set to the transmission power of (a)Selected from a discrete set of power levels, denoted asWherein a level of 0 indicates that the station is turned off,Is the highest power level; the power configurations of all stations constitute a decision vector ; S13, according to the wireless propagation model, the ground stationAt power levelEffective coverage radius ofDetermined by a link budget equation; Using free space propagation models as the basis, path loss Distance fromFrequency ofThe relationship of (2) is as follows: ; Defining each ground station For each key waypointCoverage indication function of (2)The following formula is shown: ; In the