CN-122028109-A - Satellite-ground communication network user association method based on multi-agent deep reinforcement learning

CN122028109ACN 122028109 ACN122028109 ACN 122028109ACN-122028109-A

Abstract

The invention discloses a satellite-ground communication network user association method based on multi-agent deep reinforcement learning, which comprises the steps of establishing a caching auxiliary satellite-ground fusion network system, realizing maximization of network utility as an optimization target in a caching replacement period, constructing an optimization problem P0 of maximizing network utility by taking the limit of base stations or low earth orbit satellites, the upper limit of service quantity of the base stations and the upper limit of service quantity of the low earth orbit satellites as constraint conditions for each user, converting the optimization problem P0 of maximizing network utility into a partially observable Markov process, constructing a neural network structure frame for decentralizing execution and centralizing training, and selecting optimal user association allocation by adopting a multi-agent deep learning strategy based on a value decomposition network based on the neural network structure frame. The invention takes high-energy-efficiency low-time delay high-quality service as a core target, and solves the problem of user association by designing a multi-agent deep reinforcement learning algorithm.

Inventors

GU SHUSHI
CHEN ZIPENG
ZHANG LONG
SUN JINGHAO
ZHANG QINYU

Assignees

哈尔滨工业大学(深圳)(哈尔滨工业大学深圳科技创新研究院)

Dates

Publication Date: 20260512
Application Date: 20251225

Claims (6)

1. A satellite-ground communication network user association method based on multi-agent deep reinforcement learning is characterized by comprising the following steps: Establishing a caching auxiliary satellite-ground fusion network system, wherein the system comprises N access points, each access point is provided with M users, each access point is provided with a specific base station and a low earth orbit satellite, and a geosynchronous orbit satellite provides services for each access point and each user; The method comprises the steps of (1) constructing an optimization problem P0 for maximizing network utility by taking the maximization of network utility as an optimization target in a cache replacement period and taking the limit of each user on using a base station or a low earth orbit satellite, the upper limit of the service quantity of the base station and the upper limit of the service quantity of the low earth orbit satellite as constraint conditions; Converting the optimization problem P0 for maximizing the network utility into a partially observable Markov process, and constructing a neural network structure frame for performing decentralization and training; Based on the neural network structure framework, selecting optimal user association allocation by adopting a multi-agent deep learning strategy based on a value decomposition network.
2. The method for correlating users with a satellite-to-ground communication network based on multi-agent deep reinforcement learning according to claim 1, wherein the method further comprises deriving network energy efficiency of user m in time slot t access point n and delay of user m request content in time slot t by local edge buffer device, wherein the network energy efficiency of user m in time slot t access point n is positively correlated with signal-to-noise ratio of user m selection correlation type, and the delay of user m request content in time slot t acquisition access point n is negatively correlated with cache hit rate of user m selection correlation type, wherein the correlation type is selected from any one of base station, low orbit satellite and geosynchronous orbit satellite.
3. The method for correlating users with satellite-to-ground communication network based on multi-agent deep reinforcement learning according to claim 1, wherein the constraint condition in the optimization problem P0 for maximizing the network utility specifically comprises: A user m in the access point n is associated with one of a base station, a low earth orbit satellite and a geosynchronous orbit satellite in a time slot t; the number of services of the base station cannot exceed an upper limit; The number of low earth orbit satellites must not exceed an upper limit.
4. The satellite-to-ground communication network user association method based on multi-agent deep reinforcement learning according to claim 1, wherein the neural network structure framework for performing the decentralization and the centralization training specifically comprises: Based on a partially observable Markov process, a state space S, an action space A and a reward function R are designed, wherein the state space S is defined as a channel state between a user m and a three-layer cache architecture of a base station, a low earth orbit satellite and a geosynchronous orbit satellite, a request content of the user m and a cache hit state of the three-layer architecture to the user m; in the centralized training stage, each user single agent performs action selection according to the state space S, and a global combined action value network is constructed through a value decomposition network to reversely update an individual action value network for improving network utility under the condition of meeting constraint conditions; in the decentralization execution stage, each user single agent makes a decision on user association through a part of observable states by using a trained neural network.
5. The method for correlating users of a satellite-to-ground communication network based on multi-agent deep reinforcement learning of claim 4, wherein each user agent in the centralized training stage adopts an epsilon-greedy strategy to perform action selection.
6. The method for correlating users of a satellite-to-ground communication network based on multi-agent deep reinforcement learning according to claim 1, wherein the multi-agent deep learning strategy based on a value decomposition network specifically comprises: initializing training parameters of a neural network; Entering a training round circulation, and initializing and setting system parameters of a caching auxiliary star-ground fusion network system in each round; Entering a time slot cycle in each round, wherein each user agent in each time slot respectively observes own environment and performs epsilon-greedy strategy selection user association action according to own action value network; Taking global action A (t) in combination with the selection of all user agents, obtaining global return R (t) and global state S (t+1) under environmental feedback, and storing each tuple in experience playback buffer In experience playback buffer When enough samples exist, randomly extracting a batch of samples B, training a global action cost function according to a double-layer deep Q learning network structure by minimizing a time sequence difference error, and updating network parameters by passing the individual action cost network at a learning rate alpha; After f u training time slots are executed, the currently obtained global action value network parameters are updated to a target action value network; After all time slots in one training round are ended, initializing and setting system parameters of the auxiliary space-ground fusion network system, and continuing to carry out the next training round until the end; And outputting the global target action value network after training is finished.

Description

Satellite-ground communication network user association method based on multi-agent deep reinforcement learning Technical Field The invention relates to the technical field of network communication, in particular to a satellite-to-ground communication network user association method based on multi-agent deep reinforcement learning. Background The star-ground fusion network has become a prospect framework for providing global seamless coverage and high-quality service by virtue of the unique advantages of wide coverage range and high reliability. Because of the inherent property of long propagation delay of the satellite link and the massive repeated access to the downloaded flow requests, huge communication pressure is caused to the backhaul link. The caching-assisted star-to-ground converged network, that is, the content with high heat is placed on the edge caching device close to the user in advance, and the cached content is directly delivered to the user to avoid additional backhaul link overhead, while the backhaul transmission pressure of the core network is relieved to a certain extent, the method still faces the serious challenge that on one hand, the dynamic movement of the user and the satellite causes the communication channel to present high dynamic performance, the 3GPP considers the channel dynamic range exceeding 20dB when modeling large-scale fading in Release 18 with respect to a non-terrestrial network, and the terrestrial network presents the dynamic of Rayleigh distribution due to multipath small-scale fast fading. On the other hand, the content of the hot spot repeatedly requested by the user and the cache data of the edge network device may change continuously with time, the popularity of the news video may be updated every 2-3 hours, the popularity of the movie may be updated every week, and the popularity of the music may change every month. These challenges limit how the association of users and edge devices can be made to achieve energy efficient low latency high quality communications. Disclosure of Invention Aiming at the characteristics of the caching-assisted satellite-ground fusion network, the invention provides a satellite-ground communication network user association method based on multi-agent deep reinforcement learning, which aims at analyzing the energy efficiency and time delay of the completion of communication service, and aims at the core goal of high-energy efficiency, low-time delay and high-quality service, and designs a multi-agent deep reinforcement learning algorithm to solve the user association problem. The technical scheme of the invention is as follows: A satellite-ground communication network user association method based on multi-agent deep reinforcement learning comprises the following steps: Establishing a caching auxiliary satellite-ground fusion network system, wherein the system comprises N access points, each access point is provided with M users, each access point is provided with a specific base station and a low earth orbit satellite, and a geosynchronous orbit satellite provides services for each access point and each user; The method comprises the steps of (1) constructing an optimization problem P0 for maximizing network utility by taking the maximization of network utility as an optimization target in a cache replacement period and taking the limit of each user on using a base station or a low earth orbit satellite, the upper limit of the service quantity of the base station and the upper limit of the service quantity of the low earth orbit satellite as constraint conditions; Converting the optimization problem P0 for maximizing the network utility into a partially observable Markov process, and constructing a neural network structure frame for performing decentralization and training; Based on the neural network structure framework, selecting optimal user association allocation by adopting a multi-agent deep learning strategy based on a value decomposition network. The method further comprises the steps of deducing network energy efficiency of a user m in a time slot t access point n and delay of a local edge cache device for obtaining user m request content in the access point n in the time slot t based on the established caching auxiliary satellite-ground fusion network system, wherein the network energy efficiency of the user m in the time slot t access point n is positively correlated with signal-to-noise ratio of a correlation type selected by the user m, and the delay of the local edge cache device for obtaining the user m request content in the access point n in the time slot t is negatively correlated with cache hit rate of the correlation type selected by the user m, wherein the correlation type is selected from any one of a base station, a low-orbit satellite and a geosynchronous orbit satellite. The further technical scheme of the invention is that in the optimization problem P0 for maximizing the network utility, constraint cond