CN-121984872-A - Network planning method and device based on hybrid reinforcement learning strategy

CN121984872ACN 121984872 ACN121984872 ACN 121984872ACN-121984872-A

Abstract

The application discloses a network planning method and device based on a hybrid reinforcement learning strategy, wherein the method comprises the following steps: first, a data link network topology model and a strategy support library are established to provide a basic network structure and an adjustment strategy. And then, constructing an optimization strategy model based on multiple relation constraints, and performing efficient network planning decision optimization through a mixed reinforcement learning strategy so as to cope with a complex network environment. And then, generating an initial network planning structure through small sample data input, and finally, performing security inspection through a trusted network planning decision model to obtain a target network planning structure meeting the standard. The application adopts a mixed strategy reinforcement learning method, realizes high-dimensional target optimization and agile high-efficiency decision under the condition of environmental dynamic change, introduces an intelligent planning technology based on a small data set and experience knowledge, and enhances the credibility and safety of network planning decision.

Inventors

XIE WEI
YUAN SHANDONG
CHENG YONGJING
ZHOU HAN
ZHANG CHEN
WANG MIN
REN YUN
YAN KAI

Assignees

中国人民解放军信息支援部队工程大学

Dates

Publication Date: 20260505
Application Date: 20251231

Claims (10)

1. A network planning method based on a hybrid reinforcement learning strategy, comprising: Establishing a data link network topological structure model, and presetting a data link network structure adjustment optimization strategy according to different service requirements based on the data link network topological structure model to obtain a strategy support library; constructing an optimization strategy model of a data link network structure based on multiple relation constraints based on the strategy support library; performing network planning decision optimization on the optimization strategy model by adopting a agile high-efficiency reinforcement learning decision method of a mixed strategy to obtain a high-efficiency decision model; Taking small sample data in a real scene as input of the efficient decision model, and acquiring an initial network planning structure output by the efficient decision model; The method comprises the steps of taking the initial network planning structure as input of a trusted network planning decision model, obtaining a target network planning structure conforming to a security inspection standard, wherein the trusted network planning decision model is a decision model established in a loop by an expert based on a preset action library, and comprises the steps of constructing the preset action library based on the security inspection standard, obtaining actions of the initial network planning structure, judging whether the actions are dangerous actions before the actions are executed, and replacing the dangerous actions with the safe actions under the condition that the actions are dangerous actions so as to enhance the reliability of decision results, so that the trusted network planning decision model is obtained.
2. The network planning method based on a hybrid reinforcement learning strategy according to claim 1, wherein the pre-establishing a data link network structure adjustment optimization strategy based on the data link network topology model according to different service requirements to obtain a strategy support library comprises: Configuring node function attributes and node position features for network nodes according to service requirements, wherein the node function attributes comprise command nodes, relay nodes and common nodes, and the position features comprise air, land and sea; Generating a coastline by adopting Perlin noise simulation and through rotation and movement, dividing an action area, and obtaining a position coordinate set; And determining a data link network structure adjustment optimization strategy according to the node function attribute, the position feature and the position coordinate set to obtain a strategy support library.
3. The hybrid reinforcement learning strategy-based network planning method of claim 1, wherein constructing an optimized strategy model of a data chain network structure based on multiple relation constraints based on the strategy support library comprises: Processing a high-dimensional continuous action space by adopting a depth deterministic strategy gradient algorithm; And adopting a multi-agent depth deterministic strategy gradient algorithm to process the cooperation and competition relationship among the multi-agents and generate an optimized strategy model.
4. The network planning method based on the hybrid reinforcement learning strategy according to claim 1, wherein the optimizing strategy model performs network planning decision optimization by adopting a agile high-efficiency reinforcement learning decision method of the hybrid strategy, and the obtaining of the high-efficiency decision model comprises: Respectively establishing strategy constraint conditions and implementation modes for network node planning, power adjustment planning, frequency adjustment planning and temporary node planning; and controlling actions of the relay node and the common node through MADDPG model based on the established strategy constraint condition and implementation mode, and grading and adjusting to realize optimization of network planning, so as to obtain an efficient decision model.
5. The hybrid reinforcement learning strategy-based network planning method of claim 4, wherein the network node planning implementation comprises: Determining each relay node which is not in the maximum relay node group as an independent agent, and after the action is executed, the closer the agent is to the maximum relay node group, the higher the score is, otherwise, punishment is carried out; the method comprises the steps of utilizing MADDPG algorithm to control and score adjustment of actions of common nodes, determining each common node as an independent intelligent body, taking a circle with a circle center adjacent to a relay node and a radius with a maximum communication distance as a reference, and punishing when the common node is closer to the circle and higher in score and exceeds the circle range; the implementation manner of the power adjustment planning comprises the following steps: determining the maximum transmission distance of a common node as the nearest relay node, and calculating the transmission power of the common node according to the maximum transmission distance; The method comprises the steps of adopting a depth deterministic strategy gradient algorithm to control the transmitting power of the relay nodes, determining all the relay nodes as an agent, scoring the agent only under the condition that constraint conditions are met, otherwise, scoring the agent to be zero, and punishing the agent if the constraint conditions cannot be met within a set step length.
6. The hybrid reinforcement learning strategy based network planning method of claim 4, wherein the implementation of the frequency adjustment plan comprises: When changing channels, adjusting the input power, the receiving power and the maximum transmission distance of each node according to the Friis propagation loss model; The actions of the relay nodes are controlled and scoring adjusted by adopting MADDPG algorithm, the relay nodes of each non-maximum relay node group are determined to be independent agents, and the scoring is higher as the relay nodes are closer to the maximum relay node group after the actions are executed, otherwise, punishment is carried out; The method comprises the steps of adopting MADDPG algorithm to control and score adjustment on actions of common nodes, determining each common node as an independent intelligent body, taking a circle with a circle center adjacent to a relay node and a radius with a maximum communication distance as a reference, scoring higher when the common node is closer to the circle, and punishing when the common node exceeds the circle range, wherein the score is kept unchanged when the common node is in the circle, and adjusting according to the distance between the common node and other common nodes so as to avoid common node coincidence.
7. The hybrid reinforcement learning strategy based network planning method of claim 4, wherein the temporary node planning implementation comprises: Each temporary relay node is determined to be an independent intelligent agent, and the switching state and the position of each temporary relay node are controlled by adopting MADDPG algorithm; determining each connected relay node as a relay node group, taking the node information of the maximum relay node group and the adjacent relay node groups as input, and outputting effective positions and moving distances including whether the node is started or not and after the node is started; after all relay nodes are combined into a group, positioning an isolated common node to obtain an isolated node, wherein all relay nodes comprise started temporary relay nodes; The method comprises the steps of taking information of an isolated node and a nearest relay node as input, outputting whether the node is started, starting positions and moving distances, giving rewards if the isolated common node can be communicated with an adjacent relay node group after the node is started, enabling the rewards to be higher as the moving distance is shorter, and punishment if the node is not communicated.
8. The network planning method based on a hybrid reinforcement learning strategy according to claim 1, wherein the obtaining the initial network planning structure of the output of the efficient decision model by taking the small sample data in the real scene as the input of the efficient decision model comprises: extracting corresponding state-action pairs from decision data available in the experience knowledge, and constructing a new decision set; decoupling different state potential information in the state into evidence entities, decoupling action information in the combined action into action entities, and establishing a development relationship from the evidence entities to the action entities based on experience knowledge; and constructing a relationship map by utilizing the evidence entity, the action entity and the relationship, guiding the learning of the network planning model by using an anti-imitation learning method, and generating an initial network planning structure suitable for a small sample condition.
9. A hybrid reinforcement learning strategy-based network planning apparatus, comprising: The data link network topological structure model building module is used for building a data link network topological structure model, and pre-determining a data link network structure adjustment optimization strategy according to different service requirements based on the data link network topological structure model to obtain a strategy support library; the target optimization decision module is used for constructing an optimization strategy model of the data link network structure based on the multi-relation constraint based on the strategy support library; The reinforcement learning decision module is used for carrying out network planning decision optimization on the optimization strategy model by adopting a mixed strategy agile high-efficiency reinforcement learning decision method to obtain a high-efficiency decision model; the small sample planning module is used for taking small sample data in a real scene as input of the efficient decision model and acquiring an initial network planning structure output by the efficient decision model; The expert is in the loop credible planning module, which is used for taking the initial network planning structure as the input of a credible network planning decision model to obtain a target network planning structure conforming to a safety inspection standard, wherein the credible network planning decision model is a decision model established by adopting the expert in the loop based on a preset action library, and comprises the steps of constructing the preset action library based on the safety inspection standard, obtaining the action of the initial network planning structure, judging whether the action is a dangerous action before the action is executed, and replacing the dangerous action with a safe action under the condition that the action is the dangerous action so as to enhance the credibility of a decision result and obtain the credible network planning decision model.
10. A non-transitory computer readable storage medium having stored thereon a computer program, which when executed by a processor implements the steps of a hybrid reinforcement learning strategy based network planning method according to any of claims 1 to 8.

Description

Network planning method and device based on hybrid reinforcement learning strategy Technical Field The application relates to the technical field of network planning, in particular to a network planning method and device based on a hybrid reinforcement learning strategy. Background Currently, data link networks play a key role in the fields of modern communications, traffic, logistics and the like, and efficient planning and optimization of the data link networks have a great influence on the stability and resource utilization of the system. However, as the scale of networks increases and the complexity increases, conventional planning methods face challenges, and it is difficult to adapt to diverse network environments and resource requirements in real time. In the prior art, some researches based on network topology structure and control strategy modeling exist, but most of the methods depend on a large amount of historical data or global optimization algorithm, and are difficult to cope with rapidly-changing dynamic scenes. Furthermore, the application of conventional reinforcement learning algorithms in data-chain networks is often limited by large-scale data requirements and sample inefficiency, especially in network planning. An improved method is to model through data link network topology and adjustment strategy, so that the system can quickly perform self-adaptive adjustment when facing different link or node faults. However, this method generally faces the problems of large data volume requirement, difficult generalization and the like in actual deployment. For this reason, some studies have begun to explore small sample-based learning strategies to improve the adaptability to scarce data. However, at present, the methods still have a large improvement space in the aspects of reasoning speed, precision and the like, and particularly have certain defects in the aspects of decision agility and real-time planning. In addition, some studies in combination with "expert in loop" strategies show that the system can achieve higher decision reliability and efficiency in complex network planning scenarios through real-time participation of human experts. Such methods utilize expert domain knowledge to introduce feedback mechanisms in the reinforcement learning framework to enhance the credibility of the system. However, due to the lack of automation capability and real-time guarantee, existing schemes still present a significant challenge in achieving a balance of reliability and decision efficiency in highly complex data link networks. The prior art has certain advantages in data link network planning pushing, such as efficient planning capability suitable for specific scenes, but generally lacks an effective method for adapting to small sample data in a rapidly-changing network environment and taking decision agility and reliability into account. Disclosure of Invention Aiming at least one defect or improvement requirement of the prior art, the invention provides a network planning method and device based on a hybrid reinforcement learning strategy, which have the advantages of being capable of rapidly adapting to a dynamically-changed network environment, meeting real-time decision requirement, realizing high-efficiency real-time planning and realizing balance of credibility and decision efficiency in a complex network environment. To achieve the above object, according to a first aspect of the present invention, there is provided a network planning method based on a hybrid reinforcement learning strategy, including: Establishing a data link network topological structure model, and presetting a data link network structure adjustment optimization strategy according to different service requirements based on the data link network topological structure model to obtain a strategy support library; constructing an optimization strategy model of a data link network structure based on multiple relation constraints based on the strategy support library; performing network planning decision optimization on the optimization strategy model by adopting a agile high-efficiency reinforcement learning decision method of a mixed strategy to obtain a high-efficiency decision model; Taking small sample data in a real scene as input of the efficient decision model, and acquiring an initial network planning structure output by the efficient decision model; The method comprises the steps of taking the initial network planning structure as input of a trusted network planning decision model, obtaining a target network planning structure conforming to a security inspection standard, wherein the trusted network planning decision model is a decision model established in a loop by an expert based on a preset action library, and comprises the steps of constructing the preset action library based on the security inspection standard, obtaining actions of the initial network planning structure, judging whether the actions are dangerous