CN-121985392-A - Reinforced learning route optimization method considering dynamic underwater sound environment and node mobility

CN121985392ACN 121985392 ACN121985392 ACN 121985392ACN-121985392-A

Abstract

The invention discloses a reinforcement learning route optimization method considering dynamic underwater sound environment and node mobility, which comprises the steps of initializing parameters of an underwater sensor network, monitoring and updating underwater environment parameters and node positions in real time, dynamically updating network topology connection states, constructing a multidimensional state vector during data transmission, adopting epsilon-greedy strategy to select a route action to generate a route decision, ending the round of iteration if data packet transmission fails during the route action, otherwise, calculating a multi-objective reward value, updating a Q value table to optimize the route decision, monitoring network performance indexes in real time, adaptively adjusting reinforcement learning parameters of an intelligent body for the next iteration until the condition of ending the iteration is met, and outputting an adaptive route selection model. The method can intelligently cope with the mobility challenges of the time-varying channel and the node, and realizes multi-objective optimization balance of energy consumption, delay and stability while obviously improving the data delivery rate.

Inventors

WEI YAN
YUAN YU
CHEN ZI
Nie Foyuan
QU FENGZHONG

Assignees

浙江大学

Dates

Publication Date: 20260505
Application Date: 20260403

Claims (10)

1. A reinforcement learning route optimization method considering dynamic underwater sound environment and node mobility is characterized by comprising the following steps: initializing parameters of an underwater sensor network; Based on the node position and the channel condition, evaluating the quality and stability of links between nodes to dynamically update the network topology connection state; thirdly, when data needs to be transmitted, an intelligent agent in the node constructs a multidimensional state vector for reinforcement learning, adopts epsilon-greedy strategy to balance utilization and exploration, selects a routing action and generates a routing decision; Step four, executing the routing action, if the data packet transmission fails, recording failure information, and jumping to execute step six, otherwise, calculating a multi-target rewarding value, updating a Q value table according to the transmission result and the multi-target rewarding value, and optimizing a routing decision; Monitoring performance indexes of the underwater sensor network in real time, and adaptively adjusting reinforcement learning parameters of the intelligent body according to the network performance indexes and underwater environment parameters; And step six, judging whether the iteration termination condition is met, if not, repeatedly executing the step two-step six by using the self-adaptive adjusted intelligent agent, and if so, ending the iteration and outputting the self-adaptive routing model.
2. The method for optimizing reinforcement learning route considering dynamic underwater sound environment and node mobility according to claim 1, wherein in the first step, the parameter initialization includes initial deployment of underwater sensor network nodes, initial configuration of underwater environment parameters, communication range setting, and reinforcement learning agent initialization in each node; the underwater environment parameters comprise sound velocity propagation conditions, noise level parameters and ocean current velocity.
3. The reinforcement learning route optimization method considering dynamic underwater sound environment and node mobility according to claim 1, wherein the multidimensional state vector comprises node self state, neighbor node state, channel condition, mobility prediction result and data packet characteristic; The node self state comprises node residual energy, the neighbor node state comprises neighbor node energy level and inter-node distance, the channel condition comprises signal to noise ratio and channel quality, the mobility prediction result comprises node relative movement speed and link stability prediction value, and the data packet characteristic comprises data packet priority.
4. The reinforcement learning route optimization method considering dynamic underwater sound environment and node mobility according to claim 1, wherein in the third step, an agent adopts epsilon-greedy strategy, according to the current multidimensional state vector S t , an optimal or exploratory action a t is selected from an independent Q value table constructed by each node based on a state discretization method, and the next hop forwarding node with optimal data packet is determined, namely, a routing action is selected, so as to generate a routing decision, and the specific process is as follows: generating a random number rand in the [0,1] interval, if rand < epsilon, epsilon is the current exploration rate, entering an exploration mode, and selecting a node from a neighbor node list randomly as the next hop by the intelligent agent; If the rand is more than or equal to epsilon, entering a utilization mode, inquiring a Q value table by an agent, and selecting an action with the maximum Q value under the current state S t , namely selecting a path which is currently considered to be optimal by using the existing experience; as the number of training steps t increases, the exploration rate decays exponentially.
5. The method for reinforcement learning route optimization considering dynamic underwater sound environment and node mobility according to claim 1, wherein the multi-objective prize value comprehensively considers packet delivery success rate, energy consumption, transmission delay and link stability calculation.
6. The reinforcement learning route optimization method considering the dynamic underwater sound environment and the node mobility according to claim 1, wherein in the fourth step, the Q value table is learned and updated by adopting a Q-learning algorithm in combination with an experience playback mechanism, and the method is specifically implemented by the following operations: (1) Storing the interactive tuples of the intelligent agent into the experience playback buffer area at each time step, and covering the earliest data by adopting a first-in first-out principle when the experience playback buffer area is full; (2) Sampling, namely randomly extracting a batch of samples from an experience buffer area for training; (3) And updating, namely calculating the updating quantity of the Q value according to the Belman equation, and iteratively updating the corresponding numerical value in the Q value table.
7. The method for optimizing reinforcement learning route taking dynamic underwater acoustic environment and node mobility into consideration as claimed in claim 1, wherein in the fifth step, network performance indexes comprise data packet delivery rate, end-to-end delay, network energy consumption and node survival time, and the reinforcement learning parameters of the intelligent agent are adaptively adjusted by adopting a parameter dynamic adjustment mechanism based on performance gradient, specifically by the following operations: (1) The learning rate alpha is adjusted, namely when the underwater sensor network monitors that the variance of the delivery rate of the data packet exceeds a preset threshold value or the variation of the decline of the channel quality index exceeds a set threshold value, the underwater environment is indicated to have mutation, and the learning rate alpha of other nodes is increased; (2) And adjusting the discount factor gamma, namely increasing the gamma value when the node residual energy is larger than the preset energy upper limit value, and reducing the gamma value when the node residual energy is smaller than the preset energy lower limit value, so that the intelligent body is more prone to select the action with the minimum instant energy consumption.
8. The method for reinforcement learning route optimization considering dynamic underwater sound environment and node mobility according to claim 1, wherein in the sixth step, the termination iteration condition comprises the following three logic judgments: (1) Judging whether a data buffer area of a current node is empty or not and whether a data packet to be forwarded exists in the underwater sensor network or not; (2) Judging whether the residual energy of the current node is lower than a death threshold for maintaining basic communication or whether the number of the surviving nodes in the underwater sensor network is lower than the minimum number for maintaining connectivity or not; (3) Judging whether an effective path to a water surface sink node exists in the current network topology connection state, namely whether the Q value of at least one neighbor node exists or not is larger than a preset unreachable penalty value; If any one of the three logic judgments is met, the iteration is terminated, otherwise, the iteration is continued.
9. The reinforcement learning route optimization method considering dynamic underwater sound environment and node mobility according to claim 1, wherein in the sixth step, when iteration is finished, a mapping relation between underwater environment parameters and route decisions is established, and the route decisions are optimized again through historical data analysis and pattern recognition technology to form a self-adaptive route selection model output.
10. The reinforcement learning route optimization system considering the dynamic underwater sound environment and the node mobility is used for realizing the reinforcement learning route optimization method considering the dynamic underwater sound environment and the node mobility according to any one of claims 1-9, and is characterized by comprising a state input module, a reinforcement learning agent, an environment interaction module, a multi-target rewarding value calculation module and a learning update module; the state input module is used for constructing a multidimensional state vector according to the current state of the node; The reinforcement learning agent is used for determining an optimal next hop forwarding node by adopting an epsilon-greedy strategy; the environment interaction module is used for monitoring underwater environment parameters in real time, executing routing actions and feeding back transmission results; the multi-target rewards value calculation module is used for calculating multi-target rewards values according to the transmission result; The learning updating module is used for updating the Q value table and storing the interaction tuple of the agent in the experience playback buffer.

Description

Reinforced learning route optimization method considering dynamic underwater sound environment and node mobility Technical Field The invention relates to the technical field of underwater acoustic communication and network routing, in particular to a reinforcement learning route optimization method considering dynamic underwater acoustic environment and node mobility, which is used for realizing dynamic, efficient and reliable data forwarding of an underwater acoustic sensing network. Background The underwater acoustic sensing network is a core infrastructure in the fields of marine environment monitoring, resource exploration, disaster early warning and the like. The underwater network routing protocol design is faced with serious challenges due to the inherent characteristics of high delay, narrow bandwidth, strong time variation, high bit error rate and the like of the underwater acoustic channel which is used for communication. Conventional underwater routing protocols, such as static path-based routing, geographic location-based routing, or depth information-based routing, typically rely on pre-set rules or a simplified network model. These methods often exhibit inadequate compliance in the face of dynamically changing underwater acoustic propagation conditions, node position drift caused by ocean currents and biological activity, and limited energy supply to the nodes. In particular, static routing strategies cannot cope with link interruption caused by channel quality fluctuation or node movement, and decisions which depend on distance or depth only ignore energy consumption balance and link stability, are easy to cause network energy holes and shorten overall survival time. In recent years, to improve network adaptivity, some researches introduce intelligent optimization algorithms or mechanisms based on network state feedback. However, most of these methods focus on a single optimization objective (e.g., minimizing hops or energy consumption), or rely on the collection and processing of global information, which can introduce considerable control overhead and communication delays in a dynamic, distributed deep water environment, and make real-time, efficient path optimization difficult. Meanwhile, the prior proposal generally lacks systematic modeling and learning capabilities of coupling relation between underwater environment dynamic state and node mobility, so that the routing decision in a complex time-varying scene is difficult to reach the optimal performance. Therefore, there is a need for an intelligent routing method that can autonomously learn and adapt to the underwater dynamic environment and node mobility. The method needs to comprehensively sense the state of the multidimensional network, evaluate the link quality and stability on line, and carry out distributed decision by taking multi-objective optimization (such as balancing data delivery rate, end-to-end delay and network energy consumption) as a principle. Reinforcement learning technology provides a potential solution to the above problems due to its ability to perform policy optimization through trial and error in an unknown environment. However, the direct application of reinforcement learning to underwater routing still faces challenges such as complex state space design, difficulty in accurately characterizing multi-objective trade-offs by rewarding functions, and low learning efficiency in dynamic environments. Disclosure of Invention Aiming at the defects of the prior art, the invention provides a reinforcement learning route optimization method considering a dynamic underwater sound environment and node mobility. The specific technical scheme is as follows: a reinforcement learning route optimization method considering dynamic underwater sound environment and node mobility comprises the following steps: initializing parameters of an underwater sensor network; Based on the node position and the channel condition, evaluating the quality and stability of links between nodes to dynamically update the network topology connection state; thirdly, when data needs to be transmitted, an intelligent agent in the node constructs a multidimensional state vector for reinforcement learning, adopts epsilon-greedy strategy to balance utilization and exploration, selects a routing action and generates a routing decision; Step four, executing the routing action, if the data packet transmission fails, recording failure information, and jumping to execute step six, otherwise, calculating a multi-target rewarding value, updating a Q value table according to the transmission result and the multi-target rewarding value, and optimizing a routing decision; Monitoring performance indexes of the underwater sensor network in real time, and adaptively adjusting reinforcement learning parameters of the intelligent body according to the network performance indexes and underwater environment parameters; And step six, judging whether the iteration termina