CN-122001917-A - Underwater AUV service caching and switching state switching method based on D3QN

CN122001917ACN 122001917 ACN122001917 ACN 122001917ACN-122001917-A

Abstract

The invention discloses a D3 QN-based underwater AUV service caching and switching state switching method. Firstly, establishing an AUV service buffer state and dormancy/activation state coupling control model facing to a marine network formed by a water surface buoy node and an AUV. And then, constructing an actionable mask which meets the cache capacity constraint and the anti-duplication write-in constraint through the consistency binding of the target switch state and the service cache write-in action, the dormancy isolation constraint and the dormancy cache freezing rule, constructing a comprehensive cost function by using the cache update time overhead, the write-in energy consumption, the switch state switching cost and the invalid activation penalty, and finally solving in a legal action subspace by using D3QN to output an AUV service cache and switch state switching control strategy. The invention can effectively realize AUV service buffer updating and switch state switching under dynamic load, and reduce underwater network energy consumption and service buffer updating delay.

Inventors

LIU SHUAI
LI WENFENG
ZHONG YUN
MENG XIANGXU
ZHAO KANGLIAN

Assignees

南京大学

Dates

Publication Date: 20260508
Application Date: 20260409

Claims (4)

1. The method is characterized by being applied to a marine network consisting of a water surface buoy node and an AUV, wherein the water surface buoy node is preset with all services and serves as a service update source node of the AUV, the AUV has limited buffer capacity and supports dynamic switching between a dormant state and an active state, and the method comprises the following steps: s1, constructing a marine network system model formed by water surface buoy nodes and AUV, wherein the AUV set is recorded as The water surface buoy node set is recorded as The service set is recorded as The set of decision periods is noted as ;AUV Maximum storage capacity of (2) is Service The memory size of (2) is noted as ;AUV In the period of For service The service cache state variable of (1) is recorded as Its current switch state variable is recorded as Wherein Represent AUV In the active state of the device, the device is in an active state, Represent AUV In a dormant state, service In the period of Is recorded as the normalized request frequency of (2) For characterising services Is not required to be heated; S2, establishing a coupling control model for AUV service buffer updating and switching state switching, defining AUV target switching state and service buffer writing action, carrying out consistency binding on the service buffer writing action and the target switching state, and setting dormancy isolation constraint and dormancy buffer freezing rule; S3, constructing a system state space based on the current service cache state, the last period switch state, the service request statistic and the system load statistic of the AUV, constructing a system action space based on the AUV target switch state vector and the service cache update matrix, generating an movable action mask according to cache capacity constraint, consistency binding constraint, dormancy isolation constraint and anti-duplication constraint to dynamically shield illegal combined action, and constructing a reward function of a Markov decision process by taking a weighted combination of service cache update time cost, service writing energy consumption, switch state switching cost and invalid activation penalty as system comprehensive cost; And S4, solving the Markov decision process in the legal action subspace after the movable mask screening by using the D3QN, and outputting the service cache and the switch state switching control strategy of each AUV.
2. The method for switching the service buffer and the switch state of the underwater AUV based on the D3QN as claimed in claim 1, wherein the step S2 is specifically as follows: definition of AUV In the period of The target switch state variable of (2) is And the service cache write decision variable is And carrying out consistency binding on the service cache write-in decision variable and the target switch state variable, and setting dormancy isolation constraint: I.e. only when AUV In the period of Is only allowed to execute the service if the target state of (a) is an active state When (1) is written into the cache When=0, AUV In the dormant state, the service cache state is frozen and unchanged, namely: When (when) When=1, AUV In the active state, the service cache state is updated according to the service cache writing decision: ; defining the unit energy storage consumption coefficient of AUV cache writing as Service write energy consumption The method comprises the following steps: ; Defining the update time overhead for the AUV to receive service from the surface buoy node and complete writing as Cycle time Service cache update cost of (a) The method comprises the following steps: Wherein, the , Trade-off coefficients for balancing update time overhead and service write energy consumption, respectively, and satisfy ; Defining wake-up switching cost of AUV from sleep state to active state as The closing switching cost for switching from the active state to the dormant state is Defining an indication function When the condition in brackets is satisfied, the value is 1, otherwise, the value is 0, the period is Switching state switching cost of (a) The method comprises the following steps: Wherein, the Is AUV (autonomous Underwater vehicle) In the period of A switch state variable, defining an AUV that satisfies the following equation as an inactive AUV: , accordingly, an deactivation penalty term is introduced: Wherein, the Is AUV (autonomous Underwater vehicle) Is used to deactivate penalty coefficients.
3. The method for switching the service buffer and the switch state of the underwater AUV based on the D3QN as claimed in claim 1, wherein the step S3 is specifically as follows: In the period of Defining a system state space The method comprises the following steps: Wherein, the For the period of Is a normalized system load statistic of (1); Defining the system is in a period Is combined with action of (a) , wherein, Is a target switch state vector; Updating the matrix for the service cache; constructing a system in a state space based on the following conditions Global actionable collection under : Capacity constraint is satisfied after cache update: meeting the consistency binding constraint and the dormancy isolation constraint: The anti-duplicate write constraint is satisfied: Determining that the joint action which does not meet any of the conditions is an illegal joint action, and masking the joint action by an actionable mask to form the global actionable set ; For periods of Service cache update cost of (a) Cost of switching on and off state And deactivate penalty for activation Normalization processing is carried out and respectively marked as 、 And Defining a reward function as: Wherein, the Weight coefficients for service cache update cost, switch state switching cost and inactive penalty balance respectively and satisfy + + =1。
4. The method for switching the service buffer and the switch state of the underwater AUV based on the D3QN as claimed in claim 1, wherein the step S4 is specifically as follows: Constructing an online network and a target network of the D3QN, wherein the online network and the target network adopt the same duel-fight double-branch structure, and in a decision period Space the system state Inputting online networks and aggregating only globally active actions The legal combined action in the process carries out Q value evaluation; Using maximum Q value screening to select a set of actions on the global activity Internally selected joint actions After the system executes the combined action, the system obtains a reward function Next state And state transition quaternion Store in experience playback pool, when calculating target Q value, only in next state Corresponding globally active action sets And after each time of online network parameter updating, smoothly updating the target network parameter by adopting a soft updating strategy, and obtaining the AUV service cache updating and switching state switching control strategy through the training process.

Description

Underwater AUV service caching and switching state switching method based on D3QN Technical Field The invention belongs to the field of ocean edge calculation and intelligent control, and particularly relates to an underwater AUV service caching and switching state switching method based on a duel-depth Q Network (Dueling Double Deep Q-Network, D3 QN). Background With the continuous pushing of applications such as smart ocean, ocean environment monitoring, underwater target detection, ocean resource development, and the like, the number of autonomous underwater vehicles (Autonomous Underwater Vehicle, AUV) deployed in the ocean internet of things system is increasing. AUVs typically require different types of computing services, such as target detection, data analysis, and intelligent recognition, to be invoked when performing tasks such as sensing, recognition, analysis, and collaborative operations. Because AUV has limited energy, calculation power and storage resources, it is difficult to store all services completely for a long time, and it is also difficult to keep high-efficiency operation all the time under dynamic traffic load. The use of surface buoy nodes (Buoy Node, BN) as service support nodes to provide service delivery and edge coordination capability for AUVs has become an important technical approach to improving the processing capability of marine edge network tasks. However, in the above-described marine edge network, AUV service cache control and switch state control still face significant challenges. On one hand, AUV buffer capacity is limited, all services can not be stored at the same time, when service demands change, if service buffer updating is not timely, the required services are lost, the system processing time delay is increased, and if service buffer writing is frequently executed, higher service issuing time delay and buffer writing energy consumption are brought. On the other hand, the AUV is not always required to be kept in an activated state in a dynamic marine environment, and reasonable dormancy and wake-up control are beneficial to reducing the energy consumption of the system, but if the switch state is switched too frequently or the AUV is blindly activated when the effective service demand support is lacking, extra switching cost and invalid activation overhead are introduced, and the system stability and the resource utilization efficiency are affected. Further, in the process of joint control of AUV service cache update and switch state switching, there is also a problem that the action coupling relationship is difficult to describe. If the AUV with the target state of dormancy is still allowed to execute the service cache writing, control logic conflict is caused, if the cached service is repeatedly written, unnecessary bandwidth and energy consumption are wasted, if the cache capacity constraint, the state switching constraint and the action consistency are not uniformly constrained, a large amount of illegal combined actions are easy to generate, the decision space scale is increased, and the control efficiency is reduced. In view of the above, the prior art generally focuses primarily on task offloading, computing resource allocation, or single-dimensional service deployment optimization. For example, part of ocean edge computing method default edge nodes already have required services, are mainly optimized around task transmission, link scheduling or resource allocation, do not explicitly consider the problem of dynamic update under the limited condition of AUV service cache, part of service cache methods can adjust cache contents according to service demands, but are designed towards a ground edge network or static scene, do not combine the characteristics that the on-off state of the AUV nodes is switchable, service writing actions need to be consistent with the target on-off state, and are difficult to consider the service cache update cost and the node on-off cost, and part of methods adopt heuristic strategies, fixed rules or common reinforcement learning algorithms to cache or control nodes, are difficult to handle the problem that the high-dimensional discrete joint action space formed by the service cache writing actions and the on-off state actions is easy to have multiple illegal actions, slow strategy convergence and insufficient adaptability to dynamic load scenes. In summary, in the marine network scenario, the prior art has the following defects that firstly, AUV service cache update and switch state switching splitting treatment are often carried out, a unified modeling method for coupling control is lacked, secondly, additional cost caused by service cache update time cost, service writing energy consumption, switch state switching cost and deactivation is not considered at the same time, system comprehensive cost is difficult to truly reflect, thirdly, constraint such as dormancy isolation, anti-repetition writing and cache capacity is not su