CN-121996369-A - Post-disaster Internet of vehicles resource allocation method based on unmanned aerial vehicle assistance
Abstract
The invention discloses a post-disaster internet of vehicles resource allocation method based on unmanned aerial vehicle assistance, which relates to the technical field of internet of vehicles, and comprises the steps of firstly constructing a system model integrating perception, communication and calculation, establishing an optimization problem aiming at the perception task completion rate, and solving by jointly optimizing a vehicle scheduling sequence and a task unloading proportion; the problem is modeled as a Markov decision process, and an improved TD3 algorithm, namely ACBC-TD3 algorithm, is adopted to solve, so that early strategy convergence is effectively accelerated, and training stability is enhanced. The invention suppresses the Q value overestimation through the double Critic network structure and the delay updating mechanism, improves the initial exploration efficiency through behavior cloning, and improves the performance after convergence through the self-adaptive optimization mechanism and the cosine annealing strategy. Experiments show that the method can obviously improve the task completion rate and the system resource utilization efficiency in a post-disaster emergency scene, and has good application prospect.
Inventors
- JI BAOFENG
- Guo Binfan
- QIAO DANDAN
- CHAI WENJUAN
- WANG XIAN
- LIU YIHAO
- Lu Hangxiao
- LIU DAN
- HUO ZHAN
- LU RONGFENG
- YAN CHENGGANG
- ZHANG HUI
- XING CHENGWEN
- WANG DONGMING
- FAN HUITAO
- Ding Benkang
Assignees
- 河南科技大学
Dates
- Publication Date
- 20260508
- Application Date
- 20251201
Claims (7)
- 1. The post-disaster internet of vehicles resource allocation method based on unmanned aerial vehicle assistance is characterized by comprising the following steps: S1, constructing an unmanned aerial vehicle-assisted post-disaster internet of things resource allocation system model integrating perception, communication and calculation, wherein the system model comprises an unmanned aerial vehicle-assisted task unloading model, the task unloading model comprises unmanned aerial vehicle-mounted edge servers, and in each discrete time slot, the unmanned aerial vehicle can select one vehicle to execute task coordination and determine task unloading proportion; s2, constructing an optimization problem with a perceived task completion rate as an optimization target based on the system model, solving the optimization problem by jointly optimizing a vehicle scheduling sequence and a task unloading proportion, and being constrained by system resources and task demands; s3, modeling the optimization problem into a Markov decision process, and solving by adopting ACBC-TD3 algorithm based on deep reinforcement learning; Wherein, the ACBC-TD3 algorithm dynamically adjusts the strategy noise and soft update rate by adopting a self-adaptive optimizer based on the dual-delay depth deterministic strategy gradient TD3 algorithm, adjusts the learning rate of an Actor network and a Critic network by adopting a cosine annealing method, and introduces a behavior cloning algorithm to accelerate early strategy learning; And S4, training the ACBC-TD3 algorithm by using a simulation environment defined by an unmanned aerial vehicle-assisted post-disaster internet of vehicles resource allocation system model, and obtaining an optimized task unloading strategy by iteratively updating algorithm parameters until the algorithm converges to a preset performance index.
- 2. The post-disaster internet of vehicles resource allocation method based on unmanned aerial vehicle assistance according to claim 1, wherein in the step S1, the system model comprises k vehicles and at least one unmanned aerial vehicle equipped with a mobile edge calculation server, the unmanned aerial vehicle hovers at a fixed location and covers a specific service area, the vehicles are randomly distributed in the area, and the vehicles and the unmanned aerial vehicle are both equipped with sensors.
- 3. The post-disaster internet of vehicles resource allocation method based on unmanned aerial vehicle assistance according to claim 2, wherein in step S2, the optimization problem is expressed as: Wherein, the Representing a vehicle Whether or not the co-sensing task of (c) can be completed within a prescribed time, Indicating the number of vehicles that require cooperative sensing, Representing a vehicle Whether or not in time slot Is scheduled for task offloading purposes and, Representing a vehicle In the unloading proportion of the time slot, Is shown in time slot Unmanned aerial vehicle and vehicle Whether there is shielding between them , ) Representing a vehicle Is defined by the plane position coordinates of (c), And Respectively representing the actual delay and the maximum tolerant delay of the task; Indicating that unmanned aerial vehicle is in time slot For vehicles The energy consumed in performing the task is, Indicating the total available energy of the drone.
- 4. The post-disaster internet of vehicles resource allocation method based on unmanned aerial vehicle assistance according to claim 3, wherein the step S3 specifically comprises: S31, dynamically adjusting the exploration noise and the target network soft update rate based on the sum of the variances of the Q values of the round numbers of the sliding window; S32, adopting a learning rate scheduling strategy comprising a preheating stage and an annealing stage to adjust the learning rates of an Actor network and a Critic network; s33, introducing a behavior cloning mechanism, and aligning the Actor network output with actions in experience playback data through supervised learning so as to accelerate early strategy convergence.
- 5. The post-disaster internet of vehicles resource allocation method based on unmanned aerial vehicle assistance of claim 4, wherein the step S31 specifically comprises: After each training round is finished, calculating the total variance of the Q value output by the recent double Critic network based on a sliding window with a preset size, and judging the training performance state of the intelligent body according to the variation trend of the total variance of the Q value: If the total variance of the Q value is obviously increased, reducing strategy exploration noise and the soft update rate of the target network; if the total variance of the Q value is kept stable, the strategy exploration noise and the soft update rate of the target network are increased.
- 6. The post-disaster internet of vehicles resource allocation method based on unmanned aerial vehicle assistance of claim 5, wherein the step S32 specifically comprises: In the preheating stage, gradually increasing the learning rate of an Actor network and a Critic network from an initial value to a preset maximum value; and in the cosine annealing stage, attenuating the learning rate according to the periodic change rule of the cosine function.
- 7. The post-disaster internet of vehicles resource allocation method based on unmanned aerial vehicle assistance of claim 5, wherein the step S33 specifically comprises: The method comprises the steps of obtaining state-action data from an experience playback buffer area, extracting the state-action data from the experience playback buffer area as expert data, calculating behavior clone loss between an action output by an Actor network and the expert action based on the expert data, combining the behavior clone loss with a Q value output by a Critic network, and constructing a total loss function of the Actor network, wherein the weight of the behavior clone loss is related to statistics of the Q value in the experience playback buffer area.
Description
Post-disaster Internet of vehicles resource allocation method based on unmanned aerial vehicle assistance Technical Field The invention relates to the technical field of Internet of vehicles, in particular to a post-disaster Internet of vehicles resource allocation method based on unmanned aerial vehicle assistance. Background After natural disasters (such as earthquakes, floods, etc.), road bypass traffic infrastructure (RSU) is often destroyed, resulting in limited sensing and communication capabilities of the internet of vehicles (IoV). The sensing range of the sensor of the vehicle is limited, a sensing blind area is easy to form, and the capability of the vehicle for sensing the surrounding environment is seriously influenced. Unmanned Aerial Vehicles (UAVs) have become an important supplement to post-disaster replacement RSUs for emergency communication and computation by virtue of their flexible deployment, fast coverage and on-board computing capabilities. However, the UAV has limited energy and limited computing resources, and the internet of vehicles perceives that tasks have randomness and high dynamic property, so how to realize efficient task scheduling and resource allocation becomes a difficult problem to be solved. The traditional scheduling or convex optimization method based on rules is difficult to cope with the dynamic environment of a complex post-disaster scene, and in recent years, the Deep Reinforcement Learning (DRL) method has strong potential in Mobile Edge Computing (MEC) and Internet of vehicles resource optimization, but the existing method still has the problems of low convergence speed, poor stability and the like. Disclosure of Invention The invention aims to solve the technical problem of providing a post-disaster Internet of vehicles resource allocation method based on unmanned aerial vehicle assistance, which realizes intelligent optimization of task scheduling and calculation unloading by constructing an unmanned aerial vehicle-vehicle cooperative system model and introducing an improved deep reinforcement learning algorithm, thereby improving the task completion rate of a vehicle perception surrounding vision blind area in a post-disaster emergency scene. In order to achieve the purpose, the technical scheme adopted by the invention is that the post-disaster Internet of vehicles resource allocation method based on unmanned aerial vehicle assistance comprises the following steps: S1, constructing an unmanned aerial vehicle-assisted post-disaster internet of things resource allocation system model integrating perception, communication and calculation, wherein the system model comprises an unmanned aerial vehicle-assisted task unloading model, the task unloading model comprises unmanned aerial vehicle-mounted edge servers, and in each discrete time slot, the unmanned aerial vehicle can select one vehicle to execute task coordination and determine task unloading proportion; s2, constructing an optimization problem with a perceived task completion rate as an optimization target based on the system model, solving the optimization problem by jointly optimizing a vehicle scheduling sequence and a task unloading proportion, and being constrained by system resources and task demands; s3, modeling the optimization problem into a Markov decision process, and solving by adopting ACBC-TD3 algorithm based on deep reinforcement learning; Wherein, the ACBC-TD3 algorithm dynamically adjusts the strategy noise and soft update rate by adopting a self-adaptive optimizer based on the dual-delay depth deterministic strategy gradient TD3 algorithm, adjusts the learning rate of an Actor network and a Critic network by adopting a cosine annealing method, and introduces a behavior cloning algorithm to accelerate early strategy learning; And S4, training the ACBC-TD3 algorithm by using a simulation environment defined by an unmanned aerial vehicle-assisted post-disaster internet of vehicles resource allocation system model to obtain an optimized task unloading strategy. Further, in the step S1, the system model includes k vehicles and at least one unmanned aerial vehicle equipped with a mobile edge computing server, the unmanned aerial vehicle hovers at a fixed location and covers a specific service area, the vehicles are randomly distributed in the area, and the vehicles and the unmanned aerial vehicle are both equipped with sensors. Further, in the step S2, the optimization problem is expressed as: Wherein, the Representing a vehicleWhether or not the co-sensing task of (c) can be completed within a prescribed time,Indicating the number of vehicles that require cooperative sensing,Representing a vehicleWhether or not in time slotIs scheduled for task offloading purposes and,Representing a vehicleIn the unloading proportion of the time slot,Is shown in time slotUnmanned aerial vehicle and vehicleWhether there is shielding between them,) Representing a vehicleIs defined by the plane position coor