Search

CN-121996312-A - Vehicle-mounted augmented reality task unloading decision and resource allocation method based on multi-agent reinforcement learning

CN121996312ACN 121996312 ACN121996312 ACN 121996312ACN-121996312-A

Abstract

The invention discloses a mobile edge computing task unloading method in a car networking environment, the method optimizes resource allocation by a multi-agent depth deterministic strategy gradient task unloading algorithm. The method aims at the problem of resource allocation in the task unloading process, and combines an augmented reality service model to split the problem into two key sub-problems of task unloading decision and resource allocation. The task offloading decision-making problem selects an optimal offloading strategy by using a deep reinforcement learning algorithm, and the resource allocation problem determines an optimal resource allocation scheme by using a plurality of traditional optimization methods. Simulation experiment results show that the algorithm is superior to the existing distributed deterministic strategy gradient algorithm in balance of energy consumption and time delay, and can realize lower energy consumption while guaranteeing service quality.

Inventors

  • WANG JIARUI
  • TAN GUOPING
  • Yi Wenxiong

Assignees

  • 河海大学

Dates

Publication Date
20260508
Application Date
20241106

Claims (1)

  1. 1. The vehicle-mounted augmented reality task unloading decision and resource allocation method based on multi-agent reinforcement learning is characterized by comprising the following steps of: Firstly, constructing a vehicle edge computing cooperative task unloading and resource allocation system application scene based on non-orthogonal multiple access In this scenario, it is assumed that the system is composed of a plurality of 5G base stations and roadside communication units, which are deployed beside a road, capable of providing highly reliable wireless communication services for vehicles within communication coverage. In addition, the system is equipped with edge servers of different computing capabilities for providing the necessary computing resources to the vehicle. The system is designed for urban road environment, but can be suitable for other different application scenes by proper adjustment of the model. Step two, designing a gradient task unloading algorithm of multi-agent depth deterministic strategy The Markov decision process for a multi-agent depth deterministic strategy gradient task offloading algorithm is described by deploying an agent on each base station that obtains local state information by communicating with all vehicles within a communication coverage area. Based on this information, the intelligent agent prepares an offloading scheme for each subtask. After the task calculation is completed, the system will calculate the average service rate and energy consumption and generate a reward signal accordingly. The environment will then randomly transition to the next state and feed back the state to the base station. The base station agent repeatedly executes the steps and aims at optimizing the balance performance between time delay and energy consumption, and continuously updates the task unloading strategy. (A) State space design In the multi-agent depth deterministic strategy gradient task offloading algorithm provided by the invention, a plurality of agents exist, and each agent can independently make decisions based on local state information perceived by vehicle communication. The local state space of each agent is defined as: Where e is the number of the edge node, t is the slot number, Representing edge node e and vehicle at time t A set of distances between the two, And Respectively representing sub-tasks to be calculated on vehicles in communication coverage area in edge node e at time t The size of (c) and the number of CPU cycles required to calculate 1 bit data, Indicating tasks on the vehicle within communication range of edge node e at time t The global state space can be expressed as: (b) Motion space design The invention considers the actual situation that the subtasks can be calculated at the local edge node, and the subtasks can be transmitted to other nodes in the system for calculation. Thus, the action space of edge node e is constituted by the subtask offload decisions of the vehicles within its coverage area, which is defined as: Wherein the method comprises the steps of Indicating subtasks Whether to calculate on the edge node, the global action space at the time t is: (c) Bonus function design The invention aims to realize the optimal balance between the energy consumption and the time delay of the system so as to seek an optimal mobile edge computing task unloading and resource allocation scheme. The optimization objective is achieved by solving the following optimization problem: Wherein ζ ε [ o,1] is an equalization factor, balances the performance of delay and energy consumption, the selection of which depends on the preference of the user (lower delay or lower energy consumption), and the reward function of the edge node e at time t is: the global jackpot function at time t is: Third, constructing a vehicle edge calculation cooperative task unloading decision and resource allocation system model based on a non-orthogonal multiple access technology The invention models the task unloading problem into a Markov decision process, and the scale of a state space can be expanded along with the increase of the number of vehicles, so that the problems of dimensional explosion and convergence are easy to occur in a single-agent deep reinforcement learning algorithm. In order to cope with the challenges, the invention provides a multi-agent depth deterministic strategy gradient algorithm which is suitable for complex environments and aims to optimize task unloading and realize double optimization of time delay and energy consumption. The multi-agent depth deterministic strategy gradient task offloading algorithm is composed of a plurality of agents, each of which corresponds to a base station, and obtains local state information by communicating with vehicles within a communication coverage area, thereby independently making decisions. The algorithm is an extension of the depth deterministic strategy gradient algorithm. Based on the "actor" - "critter" architecture, each agent contains two parts, namely "actor" and "critter". The "actor" part is made up of the current policy network and the target policy network, while the "reviewer" part is made up of the current Q network and the target Q network. The current policy network is responsible for exploring the environment and outputting actions, and the target policy network receives as input the next state of the four-tuple stored in the playback buffer. The output of the target policy network is the value of the corresponding action, which helps to update the current Q network and improve the stability of training. The current Q network updates the current policy network with the current policy network's decisions, and the target Q network calculates the target Q value with the four tuple information stored in the playback buffer, including the next state and action. This process helps to enhance the stability of model training. In the multi-agent depth deterministic strategy gradient task unloading algorithm, each agent can only observe local state information in the environment and takes the local state information as current strategy network input, and the strategy output action of the current strategy network is used. The algorithm provided by the invention is different from the simple combination of a plurality of depth deterministic strategy gradient algorithms, and for the current Q network, global state information and action information are obtained by interacting with other intelligent agents in the system, and the obtained information is used for training to relieve the problem of local observability, so that the training stability is improved.

Description

Vehicle-mounted augmented reality task unloading decision and resource allocation method based on multi-agent reinforcement learning Technical Field The invention belongs to the technical field of Internet of vehicles, and mainly relates to task unloading in mobile edge computing and a problem of distribution and optimization of computing resources and transmission power thereof. Background In recent years, task offloading methods in the field of internet of vehicles have been studied increasingly, but there have been relatively few studies on vehicle-mounted computing services requiring extremely low latency and processing large amounts of data, such as augmented reality. Such traffic makes it difficult for conventional task offloading and resource allocation schemes to meet their quality of service requirements due to stringent latency requirements and large packet volumes. Augmented reality services require a large amount of computing resources to meet the quality of service due to their extremely low latency requirements and large packet characteristics, while the requirement for such a large amount of resources also means higher energy consumption. Since it is often difficult for a vehicle to meet the demands of such resource-intensive services as a terminal with limited computing resources, it is considered to transfer the computing tasks of the augmented reality service to an edge server for processing to alleviate this problem. In a dynamically-changing complex Internet of vehicles scene, the traditional optimization method often has the problems of high solving difficulty, poor effect, low speed and the like. The traditional task offloading scheme cannot effectively solve the offloading problem of the augmented reality service. Therefore, there is a need to develop new optimization methods that should adapt to the dynamics of the internet of vehicles environment and take into account the specificity of the traffic to achieve more efficient task offloading and resource allocation. Disclosure of Invention The invention aims to solve the problem of cooperative task unloading and resource allocation in a multi-user complex Internet of vehicles scene, and adopts a scheme combining deep reinforcement learning and a traditional optimization method to solve the problem. The invention decouples the task unloading decision and the resource allocation problem, and decomposes the task unloading decision and the resource allocation problem into the task unloading decision problem and the resource allocation problem for solution. Aiming at the problems of dimensional explosion, high computational complexity and convergence in the traditional single-agent deep reinforcement learning algorithm, the invention provides a multi-agent deep deterministic strategy gradient algorithm for solving the problem of task unloading. As to the resource allocation problem, a variety of conventional optimization techniques are utilized to determine the optimal resource allocation scheme. The technical scheme adopted for solving the technical problems is as follows: S1, constructing a vehicle edge calculation cooperative task unloading decision and resource allocation system application scene based on non-orthogonal multiple access The system is assumed to be provided with a plurality of 5G base stations and roadside communication units, and can provide highly reliable wireless communication services for vehicles within a road coverage area. In addition, the system is equipped with edge servers of different computing power to provide the necessary computing services for the vehicle. The system is mainly designed for urban road environments, but can be easily expanded and applied to other scenes by replacing corresponding models. S2, designing a task unloading algorithm based on multi-agent depth deterministic strategy gradient: The Markov process in a multi-agent depth deterministic policy gradient based task offloading algorithm may be described by deploying an agent on each base station that obtains local state information by communicating with all vehicles within the base station's communication coverage and decides the offloading scheme for each subtask. After the task calculation is completed, the agent will calculate the average service rate and energy consumption and calculate rewards accordingly. The environment will then randomly transition to the next state and send feedback information back to the base station. The base station will repeat this process and continuously update its task offloading policy with the goal of optimizing the balance performance between latency and energy consumption. Multiple agents exist in the task offload algorithm of the multi-agent depth deterministic strategy gradient, and each agent can make decisions only by sensing local state information based on communication with the vehicle. Each agent local state space is defined as: Where e is the number of the edge node, t is the slot number, Rep