CN-121478504-B - Mobile edge computing task unloading optimization method and device

CN121478504BCN 121478504 BCN121478504 BCN 121478504BCN-121478504-B

Abstract

The invention relates to the technical field of computer science and information engineering, in particular to a method and a device for unloading and optimizing a mobile edge computing task, comprising the following steps of S1 carrying out environment modeling, S2 constructing a neural network structure, S3 carrying out algorithm training, S3 carrying out environment parameter configuration and network parameter initialization, carrying out environment interaction and experience storage, and finally carrying out network updating and loop iteration, S4 carrying out final strategy network after S3 training is completed Deployment into dynamic mobile edge computing systems for real-time task offloading decisions, wherein Vectors are embedded for the graph.

Inventors

LI LIN
LIN ZHIYAO
ZHU JIANHUA
YANG MIAO

Assignees

厦门理工学院

Dates

Publication Date: 20260512
Application Date: 20260109

Claims (5)

1. A method for optimizing unloading of a moving edge computing task is characterized by comprising the following steps of S1, carrying out environment modeling, namely building a graph enhancement state space, building an action space and setting a causal demand rewarding function, wherein the method comprises the following steps of (1) building the graph enhancement state space, namely building a heterogeneous graph Wherein Is a node set comprising user nodes Task node Server node ; For an edge set comprising a representation of task generation relationships Edges representing network connection relations Edges and representing potential processing relationships Edge, employing a graph attention network as an encoder For isomerism map Encoding to generate a fixed-dimension graph embedded vector Construction of action space as state awareness of agent (2) action defines the operational variables of the system in task offloading and resource allocation decisions, including offloading the task's resource allocation volume Bandwidth allocation amount occupied during migration And Selection of an offload target at a time The motion space is kept as a mixed design of continuity and discreteness, and the expression is that (3) Setting a causal demand rewarding function for eliminating pollution of the rewarding signal by the environment confusion variable, including defining a causal graph, performing a counter facts rewarding calculation and a pure rewarding calculation, wherein key variables of the causal graph include user context Unloading action Task results The causal dependencies are as follows: And is also provided with Wherein Representing that C has a direct effect on A and Y, the counterfactual reward calculation is specifically by a pre-trained counterfactual predictive model To estimate baseline returns for predicting a given state And user context When taking arbitrary action vector The net rewards Is the actual benefit Benefit from baseline The difference of (a) is expressed as , wherein, Indicating the state of the system at time t, Indicating that the agent is in state The action to be taken in the course of the following, Indicating the action of the agent at the next moment, A context variable representing the agent at time t; S2, constructing a neural network structure, wherein the neural network structure comprises an online strategy network Target policy network On-line Q network And target Q network Wherein In order to be in a given state, Is an action vector, an online policy network Target policy network Is responsible for generating actions according to states, and is input by graph embedded vector The network adopts a multi-branch structure to adapt to the mixed action space, and specifically comprises embedding vectors through a shared full-connection layer processing diagram Forming a shared feature layer, outputting the corresponding resource allocation amount of the continuous value Resource allocation branching of (c) Outputting the corresponding bandwidth allocation amount of the continuous value Bandwidth allocation branching of (a) And a server selection branch outputting a selection logic value for each available server Q network, i.e. on-line Q network And target Q network For evaluating the value of state-action pairs, the input of which is a graph-embedded vector Motion vector The network structure is a three-layer fully-connected neural network, and the learning aim is to minimize the net rewards Is a time sequence difference error of (a); S3, carrying out algorithm training, namely firstly configuring environment parameters and initializing network parameters, secondly carrying out environment interaction and experience storage, and finally carrying out network updating and loop iteration; s4, after S3 training is completed, the final strategy network Deployment into dynamic mobile edge computing systems for real-time task offloading decisions, wherein Vectors are embedded for the graph.
2. The mobile edge computing task offload optimization method of claim 1, wherein the amount of continuous action resource allocation is And bandwidth allocation amount Adding normal distributed noise that decays over time: , and maps it to the actual range 0 by clipping and scaling, And (c) is equal to (c) in the formula (i), In which Indicating the resource allocation amount The corresponding search for the noise samples is performed, Indicating the bandwidth allocation amount The corresponding search for the noise samples is performed, And Respectively represent the resource allocation amounts And bandwidth allocation amount The variance of the noise is explored in the normal distribution.
3. The mobile edge computing task offload optimization method according to claim 1, wherein the step S3 of initializing in algorithm training comprises the steps of: (1) Network parameter initialization, namely randomly initializing parameters of a strategy network and a Q network; (2) Experience playback buffer initialization setting a unified experience playback buffer Experience quaternion for storing agent interactions with environments Wherein Is the next state obtained after a given action is taken; (3) Initialization of the inverse facts prediction model And prepares historical data for its pre-training; (4) Super-parameter setting, namely determining a learning rate, a soft update rate, a rewarding discount factor and exploring noise parameters; (5) The environment setting comprises the steps of configuring mobile edge environment parameters, wherein the mobile edge environment parameters are edge server parameters, namely maximum available computing resources, maximum available bandwidth resources, mobile user attribute parameters, namely user positions, task demands, task characteristic parameters, namely data quantity and deadline.
4. The mobile edge computing task offload optimization method of claim 3, wherein the step S3 of performing environmental interactions and empirical storage comprises the steps of: (1) Acquiring current environment information and constructing a heterogeneous graph And heterogeneous map through a map attention network Coding to obtain the embedded vector of the graph ; (2) Action generation by embedding vectors from graphs over a policy network Generating actions And adding search noise; (3) Rewards calculation environmental returns to actual benefits And calculate a pure prize using a counter-facts prediction model ; (4) Experience storage-experience is performed Storing experience playback buffers 。
5. A mobile edge computing task unloading optimization device for realizing the mobile edge computing task unloading optimization method according to any one of claims 1-4 is characterized by comprising an environment modeling module for constructing a graph enhancement state space, constructing an action space and setting a causal demand rewarding function, and specifically comprising the following steps of (1) constructing the graph enhancement state space, constructing a heterogeneous graph Wherein Is a node set comprising user nodes Task node Server node ; For an edge set comprising a representation of task generation relationships Edges representing network connection relations Edges and representing potential processing relationships Edge, employing a graph attention network as an encoder Patterning of environment Encoding to generate a fixed-dimension graph embedded vector Construction of action space as state awareness of agent (2) action defines the operational variables of the system in task offloading and resource allocation decisions, including offloading the task's resource allocation volume Bandwidth allocation amount occupied during migration And Selection of an offload target at a time The motion space is kept as a mixed design of continuity and discreteness, and the expression is that (3) Setting a causal demand rewarding function for eliminating pollution of the rewarding signal by the environment confusion variable, including defining a causal graph, performing a counter facts rewarding calculation and a pure rewarding, wherein key variables of the causal graph include user context Unloading action Task results The causal dependencies are as follows: And is also provided with Wherein Representing that C has a direct effect on A and Y, the counterfactual reward calculation is specifically by a pre-trained counterfactual predictive model To estimate baseline returns for predicting a given state And user context When taking arbitrary action vector The net rewards Is the actual benefit Benefit from baseline The difference of (a) is expressed as , wherein, Indicating the state of the system at time t, Indicating that the agent is in state The action to be taken in the course of the following, Indicating the action of the agent at the next moment, A context variable representing the agent at time t; Building a neural network structure module, wherein the neural network structure comprises an online strategy network Target policy network On-line Q network And target Q network Wherein In order to be in a given state, Is an action vector, an online policy network Target policy network Is responsible for generating actions according to states, and is input by graph embedded vector The network adopts a multi-branch structure to adapt to the mixed action space, and specifically comprises embedding vectors through a shared full-connection layer processing diagram Forming a shared feature layer, outputting the corresponding resource allocation amount of the continuous value Resource allocation branching of (c) Outputting the corresponding bandwidth allocation amount of the continuous value Bandwidth allocation branching of (a) And a server selection branch outputting a selection logic value for each available server Q network, i.e. on-line Q network And target Q network For evaluating the value of state-action pairs, the input of which is a graph-embedded vector Motion vector The network structure is a three-layer fully-connected neural network, and the learning aim is to minimize the net rewards Is a time sequence difference error of (a); the algorithm training module is used for firstly configuring environment parameters and initializing network parameters, secondly performing environment interaction and experience storage, and finally performing network updating and loop iteration; Model deployment and continuous optimization module, which is to train the algorithm training module to finish the final strategy network Deployment into dynamic mobile edge computing systems for real-time task offloading decisions, wherein Vectors are embedded for the graph.

Description

Mobile edge computing task unloading optimization method and device Technical Field The invention relates to the technical field of computer science and information engineering, in particular to a method and a device for unloading and optimizing a mobile edge computing task. Background In the age of rapid evolution of internet of things (IoT) devices and 5G/6G communication technologies, a number of computationally intensive and delay-sensitive applications are emerging, such as Augmented Reality (AR), unmanned, smart medicine, etc. These applications are extremely demanding in terms of computational resources and response time delays. However, because the mobile terminal has limited computing and storage capabilities, the traditional cloud computing mode has significant network transmission delay and centralization bottlenecks, and is difficult to meet the requirements of real-time performance and energy consumption control. For this reason, mobile Edge Computing (MEC) has evolved to provide low latency local computing and data services to users by deploying computing nodes at the network edge near the terminal. Although MEC architecture alleviates the computational load and time delay problems to a certain extent, the running environment has high dynamic and uncertainty, namely, the mobile terminal frequently spans coverage areas of different edge servers, so that task migration and connection interruption frequently occur, edge node computation and bandwidth resources change at any time, and multi-user shared resources enable the system load to present a strong time-varying characteristic. These factors complicate and dynamic task offloading and resource allocation issues. In recent years, researchers have attempted to solve such problems using Deep Reinforcement Learning (DRL) algorithms. The DRL continuously learns the strategy through interaction with the environment, and can realize automatic task unloading and resource optimization in a high-dimensional state space. However, with the deep research, the existing DRL method gradually exposes two deep technical bottlenecks which are not effectively solved when dealing with the intrinsic complexity of the MEC environment (1) the problem of structural characterization deficiency of a state space. Existing methods typically model MEC environmental states as a stack of independent physical parameters (e.g., server resources, channel bandwidth, user coordinates, etc.). This "flattened" vector representation completely ignores the topological connections and complex interactions inherent between users, tasks, edge servers. The intelligent agent cannot sense the structural information of the whole system, so that learned strategies are difficult to understand structural dynamics such as 'associated server change caused by user movement' or 'multi-user competition for the same resource chain', and therefore generalization capability is poor and decision is failed when the environment changes unstably. (2) problem of confusing bias of bonus signals. Existing DRL methods rely heavily on artificially designed reward functions (e.g., weight summing of time delays, energy consumption, etc.). Such reward signals are mixed with the real causal effects of the inherent fluctuations of the environment (confounding variables) and the decision actions. For example, the success of a task may be mainly due to the user moving to an area where the channel quality is better, rather than the offloading decision itself being better. The biased rewards can mislead the strategy to learn false statistical association instead of real causal rules, severely restrict the performance upper limit and learning stability of the strategy, and lead the strategy to have severe fluctuation in dynamic environment. In order to alleviate the above problems, research attempts have been made to introduce techniques such as imitative learning, but the two fundamental bottlenecks are not broken through in essence, and new problems such as expert sample dependence and unstable training are introduced. Disclosure of Invention The invention provides a mobile edge computing task unloading optimization method aiming at two fundamental problems of state perception shallowness and bias of rewarding signals faced by task unloading decision in a Mobile Edge Computing (MEC) environment, which comprises the following steps: S1, performing environment modeling, namely constructing a graph enhancement state space, constructing an action space and setting a causal demand rewarding function, wherein the method specifically comprises the following steps of (1) constructing the graph enhancement state space, namely constructing a heterogeneous graph WhereinIs a node set comprising user nodesTask nodeServer node;For an edge set comprising a representation of task generation relationshipsEdges representing network connection relationsEdges and representing potential processing relationshipsEdge, employing a graph atte