CN-121981207-A - Unmanned aerial vehicle assisted asynchronous federal learning online scheduling method and system based on convergence awareness

CN121981207ACN 121981207 ACN121981207 ACN 121981207ACN-121981207-A

Abstract

The invention relates to the field of mobile edge calculation, and discloses an unmanned aerial vehicle assisted asynchronous federal learning online scheduling method and system based on convergence awareness. The method comprises the steps of establishing a system model, defining model staleness, establishing a convergence cost penalty function containing data isomerism and staleness effects based on theoretical deduction, establishing a joint optimization problem aiming at minimizing long-term convergence cost under the condition of meeting energy budget and bandwidth constraint, utilizing a Lyapunov optimization technology to establish a virtual queue, converting the long-term problem into a single-time-slot deterministic optimization sub-problem, and finally adopting a deep reinforcement learning algorithm based on an encoder-decoder structure to solve flight position and a client selection strategy on line. The system includes various modules and computing devices configured to perform the method steps described above. The invention can effectively relieve model deviation caused by asynchronous update and obviously improve the convergence speed and precision of the model in an energy-limited environment.

Inventors

LU JIANFENG
LI LEI
WANG QIANXUN

Assignees

武汉科技大学

Dates

Publication Date: 20260505
Application Date: 20260126

Claims (10)

1. An unmanned aerial vehicle assisted asynchronous federal learning online scheduling method based on convergence awareness is characterized by comprising the following steps of: S1, establishing an unmanned aerial vehicle-assisted asynchronous federal learning system model, wherein the unmanned aerial vehicle-assisted asynchronous federal learning system model comprises an unmanned aerial vehicle serving as a mobile parameter server and a plurality of heterogeneous Internet of things clients distributed on the ground; s2, establishing a communication model and an unmanned aerial vehicle energy consumption model between the unmanned aerial vehicle and the client, and defining model staleness based on an asynchronous federal learning process; s3, carrying out theoretical analysis on the convergence of asynchronous federal learning, deducing a convergence upper bound, and constructing a convergence cost penalty function containing data isomerism and model staleness influence based on the upper bound; s4, under the constraint condition that unmanned aerial vehicle energy budget, communication bandwidth and maximum obsolescence are met, establishing a joint optimization problem aiming at minimizing long-term convergence cost; s5, converting the joint optimization problem into a single-time-slot deterministic optimization sub-problem by utilizing a Lyapunov optimization technology, and constructing a corresponding virtual queue to process long-term constraint; And S6, adopting a deep reinforcement learning algorithm based on an encoder-decoder structure to solve the single-time slot optimization sub-problem on line, outputting a target flight position and a client selection strategy of the unmanned aerial vehicle, and enabling the unmanned aerial vehicle to execute flight and receive a model gradient uploaded by the client according to the target flight position and the client selection strategy, so as to complete asynchronous model aggregation and updating.
2. The unmanned aerial vehicle assisted asynchronous federal learning online scheduling method based on convergence awareness of claim 1, wherein in S1, an unmanned aerial vehicle assisted asynchronous federal learning system model is constructed, and the method specifically comprises the following steps: the system comprises a unmanned aerial vehicle serving as a mobile parameter server and a collection unit Multiple ground clients of (a) each client Holding a local data set Its local data distribution has a difference from the global data distribution, which is quantified as a data isomerism indicator by KL divergence The training goal of the federal learning is to find the optimal global model To minimize global loss function The global loss function is defined as a weighted average of all client local loss functions: Wherein, the Is a client Is used for the data amount weight of the (a), Is a client Is a local loss function of (1).
3. The unmanned aerial vehicle assisted asynchronous federal learning online scheduling method based on convergence awareness of claim 1, wherein in S2, the process of establishing a communication model between the unmanned aerial vehicle and the client and defining model staleness specifically comprises the following steps: Dividing the operation of the system into a series of discrete time slots At any time slot Unmanned aerial vehicle selects client that part is ready Participating in model aggregation, selected clients due to asynchronous nature of the system Uploaded gradient Is based on a global model of historical moments Calculated, wherein Defining as model staleness, representing the difference between the current time slot and the last time slot of the client to participate in aggregation, establishing a line-of-sight channel model between the unmanned aerial vehicle and the client, assuming that the unmanned aerial vehicle flight altitude is In time slot Is at the horizontal position of Client terminal The position of (2) is Channel gain therebetween Expressed as: Wherein, the Calculating the client according to shannon's formula And sets the constraint that each client must complete the gradient upload within a fixed communication duration to determine the minimum bandwidth required by each client Establishing an energy consumption model of the unmanned aerial vehicle, wherein the unmanned aerial vehicle is in a time slot Total energy consumption of (2) Consists of propulsion energy consumption, communication energy consumption and calculation energy consumption, wherein the propulsion energy consumption is related to the flying speed of the unmanned aerial vehicle, the communication energy consumption is related to the data transmission duration, and the total energy consumption of the unmanned aerial vehicle in the whole task period is set to be less than or equal to the maximum energy budget 。
4. The unmanned aerial vehicle assisted asynchronous federal learning online scheduling method based on convergence awareness of claim 1, wherein in S3, the process of constructing a convergence cost penalty function specifically comprises: performing error decomposition and definition on an asynchronous federal learning global loss function, deducing an expected convergence upper bound, and extracting a term which changes along with each round of scheduling decision in the upper bound as a convergence cost penalty function It is defined as: Wherein, the The learning rate updated for the global model; For instant cost terms, for quantifying a set of clients selected for a current time slot The impact of quality of (a) on model convergence, its value being related to the data heterogeneity index of the selected client The convergence deviation caused by the data distribution difference is represented by the weighted sum positive correlation of the data; the historical accumulated cost term is used for quantifying the hysteresis influence of historical decision on the convergence of the current model, and the hysteresis influence is related to the model staleness accumulated by unselected clients in a plurality of time slots in the past.
5. The unmanned aerial vehicle assisted asynchronous federal learning online scheduling method based on convergence awareness of claim 1, wherein in S4 and S5, the process of processing long-term constraints and establishing virtual queues by using Lyapunov optimization technology specifically comprises the following steps: Establishing a joint optimization problem aiming at minimizing a long-term time average value of a convergence cost penalty function in a whole task period, wherein the problem needs to meet a total energy budget constraint of a full period of an unmanned aerial vehicle and a maximum staleness constraint of a client It is defined as: Wherein, the To be based on total energy budget And total time slot number The calculated average energy budget for a single slot, For processing the maximum old degree constraint of the client, constructing a virtual queue of the old degree of the client It is defined as: Wherein, the Is a preset maximum threshold of staleness, To instruct the client Binary variable whether the current time slot is selected or not, if so Otherwise, 0.
6. The unmanned aerial vehicle-assisted asynchronous federal learning online scheduling method based on convergence awareness of claim 5, wherein in S5, the joint optimization problem is converted into a deterministic optimization sub-problem for each slot, in particular by minimizing an upper bound of a drift weighting penalty, and an objective function of the deterministic optimization sub-problem is expressed as: Wherein, the Is a non-negative control parameter and is used for balancing convergence performance and constraint satisfaction; The convergence cost penalty function of claim 4, wherein the objective function is formed by three weights, a first term to minimize the convergence cost of the current slot and a second term to penalty the energy virtual queue Excessive energy consumption behavior in backlog, a third item aimed at rewarding virtual queues when staleness Selecting behavior of a corresponding client upon backlog, wherein Is the maximum staleness threshold.
7. The unmanned aerial vehicle assisted asynchronous federal learning online scheduling method based on convergence awareness of claim 1, wherein in S6, the deep reinforcement learning algorithm adopts a neural network architecture based on an encoder-decoder, and the construction process of the encoder part comprises the following steps: A1, inputting preprocessing, namely dividing the system state of the current time slot into global state information and individual state information of each client, wherein the global state information comprises unmanned energy virtual queue state Historical convergence cost term reflecting cumulative impact of historical decisions, and system total communication bandwidth constraint The individual state information comprises a data isomerism index of the position coding and the quantification of local data distribution of a client relative to the unmanned aerial vehicle Local data set size ratio Client staleness virtual queue state And the current calculated cooling state or remaining calculation time of the client; A2, feature extraction, namely constructing an independent feature extraction network, and mapping the global state information and the individual state information of each client into initial feature vectors; A3, context aggregation, namely performing interactive processing on initial feature vectors of all clients by utilizing a self-attention mechanism module, and calculating attention weights among the clients so as to capture the interdependence relationship of the clients on geographic distribution and training states; A4, output generation, namely outputting a client characteristic embedded set containing global context information and an aggregated graph embedded vector as input of a decoder.
8. The unmanned aerial vehicle assisted asynchronous federal learning online scheduling method based on convergence awareness of claim 7, wherein the decoder portion construction process and action generation strategy comprises: B1, double-head structure construction, wherein the decoder comprises a displacement decision head and a client selection decision head; B2, generating a displacement decision, wherein a displacement decision head calculates probability distribution of the unmanned aerial vehicle on a discretization action space based on the graph embedded vector, and samples to generate a displacement decision of the unmanned aerial vehicle, and the displacement decision is required to be cut through a geographic boundary to meet the flight area constraint; B3, generating a client sequence, wherein the client selection decision head adopts an autoregressive structure based on a pointer network, and sequentially generates a client sequence participating in aggregation by taking a cyclic neural network or a gating cyclic unit as a core; The feasibility constraint processing, in each step of decoding process of the client selection decision head, the dynamic mask mechanism is applied to forcedly set the selected probability of the infeasible client to zero, wherein the infeasible client comprises a client in a computational cooling state, a client which is already selected in the current time slot and a client which can cause the total bandwidth requirement to exceed the residual bandwidth resource of the system after joining; And B5, judging the sequence termination, wherein the action space of the client selection decision head comprises a special ending mark, and stopping the client selection process of the current time slot when the mark is sampled.
9. The unmanned aerial vehicle assisted asynchronous federal learning online scheduling method based on convergence awareness of claim 7, wherein in S6, the deep reinforcement learning reward function construction and network parameter training process comprises: C1 bonus function design: instant bonus defining current time slot Optimizing the negative value of the sub-problem objective function for the certainty, i.e To guide the agent to minimize long term convergence costs and meet constraints by maximizing cumulative expected returns; c2 Advantage function estimation, introducing a baseline function To reduce the strategy gradient variance, calculate the dominance function Wherein Accumulating rewards for discounts starting from the current time slot; the loss function is constructed, wherein the total loss function consists of strategy gradient loss, value estimation loss and entropy regularization terms, the entropy regularization terms are used for encouraging the exploratory of the strategy and preventing the algorithm from converging to the suboptimal strategy prematurely; and C4, parameter updating, namely adopting REINFORCE algorithm or strategy gradient type algorithm, and carrying out iterative updating on the network parameters of the encoder and the decoder by using a gradient descent method.
10. An unmanned aerial vehicle assisted asynchronous federal learning system based on convergence perception is characterized by comprising an unmanned aerial vehicle, a plurality of ground heterogeneous clients and a computer program stored in a memory and executed by a processor, wherein the unmanned aerial vehicle is used as a mobile parameter server and is provided with a flight control module and a wireless communication module, the ground heterogeneous clients hold local data sets and are provided with a model training module, the processor executes the computer program to realize the steps of the method in any one of claims 1 to 9, the system executes an asynchronous federal learning process, the unmanned aerial vehicle plans a flight path and selects clients based on a depth reinforcement learning strategy in each time slot, the selected clients calculate gradients based on the local data and a historical global model and upload, and the unmanned aerial vehicle asynchronously aggregates the received gradients and updates the global model.

Description

Unmanned aerial vehicle assisted asynchronous federal learning online scheduling method and system based on convergence awareness Technical Field The invention relates to the technical field of intersection of mobile edge calculation, wireless communication and distributed machine learning, in particular to an unmanned aerial vehicle assisted asynchronous federal learning online scheduling method based on convergence awareness. Background Federal Learning (FL) has been widely used in the internet of things scenario as a paradigm that enables collaborative training with distributed data while protecting data privacy. However, in remote or post-disaster areas where terrestrial communication infrastructure is scarce, conventional terrestrial base stations have difficulty meeting coverage requirements. Unmanned Aerial Vehicle (UAV) is introduced as a mobile parameter server, and the high mobility and LoS communication advantages of the UAV are utilized to assist federal learning, so that the UAV becomes an effective way for solving the problem. However, in practical application, the method is limited by the isomerism of the computing and communication capabilities of the Internet of things equipment, and the traditional synchronous federal learning faces a serious 'lag effect', so that the unmanned aerial vehicle is caused to wait in an invalid hovering mode in the air, the limited airborne energy of the unmanned aerial vehicle is greatly consumed, and the service life of a task is shortened. To improve time and energy efficiency, asynchronous Federal Learning (AFL) was introduced into unmanned networks, allowing devices to upload updates at any time without global synchronization. However, asynchronous mechanisms inevitably introduce the problem of "model staleness", i.e. the gradient computed by the client based on the historical global model deviates from the current global model. When the degree of staleness and inherent data isomerism (Non-IID) of the internet of things equipment are interwoven with each other, convergence stability of the model can be seriously damaged, and accuracy of a final model is reduced. Existing studies, while solving the communication efficiency problem to some extent, tend to ignore the combined negative impact of staleness and data heterogeneity on model convergence. Disclosure of Invention In view of the above, the invention aims to provide an unmanned aerial vehicle assisted asynchronous federal learning online scheduling method and system based on convergence awareness, which aims to solve the problem that model convergence is difficult due to model staleness and data heterogeneous coupling in asynchronous update, and overcome the defect that the conventional scheduling scheme is difficult to realize long-term optimal performance in an energy-limited dynamic environment. In order to achieve the above purpose, the invention provides an unmanned aerial vehicle assisted asynchronous federal learning online scheduling method based on convergence awareness, which comprises the following steps: S1, establishing an unmanned aerial vehicle-assisted asynchronous federal learning system model, wherein the unmanned aerial vehicle-assisted asynchronous federal learning system model comprises an unmanned aerial vehicle serving as a mobile parameter server and a plurality of heterogeneous Internet of things clients distributed on the ground; s2, establishing a communication model and an unmanned aerial vehicle energy consumption model between the unmanned aerial vehicle and the client, and defining model staleness based on an asynchronous federal learning process; s3, carrying out theoretical analysis on the convergence of asynchronous federal learning, deducing a convergence upper bound, and constructing a convergence cost penalty function containing data isomerism and model staleness influence based on the upper bound; s4, under the constraint condition that unmanned aerial vehicle energy budget, communication bandwidth and maximum obsolescence are met, establishing a joint optimization problem aiming at minimizing long-term convergence cost; s5, converting the joint optimization problem into a single-time-slot deterministic optimization sub-problem by utilizing a Lyapunov optimization technology, and constructing a corresponding virtual queue to process long-term constraint; And S6, adopting a deep reinforcement learning algorithm based on an encoder-decoder structure to solve the single-time slot optimization sub-problem on line, outputting a target flight position and a client selection strategy of the unmanned aerial vehicle, and enabling the unmanned aerial vehicle to execute flight and receive a model gradient uploaded by the client according to the target flight position and the client selection strategy, so as to complete asynchronous model aggregation and updating. Further, in the step S1, constructing an unmanned plane-assisted asynchronous federal learning system model, which