Search

CN-121985343-A - Differential privacy client selection policy based on priority in vehicle edge calculation

CN121985343ACN 121985343 ACN121985343 ACN 121985343ACN-121985343-A

Abstract

The invention discloses a vehicle networking client selection and resource allocation strategy based on Deep Reinforcement Learning combined with FEDERATED LEARNING algorithm. First, a three-tier collaborative architecture consisting of vehicles, edge servers, and cloud servers is built. Secondly, a fl_ MADDPG algorithm based on Deep Reinforcement Learning in combination with FEDERATED LEARNING for a vehicle networking client selection and resource allocation strategy is proposed. The client selection and resource allocation policy is then derived based on the FL-MADDPG algorithm. And finally, when the client is selected to participate in federal polymerization, personalized differential privacy noise is added for the transmitted model parameters.

Inventors

  • CHEN YISHAN
  • Cheng Decan
  • TENG MENGFAN
  • Xie Runshan
  • CHENG GUANJIE

Assignees

  • 江西理工大学

Dates

Publication Date
20260505
Application Date
20251117

Claims (6)

  1. 1. The internet of vehicles client selection and resource allocation strategy based on Deep Reinforcement Learning combined with FEDERATED LEARNING algorithm is characterized by comprising the following steps: 1) Constructing a three-layer collaborative architecture consisting of vehicles, edge servers and cloud servers; 2) The FL_ MADDPG algorithm based on Deep Reinforcement Learning combined FEDERATED LEARNING for the Internet of vehicles client selection and resource allocation strategy is proposed; 3) Obtaining a client selection and resource allocation strategy; 4) And when the client is selected to participate in federal polymerization, personalized differential privacy noise is added for the transmitted model parameters.
  2. 2. The internet of vehicle-oriented client selection and resource allocation strategy based on Deep Reinforcement Learning in combination with the FEDERATED LEARNING algorithm of claim 1, wherein the heterogeneous computing architecture consisting of vehicle, edge server and cloud server described in step 1) includes three parts: 1) The vehicle layer is used for generating a calculation task and locally training a model; 2) An edge server layer for providing computing and communication resources for local model aggregation; 3) And the cloud server layer is used for coordinating global model aggregation, verifying privacy budgets and synchronizing federal learning processes among the multi-edge servers.
  3. 3. The internet of vehicle client selection and resource allocation policy based on Deep Reinforcement Learning in combination with the FEDERATED LEARNING algorithm as in claim 2, wherein the fl_ MADDPG algorithm for the internet of vehicle client selection and resource allocation policy in step 2) is as follows: The algorithm is a multi-agent deep reinforcement learning algorithm based on an Actor-Critic architecture, and is characterized in that a 'centralized training and decentralized execution' mode is adopted, during training, all agents can share global state information to optimize strategies, and each agent depends on local observation when executing decisions, so that the complex interaction and resource competition problems among the multiple agents can be effectively processed, and the problem of dynamic client selection in the Internet of vehicles environment is solved by using the algorithm; The proposed FL-MADDPG algorithm first models the joint optimization problem as Markov Decision Process (MDP), its state space includes the vehicle task queue, available computing resources between the vehicle and edge servers, the action space jointly determines the client selection indicator variable, the communication subchannel allocation variable and the computing resource allocation proportion in the same decision time slot, aims to optimize the processing time and the energy consumption of the task, and makes the task completion rate reach the maximum benefit, uses the state space as input, generates a suitable resource allocation strategy for the vehicle and whether to participate in the joint strategy of the federal aggregation multi-client selection strategy as output.
  4. 4. The internet of vehicle oriented client selection and resource allocation strategy according to claim 3, wherein the objective function of the client selection and resource allocation strategy based on Deep Reinforcement Learning in combination with FEDERATED LEARNING algorithm in step 3) is: ∑ n∈N (F (n,t) )/(λ n T (n,t) +(1-λ n )E (n,t) )+φA t (1) Wherein, the 1) The system cost of the front half part is the total vehicle, which consists of the weighting of time energy consumption and the task completion number F n,t , lambda n T (n,t) +(1-λ n )E (n,t) represents the weighted sum of the time and the energy consumption of the total vehicle, lambda n is the weight of the vehicle to the time and the energy consumption, A t of the rear half part is the model accuracy of federal learning, phi is a scaling factor, and in the system cost, the values of delay and the energy consumption are determined by a strategy selected by a client; 2) When the vehicle n is not selected to participate in federal learning, only the time of local processing needs to be considered If vehicle n is selected to participate in federal aggregation, it is necessary to consider not only local processing time but also time for parameter upload and download The time it takes for the polymerization to take, Representing the transmission time of the model parameters to the edge server, Representing the time taken for transferring the model parameters to the adjacent edge server e' through the edge server e, wherein the transfer time for uploading the model parameters to the cloud is as follows The specific calculation formula of the time is as follows: Where V n represents the computational resources required to aggregate the model parameters, f n represents the computational resources owned by the vehicle, D n represents the size of the model parameters, The uplink transmission rates from the vehicle to the edge server e, from the vehicle to the neighboring edge server e' and from the edge server e to the cloud server are respectively represented, Representing the downlink transmission rate from the edge server to the vehicle, and also taking into account the time taken for aggregation when model parameters are transmitted to the server for aggregation, The aggregate time spent in edge servers e and e', respectively, is represented by the following formula: Wherein f e and f e′ represent computing resources owned by edge server e and adjacent edge server e', respectively, Representing the proportions of computing resources allocated for this aggregation by edge server e and adjacent edge server e', respectively; In this process, not only the time spent but also the corresponding energy consumption should be considered, since the energy consumption mainly originates from the transmitting process of the wireless communication module, the magnitude of which is determined by the power of each link together with the corresponding transmission time, and in accordance with the time model, the energy consumption also depends on whether the vehicle is selected and the aggregation path adopted, in particular, when the vehicle n is not selected to participate in the aggregation, only the local calculation energy consumption needs to be considered When the vehicle n is selected to participate in the polymerization, the energy consumption spent in the transmission is also taken into consideration And Representing the energy consumption spent transmitting to the edge server to the adjacent edge server and to the cloud respectively, the specific calculation formula of the energy consumption is as follows: Wherein ζ n is determined by the CPU architecture of the vehicle chip, And Respectively representing the transmission power from the vehicle to the edge server, from the vehicle to the adjacent edge server, and from the edge server to the cloud server.
  5. 5. The internet of vehicle client-oriented selection and resource allocation strategy based on Deep Reinforcement Learning in combination with the FEDERATED LEARNING algorithm as in claim 4, wherein when the client is selected to participate in federal aggregation, step 4) adds personalized differential privacy noise content to the transmitted model parameters as follows: 1) In the standard ε -DIFFERENTIAL PRIVACY (ε -DP) definition, if a random mechanism is used For adjacent datasets differing by at most one element And And a subset S of arbitrary outputs, satisfying The method is characterized in that the method comprises the steps of acquiring a personalized differential privacy mechanism, wherein the personalized differential privacy mechanism is introduced, and the core of the personalized differential privacy mechanism is defined as privacy configuration xi, namely a mapping: ξ:U→R + (13) The mapping maps the settings of user i to their Personalized privacy preferences ζ i and replaces the unified privacy configuration ε with ζ i , which is defined as ζ -personified DIFFERENTIAL PRIVACY (ζ -PDP), requiring a random mechanism For any record d and d' satisfies: to achieve privacy protection, laplace1 Mechanism is introduced to inject noise, and the Mechanism needs to calculate a query function I 1 sensitivity of (A) It measures the maximum impact of a single data record change on query results, defined as: and by looking up results Subject to the sensitivity is added And personalized privacy configuration ζ i to achieve privacy protection, the added noise is expressed as: 2) According to the method, a client selects a designated client to participate in federal aggregation according to a client selection strategy, obtains respective privacy protection levels according to the task queues, adds local differential privacy noise to own model parameters, encrypts and transmits the privacy protection levels, and after receiving model parameters of all the selected clients, a server firstly decrypts the local privacy levels of the clients, obtains corresponding local privacy budget xi through a mapping table, judges whether a current global aggregation result meets a preset global differential privacy threshold epsilon max , and if not, re-injects global calibration noise to an aggregation model to enable the model to meet differential privacy requirements.
  6. 6. The internet of vehicle-oriented client selection and resource allocation strategy based on Deep Reinforcement Learning in combination with FEDERATED LEARNING algorithm according to any one of claims 1-5, wherein the process of client selection based on the strategy is as follows: 1) FEDERATED LEARNING the selected client vehicle downloads the current global model W t , uses the local data set to perform multiple training, and after the training is completed, the client adopts the personalized differential privacy mechanism described in claim 5 to update the model parameters according to own privacy configuration xi i Adding noise to obtain model parameters after disturbance And the disturbed model parameters and the encrypted privacy protection level The cloud server executes final global aggregation, checks whether a preset privacy threshold epsilon max is met or not at the stage, if not, global calibration noise is injected, and finally, model accuracy A t is obtained; 2) Deep Reinforcement Learning obtaining a prize value given by the environment by integrating the time spent, the energy consumption and the meeting condition of constraint conditions through the vehicle task queue information and the condition of each communication link, adding the obtained global model accuracy into the prize to obtain a prize value r t , storing the prize value r t together with the current weight state s t , the executed action a t and the new state s t+1 into an experience playback pool as an experience tuple (s t ,a t ,r t ,s t+1 ); 3) And repeating the step 1) and the step 2) until a client selection and resource allocation strategy with balanced benefits of time, minimum energy consumption, maximum task completion rate and maximum accuracy after the client participates in aggregation is obtained.

Description

Differential privacy client selection policy based on priority in vehicle edge calculation Technical Field The invention relates to a problem of a client selection and resource allocation strategy in a car networking scene, in particular to a car networking client selection and resource allocation strategy based on Deep Reinforcement Learning combined with FEDERATED LEARNING algorithm. Background With the evolution of 5G/6G communication technology, the internet of vehicles (IoV) has become the core for building future intelligent transportation systems. The supported automatic driving, high-precision map updating and other key applications have extremely high requirements on the real-time performance and reliability of data processing. However, the computing power and the storage capability of the vehicle-mounted unit are limited, and the time delay and the bandwidth pressure caused by uploading the massive data to the remote cloud center make the traditional cloud computing architecture difficult to be qualified. Therefore, vehicle Edge Computing (VEC) has been developed, by sinking computing resources to the roadside units or base stations, the data processing path is significantly shortened, the delay is effectively reduced, and the backbone network load is relieved, which becomes a key technology for meeting IoV severe QoS requirements. However, the VEC brings new challenges while improving the efficiency, namely, the vehicle continuously generates sensitive data such as position, track, behavior and the like, if the sensitive data is directly uploaded and intensively processed, the network burden is increased, and privacy leakage is more easily caused by security holes of data aggregation or transmission links. In the increasingly strict global data regulation background, how to guarantee the privacy of users while releasing the data value has become a core bottleneck restricting IoV scale deployment. Under such a background, how to cooperatively optimize a client selection, resource allocation and privacy protection mechanism in a dynamic heterogeneous internet of vehicles environment, and achieve multi-objective balance of low time delay, high energy efficiency and fine-grained privacy security is particularly urgent and necessary. The method is not only related to practical landing and performance breakthrough of federal learning in an edge computing scene, but also has key practical significance for constructing a safe, reliable, efficient and intelligent next-generation intelligent traffic system. Disclosure of Invention The invention provides a client selection and resource allocation strategy based on Deep Reinforcement Learning combined with FEDERATED LEARNING algorithm aiming at the client selection and resource allocation strategy applied in the internet of vehicles edge network environment. The invention is realized by adopting the following technical scheme: A vehicle networking client selection and resource allocation strategy based on Deep Reinforcement Learning combined with FEDERATED LEARNING algorithm comprises the following steps: 1) Constructing a three-layer collaborative architecture consisting of vehicles, edge servers and cloud servers; 2) The FL_ MADDPG algorithm based on Deep Reinforcement Learning combined FEDERATED LEARNING for the Internet of vehicles client selection and resource allocation strategy is proposed; 3) Obtaining a client selection and resource allocation strategy; 4) And when the client is selected to participate in federal polymerization, personalized differential privacy noise is added for the transmitted model parameters. In the above technical solution, further, the collaborative architecture of the vehicle, the edge server and the cloud server described in step 1) includes three parts: 1) The vehicle layer is used for generating a calculation task and locally training a model; 2) An edge server layer for providing computing and communication resources for local model aggregation; 3) The method is used for coordinating global model aggregation, verifying privacy budgets and synchronizing federal learning processes among the multi-edge servers; Further, the fl_ MADDPG algorithm for client selection and resource allocation policies described in step 2) is as follows: The algorithm is a multi-agent deep reinforcement learning algorithm based on an Actor-Critic architecture, and the core of the algorithm is a mode of centralized training and decentralized execution. During training, all agents may share global state information to optimize policies, while each relies on local observations in performing decisions, thereby enabling complex interactions and resource competition problems between multiple agents to be effectively handled. The method is used for solving the problem of dynamic client selection in the Internet of vehicles environment. Vehicles can produce many different tasks during their travel that are subject to varying demands on time and privacy