CN-121351939-B - Hierarchical federal learning method and system based on split element learning
Abstract
The invention discloses a hierarchical federation learning method and system based on split element learning, wherein the method specifically comprises the steps that a cloud server initializes a global model and distributes the global model to each edge server, and an edge server layer distributes the global model to a client; the method comprises the steps of enabling each client to update a feature extraction module of a global model locally based on local data and freezing a classifier module of the global model, enabling an edge server layer to aggregate parameters of the updated feature extraction modules from a plurality of clients and execute meta-optimization on the classifier modules by utilizing a local verification data set to generate an edge local model, enabling a cloud server layer to periodically collect model parameters from a plurality of edge servers, and updating the global model by adopting a gradient sensitive momentum aggregation strategy. According to the invention, through the three-layer architecture design of 'vehicle-side-cloud' and a two-stage collaborative optimization mechanism, the collaborative optimization among privacy protection, communication efficiency and model adaptability is realized in the system in the vehicle networking environment.
Inventors
- LIANG WEI
- XIE RUI
- DIAO ZULONG
- CHEN YUXIANG
- MENG XIANGWEI
- HE DACHENG
Assignees
- 湖南科技大学
Dates
- Publication Date
- 20260505
- Application Date
- 20251217
Claims (9)
- 1. The hierarchical federal learning method based on split element learning is characterized by comprising the following steps: Constructing a client layer, an edge server layer and a cloud server layer, wherein the client layer comprises a plurality of clients, the edge server layer comprises a plurality of edge servers, and the cloud server layer comprises at least one cloud server; initializing a global model based on a cloud server and distributing the global model to each edge server, and distributing the global model to a affiliated client through an edge server layer; Enabling each client to update the feature extraction module of the global model locally based on the local data, freezing the classifier module of the global model at the same time, and asynchronously uploading the parameters of the updated feature extraction module to an associated edge server; Aggregating the parameters of the updated feature extraction modules from the plurality of clients based on the edge server layer, and performing meta-optimization on the classifier modules by using the local verification data set to generate an edge local model; periodically collecting model parameters from a plurality of edge servers based on a cloud server layer, and updating a global model by adopting a gradient sensitive momentum aggregation strategy; based on the edge server layer, receiving the updated global model, and calculating personalized fusion weights according to the loss difference of the edge local model and the updated global model on the local verification data; Based on the personalized fusion weight, carrying out weighted fusion on the edge local model and the global model to generate a personalized local model, and distributing the personalized local model to the affiliated client through the edge server; the method specifically comprises the steps of aggregating parameters of an updated feature extraction module from a plurality of clients based on an edge server layer, and performing meta-optimization on a classifier module by using a local verification data set to generate an edge local model, wherein the method specifically comprises the following steps: according to the parameters of the updated feature extraction modules received by the edge server layer from a plurality of clients, calculating an aggregation weight coefficient by combining the data volume proportion of the clients, and executing a weighted aggregation operation to generate aggregated feature extraction parameters; The edge server utilizes the aggregated characteristic extraction parameters, combines with the local verification data set, performs a meta-optimization process on the classifier module, and updates the classifier parameters; and combining the updated classifier parameters with the aggregated feature extraction parameters to construct the edge local model.
- 2. The method according to claim 1, wherein the cloud-based server initializes and distributes the global model to each edge server, and the global model is distributed to the client through the edge server layer, specifically comprising: constructing an initial global model by adopting a normal distribution initialization strategy based on a cloud server, wherein the initial global model comprises initialization parameters including characteristic extraction module parameters and classifier module parameters; according to the cloud server, a lightweight diagnosis model is issued to each edge server, and the real-time calculation load and the storage state of each edge server are collected to obtain a diagnosis result; Based on the diagnosis result, distributing a complete initial global model or a simplified initial global model to each edge server through a cloud server according to the load of each edge server; and according to the network state of the client in the coverage area of each edge server, carrying out self-adaptive distribution on the client through the edge servers.
- 3. The method according to claim 1, wherein the enabling each client to update the feature extraction module of the global model locally based on the local data while freezing the classifier module of the global model and asynchronously uploading the parameters of the updated feature extraction module to the associated edge server comprises: configuring a local training task according to local computing resources of a client, sampling small batch data from a local data set based on the configured local training task to execute a local training period, calculating loss through forward propagation and back-propagating gradients, and only updating parameters of a feature extraction module; After a preset local training period is completed, calculating a feature extraction parameter increment through a client, wherein the feature extraction parameter increment is a difference value between a locally updated parameter and an originally received feature extraction module parameter; and asynchronously uploading the feature extraction parameter increment to an associated edge server according to the client.
- 4. The method according to claim 1, wherein the cloud-based server layer periodically collects model parameters from a plurality of edge servers, and updates the global model with a gradient-sensitive momentum-aggregation policy, specifically comprising: based on a preset synchronization period, collecting respective corresponding model parameter sets from a plurality of edge servers through a cloud server; based on the model parameter set, calculating verification loss gradient of each edge server model of the current turn through a cloud server, and analyzing spatial distribution differences among model parameters in the model parameter set to obtain parameter dispersion indexes; Based on the verification loss gradient and the parameter dispersion index, calculating a dynamic momentum coefficient according to a cloud server through a dynamic adjustment mechanism, wherein the dynamic momentum coefficient is inversely related to the variation trend of the verification loss gradient and positively related to the parameter dispersion index; And carrying out weighted fusion on the historical global model parameters and the model parameter set by using the dynamic momentum coefficient, and executing global model updating to generate a new round of global model.
- 5. The method according to claim 1, wherein the computing the personalized fusion weight based on the edge server layer receives the updated global model and based on the difference in loss of the edge local model and the updated global model on the local verification data specifically comprises: according to the edge server, receiving updated global model parameters from the cloud server, comparing the updated global model parameters with the parameters of the edge local model maintained by the edge server, and respectively calculating a local model loss value and a global model loss value on a local verification data set of the edge server; obtaining a loss difference value through difference value operation by using the calculated local model loss value and global model loss value; inputting the loss difference value into a hyperbolic tangent activation function to perform nonlinear transformation, and generating a weight adjustment factor; based on the weight adjustment factor, combining a preset reference retention rate and a sensitivity scaling factor, and calculating to obtain personalized fusion weight.
- 6. The method of any one of claims 1 to 5, wherein the feature extraction module is configured to process the time series data collected by the client using a multi-head self-attention mechanism based on a lightweight converter architecture, and extract a time series feature vector containing driving behavior intention, including Receiving original time sequence data acquired by a vehicle-mounted sensor of a client, denoising and normalizing the original time sequence data, and generating standardized sequence data; Segmenting and embedding mapping are carried out on the standardized sequence data, and sequence embedding vectors are generated; Adding position coding information to the sequence embedded vector to reserve time sequence dependency and generate a position-aware embedded vector; Inputting the embedded vector of the position perception into a multi-head self-attention mechanism, calculating attention distribution through a query matrix, a key matrix and a value matrix, and aggregating context characteristics to generate enhanced characteristic representation; based on the enhanced feature representation, nonlinear transformation and dimension adjustment are carried out through a feedforward neural network, and a high-order time sequence feature vector is extracted; And screening the characteristics of the key time steps from the high-order time sequence characteristic vectors based on the attention weight, and generating the time sequence characteristic vector containing the intention of driving behaviors.
- 7. The method according to any one of claims 1 to 5, wherein the classifier module is configured to extract spatial positional relationships and interaction risk features between vehicles by fusing multi-client features through a graph rolling operation based on a graph neural network aggregation layer, and includes: According to the initial feature vectors received by the edge server from a plurality of clients, and based on the relative positions or communication connection states between vehicles corresponding to the clients, constructing a dynamic graph structure, wherein graph nodes of the dynamic graph structure are clients and graph edges are connections between vehicles; Aggregating node features of the dynamic graph structure through graph convolution operation, and transforming the initial feature vector based on the adjacency matrix and the degree matrix to generate enhanced node features fused with the multi-client features; And extracting the spatial position relation features and the interaction risk features among the vehicles from the enhanced node features, carrying out collaborative scene understanding according to the spatial position relation features and the interaction risk features, and outputting a collaborative decision result.
- 8. The method according to any one of claims 1 to 5, further comprising: Training and maintaining a regional reference risk model for each edge server according to historical traffic accident data and traffic flow characteristics of the corresponding region of the edge server, and generating reference model parameters; Extracting driving behavior characteristics from local CAN bus data according to a client, updating a local characteristic extraction module based on the driving behavior characteristics, and generating characteristic increment parameters; Based on the feature increment parameters, the classifier boundary of the regional reference risk model is adjusted by an edge server through a meta-optimization strategy, and optimized local model parameters are generated; calculating a first loss difference of the optimized local model parameter and the reference model parameter on the local verification data according to the edge server, and generating a first personalized fusion weight based on the first loss difference; And carrying out weighted fusion on the optimized local model parameters and the reference model parameters by using the first personalized fusion weight to generate a personalized risk assessment model, wherein the personalized risk assessment model is used for risk scoring and real-time adjustment of the basic insurance rate of the vehicle.
- 9. A hierarchical federal learning system based on split element learning, the system comprising: The first federation learning module is used for constructing a client layer, an edge server layer and a cloud server layer, wherein the client layer comprises a plurality of clients, the edge server layer comprises a plurality of edge servers, and the cloud server layer comprises at least one cloud server; The second federation learning module is used for initializing a global model based on the cloud server and distributing the global model to each edge server, and distributing the global model to the affiliated client through an edge server layer; the third base learning module is used for enabling each client to update the feature extraction module of the global model locally based on local data, freezing the classifier module of the global model at the same time, and asynchronously uploading the parameters of the updated feature extraction module to the associated edge server; the fourth binding learning module is used for aggregating the parameters of the updated feature extraction modules from the plurality of clients based on the edge server layer, and performing meta-optimization on the classifier module by utilizing the local verification data set to generate an edge local model; A fifth federal learning module configured to periodically collect model parameters from a plurality of edge servers based on the cloud server layer, and update the global model using a gradient sensitive momentum aggregation policy; the sixth federal learning module is used for receiving the updated global model based on the edge server layer, and calculating personalized fusion weights according to the loss difference of the edge local model and the updated global model on the local verification data; the seventh federal learning module is used for carrying out weighted fusion on the edge local model and the global model based on the personalized fusion weight to generate a personalized local model, and distributing the personalized local model to the affiliated client through the edge server; the method specifically comprises the steps of aggregating parameters of an updated feature extraction module from a plurality of clients based on an edge server layer, and performing meta-optimization on a classifier module by using a local verification data set to generate an edge local model, wherein the method specifically comprises the following steps: according to the parameters of the updated feature extraction modules received by the edge server layer from a plurality of clients, calculating an aggregation weight coefficient by combining the data volume proportion of the clients, and executing a weighted aggregation operation to generate aggregated feature extraction parameters; The edge server utilizes the aggregated characteristic extraction parameters, combines with the local verification data set, performs a meta-optimization process on the classifier module, and updates the classifier parameters; and combining the updated classifier parameters with the aggregated feature extraction parameters to construct the edge local model.
Description
Hierarchical federal learning method and system based on split element learning Technical Field The invention belongs to the technical field of distributed machine learning and Internet of vehicles communication, and particularly relates to a hierarchical federal learning method and system based on split element learning. Background With the deep convergence of intelligent networking vehicles (ICV) and 5G communication technologies, the internet of vehicles (Internet of Vehicles, ioV) is accelerating to evolve as a key infrastructure supporting autopilot, intelligent transportation and telematics. The Internet of vehicles realizes collaborative awareness, intelligent decision-making and fine control through large-scale information interaction (V2X) among vehicles, road Side Units (RSUs), pedestrians and cloud control centers. With the access of hundreds of millions of vehicle terminals, the internet of vehicles generates massive multi-mode data including high-definition images, laser radar point clouds, driving behaviors and the like, and how to safely and efficiently utilize the data to train an artificial intelligent model with higher performance becomes a core problem for promoting the development of the industry. Traditional centralized machine learning schemes require that the raw data of all vehicles be uploaded to the cloud for unified training, which faces almost insurmountable obstacles in the internet of vehicles environment. Firstly, privacy and safety red line that vehicle data contains highly sensitive information such as accurate track, driving habit, audio and video in a vehicle and the like of a user, and transmission and centralized storage of the highly sensitive information face a huge privacy leakage risk and are strictly limited by global data safety regulations in various places. And secondly, V2X communication is a communication bottleneck and delay challenge, and particularly, the V2X communication is used for applications such as cooperative driving, collision early warning and the like for guaranteeing driving safety, and millisecond-level ultra-low delay is required. If massive raw data are uploaded to the cloud in real time, huge network bandwidth is occupied, high communication cost is generated, unacceptable communication delay is introduced, and driving safety is seriously threatened. Federal learning (FEDERATED LEARNING, FL) is used as a distributed machine learning paradigm of data stationary model movement, and provides a brand new solution for data processing of the Internet of vehicles. The method allows each vehicle terminal to locally utilize private data to carry out model training, and only uploads encrypted model parameter update to the server, so that data privacy is ensured at the source and communication load is greatly reduced. However, the direct application of standard federal learning frameworks (e.g., fedAvg) to the internet of vehicles, an extremely complex environment, still faces four unique and serious challenges. Federal learning (FEDERATED LEARNING, FL) is used as a distributed machine learning paradigm of data stationary model movement, and provides a brand new solution for data processing of the Internet of vehicles. However, the prior art such as patent CN112487123a (a federal learning method of the internet of vehicles) and US20210056789A1 (a hierarchical federal learning system) relates to federal learning, but does not effectively solve the problems of asynchronous split element learning and high dynamic topology adaptation in the internet of vehicles. Second, ultra-high dynamic network topology. During high speed movement of the vehicle, the connection with the edge server (e.g. RSU) is short and unstable, a so-called connection opportunity window. Vehicles frequently join and leave the coverage of a certain RSU, resulting in a rapid change in the set of participating nodes for federal learning. Standard synchronous federal learning mechanisms require all participants to complete training and uploading at the same time step, which is almost impossible to achieve in a highly dynamic internet of vehicles, resulting in a significant amount of waiting overhead and training interruption. Third, there is a great deal of device resource isomerism. The terminal in the internet of vehicles has huge difference in computing power, and the terminal coexist from a high-end intelligent automobile with strong computing power to a traditional vehicle or embedded equipment with limited computing power. Requiring all vehicles to perform the same complex model training task (e.g., mapl algorithm based on meta-learning, requiring computation of second order gradients) is impractical, can lead to a large number of low-power nodes being straggled or training failure, and severely affects the overall efficiency and fairness of the federal learning system. Fourth, a stringent real-time requirement. Many internet of vehicles applications (e.g., collaborative lane changes