CN-119940455-B - End-edge cloud model compression and deployment method based on split learning

CN119940455BCN 119940455 BCN119940455 BCN 119940455BCN-119940455-B

Abstract

The invention relates to the technical field of deep learning and discloses a method for compressing and deploying an end-to-side cloud model based on split learning, which comprises the steps of splitting a deep learning model into a front model, a middle model and a rear model, and deploying the front model, the middle model and the rear model on a client, an edge side and a cloud; the method comprises the steps of enabling a client to forward propagate a front model, sending forward propagation results to an edge side, enabling the edge side to receive the forward propagation results from at least one client, enabling the edge side to forward propagate the received results through a middle model, sending the forward propagation results to a cloud, enabling the cloud to receive the forward propagation results from at least one edge side, enabling the rear model to finish forward propagation, calculating a loss function, conducting reverse propagation to update the front model, the middle model and the rear model, conducting iterative pruning on the front model, the middle model and the rear model, conducting fine tuning on the pruned models through knowledge distillation, and deploying the light front model to the client to conduct reasoning locally on the client.

Inventors

CAI JUN
Hu Zhunyi
LIU YAN
LUO JIANZHEN
LIAO LIPING

Assignees

广东技术师范大学

Dates

Publication Date: 20260508
Application Date: 20250131

Claims (5)

1. The end-edge cloud model compression and deployment method based on split learning is characterized by comprising the following steps of: S1, splitting a deep learning model to be compressed into a front model, a middle model and a rear model, and respectively disposing the front model, the middle model and the rear model on a client, an edge side and a cloud; S2, the client transmits the front model deployed on the client forwards by utilizing local data, and transmits a forward transmission result to the edge side; S3, the edge side receives forward propagation results from at least one client side, forward propagates the received results by using a middle model deployed on the edge side, and sends the forward propagation results to the cloud; S4, the cloud receives forward propagation results from at least one edge side, completes forward propagation by utilizing a rear model deployed on the cloud, calculates a loss function, and performs backward propagation based on the loss function so as to update parameters of the front model, the middle model and the rear model; s5, respectively carrying out iterative pruning on a front model deployed at the client, a middle model deployed at the edge side and a rear model deployed at the cloud, and carrying out fine adjustment on the pruned models by utilizing knowledge distillation; S6, deploying the lightweight front model to a client to locally infer the client; in step S1, according to the resource limitations of the client, the edge side and the cloud, the deep learning model is manually split into a front model, a middle model and a rear model; in step S2, the client processes the local data and performs forward propagation of the front model until a predefined cut layer, the output of the client forward propagation is expressed as: ; Wherein, the Is the i-th front model of the model, Is the input data of the i-th client device, Is a parameter of the ith front model; In step S3, the edge side receives the intermediate results sent by all clients And polymerizing to obtain And continues to perform forward propagation as an input to the edge side, the output of which is denoted as: Wherein the method comprises the steps of Is the j-th middle model of the model, Is a parameter of the jth middle model; in step S4, the cloud receives the forward propagation result from at least one edge side, and completes forward propagation by using a rear model deployed at the cloud, including the cloud receiving the intermediate result sent by the edge side And polymerizing to obtain And finish the final forward propagation to obtain the model output The cloud-propagated output is expressed as: ; Wherein the method comprises the steps of Is a cloud model of the device, which is a cloud model, Is a parameter of the cloud model; In step S4, the calculating a loss function and back-propagating based on the loss function to update parameters of the front model, the middle model, and the rear model includes: cloud according to And real tag y to calculate a loss function The cloud calculates the gradient of the loss function relative to the model parameters and transmits the gradient to the edge side; The edge side updates the model part according to the gradient and calculates the gradient transmitted to the client; And updating the model part of the client according to the gradient by the client, and finally completing model updating.
2. The method for compressing and deploying the end-edge cloud model based on split learning according to claim 1, wherein in step S5, the fine-tuning of the pruned model by using the knowledge distillation technology comprises the steps of taking an unbeared model part as a teacher, taking the pruned model part as a student, fine-tuning by using split multi-stage knowledge distillation, wherein the multi-stage distillation fine-tuning comprises intermediate feature loss and soft tag loss, and simultaneously introducing cross entropy loss of a hard tag to form total loss of multi-stage distillation fine-tuning, and training the student model.
3. The end-edge cloud model compression and deployment method based on split learning according to claim 2, wherein the intermediate feature loss is expressed as: Wherein, the A feature map represented as a student model; R (·) is a regression variable consisting of a 1×1 convolution layer and a BN layer; is a measure of the L2 distance between the student and the teacher feature map; Wherein, the Is an intermediate feature of the front model of the end side, Is an intermediate feature of the intermediate model of the edge side; The soft tag loss is expressed as: Wherein the method comprises the steps of Student model logic output of the j-th class representing the i-th sample; And The soft output of the student model and the teacher model A of the j-th class of the i-th sample is respectively represented; The cross entropy loss of the incoming hard tag is expressed as: Wherein, the A logic output representing a j-th class of the i-th batch of samples; a j-th class hard tag representing an i-th sample; the total loss of the multi-stage distillation trim is expressed as: Wherein, the 、 And Weight values representing intermediate feature loss, soft tag loss, and hard tag loss, respectively, and 。
4. A method for compressing and deploying a client side cloud model based on split learning according to claim 3, wherein in step S6, deploying the lightweight front model to the client side comprises: Combining the front model with the middle model and the rear model; Deploying the combined model to a client to locally complete an reasoning task at the client; in the local reasoning process of the client, data does not need to be transmitted to the edge side or the cloud end, so that communication overhead and reasoning delay are reduced.
5. The method for compressing and deploying end-edge cloud model based on split learning according to claim 4, wherein the combining of the front model with the middle model and the rear model is specifically as follows: + ; Wherein, the Is a combination of a cloud model, a jth edge side model and a jth client model.

Description

End-edge cloud model compression and deployment method based on split learning Technical Field The invention relates to the technical field of deep learning, in particular to a method for compressing and deploying an end-to-side cloud model based on split learning. Background In recent years, deep learning technology has made a remarkable breakthrough in various fields, and is widely applied to tasks such as image recognition, natural language processing and the like. However, deploying computationally intensive deep learning models to resource-constrained end-side devices (e.g., smartphones, internet of things devices, etc.) still faces significant challenges. The traditional cloud reasoning mode has powerful computing resources, but needs to upload terminal data to the cloud for processing, so that not only is significant communication overhead brought, but also higher reasoning delay is introduced, and the risk of privacy leakage of users is possibly caused. To address these challenges, a coordinated architecture of ends Bian Yun has developed. The architecture aims to utilize the computing capacity of the edge side, share the computing pressure of the cloud end and reduce the data transmission between the end side and the cloud end, so that the problems of delay and communication overhead are relieved to a certain extent. However, even under the end-edge cloud architecture, it is impractical to deploy the complete deep learning model directly to resource-limited end-side devices. Therefore, the model compression technology becomes a key to achieve efficient reasoning on the end side. Currently, the mainstream model compression methods include pruning, quantization, knowledge distillation, and the like. Conventional model pruning methods are typically applied directly to the complete model, reducing model size and computational complexity by removing unimportant connections or neurons in the model. However, under the end-edge cloud architecture, if only the cloud model is pruned and deployed directly to the end side, the resource limitation of the end-side device may still not be satisfied, and the distributed characteristics of the end-edge cloud architecture are ignored. Furthermore, simple pruning operations tend to result in significant degradation of model accuracy. The split learning is used as an emerging distributed training method, and the deep learning model is split to different devices for training, so that the data privacy can be effectively protected. However, the existing split learning method mainly focuses on the distributed training process of the model, and the problems of model compression and deployment are rarely considered. The split model is directly deployed to the end side, the problem that the model is overlarge still possibly exists, and the advantages of local reasoning of the end side cannot be fully exerted, specifically: Traditional model compression methods and end-edge cloud architecture are not suitable enough, namely traditional pruning and quantization methods are usually carried out on complete models, and distributed characteristics and communication constraints of the end-edge cloud architecture are not fully considered, so that the compressed models can still be too large or communication overhead is still high when the end-edge cloud architecture is deployed. The traditional knowledge distillation method is insufficient in information migration, the traditional knowledge distillation mainly focuses on the knowledge migration of an output layer, and ignores rich characteristic information of a model middle layer, so that under a terminal edge cloud architecture, especially under the condition that a model is split, the student model can not fully learn knowledge of a teacher model, and the precision recovery effect is limited. The method has the advantages that the method is lack of a collaborative optimization strategy aiming at the end-edge cloud architecture, the prior art is less in consideration of model splitting, compression and deployment as a whole, the method is lack of a collaborative optimization strategy aiming at the characteristics of the end-edge cloud architecture, the advantages of the end-edge cloud architecture cannot be fully utilized, and the end-side reasoning with low delay and low communication overhead is achieved. Due to the defects in the prior art, when the deep learning model is deployed under the end-edge cloud architecture, the problems of high communication overhead, high reasoning time delay and the like still exist. Therefore, a technical scheme capable of efficiently compressing a model and guaranteeing accuracy by fully utilizing the characteristics of an end-edge cloud architecture is urgently needed to realize end-edge local reasoning with low delay and low communication overhead. Disclosure of Invention The invention provides a method for compressing and deploying an end-edge cloud model based on split learning, which solves