CN-122019146-A - DNN partitioning method considering robustness

CN122019146ACN 122019146 ACN122019146 ACN 122019146ACN-122019146-A

Abstract

The invention discloses a DNN partitioning method considering robustness, which evaluates the importance of each neuron in a neural network through designing weight amplitude, connection sensitivity and output distribution change, calculates structure dependent contribution by using information obtained by attention of a graph, normalizes and weights and fuses to obtain the comprehensive importance of each neuron, calculates the survival probability and the minimum replication number of the neuron, preferentially replicates neurons or channels with high comprehensive importance in memory and energy consumption budget, comprehensively considers communication cost and load balance, builds a comprehensive cost function, defines constraint conditions, solves in two stages of budget replication, graph partitioning and load balance, finely adjusts, maps all original elements and copies onto specific edge equipment, realizes DNN partitioning, and solves the problems that the contribution degree of different layers to a final prediction result is different, and all parts of a model, network bandwidth, equipment computing capacity and link quality can change at any time without limitation.

Inventors

Shi Chuxuan
XIE JINHUI
XU JIE
WANG CHENGLONG

Assignees

华东计算技术研究所（中国电子科技集团公司第三十二研究所）

Dates

Publication Date: 20260512
Application Date: 20260106

Claims (10)

1. A DNN partitioning method considering robustness, comprising the steps of: representing a deep learning model as a directed acyclic graph, modeling equipment and network links in a deployment environment as nodes and edges in the directed acyclic graph, constructing a heterogeneous graph containing model structure information and deployment environment information, and learning importance relations among the nodes by adopting a graph attention mechanism to obtain a weighted adjacency matrix And updated node embedding The dependency relationship of the model structure and the equipment/link state information are encoded; Designing weight amplitude, connection sensitivity and output distribution variation, evaluating the importance of each neuron in the neural network, calculating structure-dependent contribution by using information obtained by graph attention, and carrying out normalization and weighted fusion to obtain the comprehensive importance of each neuron; Let the equipment set be Apparatus and method Is (1) the failure probability is If a matrix is allocated Representing neurons Whether or not to be deployed at the device On top of that, calculate neurons The survival probability of (2) is as follows: when the failure rate of the devices is approximately the same and is When in order to meet the target reliability The minimum number of replications required for calculation is formulated as follows: Neurons or channels with high comprehensive importance are preferentially copied in the memory and energy consumption budget, communication cost and load balance are comprehensively considered, a comprehensive cost function is constructed, constraint conditions are defined, and through the two-stage solution of the copy in the budget, the graph partition and the load balance, fine adjustment is carried out, all originals and copies are mapped to specific edge equipment, and DNN partition is realized.
2. The DNN partitioning method according to claim 1, wherein the importance relationship between nodes is learned by using a graph attention mechanism, and the attention weight formula is obtained as follows: Wherein, the And Respectively nodes And Is used to determine the embedded representation of (a), Is a matrix of weights that is type-specific, Is a type-specific attention vector that, The vector concatenation operation is represented by a vector, Representing nodes In type The lower neighbor set.
3. The DNN partitioning method as set forth in claim 1, wherein said weight magnitude formula is as follows: Wherein, the And Respectively the first A weight matrix and a bias vector for the layer.
4. The DNN partitioning method as set forth in claim 1, wherein said connection sensitivity formula is as follows: Wherein, the Representing network output For weight Is a partial derivative of (c).
5. The DNN partitioning method as set forth in claim 1, wherein said output distribution variation formula is as follows: Wherein, the Representing Jensen-Shannon divergence is an indicator of the difference between two probability distributions.
6. The DNN partitioning method as set forth in claim 1, wherein said integrated importance formula is as follows: Wherein the weight is The determination may be learned by a small-scale validation set or historical data, Is a very small constant for numerical stability.
7. The DNN partitioning method of claim 1, wherein said communication cost formula is as follows: Wherein, the Is a neuron Is used for the output data amount of the (a), Is a coefficient reflecting the link bandwidth and delay.
8. The DNN partitioning method of claim 1, wherein the load balancing formula is as follows: Wherein, the Is a neuron Is a calculation amount of (a).
9. The DNN partitioning method of claim 1, wherein said integrated cost function is formulated as follows: 。
10. The DNN partitioning method of claim 1, wherein the constraint is as follows: 。

Description

DNN partitioning method considering robustness Technical Field The invention belongs to the technical field of partitioning, and particularly relates to a DNN partitioning method considering robustness. Background In conventional AI applications, there are generally two deployment modes, one is to put all the computation on the cloud server, and the other is to rely entirely on the local device for reasoning. However, both of these extreme approaches have significant drawbacks. The full cloud reasoning affects real-time performance due to network delay and bandwidth limitation, and the full local reasoning is limited by the computing power of the edge device. In order to solve these problems, a concept of collaborative reasoning, namely, reasonably distributing the computing tasks of the deep learning model between the terminal device and the edge server, has been raised in recent years. This approach can take advantage of the nearby advantages of the edge devices, as well as by virtue of the powerful computing power of the server. However, most of the existing collaborative reasoning methods have the following problems: First, conventional methods typically consider a deep learning network simply as a linear layer sequence, selecting only one segmentation point to divide the model into two parts. This approach ignores that the deep learning network is actually a complex graph structure, and complex data dependencies exist between different layers. The erroneous segmentation may result in a large amount of intermediate data to be transmitted between the devices, resulting in network congestion. Second, in practical applications, the edge devices may go offline for various reasons (e.g., power exhaustion, network interruption, hardware failure). When a device fails, the computing tasks originally assigned to the device cannot be completed, possibly resulting in failure of the entire reasoning process. While some approaches propose improving reliability by replicating critical computations, there is still a lack of systematic solutions on how to choose which parts to replicate under limited memory and energy budget, and how to arrange the deployment locations of the original and the copies. Furthermore, existing approaches tend to focus on only a single optimization objective, such as minimizing delay or reducing traffic, while ignoring the trade-off between multiple objectives. In practical application, a plurality of factors such as calculation load balancing, communication overhead, reliability guarantee and the like need to be considered at the same time. In recent years, graph neural networks and attention mechanisms have shown great capability in analyzing complex relational data. The dependency relationships between layers in the deep learning model, as well as the association between model structure and deployment environment, can be better understood through the graph attention network. Meanwhile, based on various indexes such as weight analysis, gradient sensitivity, output change and the like, the importance degree of different neurons in the neural network on the final prediction result can be estimated. If the technologies can be combined, the learning capability of the graph attention network on the structure dependency relationship can be fully utilized, the copying decision can be guided based on the multidimensional importance index, and a reasonable graph partition mapping algorithm is matched, so that a stable, efficient and easily deployed collaborative reasoning scheme can be realized. Disclosure of Invention Based on the background, the following key technical problems need to be solved: Question 1 how is the importance of each layer in a neural network accurately assessed? Deep learning models typically contain tens or even hundreds of layers, with different layers contributing to the final prediction result to a large extent. A set of evaluation system needs to be established, which can consider the characteristics (such as weight, sensitivity to input change) of each layer and the structure dependency relationship between layers. More importantly, such assessment needs to be able to reflect the extent to which overall performance is affected when a layer fails in an actual deployment environment. Question 2 how to implement an optimal replication strategy under resource constraints? The memory and power consumption of the edge devices are limited and it is not possible to copy parts of the model without limitation. Therefore, there is a need to design an intelligent selection mechanism that preferentially replicates those portions of the limited resource budget that are most critical to system reliability. Meanwhile, the deployment positions of the original and the copy also need to be considered, so that even if some devices fail, the system still works normally, and the load distribution is relatively balanced. Question 3 how is a dynamically changing network environment adapted? Ed