EP-3970076-B1 - METHOD FOR DEEP NEURAL NETWORK COMPRESSION
Inventors
- SENN, Melanie
Dates
- Publication Date
- 20260506
- Application Date
- 20200514
Claims (4)
- A computer-implemented method for compressing at least one layer of a deep neural network, DNN, wherein the DNN has at least one convolutional layer with a plurality of filters and is configured for processing and analyzing images captured by at least one sensor (162) of a vehicle (158) in order to identify objects and/or features in the environment of the vehicle (158), comprising: training (100) the DNN by a processor; calculating similarity measures between the filters of the at least one convolutional layer of the trained DNN; removing one or more of the filters in the convolutional layer based on the calculated similarity measures; clustering (102), by the processor, remaining ones of the filters of the at least one layer; applying (104) dimension reduction, by the processor, on the clustered filters of the at least one layer; after the applying dimension reduction step (104), determining (106) whether a data storage size of the DNN has been compressed to a predefined threshold, wherein the clustering step and the applying dimension reduction step are iteratively performed based on whether the data storage size of the DNN has been compressed to the predefined threshold; after the determining whether a data storage size of the DNN has been compressed to a predefined threshold step (106), determining (108) whether the DNN meets a predefined level of accuracy; retraining (110), by the processor, the DNN until the predefined level of accuracy is met; and storing the retrained DNN in a storage device.
- The computer-implemented method of claim 1 further comprising the step, after the storing step, of transmitting the retrained DNN to a mobile device.
- The computer-implemented method of claim 1 or 2 wherein the mobile device is a vehicle and wherein the retrained DNN is transmitted via an over-the-air wireless broadband network to the vehicle.
- The computer-implemented method of claim 1, 2 or 3, wherein if a predefined number of consecutive retrains of the DNN is reached, then iteratively apply the clustering step and the applying dimension reduction step.
Description
Field The present disclosure relates to methods, apparatuses, and systems for a deep neural network and, more particularly, to compressing the deep neural network. Background Deep neural networks ("DNNs") are widely used for many artificial intelligence ("AI") applications including computer vision, autonomous vehicles, speech recognition, language translations, advertising, cancer detection, and robotics. While DNNs deliver state-of-the-art accuracy for many tasks, DNNs require a high amount of computational processing power and a high amount of data storage to perform those tasks. Particularly, in mobile applications (e.g., for mobile phones, tablets, and vehicles), transmitting DNN model updates over the air via a wireless broadband communication (e.g., LTE) can be challenging due to limited bandwidth and costs associated with such wireless broadband communication. Thus, for computing devices, such as mobile phones, laptops, internet of things ("IOTs"), in-vehicle computing systems, etc., the processing requirements and storage requirements of a DNN are limiting factors as to whether such computing device can locally process such DNN. Furthermore, if there are multiple DNNs running simultaneously on the computing system, it becomes even more imperative to manage the storage size and complexity of the DNNs. Therefore, it is important to develop techniques that enable compression of DNNs to improve processing efficiency, reduce data storage without sacrificing application accuracy or increasing hardware cost, and/or reduce transmission size of the DNN or updates thereof. A filter level pruning method based on similar feature extraction for compressing and accelerating the convolutional neural networks by k-means++ algorithm is disclosed in Lianqiang Li et al: "Filter Level Pruning Based on Similar Feature Extraction for Convolutional Neural Networks", IEICE Transactions on Information and Systems, 1 January 2018, Vol.: E101.D, Nr.: 4, Pages: 1203-1206, XP055720613, JP, ISSN 0916-8532, DOI: 10.1587/transinf.2017EDL8248. Dong Wang et al:, "Exploring Linear Relationship in Feature Map Subspace for ConvNets Compression", arXiv.org, Cornell University Library, 201 Olin Library Cornell University Ithaca, NY 14853, 15 March 2018, XP080864876 propose a filter pruning method to compress and accelerate CNNs which is based on the linear relationship identified in different feature map subspaces via visualization of feature maps. Zhang Qingyang et al: "OpenVDAP: An Open Vehicular Data Analytics Platform for CAVs", 2018 IEEE 38th International Conference on Distributed Computing Systems (ICDCS), 2 July 2018, Pages:1310-1320, XP033375981, DOI: 10.1109/ICDCS.2018.00131 propose an Open Vehicular Data Analytics Platform (OpenVDAP) for Connected and Autonomous vehicles (CAVs), which is a full-stack edge based platform including an on-board computing/communication unit, an isolation-supported and security & privacy-preserved vehicle operation system, as well as an edge-aware application library. Brief Description of the Figures Figure 1 illustrates a simplified example of a deep neural network having a hidden layer.Figure 2 illustrates a simplified example of a convolutional neural network having a plurality of convolutional layers.Figure 3 illustrates a three-dimensional representation of a plurality of filters for a convolutional layer of a CNN.Figure 4 illustrates a block diagram of an embodiment for filter similarity pruning of the present disclosure.Figure 5 illustrates a flow chart of another example embodiment for filter similarity pruning of the present disclosure.Figure 6 illustrates an example embodiment for dimension reduction using a principal component analysis ("PCA").Figure 7 illustrates a graphical representation for clustering in accordance with at least one embodiment.Figure 8 illustrates a simplified block diagram of a vehicle in accordance with an example embodiment. Detailed Description of the Embodiments The figures and descriptions provided herein may have been simplified to illustrate aspects that are relevant for a clear understanding of the herein described devices, systems, and methods, while eliminating, for the purpose of clarity, other aspects that may be found in typical devices, systems, and methods. Those of ordinary skill in the art may recognize that other elements and/or operations may be desirable and/or necessary to implement the devices, systems, and methods described herein. Because such elements and operations are well known in the art, and because they do not facilitate a better understanding of the present disclosure, a discussion of such elements and operations may not be provided herein. Neural networks are machine-learning models that employ one or more layers of neurons (also referred to as "nodes") to generate an output, e.g., a classification, for a received input. Some neural networks include one or more hidden layers in addition to an output layer. The output of each hidden