US-12626142-B2 - System and method for automated design space determination for deep neural networks
Abstract
There is provided a system and method of automated design space determination for deep neural networks. The method includes obtaining a teacher model and one or more constraints associated with an application and/or target device or process used in the application configured to utilize a deep neural network; learning an optimal architecture using the teacher model, constraints, a training data set, and a validation data set; and deploying the optimal architecture on the target device or process for use in the application.
Inventors
- Ehsan SABOORI
- Davis Mangan SAWYER
- MohammadHossein ASKARIHEMMAT
- Olivier MASTROPIETRO
Assignees
- DEEPLITE INC.
Dates
- Publication Date
- 20260512
- Application Date
- 20191118
Claims (20)
- 1 . A method of automated design space exploration for deep neural networks, the method comprising: obtaining a teacher model deep neural network; obtaining one or more constraints, the one or more constraints being associated with at least one of: a computer-implemented application, a target computing device or a process used in the computer-implemented application, the computer-implemented application configured to utilize a deep neural network having multiple layers with associated weights and computational parameters; learning an optimal deep neural network using the teacher model deep neural network, the one or more constraints, a training data set, and a validation data set, wherein learning the optimal deep neural network comprises iteratively: generating a new student deep neural network by applying network transformation operations to the teacher model deep neural network, wherein the network transformation operations comprise at least one of: altering a number of filters in a layer, altering a number of nodes in the layer, adding the layer to the teacher model deep neural network, or removing the layer from the teacher model deep neural network, and wherein generating the new student deep neural network further comprises applying function-preserving transformations and reusing weights from the teacher model to maintain functional equivalence prior to retraining with knowledge distillation evaluating the new student deep neural network with a performance estimator, performance criteria, and the training data set, wherein the performance estimator measures at least one of: inference time in milliseconds, memory footprint in megabytes, or computational complexity measured in floating-point operations per second of the new student deep neural network, wherein evaluating the new student deep neural network with the performance estimator comprises estimating performance on the target hardware so as to reduce computational overhead relative to full model retraining; in response to the new student deep neural network failing the performance criteria, bypassing evaluation with the validation data set; in response to the new student deep neural network satisfying the performance criteria, designating the new student deep neural network as being a satisfactory new student deep neural network and evaluating a performance of the satisfactory new student deep neural network with the validation data set; and generating a reward based on the evaluated performance of the satisfactory new student deep neural network, the reward being based on limitations of the target computing device, wherein the reward comprises a compression reward term and an accuracy reward term that are computed to incentivize production of smaller deep neural networks without sacrificing functional accuracy, wherein the reward is further computed based on hardware constraints of the target computing device, and wherein portions of the deep neural network are automatically mapped onto hardware cores of the target computing device maximizing energy efficiency and execution time on the target computing device; wherein the iterations converge, as measured by the reward, on an acceptable iteration of the optimal deep neural network; and deploying the optimal deep neural network on the target computing device or utilizing the optimal deep neural network in the process for use in the computer-implemented application, wherein deploying comprises deploying on a resource-constrained computing device.
- 2 . The method of claim 1 , wherein the new student deep neural network is generated by shrinking or expanding the teacher model deep neural network by altering the network configuration.
- 3 . The method of claim 1 , wherein the reward can be positive or negative according to whether the new student deep neural network is promising or not.
- 4 . The method of claim 1 , wherein the reward comprises one or more terms that reflect a desired accuracy and/or compression rate to incentivize production of smaller deep neural networks without sacrificing functional accuracy.
- 5 . The method of claim 1 , wherein retraining with knowledge distillation comprises the new student deep neural network receiving information from the teacher model deep neural network or a larger previously determined deep neural network.
- 6 . The method of claim 1 , wherein the constraints further comprise at least one of: accuracy, speed, power, target hardware memory.
- 7 . The method of claim 1 , wherein the application is an artificial intelligence-based application.
- 8 . The method of claim 1 , comprising generating at least two student deep neural networks, and wherein generating the at least two student deep neural networks comprises: formulating the at least two student deep neural networks at least in part based on a quantizer that learns the optimal deep neural network using lower precision weights, as compared to an input model, across the at least two student deep neural networks.
- 9 . The method of claim 8 , comprising: implementing one or more mapping algorithms to map the at least two student deep neural networks to the target device, where the mapping algorithms include transformations for different hardware configurations.
- 10 . The method of claim 1 , wherein evaluating the new student deep neural network with the performance estimator comprises estimating, in less than one second, performance on the target hardware so as to reduce computational overhead relative to full model retraining.
- 11 . The method of claim 1 , wherein deploying comprises deploying on a resource-constrained computing device requiring inference latency of less than 10 ms.
- 12 . A non-transitory computer readable medium comprising computer executable instructions for automated design space exploration for deep neural networks, the computer executable instructions comprising instructions for: obtaining a teacher model deep neural network; obtaining one or more constraints, the one or more constraints being associated with at least one of: a computer-implemented application, a target computing device or a process used in the computer-implemented application, the computer-implemented application configured to utilize a deep neural network having multiple layers with associated weights and computational parameters; learning an optimal deep neural network using the teacher model deep neural network, the one or more constraints, a training data set, and a validation data set, wherein learning the optimal deep neural network comprises iteratively: generating a new student deep neural network by applying network transformation operations to the teacher model deep neural network, wherein the network transformation operations comprise at least one of: altering a number of filters in a layer, altering a number of nodes in the layer, adding the layer to the teacher model deep neural network, or removing the layer from the teacher model deep neural network, and wherein generating the new student deep neural network further comprises applying function-preserving transformations and reusing weights from the teacher model to maintain functional equivalence prior to retraining with knowledge distillation; evaluating the new student deep neural network with a performance estimator, performance criteria, and the training data set, wherein the performance estimator measures at least one of: inference time in milliseconds, memory footprint in megabytes, or computational complexity measured in floating-point operations per second of the new student deep neural network, wherein evaluating the new student deep neural network with the performance estimator comprises estimating, in less than one second, performance on the target hardware so as to reduce computational overhead relative to full model retraining; in response to the new student deep neural network failing the performance criteria, bypassing evaluation with the validation data set; in response to the new student deep neural network satisfying the performance criteria, designating the new student deep neural network as being a satisfactory new student deep neural network and evaluating a performance of a satisfactory new student deep neural network with the validation data set; and generating a reward based on the evaluated performance of the satisfactory new student deep neural network, the reward being based on limitations of the target computing device, wherein the reward comprises a compression reward term and an accuracy reward term that are computed to incentivize production of smaller deep neural networks without sacrificing functional accuracy, wherein the reward is further computed based on hardware constraints of the target computing device, and wherein portions of the deep neural network are automatically mapped onto hardware cores of the target computing device maximizing energy efficiency and execution time on the target computing device; wherein the iterations converge, as measured by the reward, on an acceptable iteration of the optimal deep neural network; and deploying the optimal deep neural network on the target computing device or utilizing the optimal deep neural network in the process for use in the computer-implemented application, wherein deploying comprises deploying on a resource-constrained computing device.
- 13 . The non-transitory computer readable medium of claim 12 , further comprising instructions for: wherein evaluating the new student deep neural network with the performance estimator comprises estimating, in less than one second, performance on the target hardware so as to reduce computational overhead relative to full model retraining.
- 14 . The non-transitory computer readable medium of claim 12 , further comprising instructions for: wherein deploying comprises deploying on a resource-constrained computing device requiring inference latency of less than 10 ms.
- 15 . A deep neural network optimization engine configured to perform automated design space exploration for deep neural networks, the engine comprising a processor and memory, the memory comprising computer executable instructions for: obtaining a teacher model deep neural network; obtaining one or more constraints, the one or more constraints being associated with at least one of: a computer-implemented application, a target computing device, or a process used in the computer-implemented application, the computer-implemented application configured to utilize a deep neural network having multiple layers with associated weights and computational parameters; learning an optimal deep neural network using the teacher model deep neural network, the one or more constraints, a training data set, and a validation data set, wherein learning the optimal deep neural network comprises iteratively: generating a new student deep neural network, and wherein generating the new student deep neural network further comprises applying function-preserving transformations and reusing weights from the teacher model to maintain functional equivalence prior to retraining with knowledge distillation; evaluating the new student deep neural network with a performance estimator, performance criteria, and the training data set by applying network transformation operations to the teacher model deep neural network, wherein the performance estimator measures at least one of: inference time in milliseconds, memory footprint in megabytes, or computational complexity measured in floating-point operations per second of the new student deep neural network, wherein the network transformation operations comprise at least one of: altering a number of filters in a layer, altering a number of nodes in the layer, adding the layer to the teacher model deep neural network, or removing the layer from the teacher model deep neural network, wherein evaluating a new student deep neural network with the performance estimator comprises estimating performance on the target hardware so as to reduce computational overhead relative to full model retraining; in response to the new student deep neural network failing the performance criteria, bypassing evaluation with the validation data set; in response to the new student deep neural network satisfying the performance criteria, designating the new student deep neural network as being a satisfactory new student deep neural network and evaluating a performance of a satisfactory new student deep neural network with the validation data set; and generating a reward based on the evaluated performance of the satisfactory new student deep neural network, the reward being based on limitations of the target computing device, wherein the reward comprises a compression reward term and an accuracy reward term that are computed to incentivize production of smaller deep neural networks without sacrificing functional accuracy, wherein the reward is further computed based on hardware constraints of the target computing device, and wherein portions of the deep neural network are automatically mapped onto hardware cores of the target computing device maximizing energy efficiency and execution time on the target computing device; wherein the iterations converge, as measured by the reward, on an acceptable iteration of the optimal deep neural network; and deploying the optimal deep neural network on the target computing device or utilizing the optimal deep neural network in the process for use in the computer-implemented application, wherein deploying comprises deploying on are source-constrained computing device.
- 16 . The engine of claim 15 , further comprising instructions for: transferring knowledge from the teacher model deep neural network to train with a knowledge distillation process.
- 17 . The engine of claim 16 , wherein the reward function is negative for instances of bypassing evaluation for failed student deep neural networks.
- 18 . The engine of claim 16 , wherein the reward function comprises one or more terms that reflect a desired accuracy and/or compression rate to incentivize production of smaller deep neural networks without sacrificing functional accuracy.
- 19 . The engine of claim 16 , wherein the knowledge distillation process comprises the new student deep neural network receiving information from the teacher model deep neural network or a larger previously determined deep neural network.
- 20 . The engine of claim 15 , wherein the new student deep neural network is generated by shrinking or expanding the teacher model deep neural network by altering the network configuration.
Description
CROSS-REFERENCE TO RELATED APPLICATION(S) This application is a National Entry of PCT Application No PCT/CA2019/051642, filed on Nov. 18, 2019, which claims priority to U.S. Provisional Patent Application No. 62/769,403 filed on Nov. 19, 2018, the contents of which are incorporated herein by reference, in their entirety. TECHNICAL FIELD The following relates to systems and methods for automated design space determination for deep neural networks, for example by enabling design space exploration. BACKGROUND The emergence of deep neural networks (DNNs) in recent years has enabled ground-breaking abilities and applications for modern intelligent systems. State-of-the-art DNNs have been found to achieve high accuracy on tasks in computer vision and natural language processing, even outperforming humans on object recognition tasks. Concurrently, the increasing complexity and sophistication of DNNs is predicated on significant power consumption, model size and computing resources. These factors have been found to limit deep learning's performance in real-time applications, in large-scale systems, and on low-power devices. Modern DNN models require as many as billions of expensive floating-point operations for a single input classification. This problem is exacerbated in high-throughput systems that perform millions of inference computations per second, requiring large and expensive Graphics Processing Units (GPUs). Furthermore, many low-end and cost-effective devices do not have the resources to execute DNN inference, causing users to sacrifice privacy and offload processing to the cloud. Furthermore, tasks with strict latency constraints, such as in automotive and mobility applications often require that inference be performed in a matter of milliseconds, often with limited hardware. To address these problems, there has been a significant push in academia and industry to make deep learning models more resource-efficient and applicable for real-time, on-device applications. Many techniques have been proposed for model optimization and inference acceleration, as well as hardware implementations of DNNs. Prior solutions include a variety of core optimization techniques for compressing, accelerating and mapping DNNs on various hardware platforms. The main approach to model optimization is by approximating the original DNN. Techniques include the removal of redundant connections, nodes, filters and layers in the network, also referred to as “pruning”. An alternative approach to optimization is knowledge distillation, whereby a “teacher” network is adapted to produce a smaller, “student” network. However, generally these techniques are implemented manually by a domain expert, relying on heuristics and intensive feature engineering. Additionally, these approaches are often found to sacrifice too much accuracy or limit network performance on complex and large data sets. At present, two fundamental challenges exist with current optimization techniques, namely: 1) that hand-crafted features and domain expertise is required for model optimization, and 2) that time-consuming fine-tuning is often necessary to maintain accuracy. There exists a need for scalable, automated processes for model optimization on diverse DNN architectures and hardware back-ends. Generally, it is found that the current capacity for model optimization is outpaced by the rapid development of new DNNs and disparate hardware platforms that are applicable, yet largely inefficient for deep learning workloads. It is an object of the following to address at least one of the above-mentioned challenges. SUMMARY It is recognized that a general approach that is agnostic to both the architecture and target hardware(s) is needed to optimize DNNs, making them faster, smaller and energy-efficient for use in daily life. The following relates to deep learning algorithms, for example, deep neural networks. A method for automated optimization, specifically design space exploration, is described. The following relates to the design of a learning process to leverage trade-offs in different deep neural network designs using computation constraints as inputs. The learning process trains an optimizer agent to adapt large, initial networks into smaller networks of similar performance that satisfy target constraints in a data-driven way. By design, the learning process and agents are agnostic to both the network architecture and the target hardware platform. In one aspect, there is provided a method of automated design space exploration for deep neural networks, the method comprising: obtaining a teacher model and one or more constraints associated with an application and/or target device or process used in the application configured to utilize a deep neural network; learning an optimal student architecture using the teacher model architecture, constraints, a training data set, and a validation data; and deploying the optimal architecture on the target device or process for u