CN-116601645-B - Finding hardware characteristics of deep learning accelerator via compiler for optimization

CN116601645BCN 116601645 BCN116601645 BCN 116601645BCN-116601645-B

Abstract

Systems, devices, and methods related to deep learning accelerators and memory are described. For example, an integrated circuit device may be configured to execute instructions having matrix operands and configured with random access memory. A computing device running a compiler may interact and/or probe an integrated circuit device to identify hardware characteristics of the integrated circuit device when performing matrix calculations. The compiler may generate and optimize a compilation result from a description of an artificial neural network based at least in part on hardware characteristics of the integrated circuit device. The compilation result may include first data representing parameters of the artificial neural network and second data representing instructions executable by the integrated circuit device to generate an output of the artificial neural network based on the first data and an input to the artificial neural network.

Inventors

A. T. Zaidi
M. Vitez
E. Kuluercello
J. Cummins
A. X. Ming Zhang

Assignees

美光科技公司

Dates

Publication Date: 20260512
Application Date: 20211019
Priority Date: 20201106

Claims (10)

1. A method, comprising: transmitting, by a computing device, one or more commands to an integrated circuit device, each command associated with one or more requests for a response to a hardware characteristic that indicates the integrated circuit device when performing matrix calculations; receiving, at the computing device and after determining the hardware characteristics of the integrated circuit device, data representing a description of an artificial neural network, and Generating, by the computing device, a compilation result from the data representative of the description of the artificial neural network based at least in part on the hardware characteristics of the integrated circuit device, the compilation result including first data representative of parameters of the artificial neural network and second data representative of instructions executable by the integrated circuit device to generate an output of the artificial neural network based on the first data and an input to the artificial neural network.
2. The method of claim 1, wherein the hardware characteristic of the integrated circuit device identifies a feature, option, behavior, performance, or latency of at least one processing unit of the integrated circuit device, or any combination thereof.
3. The method of claim 2, wherein the transmitting the one or more commands to an integrated circuit device includes loading a test program into the integrated circuit device to receive a response to executing the test program in the integrated circuit device.
4. A method according to claim 3, further comprising: The hardware characteristics are determined from the response to executing the test program in the integrated circuit device.
5. A method according to claim 3, further comprising: the hardware characteristic is determined from a timestamp associated with execution of the test program in the integrated circuit device.
6. A method according to claim 3, further comprising: based on the response to executing the test program in the integrated circuit device, a specification of the integrated circuit device is identified among a plurality of predetermined specifications of the integrated circuit device.
7. A method according to claim 3, further comprising: hardware options for a matrix processing unit of the integrated circuit are probed based on the response to executing the test program in the integrated circuit device.
8. A method according to claim 3, further comprising: the test program is generated based on specifications of a plurality of hardware platforms of an integrated circuit device configured to perform matrix calculations.
9. A method according to claim 3, further comprising: The compiled results from the description of the artificial neural network are transformed to improve performance of the results when executed in the integrated circuit device based on the hardware characteristics of the integrated circuit device.
10. A computing device, comprising: memory unit At least one microprocessor coupled to the memory and configured to detect an integrated circuit device to identify a hardware characteristic of the integrated circuit device when performing matrix calculations, and to generate a compilation result from data representative of a description of an artificial neural network based at least in part on the hardware characteristic of the integrated circuit device, the compilation result including first data representative of parameters of the artificial neural network and second data representative of instructions executable by the integrated circuit device to generate an output of the artificial neural network based on the first data and an input to the artificial neural network.

Description

Finding hardware characteristics of deep learning accelerator via compiler for optimization RELATED APPLICATIONS The present application claims priority to U.S. patent application No. 17/092,033 to (DISCOVERY OF HARDWARE CHARACTERISTICS OF DEEP LEARNING ACCELERATORS FOR OPTIMIZATION VIA COMPILER)", whose entire disclosure is hereby incorporated by reference, to find the hardware characteristics of a deep learning accelerator via a compiler, filed on 6, 11, 2020. Technical Field At least some embodiments disclosed herein relate generally to compilers, and more particularly, but not limited to, compilers for generating instructions executable by accelerators for Artificial Neural Networks (ANNs), such as ANNs configured through machine learning and/or deep learning. Background An Artificial Neural Network (ANN) uses a neural network to process inputs to the network and to generate outputs from the network. Deep learning has been used in many application fields such as computer vision, speech/audio recognition, natural language processing, machine translation, bioinformatics, drug design, medical image processing, games, and the like. Drawings The embodiments are illustrated by way of example and not limitation in the figures of the accompanying drawings in which like reference numerals refer to similar elements. FIG. 1 shows an integrated circuit device with a deep learning accelerator and random access memory configured in accordance with one embodiment. FIG. 2 shows a processing unit configured to perform matrix-matrix operations according to one embodiment. FIG. 3 shows a processing unit configured to perform matrix-vector operations according to one embodiment. FIG. 4 shows a processing unit configured to perform vector-vector operations according to one embodiment. FIG. 5 shows a deep learning accelerator and random access memory configured to autonomously apply input to a trained artificial neural network, according to one embodiment. FIG. 6 shows a technique for generating instructions executable by a deep learning accelerator to implement an artificial neural network, according to one embodiment. Fig. 7 and 8 illustrate techniques to map the compilation results of a generic deep learning accelerator to instructions executable by a specific deep learning accelerator to implement an artificial neural network, according to one embodiment. FIG. 9 shows another technique for generating instructions executable by a deep learning accelerator to implement an artificial neural network, according to one embodiment. FIG. 10 shows an integrated circuit device with a deep learning accelerator with configurable hardware capabilities and random access memory configured in accordance with one embodiment. FIG. 11 illustrates different hardware configurations of a processing unit of a deep learning accelerator configurable via options stored in registers, according to one embodiment. FIG. 12 illustrates a technique for generating instructions executable by a deep learning accelerator with an optimized hardware configuration to implement an artificial neural network, according to one embodiment. FIG. 13 shows a technique for discovering hardware characteristics of a deep learning accelerator, according to one embodiment. FIG. 14 illustrates a technique for generating instructions executable by a deep learning accelerator and optimized according to hardware characteristics of the deep learning accelerator, according to one embodiment. FIG. 15 shows a method for compiling instructions to implement an artificial neural network on a deep learning accelerator based on hardware characteristics of the deep learning accelerator, according to one embodiment. FIG. 16 shows a block diagram of an example computer system in which embodiments of the present disclosure may operate. Detailed Description At least some embodiments disclosed herein provide integrated circuits to implement the computation of Artificial Neural Networks (ANNs) with reduced energy consumption and computation time. The integrated circuit device is programmable. A compiler may be used to generate instructions executable in an integrated circuit device from a description of an Artificial Neural Network (ANN). The instructions, when executed in the device, cause the integrated circuit device to perform a calculation of an Artificial Neural Network (ANN). The compiler may discover hardware characteristics (e.g., capabilities and behaviors) of a Deep Learning Accelerator (DLA) and use the discovered hardware characteristics to optimize the generated instructions to implement computation of an Artificial Neural Network (ANN) on the Deep Learning Accelerator (DLA). For example, an integrated circuit device may include a Deep Learning Accelerator (DLA) and random access memory. The random access memory is configured to store parameters of an Artificial Neural Network (ANN) and instructions having matrix operands. The instructions stored in the random access memory may