CN-113853600-B - Hybrid analog-to-digital matrix processor

CN113853600BCN 113853600 BCN113853600 BCN 113853600BCN-113853600-B

Abstract

Techniques for matrix operations to compute arbitrarily large matrices on a mixed analog-digital matrix processor of finite size are described. Techniques for gain adjustment in a limited-sized hybrid analog-to-digital matrix processor are described that enable the system to achieve higher energy efficiency, greater physical density, and improved numerical accuracy. In some embodiments, these techniques maximize the prediction accuracy of GEMM-based convolutional neural networks using low-accuracy data representations.

Inventors

Taylor J. Kenny
Martin B. Z. Forsyth
Tommy Razovich
Dalus Bunandar

Assignees

轻物质公司

Dates

Publication Date: 20260508
Application Date: 20200225
Priority Date: 20190226

Claims (20)

1.A hybrid analog-to-digital processor comprising: A circuit comprising an analog processor, wherein the circuit is configured to perform a mathematical operation using a plurality of paths, wherein for each of the plurality of paths, the circuit is configured to: Determining one or more scaling factors for the path based on a set of parameters representing a portion of an arbitrary matrix, wherein the one or more scaling factors are configured to scale data based on a dynamic range of the analog processor; Scaling at least some parameters of the parameter set based on the one or more scaling factors to produce a scaled parameter set; programming the analog processor based on the scaled parameter set; Generating a plurality of input analog signals based on the input data set; Generating a plurality of output analog signals based on the plurality of input analog signals and the scaled parameter set; Generating a partial output data set based on the plurality of output analog signals, and Scaling the partial output data set based on the one or more scaling factors to produce a scaled partial output data set, Wherein the circuitry is further configured to generate an accumulated output dataset by accumulating the scaled partial output datasets generated by at least two of the plurality of pathways, wherein the accumulated output dataset represents a result of the mathematical operation.
2. The hybrid analog-to-digital processor of claim 1, wherein generating a plurality of output analog signals based on the plurality of input analog signals and the scaled parameter set comprises performing a matrix-to-matrix multiplication based on the plurality of input analog signals and the scaled parameter set.
3. The hybrid analog-to-digital processor of claim 1, wherein generating a plurality of output analog signals based on the plurality of input analog signals and the scaled parameter set comprises performing a convolution based on the plurality of input analog signals and the scaled parameter set.
4. The hybrid analog-to-digital processor of claim 1, wherein programming the analog processor based on the scaled parameter set comprises setting respective gains or attenuators for a plurality of analog amplifiers or attenuators of the analog processor based on the scaled parameter set.
5. The hybrid analog-to-digital processor of claim 1, wherein the analog processor comprises a photonic processor comprising a plurality of programmable photonic devices, and wherein programming the analog processor based on the scaled parameter set comprises setting respective characteristics for the plurality of programmable photonic devices based on the scaled parameter set.
6. The hybrid analog-to-digital processor of claim 5, wherein the programmable photonic device comprises a mach-zehnder interferometer, and wherein setting respective characteristics for the plurality of programmable photonic devices based on the scaled parameter set comprises: and setting corresponding optical characteristics for the Mach-Zehnder interferometers based on the scaled parameter sets.
7. The hybrid analog-to-digital processor of claim 1, wherein accumulating the scaled partial output data set generated by at least two of the plurality of lanes comprises: For at least some of the paths, adding the scaled partial output data set generated by a path to the scaled partial output data set generated by a previous path.
8. A hybrid analog-to-digital processor configured to perform mathematical operations, comprising: A circuit comprising an analog processor and an analog scaling unit, wherein the circuit is configured to: Generating a plurality of input analog signals based on the input data set; setting a gain of the analog scaling unit based on one or more scaling factors, wherein the one or more scaling factors are configured to scale data based on a dynamic range of the analog processor; programming the analog processor with a set of parameters representing an arbitrary matrix; Generating a plurality of output analog signals based on the plurality of input analog signals and the parameter set; Generating a plurality of amplified or attenuated output analog signals by amplifying or attenuating the plurality of input analog signals and/or the plurality of output analog signals using the analog scaling unit, and An output data set is generated based on the plurality of amplified or attenuated output analog signals.
9. The hybrid analog-to-digital processor of claim 8, wherein the analog scaling unit comprises an analog amplifier or attenuator.
10. The hybrid analog-to-digital processor of claim 8, wherein the hybrid analog-to-digital processor is further configured to perform a multi-pass computation based on the mathematical operation, wherein the hybrid analog-to-digital processor is further configured to: Setting the gain of the analog scaling unit to a first value during a first pass of the multi-pass calculation, and During a second pass of the multi-pass calculation, a gain of the analog scaling unit is set to a second value different from the first value.
11. The hybrid analog-to-digital processor of claim 8, wherein generating a plurality of output analog signals based on the plurality of input analog signals and the parameter set comprises performing a matrix-to-matrix multiplication based on the plurality of input analog signals and the parameter set.
12. The hybrid analog-to-digital processor of claim 8, wherein generating a plurality of output analog signals based on the plurality of input analog signals and the parameter set comprises performing a convolution based on the plurality of input analog signals and the parameter set.
13. The hybrid analog-to-digital processor of claim 8, wherein the circuit comprises a plurality of analog-to-digital converters (ADCs), and the plurality of ADCs are configured to generate the output data set based on the plurality of output analog signals.
14. The hybrid analog-to-digital processor of claim 13, wherein the plurality of ADCs comprises an n-bit ADC, where n is equal to or less than 12.
15. The hybrid analog-to-digital processor of claim 8, wherein the circuitry is further configured to determine the one or more scaling factors based on the parameter set and the input data set.
16. A method for performing a mathematical operation, the method comprising: Generating a plurality of input analog signals based on the input data set; Setting a gain of the analog scaling unit based on one or more scaling factors, wherein the one or more scaling factors are configured to scale the data based on a dynamic range of the analog processor; programming the analog processor with a set of parameters representing an arbitrary matrix; Generating a plurality of output analog signals based on the plurality of input analog signals and the parameter set; Generating a plurality of amplified or attenuated output analog signals by amplifying or attenuating the plurality of input analog signals and/or the plurality of output analog signals using the analog scaling unit, and An output data set is generated based on the plurality of amplified or attenuated output analog signals.
17. The method of claim 16, wherein the analog scaling unit comprises an analog amplifier or attenuator.
18. The method of claim 16, wherein the method further comprises performing a multi-pass computation based on the mathematical operation using a hybrid analog-to-digital processor, wherein hybrid analog-to-digital processor comprises circuitry further configured to: Setting the gain of the analog scaling unit to a first value during a first pass of the multi-pass calculation, and During a second pass of the multi-pass calculation, a gain of the analog scaling unit is set to a second value different from the first value.
19. The method of claim 16, wherein generating a plurality of output analog signals based on the plurality of input analog signals and the parameter set comprises performing a matrix-to-matrix multiplication based on the plurality of input analog signals and the parameter set.
20. The method of claim 16, wherein generating a plurality of output analog signals based on the plurality of input analog signals and the parameter set comprises performing a convolution based on the plurality of input analog signals and the parameter set.

Description

Hybrid analog-to-digital matrix processor Cross Reference to Related Applications The present application claims priority to U.S. provisional patent application serial No. 62/810851, filed 26 at 2.2019 under the heading "GENERAL MATRIX MULTIPLICATION WITH SUB-ARRAY TILING AND MULTIPLE SCALING FOR HYBRID ANALOG-DIGITAL MATRIX PROCESSORS",, attorney docket No. L0858.70011US00, which is incorporated herein by reference in its entirety, in accordance with 35u.s.c. ≡119 (e). Technical Field The present disclosure relates to a hybrid analog-to-digital processor configured to perform mathematical matrix operations. Background Deep learning, machine learning, latent variable models, neural networks, and other matrix-based micro-programs are used to solve various problems, including natural language processing and object recognition in images. Solving these problems using deep neural networks typically requires long processing times to perform the required calculations. The most computationally intensive operations in solving these problems are typically mathematical matrix operations, such as general matrix multiplication or multi-channel convolution. A conventional approach to accelerate deep learning algorithms is to develop specialized hardware architectures. This is because conventional computer processors, such as Central Processing Units (CPUs), which are composed of circuits including hundreds of millions of transistors to implement logic gates on information bits represented by electrical signals, are designed for general purpose computing and are therefore not optimized for the particular pattern of data movement and computation required for deep learning and other matrix-based micro-programmable algorithms. One conventional example of dedicated hardware for deep learning is a Graphics Processing Unit (GPU) that has a highly parallel architecture, making them more efficient than a CPU in performing image processing and graphics operations. After their development for graphics processing, GPUs have been found to be more efficient than CPUs on other parallelizable algorithms, such as those used for neural networks and deep learning. Deep learning using neural networks typically requires two phases, a training phase and an evaluation phase (sometimes referred to as "inference (inference)"). Before the deep learning algorithm can be meaningfully executed on the processor (e.g., to classify an image or speech sample) during the evaluation phase, the neural network must first be trained. The training phase can be very time consuming and requires a large amount of computation. Disclosure of Invention Some embodiments relate to a hybrid analog-to-digital processor comprising a circuit comprising an analog processor, wherein the circuit is configured to perform a mathematical operation using a plurality of lanes, wherein for each of the plurality of lanes the circuit is configured to determine one or more scaling factors for the lane based on a set of parameters representing a portion of a matrix, scale at least some parameters of the set of parameters based on the one or more scaling factors to produce a scaled set of parameters, program the analog processor based on the scaled set of parameters, generate a plurality of input analog signals based on the set of input data, generate a plurality of output analog signals based on the plurality of input analog signals and the scaled set of parameters, generate a partial output data set based on the plurality of output analog signals, and scale the partial output data set based on the one or more scaling factors to produce a scaled partial output data set, wherein the circuit is further configured to generate an accumulated output data set by accumulating the scaled partial output data set generated by at least two of the plurality of lanes, wherein the accumulated output data set represents a result of the mathematical operation. In some embodiments, generating the plurality of output analog signals based on the plurality of input analog signals and the scaled parameter set includes performing a matrix-to-matrix multiplication based on the plurality of input analog signals and the scaled parameter set. In some embodiments, generating the plurality of output analog signals based on the plurality of input analog signals and the scaled parameter set includes performing convolution based on the plurality of input analog signals and the scaled parameter set. In some embodiments, programming the analog processor based on the scaled parameter set includes setting respective gains or attenuations for a plurality of analog amplifiers or attenuators of the analog processor based on the scaled parameter set. In some embodiments, the analog processor includes a photonic processor including a plurality of programmable photonic devices, and wherein programming the analog processor based on the scaled parameter set includes setting corresponding characteristics for the plurality of p