US-12619863-B2 - Deep neural network based on flash analog flash computing array

US12619863B2US 12619863 B2US12619863 B2US 12619863B2US-12619863-B2

Abstract

A deep neural network based on analog FLASH computing array, includes a number of computing arrays, a number of subtractors, a number of activation circuit units and a number of integral-recognition circuit units. The computing array includes a number of computing units, a number of word lines, a number of bit lines and a number of source lines. Each of the computing units includes a FLASH cell. The gate electrodes of the FLASH cells in the same column are connected to the same word line. The source electrodes of the FLASH cells in the same column are connected to the same source line, and the drain electrodes of the FLASH cells in the same row are connected to the same bit line. Each of the subtractors includes a positive terminal, a negative terminal and an output terminal.

Inventors

Peng Huang
Guihai YU
Jinfeng Kang
Yachen Xiang
Xiaoyan Liu
Lifeng LIU

Assignees

PEKING UNIVERSITY

Dates

Publication Date: 20260505
Application Date: 20191231
Priority Date: 20190722

Claims (5)

1 . A deep neural network based on analog FLASH computing array, comprising: a plurality of computing arrays, a plurality of subtractors, a plurality of activation circuit units and a plurality of integral-recognition circuit units; wherein each of the computing arrays comprises a plurality of computing units comprising a plurality of FLASH cells, a plurality of word lines, a plurality of bit lines and a plurality of source lines; each of the computing units comprises a FLASH cell, wherein each of the computing arrays comprises an array of rows and columns such that the plurality of FLASH cells are arranged in the rows and the columns, and gate electrodes of the FLASH cells in the same column are connected to the same word line, source electrodes of the FLASH cells in the same column are connected to the same source line, and drain electrodes of the FLASH cells in the same row are connected to the same bit line; each of the subtractors comprises a positive terminal, a negative terminal and an output terminal, the positive terminal and the negative terminal are respectively connected to two adjacent bit lines, and the output terminal is connected to an input terminal of an activation circuit or an integral-recognition circuit, wherein the number of the word lines corresponds to the number of columns in the computing array, and the word lines are configured to apply control signals to the gate electrodes of the FLASH cells, so as to control the FLASH cells to participate in an operation or not, wherein the number of the source lines corresponds to the number of columns in the computing array, and the source lines are configured to apply input signals to the source electrodes of the FLASH cells, the input signals are the analog voltages representing element values of DNN input vectors, the analog voltages are arranged in a row and input to the source electrodes of the FLASH cells in each column through the corresponding source lines, wherein the number of the bit lines corresponds to the number of rows in the computing array, the bit lines are configured to output the signals of the drain electrodes of the FLASH cells, and each of the bit lines is configured to superimpose the drain signal of the FLASH cells in one row and to output the superimposed drain signal as an output signal, wherein the plurality of computing arrays forms convolutional layers, wherein threshold voltages of the FLASH cells in the convolutional layers represent elements in a weight matrix, which are set by pre-programming, wherein the convolutional layers are implemented by setting idle FLASH cells to a threshold voltage state by setting only k*k FLASH cells corresponding to a size of a convolution kernel on the two bit lines connected to each of the plurality of subtractors, and a shift operation of the convolution kernel is implemented by a relative shift of the threshold voltage arrangement between the two bit lines connected to each of the plurality of subtractors in a respective computing array.
2 . The deep neural network of claim 1 , wherein the plurality of computing array forms fully connected layers, threshold voltages of the FLASH cells represent elements in a weight matrix and are set by pre-programming.
3 . The deep neural network of claim 1 , further comprising integral-recognition circuits and activation circuits, wherein the output terminal of each of the subtractors is connected to the integral-recognition circuit or the activation circuit.
4 . The deep neural network of claim 1 , further comprising pooling layers, which achieve pooling functions by connecting the output terminals of the subtractors together.
5 . The deep neural network of claim 1 , wherein the FLASH cell comprises a floating gate memory, a split gate memory, a charge trap memory, or an embedded flash memory device.

Description

TECHNICAL FIELD The present disclosure relates to a field of semiconductor device and integrated circuit, in particular to a deep neural network based on analog FLASH computing array. BACKGROUND Deep neural network (DNN) has an excellent performance in speech recognition, image recognition, etc., so it is widely used in the construction of the modern artificial intelligence system. The development of deep neural network has put forward a stringent requirement on the energy efficiency and hardware consumption of data operations. For the conventional computing architecture, due to the data transmission bottleneck between the central processing unit (CPU) and the memory, the computing speed is greatly restricted meanwhile the energy consumption and hardware requirements are also very huge. SUMMARY According to an aspect of the present disclosure, a deep neural network based on analog FLASH computing array is provided, which includes: a plurality of computing arrays, a plurality of subtractors, a plurality of activation circuit units and a plurality of integral-recognition circuit units. The computing array includes a plurality of computing units, a plurality of word lines, a plurality of bit lines and a plurality of source lines. Each of the computing units includes a FLASH cell. The gate electrodes of FLASH cells in each column are connected to the same word line. The source electrodes of the FLASH cells in each column are connected to the same source line. The drain electrodes of the FLASH cells in each row are connected to the same bit line. Each of the subtractors includes a positive terminal, a negative terminal and an output terminal. The positive terminal and the negative terminal are respectively connected to two adjacent bit lines. The output terminal is connected to the input terminal of an activation circuit or an integral-recognition circuit. The present disclosure provides a deep neural network based on analog FLASH computing array. The threshold voltage of a FLASH cell in the computing array is set according to the corresponding weight value in the deep neural network. The analog matrix-vector multiplication and other operations are performed using the computing array, so as to construct hidden layers such as a convolutional layer, a pooling layer and a fully connected layer in the deep neural network. In order to make the above objectives, features and advantages of the present disclosure more apparent, preferred embodiments combined with accompanying drawings are described in detail below BRIEF DESCRIPTION OF THE DRAWINGS In order to more clearly illustrate the technical solutions in the embodiments of the present disclosure, the accompanying drawings required in the description of the embodiments are briefly introduced below. It should be understood that the following accompanying drawings only show some embodiments of the present disclosure, and therefore should not be considered as limiting the scope. For those skilled in the art, other related drawings may be obtained from these accompanying drawings without inventive effort. FIG. 1 illustrates a schematic diagram of implementing a deep neural network by using analog FLASH computing array. FIG. 2 illustrates a schematic diagram of the relationship between the voltage Vds and the drain current Id of the FLASH cell. Vds is the voltage applied between the source and drain electrodes of FLASH cell. FIG. 3 illustrates a schematic diagram of the signal input and output of analog FLASH computing unit. FIG. 4 illustrates a schematic diagram of achieving an example matrix-vector multiplication operation by using analog FLASH computing array. FIG. 5 illustrates a schematic diagram of implementing a convolutional layer, a pooling layer and a fully connected layer by using analog FLASH computing array. DETAILED DESCRIPTION OF EMBODIMENTS The analog FLASH computing array may effectively perform the matrix-vector multiplication operations and accelerate the operation at hardware level for the deep neural network. The analog FLASH computing array may also greatly reduce the use of analog-to-digital or digital-to-analog conversion circuit (ADC or DAC) in peripheral circuit, which may greatly improve the utilization efficiency of energy and hardware resources. Therefore, the analog FLASH computing array has great significance to the hardware realization of artificial intelligence in the future. The present disclosure provides a deep neural network based on analog FLASH computing array, which can mainly implement: (1) computing in memory; (2) achieving analog computations based on FLASH cells; and (3) implementing the deep neural network system based on the analog FLASH computing array. The present disclosure provides a deep neural network based on analog FLASH computing array, in which the threshold voltage of a FLASH cell can be adjusted by programming/erasing. During the programming, hot electrons are injected into a floating gate, and the threshold volta