CN-115762598-B - Ping-pong type memory computing circuit based on capacitive coupling

CN115762598BCN 115762598 BCN115762598 BCN 115762598BCN-115762598-B

Abstract

The invention discloses a ping-pong type memory computing circuit based on capacitive coupling, and belongs to the field of special integrated circuit design. The circuit comprises a calculation array formed by ping-pong memory calculation units, an accuracy-adjustable quantization circuit depending on sparsity, a decoding circuit, a read-write driving circuit, a time sequence control circuit and a shift addition circuit. Compared with the traditional macro unit for accelerating the neural network, the design can update the weight value in the static random access memory array while realizing multiply-accumulate operation, so that the calculated throughput rate can be greatly improved when the acceleration of the network with more weight values is completed.

Inventors

SI XIN
WANG YUFEI

Assignees

东南大学

Dates

Publication Date: 20260512
Application Date: 20221107

Claims (6)

1. The ping-pong type memory computing circuit based on capacitive coupling is characterized by comprising a capacitive coupling ping-pong type memory computing array and an accuracy adjustable quantization array depending on sparseness, wherein weight values are stored in the capacitive coupling ping-pong type memory computing array, input characteristic values are sent to the array, multiplication and accumulation analog values obtained in each column in the computing array are converted into digital signals through corresponding quantization circuits and then sent to a peripheral shift and addition circuit to obtain final computing values, and meanwhile, the weight values required to be updated are written into a static random access memory in the computing array through a read-write driving circuit; The capacitive coupling ping-pong in-memory computing array structure CPPCIM is composed of 64 computing columns, each computing column comprises 32 CPPCIM blocks, and each CPPCIM block is composed of 1 upper multiplying unit UBMU, 1 lower multiplying unit DBMU and 1 ping-pong computing unit PPCC; the ping-pong computation unit PPCC is placed between the upper and lower multiplication units UBMU and DBMU, the upper and lower multiplication units UBMU and DBMU are composed of 4 compact 6T static random access memories and 4 NMOS transistors M1-M4 for gating, wherein the gates of the first NMOS transistor (M1), the second NMOS transistor (M2) are connected to the write gating (USEL) signal, the gates of the third NMOS transistor (M3) and the fourth NMOS transistor (M4) are connected to the compute gating (CSEL) signal, the source and drain of the first NMOS transistor (M1) and the second NMOS transistor (M2) are connected to the global bit lines (GBL, GBLB) and the local bit lines (LBL, LBLB), the source and drain of the third NMOS transistor (M3) and the fourth NMOS transistor (M4) are connected to the local bit lines (LBL, LBLB) and the compute bit lines (CBL, CBLB), the ping-pong computation unit PPCC is composed of a fifth NMOS transistor (M5) and a PMOS transistor (M6) and a gate (CBL), a value is connected to the input node (Cb) and a capacitor (Cb) is connected to the input node (Cb) of the fifth NMOS transistor (M6) and the input node (Cb) respectively, the grid electrode of the seventh NMOS transistor (M7) is connected to the calculation bit lines (CBL, CBLB), the source and drain are connected with the internal node (Vc) and the ground, and the upper and lower polar plates of the metal layer MOM capacitor (Cc) are respectively connected to the internal node (Vc) and the multiply-accumulate bit line (MBL).
2. The invention discloses a ping-pong type in-memory computing circuit based on capacitive coupling, which is characterized by comprising an input sensing detector and a unilateral successive approximation type quantizing circuit (SAR ADC), wherein the input sensing detector obtains the number of 0 in 32 input characteristic value signals (IA) in each period, the number is compared with a threshold value set in advance to generate corresponding precision configuration control signals, the corresponding precision configuration control signals are sent to a quantizer, the unilateral successive approximation type quantizing circuit comprises a capacitive DAC array, an SAR logic control quantizing circuit and a sum shift register of a dynamic latch amplifier, the capacitive DAC array adopts metal layer MOM capacitors, capacitance values are respectively 1C, 2C, 4C, 8C and 16C, a capacitance upper plate is commonly connected to N poles of a differential input end of the amplifier, a capacitance lower plate is respectively connected to reference voltage VREF1 and VREF2 through a switch circuit, the differential input end P poles of the dynamic latch amplifier is connected with a multiplication bit line MBL obtained in the computing array, the output of the amplifier is connected to the shift register, the shift register is controlled by the control signals, and the single-multiplication bit-shift operation is completed, and the addition of the multiplication bit-shift register is carried out by the single-multiplication operation.
3. The capacitive coupling-based ping-pong memory computing circuit of claim 1, wherein the ping-pong memory computing circuit supports four modes of operation including reading, writing, computing and updating weights simultaneously, when computing operation is implemented, 32 single-bit input characteristic values are simultaneously sent to computing units in each row in one period, and a multiply-accumulate bit line (MBL) and an input characteristic value signal (IA) of each computing row are pulled to ground through self-timed asynchronous reset signals at the beginning of the computing period, so that the charge amount on a computing capacitor in the computing unit is cleared, then the computing is performed, and finally quantization is performed.
4. The capacitive coupling-based ping-pong memory computing circuit of claim 2, wherein the threshold of the input perception detector is that a specific neural network is trained by external software, sparseness of input characteristic values of each layer in the network is counted, a threshold for configuring quantization precision is obtained, and accuracy is ensured while quantization power consumption and delay overhead are reduced.
5. The capacitive coupling-based ping-pong in-memory computation circuit of claim 4, wherein the input sense detector generates a corresponding precision configuration control signal by comparing with a threshold value, the quantizer circuit quantizes values of different bit numbers according to the precision configuration control signal, the SAR logic control quantization circuit performs a conventional full-precision binary approximation quantization operation if the precision is configured to be full-precision 5 bit, the SAR logic control quantization circuit skips quantization of high 2bit if the precision is configured to be not full-precision 3 bits, namely, the high 2 bits are directly 0, and directly performs quantization of low 3bit, and the quantization result is directly 0 if the precision is configured to be unnecessary, without starting the quantizer.
6. The capacitive coupling-based ping-pong memory computing circuit of claim 1, wherein the sparsity-dependent precision-adjustable quantization array adopts asynchronous SAR logic, that is, quantization operations can be completed in the same clock cycle no matter how many bits are quantized, different quantization precision configurations can affect the working frequency, and the higher the sparsity of the input characteristic value is, the smaller the quantization power consumption and delay cost are.

Description

Ping-pong type memory computing circuit based on capacitive coupling Technical Field The invention discloses a ping-pong type memory computing circuit based on capacitive coupling, and belongs to the technical field of special integrated circuit design. Background In recent years, artificial intelligence is rapidly developed, and is deeply fused with various application scenes in various fields, so that great convenience is brought to the aspects of life. Convolutional neural networks (Convolutional Neural Network, CNN) have become an important technology for machine learning and are widely used in image recognition and voice-video recognition in the field of artificial intelligence. In the implementation of deep learning networks, the huge amount of data presents problems of workload, real-time and security. Therefore, the design of low-power-consumption and high-performance AI chips and CNN-specific hardware accelerators is a hot spot for industrial and academic research. In a traditional von neumann computer architecture, a central processor and a memory transmit a large amount of data through a bus, delay and power consumption waste caused by huge data volume and limited data transmission bandwidth become the biggest bottleneck in the traditional architecture. Thus, a unified structure is created. Because the deep neural network needs to execute a large number of multiply-accumulate operations (Multiplication and Accumulation, MAC) on input data and weight data, the characteristics of high multiplexing rate and high parallelism are suitable for being deployed in a calculation-in-Memory (CIM) circuit, the waste of data transmission and energy consumption delay is greatly reduced, the energy efficiency and instantaneity of the system are further improved, and the deep neural network has wide application prospect in an energy-efficient artificial intelligent system. However, as the complexity and data size of deep neural network models increase, the contradiction between the ever larger parameter size and limited on-chip memory capacity exacerbates the need to update weights. Due to the limited write bandwidth of the conventional on-chip memory and the difficulty in simultaneous computation and update, serious weight update overhead is caused, and circuit performance is further reduced. Meanwhile, the existing in-memory computing circuit of the analog domain has the problems of linearity, array internal fluctuation, quantization overhead and the like, so that limited computing precision and parallelism are caused, and the improvement of the in-memory computing CIM energy efficiency is limited. Disclosure of Invention Aiming at overcoming the defects of the background technology, the invention provides a ping-pong type memory computing circuit based on capacitive coupling by taking high energy efficiency and low power consumption as wood targets. The invention provides a capacitive coupling-based ping-pong memory computing circuit, which comprises a capacitive coupling ping-pong memory computing array and an accuracy-adjustable quantization array depending on sparseness, wherein weight values are stored in the capacitive coupling ping-pong memory computing array, input characteristic values are sent to the array, multiplication and accumulation analog values obtained in each column in the computing array are converted into digital signals through corresponding quantization circuits and then sent to a peripheral shift and addition circuit to obtain final computing values, and meanwhile, weight values needing to be updated are written into a static random memory in the computing array through a read-write driving circuit, and ping-pong operation is realized through alternately using an upper multiplication unit UBMU and a lower multiplication unit DBMU, so that the throughput is improved. The capacitive coupling ping-pong memory computing array structure CPPCIM is composed of 64 computing columns, each computing column comprises 32 blocks CPPCIM, each block CPPCIM is composed of 1 upper multiplication unit UBMU, 1 lower multiplication unit DBMU and 1 ping-pong computing unit PPCC, the ping-pong computing units PPCC are placed between the upper multiplication unit UBMU and the lower multiplication unit DBMU, each of the upper multiplication unit UBMU and the lower multiplication unit DBMU is composed of 4 compact 6T static random access memories and 4 NMOS transistors M1-M4 for gating, wherein a first NMOS transistor and a second NMOS transistor are connected with a write gating signal, a third NMOS transistor and a fourth NMOS transistor are connected with a computing gating signal, a first NMOS transistor and a second NMOS transistor source drain are respectively connected with a global bit line and a local bit line, a third NMOS transistor and a fourth NMOS transistor source drain are respectively connected with the local bit line and the computing bit line, each ping-pong computing unit PPCC is composed of