CN-115691608-B - In-memory computing circuit, in-memory writeable multiplication computing circuit and chip
Abstract
The present invention relates to the field of in-memory computing technology, and in particular, to an in-memory computing circuit, an in-memory writeable multiplication computing circuit, and a chip. The IN-memory computing circuit comprises a weight layer, a computing layer, a first storage layer and a second storage layer which are sequentially arranged from top to bottom, wherein the weight layer is used for storing binary weights, the computing layer is used for carrying out multiplication operation on the externally input binary weights and the binary weights stored IN the weight layer, the first storage layer is used for storing high four-bit operation results, the second storage layer is used for storing low four-bit operation results, and when the IN-memory computing circuit executes multiplication operation, the multiplication operation of the four-bit weights input by an input signal line IN_B and the four-bit weights stored IN the weight layer is split into four-period addition operation, and the operation results are stored IN the first storage layer and the second storage layer. The circuit of the invention introduces the multiplication from the operation based on the analog domain to the operation based on the digital domain, and restores the operation result, thereby avoiding the problems encountered by the multiplication of the analog domain.
Inventors
- LIN ZHITING
- ZHOU YONGLIANG
- ZHANG SHAOYING
- WU XIULONG
- PENG CHUNYU
- LI XIN
- HAO LICAI
- LIU YU
- ZHAO QIANG
- LU WENJUAN
Assignees
- 安徽大学
Dates
- Publication Date
- 20260508
- Application Date
- 20221031
Claims (10)
- 1. A memory computing circuit is characterized by comprising a weight layer, a computing layer, a first storage layer and a second storage layer which are sequentially arranged from top to bottom, wherein the computing layer comprises two groups of input ends A 0 ~A 3 and B 0 ~B 3 , an original code output end S 0 ~S 4 and an inverse code output end ~ One group of input ends B 0 ~B 3 of the calculation layer are connected with 4 storage nodes QB 20 ~ QB 23 of the weight layer in one-to-one correspondence, the other group of input ends A 0 ~A 3 are connected with 4 storage nodes Q 10 ~ Q 13 of the first storage layer in one-to-one correspondence, an original code output end S 1 ~S 4 of the calculation layer is connected with 4 storage nodes Q 10 ~Q 13 of the first storage layer in one-to-one correspondence, an original code output end S 0 is respectively connected with 4 storage nodes Q 00 ~Q 03 of the second storage layer, and an inverse code output end of the calculation layer ~ Is connected with the 4 storage nodes QB 10 ~QB 13 of the first storage layer in one-to-one correspondence, and is provided with a code reversing output end Respectively connected with 4 storage nodes QB 00 ~QB 03 of the second storage layer; The input end of the weight layer is connected with a control signal line WL <2>, the input end B 0 ~B 3 of the calculation layer is connected with an input signal line IN_B, the input end of the first storage layer is connected with a control signal line WL <1>, and the input end of the second storage layer is connected with a control signal line WL <0 >; When the IN-memory computing circuit executes multiplication operation, the multiplication operation of the four-bit weight input by the input signal line IN_B and the four-bit weight stored by the weight layer is split into four-period addition operation, and the operation result is stored IN the first storage layer and the second storage layer.
- 2. The in-memory computing circuit of claim 1, wherein the computing layer comprises a full adder, 4 nor gates, and 20 switching tubes M 0 ~M 19 ; One input end of the 4 nor gates is connected with the input signal line IN_B, the other input end of the 4 nor gates is correspondingly connected with the 4 storage nodes QB 20 ~QB 23 of the weight layer one by one and is used as the input end B 0 ~B 3 of the calculation layer, and the output ends of the 4 nor gates are correspondingly connected with the 4 input ends B0-B3 of the full adder one by one; The drains of the switching tubes M 16 ~M 19 are correspondingly connected with 4 input ends A0-A3 of the full adder one by one, the sources of the switching tubes M 16 ~M 19 are correspondingly connected with 4 storage nodes Q 10 ~Q 13 of the first storage layer one by one and serve as input ends A 0 ~A 3 of the calculation layer, and the grid of the switching tubes M 16 ~M 19 is connected with a control signal line WL_A; The output end C 0 of the full adder is connected with the source electrode of M 15 , the output end Connected with the drain electrode of M 14 , the output end S3 of the full adder is connected with the source electrode of M 13 , the output end Connected with the drain electrode of M 12 , the output end S2 of the full adder is connected with the source electrode of M 11 , the output end Connected with the drain electrode of M 10 , the output end S1 of the full adder is connected with the source electrode of M 9 , the output end The output end S0 of the full adder is connected with the source electrode of M 1 、M 3 、M 5 、M 7 respectively, the output end The grid of M 8 ~M 15 is connected with the drain electrode of M 0 、M 2 、M 4 、M 6 , the grid of M 0 、M 1 is connected with the control signal line WL_SH, the grid of M 0 、M 1 is connected with the control signal line WL_SL 0 , the grid of M 2 、M 3 is connected with the control signal line WL_SL 1 , the grid of M 4 、M 5 is connected with the control signal line WL_SL 2 , the grid of M 6 、M 7 is connected with the control signal line WL_SL 3 , the drain electrode of M 15 、M 13 、M 11 、M 9 is respectively used as the source electrode of the original code output end S 4 ~S 1 ;M 14 、M 12 、M 10 、M 8 of the calculation layer and is respectively used as the inverse code output end of the calculation layer ~ A drain electrode of M 7 、M 5 、M 3 、M 1 is used as a source electrode of the original code output end S 0 ;M 6 、M 4 、M 2 、M 0 of the calculation layer and is used as an inverse code output end of the calculation layer 。
- 3. The memory computing circuit of claim 1, wherein the weight layer, the first memory layer, and the second memory layer are each comprised of four memory cells, and wherein the memory cells are 6T memory cells comprising 6 transistors.
- 4. The memory computing circuit of claim 3, wherein the 6T memory cell comprises 2 PMOS transistors P1-P2 and 4 NMOS transistors N1-N4, wherein P1 and N1 form an inverter structure, P2 and N2 form another inverter structure, N3 and N4 are respectively used as transmission tubes, the sources of P1 and P2 are connected with VDD, the sources of N1 and N2 are grounded, the drain of P1, the drain of N1, the gate of P2, the gate of N2 are connected as a storage node Q and connected with the drain of N3, the source of N3 is connected with a bit line BL, the drain of P2, the drain of N2, the gate of P1, the gate of N1 are connected as a storage node QB and connected with the drain of N4, the gates of N3 and N4 are connected with word lines WL, and the source of N4 is connected with a bit line BLB.
- 5. The in-memory computing circuit of claim 2, wherein the in-memory computing circuit, when performing a write operation, weights four bits of binary through control signal line WL <2> The weight layer is written, and the first and second memory layers are written with "0" through the control signal line WL <1> and the control signal line WL <0>, respectively.
- 6. The IN-memory computing circuit of claim 5, wherein the IN-memory computing circuit performs a multiplication operation by weighting an external four bits of binary weight via input signal line IN_B Input to the computation layer and weighted with four binary digits Four-cycle operation is carried out, and the four-cycle operation mode is as follows: first period, input signal line IN_B inputs weight The weight is equal to four-bit weight in the weight layer Performing NOR operation, inputting the operation result to the input terminals B3-B0 of the full adder, and making the control signal line The weight 0000 stored in the first storage layer is input to the input ends A3-A0 of the full adder, the full adder generates five-bit original code output and five-bit inverse code output, and the five-bit original code output and the five-bit inverse code output are respectively 、 And 、 The calculation process is shown by the following formula: , ; control signal line along with command 、 The upper four bits of the output result of the full adder are stored in the first storage layer, and the lower most bits are stored in the lower most bits of the second storage layer; A second period, an input signal line IN_B for inputting weight The operation process is the same as the first period, except that the first memory layer outputs the stored upper four bits of the first period Input to the full adder input terminals A3-A0, the calculation process is shown by the following formula: ; Instant command , The upper four bits of the output result of the full adder are stored in the first storage layer, and the lower four bits are stored in the lower bits of the second storage layer; Third period, input signal line IN_B inputs weight The operation process is the same as the second period, the difference is that the first storage layer outputs the stored upper four bits of the second period Input to the full adder input terminals A3-A0, the calculation process is shown by the following formula: ; Instant command , The upper four bits of the output result of the full adder are stored in the first storage layer, and the lower four bits are stored in the upper bits of the second storage layer; Fourth period, input signal line IN_B inputs weight The operation process is the same as the third period, except that the first memory layer outputs the stored upper four bits of the third period Input to the full adder input terminals A3-A0, the calculation process is shown by the following formula: ; Instant command , The upper four bits of the output result of the full adder are stored in the first storage layer, and the lower most bits are stored in the upper most bits of the second storage layer; further four bit binary weights And a four bit binary weight The multiplication result is stored in the first storage layer and the second storage layer 。
- 7. The in-memory computing circuit of claim 2, wherein the switching tube M 0 ~M 19 employs a NOMS tube.
- 8. An in-memory writeback multiplication computation circuit, comprising: An in-memory computing unit array in the form of an N×M array formed by NM in-memory computing units, wherein N represents the number of rows of the in-memory computing units and M represents the number of columns of the in-memory computing unit array; A word line WL for controlling on and off of transmission pipes in each memory computing unit in the memory computing unit array during reading and writing; the bit line pair comprises 2M pairs of bit lines BL and BLB, wherein each in-memory computing unit in each column is connected to the same group of bit lines BL and BLB; An input signal line in_b for inputting a four-bit binary weight to a computation layer IN the IN-memory computation cell array; a decoding circuit for decoding an externally input row address selection signal and controlling the word line WL according to a decoding result; the switching circuit is used for selecting the four-bit binary weight input by the input signal line IN_B to match with the operation of different periods of each IN-memory computing unit of the IN-memory computing unit array; A mode control circuit for processing an externally input mode selection signal to adjust different operation modes of the circuit; A timing circuit for supplying pulse signals required for reading, writing, and multiplication calculation to the in-memory calculation unit array; the output circuit is connected with bit lines BL and BLB connected with each row of in-memory computing units in the in-memory computing unit storage array through a sense amplifier SA, so as to output data or operation results stored by the in-memory computing units in any row; The in-memory computing unit adopts the circuit structure of the in-memory computing circuit as claimed in any one of claims 1 to 7, and can realize the complete function of the in-memory computing circuit.
- 9. The in-memory writeback multiplication chip which is packaged by the in-memory writeback multiplication circuit of claim 8.
- 10. The in-memory writeback multiplication computation chip of claim 9, wherein the interface of the in-memory writeback multiplication computation chip comprises at least: A power interface VDD for connecting to a power source; A ground wire interface VSS for grounding; A row address selection interface A, which is used for inputting a row strobe signal to a circuit, wherein the row strobe signal is used for adjusting the access state of each in-memory computing unit on each word line; An enable signal interface CEN for inputting an enable signal for adjusting an operation state of the circuit; An external clock signal interface CLKIN for inputting an external clock signal to the circuit, said clock signal being used to adjust different modes of operation of the circuit and the clock frequencies required by the compute units in the array of compute units in memory; a row DATA interface data_in for inputting a pre-stored DATA to each IN-memory computing unit of the IN-memory computing unit array; the read-write control interface WEN is used for inputting control signals for adjusting the read-write operation of the computing units in each memory; A MODE selection interface MODE for inputting a MODE selection signal to the circuit, wherein the MODE selection signal is used for adjusting the switching of each in-memory computing unit in the in-memory computing unit array in the read-write operation and the multiplication operation so as to adjust different working MODEs of the circuit; An input signal interface mul_in for inputting external weights to be calculated to the respective IN-memory calculation units; And the output signal interface OUT is used for outputting data or operation results stored by the calculation units in each memory.
Description
In-memory computing circuit, in-memory writeable multiplication computing circuit and chip Technical Field The present invention relates to the field of in-memory computing technology, and more particularly, to an in-memory computing circuit, and an in-memory writable multiplication computing circuit and chip using the in-memory computing circuit as a base circuit. Background The conventional von neumann architecture separates the processor compute unit from memory, reads data from memory when the processor is operating, and then writes the data back to memory after the processor has processed the data. However, in the fields with large calculation amount such as machine learning and image recognition, for example, convolutional neural networks, the running speed of the memory is not synchronous with the speed of the processor, and the access speed of the memory is seriously delayed from the calculation speed of the processor, so that the processing speed of the processor is seriously affected. There are two significant benefits to embedding the computation in memory, first, data in/out of memory is greatly reduced, since the filter weights are not explicitly read, and only the output of the computation is sent out of memory. Second, the massively parallel nature of CNNs can be utilized to access multiple memory addresses simultaneously. Thus, higher memory bandwidth can be achieved using this approach, overcoming some of the major limitations imposed by the traditional von neumann bottleneck. Existing in-memory multiplication operations are mostly performed in the analog domain, and in the proposed analog IMC (In Memory Compute) architecture, one operand is usually pre-stored in the SRAM array, while the other operand is modulated to the voltage level of the word line or the number of word line pulses. The multiplication result of the two operands is then represented according to the different discharge amounts of the bit cells. When a plurality of rows of word lines are activated simultaneously, bit lines are discharged through corresponding bit cells, so that multiplication results are accumulated on the bit lines, and finally, multiplication accumulation results are output by an analog-to-digital converter (ADC). This approach to implementing multiplication in the analog domain can face challenges in terms of read disturb, computational accuracy, ADC quantization, etc. Disclosure of Invention Based on this, it is necessary to provide an in-memory computing circuit, and an in-memory rewritable multiplication computing circuit and chip using the in-memory computing circuit as a base circuit, for the problem that in-memory analog domain multiplication computation has read disturbance, computation accuracy error, and ADC quantization accuracy error. In order to achieve the above purpose, the present invention adopts the following technical scheme: An in-memory computing circuit comprises a weight layer, a computing layer, a first storage layer and a second storage layer which are sequentially arranged from top to bottom. The calculation layer comprises two groups of input ends A 0~A3 and B 0~B3, a primary code output end S 0~S4 and an inverse code output end One group of input ends B 0~B3 of the computing layer are connected with the 4 storage nodes QB 20~QB23 of the weight layer in a one-to-one correspondence manner, and the other group of input ends A 0~A3 are connected with the 4 storage nodes Q 10~Q13 of the first storage layer in a one-to-one correspondence manner. The original code output ends S 1~S4 of the calculation layer are connected with the 4 storage nodes Q 10~Q13 of the first storage layer in a one-to-one correspondence mode, and the original code output ends S 0 are connected with the 4 storage nodes Q 00~Q03 of the second storage layer respectively. Inverse code output end of calculation layerIs connected with the 4 storage nodes QB 10~QB13 of the first storage layer in one-to-one correspondence, and is provided with a code reversing output endRespectively connected to the 4 storage nodes QB 00~QB03 of the second storage layer. The input end of the weight layer is connected with the control signal line WL <2>, the input end B 0~B3 of the calculation layer is connected with the input signal line IN_B, the input end of the first storage layer is connected with the control signal line WL <1>, and the input end of the second storage layer is connected with the control signal line WL <0 >. When the IN-memory computing circuit executes multiplication operation, the multiplication operation of the four-bit weight input by the input signal line IN_B and the four-bit weight stored by the weight layer is split into four-period addition operation, and the operation result is stored IN the first storage layer and the second storage layer. Further, the calculation layer includes a full adder, 4 nor gates, and 20 switching transistors M 0~M19. One of the input terminals of the 4 nor gates is connected to