US-12626114-B2 - Discrete three-dimensional processor
Abstract
A discrete three-dimensional (3-D) processor comprises a plurality of storage-processing units (SPU's), each of the SPU's comprising a non-memory circuit, at least a memory array and at least an off-die peripheral-circuit component thereof. The 3-D processor further comprises first and second dice. The first die comprises the memory arrays, whereas the second die comprises the non-memory circuit and the off-die peripheral-circuit component.
Inventors
- Guobiao Zhang
Assignees
- HONG KONG HAICUN TECHNOLOGY CO., LIMITED
Dates
- Publication Date
- 20260512
- Application Date
- 20230804
- Priority Date
- 20181210
Claims (20)
- 1 . A discrete three-dimensional (3-D) processor, comprising: a plurality of storage-processing units (SPU's), wherein each of said plurality of SPU's comprises a non-memory circuit, at least a memory array and an off-die peripheral-circuit component thereof; a first die including 3-D structures, further comprising the memory arrays of said plurality of SPU's; a second die including only 2-D circuits, further comprising the nonmemory circuits and the off-die peripheral-circuit components of said plurality of SPU's; a plurality of inter-die connections between said first and second dice communicatively coupling the memory arrays with the off-die peripheral-circuit components and the non-memory circuits of said plurality of SPU's; wherein said non-memory circuit is not a part of any memory; and, said an off-die peripheral-circuit component is not an input/output (I/O) circuit.
- 2 . The 3-D processor according to claim 1 , wherein said off-die peripheral-circuit component is a non-I/O circuit selected from a group consisting of an address decoder, a sense amplifier, a programming circuit, and a charge-pump circuit.
- 3 . The 3-D processor according to claim 1 , wherein said first and second dice are vertically stacked and have a same die size.
- 4 . The 3-D processor according to claim 1 , wherein said first and second dice are vertically stacked; and, all edges of said first and second dice are aligned.
- 5 . The 3-D processor according to claim 1 , wherein said first and second dice are vertically stacked; and, a relative placement between said memory array and said non-memory circuit within said each of said plurality of SPU's is the same for said plurality of SPU's.
- 6 . The 3-D processor according to claim 1 , wherein said first and second dice are vertically stacked; said each of said plurality of SPU's occupies a first area on said first die and a second area on said second die; and, said first and second areas are vertically aligned.
- 7 . The 3-D processor according to claim 1 , wherein said first die has a first back-end-of-line structure (BEOL); said second die has a second BEOL; and, said first and second BEOL's are different.
- 8 . The 3-D processor according to claim 7 , wherein said first BEOL has a first thickness; said second BEOL has a second thickness; and, said first thickness is larger than said second thickness.
- 9 . The 3-D processor according to claim 7 , wherein said first BEOL has a first number of layers; said second BEOL has a second number of layers; and, said first number of layers is larger than said second number of layers.
- 10 . The 3-D processor according to claim 7 , wherein an in-die peripheral-circuit component of said memory array is disposed under said memory array in said first die; and, said in-die peripheral-circuit component has a third BEOL.
- 11 . The 3-D processor according to claim 10 , wherein said third BEOL has a third number of layers; said second BEOL has a second number of layers; and, said third number of layers is smaller than said second number of layers.
- 12 . The 3-D processor according to claim 10 , wherein said third BEOL comprises a third interconnect material; said second BEOL comprises a second interconnect material; and, said third interconnect material has a higher resistivity than said second interconnect material.
- 13 . The 3-D processor according to claim 2 , wherein said first and second dice are vertically stacked and have a same die size.
- 14 . The 3-D processor according to claim 2 , wherein said first and second dice are vertically stacked; and, all edges of said first and second dice are aligned.
- 15 . The 3-D processor according to claim 2 , wherein said first and second dice are vertically stacked; and, a relative placement between said memory array and said non-memory circuit within said each of said plurality of SPU's is the same for said plurality of SPU's.
- 16 . The 3-D processor according to claim 2 , wherein said first and second dice are vertically stacked; said each of said plurality of SPU's occupies a first area on said first die and a second area on said second die; and, said first and second areas are vertically aligned.
- 17 . The 3-D processor according to claim 2 , wherein: said first die has a first back-end-of-line structure (BEOL); said second die has a second BEOL; and, said first and second BEOL's are different.
- 18 . The 3-D processor according to claim 17 , wherein said first BEOL has a first thickness; said second BEOL has a second thickness; and, said first thickness is larger than said second thickness.
- 19 . The 3-D processor according to claim 17 , wherein said first BEOL has a first number of layers; said second BEOL has a second number of layers; and, said first number of layers is larger than said second number of layers.
- 20 . The 3-D processor according to claim 17 , wherein an in-die peripheral-circuit component of said memory array is disposed under said memory array in said first die; and, said in-die peripheral-circuit component has a third BEOL.
Description
CROSS-REFERENCE TO RELATED APPLICATIONS This application is a continuation of U.S. patent application Ser. No. 18/117,472, filed Mar. 5, 2023, which is a continuation of U.S. patent application Ser. No. 17/964,888, filed Oct. 12, 2022, now U.S. Pat. No. 11,695,001, issued Jul. 4, 2023, which is a division of U.S. patent application Ser. No. 16/249,021, filed Jan. 16, 2019, now U.S. Pat. No. 11,527,523, issued Dec. 13, 2022. This application is also a continuation of U.S. patent application Ser. No. 18/096,013, filed Jan. 12, 2023, which is a division of U.S. patent application Ser. No. 17/467,436, filed Sep. 6, 2021, which is a continuation-in-part of U.S. patent application Ser. No. 16/249,021, filed Jan. 16, 2019, now U.S. Pat. No. 11,527,523, issued Dec. 13, 2022. U.S. Pat. No. 11,527,523 claims priorities from the following Chinese patent applications: 1) Chinese Patent Application No. 201811506212.1, filed Dec. 10, 2018;2) Chinese Patent Application No. 201811508130.0, filed Dec. 11, 2018;3) Chinese Patent Application No. 201811520357.7, filed Dec. 12, 2018;4) Chinese Patent Application No. 201811527885.5, filed Dec. 13, 2018;5) Chinese Patent Application No. 201811527911.4, filed Dec. 13, 2018;6) Chinese Patent Application No. 201811528014.5, filed Dec. 14, 2018;7) Chinese Patent Application No. 201811546476.X, filed Dec. 15, 2018;8) Chinese Patent Application No. 201811546592.1, filed Dec. 15, 2018;9) Chinese Patent Application No. 201910002944.5, filed Jan. 2, 2019;10) Chinese Patent Application No. 201910029523.1, filed Jan. 13, 2019, in the State Intellectual Property Office of the People's Republic of China (CN), the disclosures of which are incorporated herein by references in their entireties. BACKGROUND 1. Technical Field of the Invention The present invention relates to the field of integrated circuit, and more particularly to a processor. 2. Prior Art Processors (including CPU, GPU, FPGA, and others) are extensively used in mathematical computation, computer simulation, configurable gate array, pattern processing and neural network. A conventional processor is based on two-dimensional (2-D) integration, i.e. its processing circuit (e.g. arithmetic logic unit, control unit) and memory circuit (internal memory, including RAM for cache and ROM for look-up table) are disposed on a same plane, i.e. the top surface of a semiconductor substrate. Because the arithmetic logic operations are its primary function, the processor die contains limited amount of internal memory. The conventional computer is based on the von Neumann architecture, where processor and memory are physically separated. Most memory takes the form of external memory (e.g. main memory, secondary memory). When it requests a large amount of data, a processor fetches the data from an external memory. Because the processor and the external memory are distant and the system bus between them has a relatively narrow width, data transfer between them has a limited bandwidth. As the amount of data increases, the conventional processor and its associated von Neumann architecture become inefficient. The following paragraphs will provide an overview of the fields of applications of the conventional processors and their limitations. [A] Mathematical Computing One important application of processors is mathematical computing, including computing of mathematical functions and mathematical models. For mathematical computing, the conventional processors use logic-based computation (LBC), which carries out computation primarily with processing circuits (generally known as arithmetic logic unit, or ALU). In fact, the arithmetic operations that can be directly implemented by the ALU consist of addition, subtraction and multiplication. These arithmetic operations are collectively referred to as basic arithmetic operations. The ALU's are suitable for arithmetic functions, but not for non-arithmetic functions. For a processor to compute mathematical functions, an arithmetic function is a mathematical function which can be represented by a combination of its basic arithmetic operations, whereas a non-arithmetic function is a mathematical function which cannot be represented by a combination of its basic arithmetic operations. Exemplary non-arithmetic functions include transcendental functions and special functions. Because it includes more operations than the arithmetic operations provided by the ALU's, a non-arithmetic function cannot be implemented by the ALU's alone. The hardware implementation of the non-arithmetic functions has been a major challenge. For the conventional processors, only few basic functions (i.e. single-variable non-arithmetic functions, e.g. basic algebraic functions and basic transcendental functions) are implemented by hardware and they are referred to as built-in functions. These built-in functions are realized by a combination of processing circuits and look-up tables (LUT). In prior art, there are many ways to implement built