CN-122029516-A - Processing unit employing micro-op Random Access Memory (RAM) as main program memory
Abstract
Aspects disclosed in the detailed description include a Processing Unit (PU) that employs micro-op Random Access Memory (RAM) as main program memory. The micro-op RAM includes row circuits each associated with a micro-op and configured to store control signal parameters, and output ports configured to be coupled to a register file and one or more execution units. In contrast to the fetch and decode instructions of the ISA in a conventional PU, the processing unit loads a main program including a plurality of micro-ops into a row circuit of the micro-op RAM. When executing the individual micro-ops of the main program, the processing unit activates the row circuits in the micro-op RAM to cause the control signal parameters stored by the row circuits to be transferred through the output ports of the micro-op RAM to the register file and/or the one or more execution units and to avoid the need for decoding stage circuits, thereby advantageously reducing processing latency.
Inventors
- BORKAR SHEKHAR Y
- N. Y. Berkal
- R. Kahn
Assignees
- 高通股份有限公司
Dates
- Publication Date
- 20260512
- Application Date
- 20240912
- Priority Date
- 20231020
Claims (20)
- 1. An apparatus, the apparatus comprising: a processing unit, the processing unit comprising: a micro-op Random Access Memory (RAM), the micro-op Random Access Memory (RAM) comprising: A plurality of row circuits configured to store a plurality of register control signal parameters corresponding to register micro-ops to be processed and a plurality of execution control signal parameters corresponding to execution micro-ops to be executed; Register file, and An execution circuit; The processing unit is configured to: Activating a first row of the plurality of row circuits to couple the plurality of register control signal parameters in the first row of circuits to the register file to cause one or more operands to be provided to the execution circuitry, and Activating a second row circuit of the plurality of row circuits to couple the plurality of execution control signal parameters in the second row circuit to the execution circuit to select operation of the execution circuit, and The execution circuitry is configured to perform the operation based on the one or more operands and the plurality of execution control signal parameters.
- 2. The apparatus of claim 1, wherein the first row circuit and the second row circuit are disposed in a common row circuit of the plurality of row circuits.
- 3. The apparatus of claim 1, wherein: the register file further includes a register input port; the execution circuit further comprises an execution circuit input port; The micro-op RAM further includes: a plurality of first output ports coupled to the register input port, and A plurality of second output ports coupled to the execution circuit input port; The first row of circuits is coupled to the plurality of first output ports, and The second row of circuits is coupled to the plurality of second output ports.
- 4. The apparatus of claim 3 wherein the register file further comprises a register output port, Wherein the one or more operands are transferred from the register file to the execution circuitry through the register output port.
- 5. The apparatus of claim 1, the apparatus further comprising: A memory storing a program including a plurality of micro-ops, and The processing unit is further configured to: the plurality of micro-ops are loaded into the plurality of row circuits in the micro-op RAM.
- 6. The device of claim 1, wherein the micro-op RAM is directly coupled to the register file.
- 7. The apparatus of claim 1, wherein the micro-op RAM is directly coupled to the execution circuitry.
- 8. The apparatus of claim 1, wherein the processing unit is further configured to activate the first row of circuitry in a clock cycle.
- 9. The apparatus of claim 1, integrated into an Integrated Circuit (IC).
- 10. The apparatus of claim 1 integrated into a device selected from the group consisting of a set top box, an entertainment unit, a navigation device, a communications device, a fixed location data unit, a mobile location data unit, a Global Positioning System (GPS) device, a mobile phone, a cellular phone, a smart phone, a Session Initiation Protocol (SIP) phone, a tablet, a server, a computer, a portable computer, a mobile computing device, a wearable computing device, a desktop computer, a Personal Digital Assistant (PDA), a monitor, a computer monitor, a television, a tuner, a radio, a satellite radio, a music player, a digital music player, a portable music player, a digital video player, a Digital Video Disc (DVD) player, a portable digital video player, an automobile, a vehicle component, an avionics system, and a multi-rotor aircraft.
- 11. A method of operating a processing unit employing micro-op Random Access Memory (RAM), the method comprising: providing the micro-op RAM, wherein the micro-op RAM comprises: A plurality of row circuits configured to store a plurality of register control signal parameters corresponding to register micro-ops to be processed and a plurality of execution control signal parameters corresponding to execution micro-ops to be executed; activating a first row of the plurality of row circuits to couple the plurality of register control signal parameters in the first row of circuits to a register file to cause one or more operands to be provided to an execution circuit; activating a second row circuit of the plurality of row circuits to couple the plurality of execution control signal parameters in the second row circuit to the execution circuit to select operation of the execution circuit, and The operations are performed based on the one or more operands and the plurality of execution control signal parameters.
- 12. The method of claim 11, wherein the first row circuit and the second row circuit are disposed in a common row circuit of the plurality of row circuits.
- 13. The method according to claim 11, wherein: the register file further includes a register input port; the execution circuit further comprises an execution circuit input port; The micro-op RAM further includes: a plurality of first output ports coupled to the register input port, and A plurality of second output ports coupled to the execution circuit input port; The first row of circuits is coupled to the plurality of first output ports, and The second row of circuits is coupled to the plurality of second output ports.
- 14. The method of claim 13, wherein the register file further comprises a register output port, Wherein activating the first row of circuitry further comprises: The one or more operands are communicated from the register file to the execution circuitry through the register output port.
- 15. The method of claim 11, the method further comprising: Storing a plurality of micro-ops in a memory, and The plurality of micro-ops are loaded into the plurality of row circuits in the micro-op RAM.
- 16. The method of claim 11, further comprising directly coupling the micro-op RAM to the register file.
- 17. The method of claim 11, further comprising directly coupling the micro-op RAM to the execution circuitry.
- 18. The method of claim 11, wherein activating the first row of circuitry further comprises activating the first row of circuitry in a clock cycle.
- 19. The method according to claim 12, wherein: activating the first row of circuitry further includes: activating the first row of circuits in a clock cycle, and Activating the second row of circuitry further comprises: the second row of circuitry is activated in the clock cycle.
- 20. An apparatus, the apparatus comprising: a processing unit, the processing unit comprising: a micro-op Random Access Memory (RAM), the micro-op Random Access Memory (RAM) comprising: A plurality of row circuits configured to store a plurality of register control signal parameters corresponding to register micro-ops to be processed and a plurality of execution control signal parameters corresponding to execution micro-ops to be executed; Register file, and An execution circuit; means for activating a first row of the plurality of row of circuits to couple the plurality of register control signal parameters in the first row of circuits to the register file to cause one or more operands to be provided to the execution circuitry; means for activating a second row circuit of the plurality of row circuits to couple the plurality of execution control signal parameters in the second row circuit to the execution circuit to select operation of the execution circuit, and Means for performing the operation based on the one or more operands and the plurality of execution control signal parameters.
Description
Processing unit employing micro-op Random Access Memory (RAM) as main program memory Priority application The present application claims priority from U.S. patent application Ser. No. 18/491,455, entitled "PROCESSING UNIT EMPLOYING MICRO-OPERATIONS (MICRO-OPS) RANDOM ACCESS MEMORY (RAM) AS MAIN PROGRAM MEMORY(, filed 10/20/2023, which is hereby incorporated by reference in its entirety, to micro-op Random Access Memory (RAM) as the processing unit for main program memory. Technical Field The technology of the present disclosure relates generally to computer microarchitectures. Background Microprocessors, also known as Processing Units (PUs), perform computing tasks in a wide variety of applications. One type of conventional microprocessor or PU is a Central Processing Unit (CPU). Another type of microprocessor or PU is a specialized processing unit called a Graphics Processing Unit (GPU). GPUs are designed with specialized hardware to accelerate the rendering of graphics and video data to be displayed. The GPU may be implemented as an integrated component of a general purpose CPU or as a discrete hardware component separate from the CPU. The PU executes software instructions stored in a memory system that includes external memory and an instruction cache. The software instructions instruct the processor to grab data from locations in the memory and use the grabbed data to perform one or more processor operations. The results may then be stored in memory. Modern Instruction Set Architectures (ISAs), such as RISC-V, x86 Intel ® and Arm ® v8, are examples of software instructions that program PUs. Higher level languages such as C/c++ are used by programmers and automation tools to operate in a more abstract programming environment. Programs written in higher-level languages are compiled and linked into the ISA to be run on the PU. The PU includes a series of pipeline stage circuits. Modern PUs have pipeline stages of various depths to handle programs that include ISA instructions stored in memory including an instruction cache. A typical pipeline stage includes fetching ISA instructions from a memory including an instruction cache for storing recently used ISA instructions, decoding the ISA instructions, reading input registers from a register file, executing the decoded ISA instructions with the read registers, and writing the results of the executed ISA instructions to the register file or memory. The decode pipeline stage includes combinational and sequential logic circuitry for decoding ISA instructions into hundreds of bits of control settings, known as micro-operations (micro-ops). The micro-ops are used to control data movement and operation of subsequent pipeline stages. The decode pipeline stage is expensive relative to latency because it decodes ISA instructions of the program in real-time. To optimize the decode pipeline stages, some PUs include decode pipeline stages that include Read Only Memory (ROM) to pre-decode some of the ISA instructions into control signals by mapping ISA opcodes to a set of control signals. However, such optimization is limited to a specific ISA, which limits the flexibility of the hardware of the PU. Moreover, the pre-decoded instructions need to be retrieved by an expensive look-up table. To optimize clock cycles in the pipeline, the pipeline stages may analyze a window of instructions in the pipeline to find dependencies between instructions in the window, utilize temporary registers, and arrange the timing for accessing the registers. This optimization to increase the throughput of instructions in the window comes at the cost of the clock cycles required for analysis. Disclosure of Invention Aspects disclosed in the detailed description include a processing unit employing micro-op Random Access Memory (RAM) as a main program memory. Decoding circuitry creates latency problems by decoding instructions in an instruction pipeline in real-time according to an Instruction Set Architecture (ISA). The micro-op RAM includes a plurality of row circuits each associated with one micro-op and configured to store control signal parameters, and a plurality of output ports configured to be coupled to a register file and one or more execution circuits. In contrast to the fetch instructions of the ISA in a conventional processing unit, the processing unit loads a main program including a plurality of micro-ops into the row circuits of the micro-op RAM. When executing the individual micro-ops of the main program, the processing unit activates a row circuit in the micro-op RAM to cause its stored control signal parameters to be transferred through the output ports of the micro-op RAM to the register file and/or the one or more execution circuits and to avoid the need for decoding stage circuitry, thereby advantageously reducing processing latency. Furthermore, since the micro-op RAM does not store instructions from the ISA, the processing unit does not utilize conventional instruction ca