Search

CN-121996299-A - Instruction transmission processing device and method for sequential multi-transmission processor

CN121996299ACN 121996299 ACN121996299 ACN 121996299ACN-121996299-A

Abstract

The invention discloses an instruction emission processing device and method of a sequential multi-emission processor, wherein the device comprises an instruction extraction unit, an instruction decoding module, an instruction emission effectiveness arbitration module and an emission queue module, wherein the instruction extraction unit is used for reading an instruction block from an instruction memory and pre-decoding the instruction block to form an instruction block capable of being emitted, the instruction decoding module is used for decoding the instruction block received from the instruction extraction unit to obtain information of each instruction to be processed, the instruction emission effectiveness arbitration module adopts a fully-associated storage structure and comprises a plurality of entries, state information of the instructions is stored in each entry, the emission queue module is connected with the instruction emission effectiveness arbitration module and used for caching the instructions to be emitted which have passed arbitration, and the execution unit is connected with the emission queue module and used for executing the instructions and writing back execution results to the corresponding entries of the instruction emission effectiveness arbitration module.

Inventors

  • LI DONGSHENG
  • ZHANG XIRAN
  • WU YE
  • LI WEI

Assignees

  • 南京英麒智能科技有限公司

Dates

Publication Date
20260508
Application Date
20251230

Claims (9)

  1. 1. An instruction issue processing apparatus of a sequential multi-issue processor, comprising: An instruction extraction unit for reading the instruction block from the instruction memory and pre-decoding the instruction block to identify and process the instruction which cannot be emitted backwards in the same period, so as to form an emitting instruction block; the instruction decoding module is used for decoding the instruction blocks received from the instruction extraction unit to obtain category information, source and target register information and execution delay information of each instruction to be processed; The instruction transmitting effectiveness arbitration module adopts a fully-connected storage structure and comprises a plurality of entries, wherein state information of instructions is stored in each entry, and the entries are directly indexed through register addresses; The transmission queue module is connected with the instruction transmission effectiveness arbitration module and is used for caching the instruction to be transmitted which passes arbitration; The execution unit is connected with the transmission queue module and is used for executing the instruction and writing the execution result back to the corresponding entry of the instruction transmission effectiveness arbitration module; The instruction decoding module sends a register address to be accessed by an instruction to be processed to the instruction transmitting effectiveness arbitration module, the instruction transmitting effectiveness arbitration module accesses a corresponding item according to the register address, judges whether the current instruction to be processed has data adventure or resource conflict according to the state information of the precursor instruction stored in the item, and if the current instruction does not exist, the current instruction information is written into the item and the current instruction is sent to the transmitting queue module.
  2. 2. The apparatus of claim 1, wherein each entry of the instruction issue validity arbitration module comprises N independent and parallel accessed subslots, N being equal to a designed instruction issue width, each subslot independently storing state information for an instruction, the state information including at least an instruction execution function, a fixed cycle execution count, an indefinite cycle execution flag, and write back data.
  3. 3. The apparatus of claim 1, wherein the instruction issue validity arbitration module determines that the current instruction does not have a data hazard or resource conflict based on the register address index to be accessed by the pending instruction to the corresponding entry being empty, and otherwise: When the execution period of the current order instruction is not fixed, judging whether the execution of the previous order instruction is finished or not by scanning the setting state of an execution mark corresponding to the indefinite period in the entry, and judging that the current order instruction has no data hazard or resource conflict after the execution is confirmed to be finished; when the execution cycle of the current sequence instruction is fixed: If the current instruction is to be executed as a read operation, judging that the current instruction has no data hazard or resource conflict when the residual execution period of the current instruction is less than or equal to the read operation delay of the current instruction or the execution of the previous instruction is finished; if the operation to be executed of the current instruction is a write operation, and the residual execution period of the current instruction is less than or equal to the write operation delay of the current instruction or the execution of the previous instruction is finished, judging that the current instruction has no data hazard or resource conflict.
  4. 4. The apparatus of claim 1 wherein the result data is forwarded when it is determined that the preamble instruction has been executed but the result data has not been written back based on the preamble instruction status information stored in the entry of the instruction issue validity arbitration module.
  5. 5. The apparatus of claim 1, wherein the pre-decode processing by the instruction fetch unit includes identifying branch instructions or multi-cycle computing instructions in the instruction block and ensuring that both instructions are sent to the instruction decode module in different cycles.
  6. 6. A method for instruction transmission processing of a sequential multi-transmission processor, comprising the steps of: s1, an instruction decoding module decodes an instruction block received from an instruction extraction unit, and extracts category information, source and target register information and execution delay information of each instruction to be processed; s2, taking the target register address of the instruction to be processed as an index, and accessing the corresponding entry in the instruction transmitting effectiveness arbitration module, if the entry is empty, executing S4, otherwise executing S3; S3, reading state information of the precursor instruction in the item, judging whether the execution period of the precursor instruction is fixed, and if the execution period of the precursor instruction is not fixed, waiting for the execution of the precursor instruction to finish executing the precursor instruction, and executing the precursor instruction, wherein if the execution period of the precursor instruction is fixed, judging whether the current instruction has data adventure or resource conflict according to the operation type to be executed of the current instruction, and executing the precursor instruction again when the data adventure or the resource conflict does not exist, and if any of the two conditions is not met, suspending the current instruction until the condition is met; S4, writing the current instruction information into the item and sending the instruction into the transmitting queue module; s5, the transmitting queue module sequentially transmits the instructions to the execution unit for execution; And S6, after the execution unit completes execution, writing the execution result back to the corresponding entry of the instruction transmitting effectiveness arbitration module.
  7. 7. The method according to claim 6, wherein in the step S3, when the execution cycle of the in-order instruction is fixed, the following sub-steps are performed: If the current instruction is to be executed as a read operation, judging that the current instruction has no data hazard or resource conflict when the residual execution period of the current instruction is less than or equal to the read operation delay of the current instruction or the execution of the previous instruction is finished; if the operation to be executed of the current instruction is a write operation, and the residual execution period of the current instruction is less than or equal to the write operation delay of the current instruction or the execution of the previous instruction is finished, judging that the current instruction has no data hazard or resource conflict.
  8. 8. The method of claim 6, wherein in step S3, when the execution cycle of the preceding instruction is not fixed, all the execution flags of the indefinite cycle in the entry are scanned, and if any flag is set, indicating that the instruction corresponding to the indefinite cycle is not executed yet, the current instruction must be suspended, and step S4 can be executed only when all the relevant flag bits in the entry are cleared.
  9. 9. The method according to claim 6, further comprising a data forwarding judging step between the step S4 and the step S5, wherein when the entry shows that the execution result data of the preceding instruction is ready but not yet written into the register file, the result data of the preceding instruction is forwarded directly to the execution unit of the current instruction.

Description

Instruction transmission processing device and method for sequential multi-transmission processor Technical Field The invention relates to a sequential multi-emission processor, in particular to an instruction emission processing device and method of the sequential multi-emission processor. Background Sequential multi-transmit processors have wide application in edge computing and terminal equipment, and their design is balanced between performance, power consumption and area (PPA). Traditional sequential single-shot processors have insufficient performance, while out-of-order multiple-shot processors have excessive hardware overhead and are not suitable for embedded scenes. Thus, sequential multi-emission architecture becomes a compromise choice. In the prior art, chinese patent No. CN115576610a discloses an instruction distribution processing method and apparatus suitable for a general-purpose sequential emission processor, where the front end of the instruction distribution apparatus is connected to an instruction extraction unit of the processor, and the rear end is connected to an execution unit of the processor. The instruction distribution device comprises an instruction decoding module, an instruction execution state tracking module, an operand queue module, a write-back data forwarding module, a data adventure and resource conflict detection module and an instruction transmitting module. The front end of the instruction decoding module is connected to the instruction extraction unit, the rear end of the instruction decoding module is respectively connected to the instruction execution state tracking module and the operand queue module, the instruction execution state tracking module is respectively connected with the write-back data forwarding module and the data hazard and resource conflict detection module, the operand queue module is connected with the instruction transmitting module, and the instruction transmitting module and the write-back data forwarding module are respectively connected with the execution unit. The invention can obviously improve the instruction scheduling efficiency and the processor execution efficiency and reduce the instruction arrangement overhead of a compiler. The technical problems of the scheme are as follows: structural redundancy, namely, a state table is stored separately from an operand queue, and information repetition and consistency maintenance overhead exists; the index efficiency is low, the state table is required to be indirectly accessed through an operand queue, and the key path delay is large; The instructions with the indefinite period are processed roughly, namely instructions with the fixed period and instructions with the indefinite period are not distinguished, and a conservative waiting strategy is adopted for the instructions with the fixed period, so that pipeline bubbles are generated; the lack of front-end filtering, i.e. the instruction extraction unit does not preprocess the unparallelable instruction combination, results in that the back-end arbitration module processes the invalid instruction combination, which wastes hardware resources. The above problems lead to large hardware area, difficult timing convergence and limited dynamic execution efficiency, and are difficult to meet the requirements of the edge computing scene on the extreme energy efficiency ratio. Disclosure of Invention The invention aims to provide an instruction emission processing device and method of a sequential multi-emission processor, which have small hardware area and high processing efficiency and can distinguish processing according to the type of a preamble instruction to avoid pipeline stop. The technical scheme is that the instruction emission processing device of the sequential multi-emission processor comprises: An instruction extraction unit for reading the instruction block from the instruction memory and pre-decoding the instruction block to identify and process the instruction which cannot be emitted backwards in the same period, so as to form an emitting instruction block; the instruction decoding module is used for decoding the instruction blocks received from the instruction extraction unit to obtain category information, source and target register information and execution delay information of each instruction to be processed; The instruction transmitting effectiveness arbitration module adopts a fully-connected storage structure and comprises a plurality of entries, wherein state information of instructions is stored in each entry, and the entries are directly indexed through register addresses; The transmission queue module is connected with the instruction transmission effectiveness arbitration module and is used for caching the instruction to be transmitted which passes arbitration; The execution unit is connected with the transmission queue module and is used for executing the instruction and writing the execution result back to the correspon