CN-122018992-A - Micro-operation result merging and buffering device, processor, chip, equipment and method
Abstract
The application relates to the technical field of processors and provides a micro-operation result merging and buffering device, a processor, a chip, equipment and a method, wherein the device is arranged on a write-back pipeline of the processor and comprises a buffer management unit and a write-back control unit, wherein the buffer management unit is used for storing dynamically allocated buffer entries, the write-back control unit is used for matching or allocating a corresponding storage entry according to a target register mark carried by the micro-operation result when the micro-operation result enters the write-back pipeline, merging the micro-operation result into a result field of the buffer entry matched or allocated for the micro-operation result, updating a progress field of the corresponding buffer entry, and writing a complete result in a result field of the corresponding buffer entry into a corresponding target register in a physical register file when the progress field indicates that the intermediate result of the corresponding target register is completely collected. The embodiment of the application can reduce the hardware area and time sequence expenditure of the processor and maintain the out-of-order execution capacity of the processor while supporting the integral write-back of the micro-operation result.
Inventors
- TIAN QIANG
Assignees
- 成都群芯微电子科技有限公司
Dates
- Publication Date
- 20260512
- Application Date
- 20260115
Claims (20)
- 1. A micro-operation result merging and buffering device, which is arranged on a write-back pipeline of a processor, the device comprising: The buffer management unit is used for storing dynamically allocable buffer entries, and each buffer entry comprises an index field, a result field, a progress field and a buffer management unit, wherein the index field is used for storing a target register identifier; And the write-back control unit is used for responding to the micro-operation result to enter a write-back pipeline, matching or distributing a corresponding storage item according to the carried target register mark, merging the micro-operation result into the result field of the cache item matched or distributed for the micro-operation result, updating the progress field of the corresponding cache item, and writing the complete result in the result field of the corresponding cache item into the corresponding target register in the physical register file when the progress field indicates that the intermediate result of the corresponding target register is completely collected.
- 2. The apparatus of claim 1, wherein the apparatus is located between a memory management unit and a physical register file.
- 3. The apparatus of claim 1, wherein the buffer management unit comprises a two-stage buffer: The result merging cache is used as a first-level buffer and comprises a plurality of first-class entries, and each first-class entry comprises the index domain, the result domain and the progress domain; A score table as a second level buffer comprising a plurality of second class entries, each second class entry comprising the index field and the progress field; the depth of the score table is larger than that of the result merging cache.
- 4. The apparatus of claim 3, wherein said matching or assigning a corresponding memory entry based on the destination register identification carried thereby comprises: Matching a storage item corresponding to the target register identifier from the result merging cache; If the micro-operation result is not hit, a storage item is allocated for the micro-operation result carrying the target register identification.
- 5. The apparatus of claim 4, wherein each first type of entry further comprises an liveness indication field for indicating how liveness the corresponding first type of entry was accessed; The allocation of a memory entry for the micro-operation result carrying the target register identification includes: When the number of first type entries in the result merging cache reaches an upper limit and the first type entries corresponding to the target register identification do not exist in the result merging cache, selecting the first type entries according to the liveness indication field, writing intermediate results stored by the selected first type entries into corresponding target registers in the physical register file, and clearing the selected first type entries to serve as one first type entry allocated for the micro-operation result carrying the target register identification.
- 6. The apparatus of claim 5, wherein the selecting a first type of entry according to the liveness indication field comprises: And selecting a first type of item with the lowest activity according to the activity indication domain.
- 7. The apparatus of claim 5 or 6, wherein said allocating a memory entry for a micro-operation result carrying said destination register identification further comprises: When the number of the first type of entries in the result merging cache is not up to the upper limit, if the first type of entries corresponding to the target register identification do not exist in the result merging cache and the second type of entries corresponding to the target register identification exist in the score table, a first type of entries is allocated to the micro-operation result carrying the target register identification in the result merging cache, corresponding intermediate results are read back from the physical register file according to the target register identification, corresponding collecting progress is read back from the score table, and the intermediate results and the collecting progress are loaded into the corresponding fields of the first type of entries.
- 8. The apparatus of claim 4, wherein said allocating a memory entry for a micro-operation result carrying said destination register identification further comprises: If the first type of entry corresponding to the target register identifier does not exist in the result merging cache and the second type of entry corresponding to the target register identifier does not exist in the score table, respectively distributing a first type of entry and a second type of entry for the micro-operation result carrying the target register identifier in the result merging cache and the score table.
- 9. The apparatus of claim 1, wherein the updating the progress field of the corresponding cache entry comprises: And accumulating element widths of the corresponding progress fields according to element width information carried in the micro-operation result, wherein the element width information represents the element width of the micro-operation result occupied in the complete result of the corresponding target register.
- 10. The apparatus of claim 9, wherein the progress field indicates that intermediate results for a corresponding target register are collected, comprising: The element width accumulated value of the progress field reaches the element total width of the complete result of the corresponding target register.
- 11. The apparatus of claim 1, wherein the write-back control unit is further configured to: after the complete result in the result field of the corresponding cache entry is written to the corresponding destination register in the physical register file, the resource occupation of the corresponding cache entry is released.
- 12. The apparatus of claim 1, wherein the write-back control unit is further configured to: And when the progress field indicates that the intermediate result aiming at the corresponding target register is not collected, invalidating the operation of writing the micro-operation result into the physical register file.
- 13. The apparatus of claim 3, wherein the depth of the score table is determined according to the formula d2=d0-D1, wherein D2 is the depth of the score table, D0 is the depth of a load store queue loaded by a memory management unit, and D1 is the depth of the result merge cache.
- 14. A processor comprising the apparatus of any one of claims 1-13.
- 15. A chip is characterized in that, comprising a processor as claimed in claim 14.
- 16. An electronic device comprising the chip of claim 15.
- 17. A method for merging and buffering micro-operation results, which is applied to the device of claim 1, and is characterized in that the method comprises the following steps: Responding to the micro-operation result to enter a write-back pipeline, and matching or distributing a corresponding storage item according to the carried target register identification; merging the micro-operation result into a result field of the cache entry matched or allocated for the micro-operation result, and updating a progress field of the corresponding cache entry; and when the progress field indicates that the intermediate results for the corresponding target register are collected, writing the complete result in the result field of the corresponding cache entry into the corresponding target register in the physical register file.
- 18. The method of claim 17, wherein the buffer management unit comprises a two-level store: The result merging cache is used as a first-level buffer and comprises a plurality of first-class entries, and each first-class entry comprises the index domain, the result domain and the progress domain; A score table as a second level buffer comprising a plurality of second class entries, each second class entry comprising the index field and the progress field; the depth of the score table is larger than that of the result merging cache.
- 19. The method of claim 18, wherein said matching or assigning a corresponding memory entry based on the destination register identification carried thereby comprises: Matching a storage item corresponding to the target register identifier from the result merging cache; If the micro-operation result is not hit, a storage item is allocated for the micro-operation result carrying the target register identification.
- 20. The method of claim 19, wherein each first type of entry further comprises an liveness indication field for indicating how liveness the corresponding first type of entry was accessed; The allocation of a memory entry for the micro-operation result carrying the target register identification includes: When the number of first type entries in the result merging cache reaches an upper limit and the first type entries corresponding to the target register identification do not exist in the result merging cache, selecting the first type entries according to the liveness indication domain; writing the intermediate result stored in the selected first type of entry into a corresponding target register in the physical register file; the selected first type of entry is emptied as one first type of entry allocated for the micro-operation result carrying the destination register identification.
Description
Micro-operation result merging and buffering device, processor, chip, equipment and method Technical Field The present application relates to the field of processor technologies, and in particular, to a micro-operation result merging and buffering device, a processor, a chip, a device, and a method. Background With the wide application of open instruction set architectures such as RISC-V (Reduced Instruction Set Computer V), vector expansion (Vector Extension) provides a powerful parallel processing capability for the fields of high-performance computing, artificial intelligence and the like with high flexibility and configurability. The RISC-V vector architecture allows software to dynamically configure the width and number of vector elements, which facilitates software development while also presenting a significant challenge to the microarchitectural design of processors, particularly to the hardware implementation of key components such as register files. In modern high performance out-of-order processors, the physical register file (PHYSICAL REGISTER FILE, PRF) is one of the key components for area and power consumption. To improve vector processing performance, the register width is typically designed to be 16 bytes, 32 bytes, 64 bytes, or even larger. However, RISC-V vector instructions (e.g., discretely addressed strided/indexed load instructions) support byte-granularity element access, one instruction may generate multiple small-granularity write requests for different locations of the same target register. If such micro-operation results are directly written into the PRF, the PRF is required to support a complex and costly small granularity write circuit, significantly increasing chip area, routing difficulty, and timing convergence challenges. In order to relieve the direct pressure on the PRF, two main schemes exist in the prior art, namely, firstly, the operation result of the micro-operation separated from the vector instruction is directly written back to the PRF with small granularity, but the operation result causes the problems of hardware complexity and area, and secondly, a special merging buffer unit is pre-allocated in the instruction transmitting stage, and the merging and the writing back are performed after all related micro-operations are completed. The latter reduces the writing pressure of PRF, but because of the static binding of limited buffer resource in the transmitting stage, the older instruction of instruction sequence is blocked easily because of out-of-order distribution, and the life cycle of buffer entry is too long, seriously damaging the out-of-order execution efficiency and the overall performance of the processor. Therefore, how to support the overall write-back of the complex micro-operation result, avoid the excessive overhead of hardware area and time sequence of the processor, and ensure that the out-of-order execution capability of the processor is not damaged is a technical problem to be solved in the art. Disclosure of Invention The embodiment of the application aims to provide a micro-operation result merging buffer device, a processor, a chip, a device and a method, so that the hardware area and time sequence expenditure of the processor are reduced and the out-of-order execution capacity of the processor is maintained while the integral write-back of a complex micro-operation result is supported. In order to achieve the above object, in one aspect, an embodiment of the present application provides a micro-operation result merging buffer device, which is disposed on a write-back pipeline of a processor, and the device includes: The buffer management unit is used for storing dynamically allocable buffer entries, and each buffer entry comprises an index field, a result field, a progress field and a buffer management unit, wherein the index field is used for storing a target register identifier; And the write-back control unit is used for responding to the micro-operation result to enter a write-back pipeline, matching or distributing a corresponding storage item according to the carried target register mark, merging the micro-operation result into the result field of the cache item matched or distributed for the micro-operation result, updating the progress field of the corresponding cache item, and writing the complete result in the result field of the corresponding cache item into the corresponding target register in the physical register file when the progress field indicates that the intermediate result of the corresponding target register is completely collected. In the device of the embodiment of the application, the device is positioned between the memory management unit and the physical register file. In the apparatus of the embodiment of the present application, the buffer management unit includes two levels of buffering: The result merging cache is used as a first-level buffer and comprises a plurality of first-class entries, and each first-class