US-20260126998-A1 - VECTOR PROCESSING CIRCUIT AND VECTOR PROCESSING METHOD WITH REUSED CALCULATION CIRCUIT
Abstract
The present disclosure provides a vector processing circuit and a vector processing method. The vector processing circuit includes an instruction queue, multiple calculation circuits, and a control circuit. The instruction queue includes a first reduction instruction and a second reduction instruction. The calculation circuits have multiple pipeline stages. The control circuit is electrically connected to the instruction queue and the calculation circuits. The calculation circuits alternatively generates results of the first reduction instruction and the second reduction instruction over multiple clocks.
Inventors
- Zhong-Ho Chen
- Jui Chieh Liao
- Shu-Yu Chang
Assignees
- ANDES TECHNOLOGY CORPORATION
Dates
- Publication Date
- 20260507
- Application Date
- 20241107
Claims (12)
- 1 . A vector processing circuit, comprising: an instruction queue, wherein the instruction queue comprises a first reduction instruction and a second reduction instruction, a plurality of calculation circuits, wherein the calculation circuits have a plurality of pipeline stages, the plurality of the calculation circuits are identical to each other, and the plurality of the calculation circuits comprises a first calculation circuit and a second calculation circuit; and a control circuit, comprising a source operand and a selection circuit, wherein the source operand is electrically connected to the second calculation circuit, and the selection circuit is electrically connected to the source operand, the first calculation circuit and the second calculation circuit, wherein the plurality of calculation circuits alternatively generates results of the first reduction instruction and the second reduction instruction over a plurality of clocks, wherein in a first iteration, the selection circuit transmits a first element of the source operand to the first calculation circuit and the second calculation circuit receives a second element of the operand, wherein in a second iteration after the first iteration, the selection circuit transmits a temporary result generated by the second calculation circuit to the first calculation circuit, and the selection circuit transmits a temporary result generated by the first calculation circuit back to the first calculation circuit, wherein the results of the first reduction instruction and the second reduction instruction are generated by the first calculation circuit.
- 2 . The vector processing circuit according to claim 1 , wherein the plurality of calculation circuits sequentially generates temporary results of the first reduction instruction, temporary results of the second reduction instruction, a final result of the first reduction instruction, and a final result of the second reduction instruction.
- 3 . (canceled)
- 4 . (canceled)
- 5 . The vector processing circuit according to claim 1 , wherein the selection circuit comprises a multiplex, wherein inputs of the multiplex are connected to the source operand and the second calculation circuit, wherein an output of the multiplex is connected to the first calculation circuit.
- 6 . The vector processing circuit according to claim 1 , wherein the first and the second reduction instructions are floating-point reduction instructions, wherein the pipeline stages comprise a shift stage.
- 7 . The vector processing circuit according to claim 1 , wherein the first and the second reduction instructions are floating point reduction sum instructions, wherein the pipeline stages comprise a normalization stage.
- 8 . A vector processing method performed by a vector processing circuit, the vector processing method comprising: storing a first reduction instruction and a second reduction instruction in an instruction queue; and alternatively generating, by a plurality of calculation circuits, results of the first reduction instruction and the second reduction instruction over a plurality of clocks, wherein the calculation circuits have a plurality of pipeline stages, the plurality of the calculation circuits are identical to each other, and the plurality of the calculation circuits comprises a first calculation circuit and a second calculation circuit, and the step of alternatively generating the results of the first reduction instruction and the second reduction instruction comprises: in a first iteration, transmitting a first element of a source operand to the first calculation circuit and the second calculation circuit receives a second element of the operand; and in a second iteration after the first iteration, transmitting a temporary result generated by the second calculation circuit to the first calculation circuit, and transmitting a temporary result generated by the first calculation circuit back to the first calculation circuit, wherein the results of the first reduction instruction and the second reduction instruction are generated by the first calculation circuit.
- 9 . The vector processing method according to claim 8 , wherein the step of alternatively generating the results of the first reduction instruction and the second reduction instruction comprises: sequentially generating temporary results of the first reduction instruction, temporary results of the second reduction instruction, a final result of the first reduction instruction, and a final result of the second reduction instruction.
- 10 . (canceled)
- 11 . The vector processing method according to claim 8 , wherein the first and the second reduction instructions are floating-point reduction instructions, wherein the pipeline stages comprise a shift stage.
- 12 . The vector processing method according to claim 8 , wherein the first and the second reduction instructions are floating point reduction sum instructions, wherein the pipeline stages comprise a normalization stage.
Description
BACKGROUND Technical Field This disclosure relates to a vector processing circuit and a vector processing method, particularly to a circuit and a method for performing reduction operations on vectors. Description of Related Art In vector processing, reduction operation is a frequently used operation that reduces multiple elements in a vector to a single result through specific calculations (such as addition, multiplication, logical operations, etc.). However, the execution sequence of the reduction operation has a significant impact on the final result, especially in floating-point calculations, as different calculation sequences may lead to precision loss or result discrepancies. This sequence dependency poses challenges for parallel processing in multi-core or multi-threaded environments, as different threads may access and process data in different orders. Since reduction operations often appear in fields such as scientific computing, machine learning, and signal processing, accelerating these operations to improve overall system performance has become an important issue. SUMMARY This disclosure proposes a vector processing circuit and a vector processing method that execute reduction instructions in an interleaved manner. Embodiments of the present disclosure provide a vector processing circuit including an instruction queue, multiple calculation circuits, and a control circuit. The instruction queue includes a first reduction instruction and a second reduction instruction. The calculation circuits have multiple pipeline stages. The control circuit is electrically connected to the instruction queue and the calculation circuits. The calculation circuits alternatively generates results of the first reduction instruction and the second reduction instruction over multiple clocks. In some embodiments, the calculation circuit sequentially generates temporary results of the first reduction instruction, temporary results of the second reduction instruction, a final result of the first reduction instruction, and a final result of the second reduction instruction. In some embodiments, the calculation circuits includes a first calculation circuit and a second calculation circuit. The first calculation circuit generates a temporal result and the final result of the first and the second reductions instructions. The second calculation circuit generates a temporal result of the first and the second reductions instructions. In some embodiments, the control circuit includes: a source operand electrically connected to the second calculation circuit; and a selection circuit electrically connected to the source operand, the first calculation circuit and the second calculation circuit. In some embodiments, the selection circuit includes a multiplex. An input of the multiplex is connected to the source operand and the second calculation circuit. An output of the multiplex is connected to the first calculation circuit. In some embodiments, the first and the second reduction instructions are floating-point reduction instructions. The pipeline stages include a shift stage. In some embodiments, the first and the second reduction instructions are floating point reduction sum instructions. The pipeline stages include a normalization stage. From another aspect, embodiments of the present disclosure provide a vector processing method performed by a vector processing circuit. The vector processing method including: storing a first reduction instruction and a second reduction instruction in an instruction queue; and alternatively generating, by multiple calculation circuits, results of the first reduction instruction and the second reduction instruction over multiple clocks, wherein the calculation circuits have multiple pipeline stages. In some embodiments, the step of alternatively generating the results of the first reduction instruction and the second reduction instruction includes: sequentially generating temporary results of the first reduction instruction, temporary results of the second reduction instruction, a final result of the first reduction instruction, and a final result of the second reduction instruction. In some embodiments, the calculation circuits includes a first calculation circuit and a second calculation circuit. The vector processing method includes: generating, by the first calculation circuit, a temporal result and the final result of the first and the second reductions instructions; and generating, by the second calculation circuit, a temporal result of the first and the second reductions instructions. To make the aforementioned features and advantages of this disclosure more evident and understandable, examples are provided below with detailed explanations in conjunction with the accompanying FIG.s. BRIEF DESCRIPTION OF THE DRAWINGS FIG. 1 is a partial block diagram illustrating an electronic device according to an embodiment. FIG. 2 is a block diagram illustrating a core according to an embodiment. FIG. 3