Search

CN-121996888-A - Vector processing circuit and vector processing method

CN121996888ACN 121996888 ACN121996888 ACN 121996888ACN-121996888-A

Abstract

The present disclosure provides a vector processing circuit and a vector processing method. The vector processing circuit includes an instruction queue, a plurality of computation circuits, and a control circuit. The instruction queue includes a first reduce instruction and a second reduce instruction. The computation circuit has a plurality of pipeline stages. The control circuit is electrically connected to the instruction queue and the calculation circuit. The calculation circuit alternately generates results of the first reduction instruction and the second reduction instruction in a plurality of clocks. Thus, overall throughput may be increased.

Inventors

  • CHEN ZHONGHE
  • LIAO RUIJIE
  • ZHANG SHUYU

Assignees

  • 晶心科技股份有限公司

Dates

Publication Date
20260508
Application Date
20241126
Priority Date
20241107

Claims (12)

  1. 1. A vector processing circuit, comprising: An instruction queue, wherein the instruction queue includes a first reduce instruction and a second reduce instruction, A plurality of computation circuits, wherein the plurality of computation circuits have a plurality of pipeline stages, and A control circuit, wherein the control circuit is electrically connected to the instruction queue and the plurality of computation circuits, Wherein the plurality of computing circuits alternately generate results of the first reduction instruction and the second reduction instruction at a plurality of clocks.
  2. 2. The vector processing circuit of claim 1, wherein the plurality of computing circuits sequentially produce a temporary result of the first reduction instruction, a temporary result of the second reduction instruction, a final result of the first reduction instruction, and a final result of the second reduction instruction.
  3. 3. The vector processing circuit of claim 2, wherein the plurality of computing circuits comprises a first computing circuit and a second computing circuit, Wherein the first calculation circuit generates temporary results and the final results of the first reduction instruction and the second reduction instruction, Wherein the second computing circuit generates temporary results of the first reduction instruction and the second reduction instruction.
  4. 4. A vector processing circuit according to claim 3, wherein the control circuit comprises: a source operand electrically connected to the second computing circuit, and And the selection circuit is electrically connected to the source operand, the first calculation circuit and the second calculation circuit.
  5. 5. The vector processing circuit of claim 4, wherein the selection circuit comprises a multiplexer, Wherein a plurality of inputs of the multiplexer are connected to the source operand and the second calculation circuit, Wherein an output of the multiplexer is connected to the first calculation circuit.
  6. 6. The vector processing circuit of claim 1, wherein the first reduction instruction and the second reduction instruction are floating point reduction instructions, Wherein the plurality of pipeline stages includes a shift stage.
  7. 7. The vector processing circuit of claim 1, wherein the first reduction instruction and the second reduction instruction are floating point reduction sum instructions, Wherein the plurality of pipeline stages includes a normalization stage.
  8. 8. A vector processing method, performed by a vector processing circuit, the vector processing method comprising: Storing the first reduced instruction and the second reduced instruction in an instruction queue, and Results of the first reduce instruction and the second reduce instruction are alternately generated in a plurality of clocks by a plurality of compute circuits, wherein the plurality of compute circuits have a plurality of pipeline stages.
  9. 9. The vector processing method of claim 8, wherein alternately generating results of the first reduction instruction and the second reduction instruction comprises: Sequentially generating a temporary result of the first reduction instruction, a temporary result of the second reduction instruction, a final result of the first reduction instruction, and a final result of the second reduction instruction.
  10. 10. The vector processing method according to claim 9, wherein the plurality of calculation circuits includes a first calculation circuit and a second calculation circuit, and the vector processing method includes: generating temporary results and the final results of the first reduction instruction and the second reduction instruction by the first computing circuit, and A temporary result of the first reduction instruction and the second reduction instruction is generated by the second computing circuit.
  11. 11. The vector processing method of claim 8, wherein the first reduction instruction and the second reduction instruction are floating point reduction instructions, Wherein the pipeline stage comprises a shift stage.
  12. 12. The vector processing method of claim 8, wherein the first reduction instruction and the second reduction instruction are floating point reduction sum instructions, Wherein the plurality of pipeline stages includes a normalization stage.

Description

Vector processing circuit and vector processing method Technical Field The present disclosure relates to a vector processing circuit and a vector processing method, and more particularly, to a circuit and a method for performing a reduction operation on a vector. Background In vector processing, a reduction operation (reduction operation) is a frequently used operation that reduces multiple elements in a vector to a single result by a particular calculation (e.g., addition, multiplication, logical operation, etc.). However, the order of execution of the reduction operations has a significant impact on the final result, especially in floating point calculations, where different calculation orders may lead to loss of precision or result differences. This sequential dependency presents challenges for parallel processing in a multi-core or multi-threaded environment, as different threads may access and process data in different orders. Since reduction operations are commonly found in the fields of scientific computing, machine learning, signal processing, etc., it has become an important issue to accelerate these operations to improve overall system performance. Disclosure of Invention The present disclosure proposes a vector processing circuit and vector processing method that execute reduce instructions in an interleaved manner. Embodiments of the present disclosure provide a vector processing circuit including an instruction queue, a plurality of computation circuits, and a control circuit. The instruction queue includes a first reduce instruction and a second reduce instruction. The computation circuit has a plurality of pipeline stages. The control circuit is electrically connected to the instruction queue and the calculation circuit. The calculation circuit alternately generates results of the first reduction instruction and the second reduction instruction in a plurality of clocks. In some embodiments, the computing circuit sequentially generates a temporary result of the first reduction instruction, a temporary result of the second reduction instruction, a final result of the first reduction instruction, and a final result of the second reduction instruction. In some embodiments, the computing circuit includes a first computing circuit and a second computing circuit. The first calculation circuit generates temporary results and final results of the first reduction instruction and the second reduction instruction. The second calculation circuit generates temporary results of the first reduction instruction and the second reduction instruction. In some embodiments, the control circuit includes a source operand electrically connected to the second calculation circuit, and a selection circuit electrically connected to the source operand, the first calculation circuit, and the second calculation circuit. In some embodiments, the selection circuit includes a multiplexer. The input of this multiplexer is connected to the source operand and to the second calculation circuit. The output of the multiplexer is connected to the first calculation circuit. In some embodiments, the first reduction instruction and the second reduction instruction are floating point reduction instructions. The pipeline stage described above includes a shift stage. In some embodiments, the first reduction instruction and the second reduction instruction are floating point reduction summation instructions. The pipeline stage includes a normalization stage. Viewed from another aspect, embodiments of the present disclosure provide a vector processing method performed by a vector processing circuit. The vector processing method includes storing a first reduced instruction and a second reduced instruction in an instruction queue, and alternately generating results of the first reduced instruction and the second reduced instruction in a plurality of clocks by a plurality of computing circuits, wherein the computing circuits have a plurality of pipeline stages. In some embodiments, the step of alternately generating the first reduction instruction and the second reduction instruction result includes sequentially generating a temporary result of the first reduction instruction, a temporary result of the second reduction instruction, a final result of the first reduction instruction, and a final result of the second reduction instruction. In some embodiments, the computing circuit includes a first computing circuit and a second computing circuit. The vector processing method includes generating, by a first computing circuit, a temporary result and a final result of a first reduction instruction and a second reduction instruction, and generating, by a second computing circuit, the temporary results of the first and second reduction instructions. In order that the above-recited features and advantages of the present disclosure will be readily apparent and readily understood, a more particular description of the invention will be rendered by referen