CN-121364892-B - Vector data processor, instruction processing method and system on chip

CN121364892BCN 121364892 BCN121364892 BCN 121364892BCN-121364892-B

Abstract

The embodiment of the invention provides a vector data processor, an instruction processing method and a system on chip, which are used for determining an instruction calculation mode of a vector instruction according to vector mode configuration information after acquiring the vector instruction needing vector operation, wherein the vector configuration information comprises configuration information of a data path executed by a current vector through data block size adjustment. And determining one or more execution pipelines in the N execution pipelines for the vector instruction according to the instruction calculation mode, wherein the execution pipelines in the vector floating point execution unit respectively execute vector operation corresponding to the vector instruction. The scheme provided by the invention supports a mixed width calculation mode, so that the vector data processor has the capability of adjusting the data path mode executed by the current vector through the size of the data block, the throughput of processing small blocks of data is improved, and the performance of the large-bit-width vector register architecture processor when processing software loads with the characteristics of the mixed size data block is improved.

Inventors

LIU CHANG
YU JINGCHAO

Assignees

知合行一技术(上海)有限公司

Dates

Publication Date: 20260508
Application Date: 20251222

Claims (14)

1. An instruction processing method applied to a vector data processor, wherein the vector data processor comprises a vector register, a vector floating point unit, an instruction distributing unit and an emission queue unit, the vector register has a first bit width, N execution pipelines are stored in the vector floating point unit, the maximum data bit width of each execution pipeline supporting operation is a second bit width, the first bit width is larger than the second bit width, the sum of N second bit widths is larger than the first bit width, N is a multiple of 2, and the method comprises: Acquiring a vector instruction needing vector operation; Determining an instruction calculation mode of the vector instruction according to vector mode configuration information, wherein the vector mode configuration information comprises configuration information of a data path executed by a current vector through data block size adjustment, the vector mode configuration information comprises vector element width, vector length and predicate processing mode, and a rule for configuring the instruction processing mode according to information comprising the vector element width, the vector length and the predicate processing mode; determining one or more execution pipelines of N execution pipelines for the vector instruction according to the instruction calculation mode; And the execution pipelines in the vector floating point unit respectively execute vector operations corresponding to the vector instructions.
2. The method of claim 1, wherein determining an instruction calculation mode of the vector instruction based on vector mode configuration information, comprises, prior to: In a configuration information register, the vector mode configuration information is read.
3. The method of claim 1 or 2, wherein determining the instruction calculation mode of the vector instruction based on vector mode configuration information comprises: and marking the vector instruction which needs vector operation according to the vector mode configuration information, and representing the instruction calculation mode through the label.
4. The method of claim 3, wherein determining one or more of the N execution pipelines for the vector instruction based on the instruction calculation mode comprises: Distributing a transmitting queue for the vector instruction according to the instruction calculation mode; And according to the instruction calculation mode corresponding to each emission queue, emitting the vector instruction in the emission queue to one or more execution pipelines in the N execution pipelines in the vector floating point unit.
5. The method of claim 4, wherein allocating an issue queue for the vector instruction according to the instruction calculation mode comprises: according to the instruction calculation mode, a label is distributed to the vector instruction; The transmit queue unit identifying the tag; and determining the transmitting mode and the executing mode of the vector instruction according to the label indication.
6. The method of claim 5, wherein the launching the vector instruction in each of the launch queues to one or more of the N execution pipelines in the vector floating point unit according to the instruction calculation mode corresponding to the launch queue comprises: And according to the tag indication, the vector instruction in the emission queue is emitted to one pipeline or a plurality of execution pipelines in the N execution pipelines in the vector floating point unit.
7. The method of claim 1, wherein the obtaining the vector instruction requiring vector operations further comprises decoding the vector instruction requiring vector operations; The labeling the vector instruction needing vector operation according to the vector mode configuration information comprises the following steps: and when the vector instruction needing vector operation is decoded, marking the vector instruction needing vector operation.
8. The method of claim 3, wherein determining one or more of the N execution pipelines for the vector instruction based on the instruction calculation mode comprises: After the vector instruction enters a transmitting queue, marking the vector instruction needing vector operation according to vector mode configuration information; and selecting a corresponding execution pipeline to execute according to the tag.
9. The method of claim 1, wherein the execution pipelines in the vector floating point unit each perform a vector operation corresponding to the vector instruction, further comprising: The vector floating point execution unit executes a vector operation instruction, and sends the vector instruction to a corresponding operator engine, and the operator engine executes corresponding vector operation on the vector instruction; the operator engine writes the vector operation result of the vector instruction back to the vector register.
10. The method of claim 9, wherein operators in the execution pipeline include a symmetric operator that performs symmetric operations and an asymmetric operator that performs asymmetric operations.
11. The method of claim 10, wherein when the symmetric operator performs vector operation with a data bit width of a first bit width, the high-order data and the low-order data in the first bit width complete identical operation processes, and the two operation processes are not dependent on each other, wherein the high-order data refers to the second high-order data in the data with the first bit width, and the low-order data refers to the second low-order data in the data with the first bit width; When the asymmetric operation operator processes vector operation of the first bit width, the operation process of high-order data and low-order data in the first bit width bit is different, the two operations are mutually dependent, and the operation result of low-order bit source data depends on the operation of high-order bit data.
12. The method of claim 1, wherein determining the instruction calculation mode of the vector instruction based on vector mode configuration information including configuration information of a datapath executed by a current vector through data block size, further comprises: And determining an instruction calculation mode of the vector instruction according to at least one of vector length configuration information and predicate processing mode in the vector mode configuration information.
13. A vector data processor, comprising a vector register, a vector floating point unit, an instruction dispatch unit, and an issue queue unit, the vector register having a first bit width, N execution pipelines stored in the vector floating point unit, each of the execution pipelines having a maximum data bit width that supports an operation that is a second bit width, the first bit width being greater than the second bit width, a sum of N of the second bit widths being greater than the first bit width, wherein N is a multiple of 2, wherein the vector data processor further comprises: the instruction fetching unit is used for obtaining a vector instruction needing vector operation; A mode configuration unit, configured to determine an instruction calculation mode of the vector instruction according to vector mode configuration information, where the vector mode configuration information includes configuration information for adjusting a data path executed by a current vector through a data block size, and the vector mode configuration information includes a vector element width, a vector length, and a predicate processing mode, and a rule for configuring the instruction processing mode according to information including the vector element width, the vector length, and the predicate processing mode; An instruction distribution unit, configured to determine, for the vector instruction, one pipeline or a plurality of execution pipelines of N execution pipelines according to the instruction calculation mode; And the vector floating point unit is used for controlling the execution pipeline to respectively execute vector operations corresponding to the vector instructions.
14. A system on a chip, comprising: A control unit and a plurality of on-chip components including a vector data processor; The control unit is configured to control and manage the plurality of on-chip components, and the vector data processor is configured to execute a computer program/instruction that, when executed by the vector data processor, implements the method of any of claims 1-12.

Description

Vector data processor, instruction processing method and system on chip Technical Field The present invention relates to the field of electronic information technologies, and in particular, to a vector data processor, an instruction processing method, and a system on a chip. Background With the development of semiconductor technology and processor architecture technology, vector data processors are widely used in different fields such as high-performance computing, artificial intelligence, edge computing and the like due to the characteristics of high performance and low power consumption. Vector data processors are also known as vector accelerators. Among application scenarios of vector data processors, a data center is a typical application scenario. The core tasks of the data center comprise AI reasoning and training, media processing and transcoding, encryption and decryption operations and the like, and related software loads have high data parallelism, namely, the same or highly similar operations need to be performed on a large amount of different data. The nature of the data center software load determines that the server-level high performance processor needs to have not only its ability to efficiently handle scalar operations, but also its ability to efficiently vector operations. In order to adapt to the demand of software load on parallel computing power, with the rapid development of artificial intelligence and large language models, the demand on parallel computing power of a server-level high-performance processor is increasing, which also provides greater challenges for processor design manufacturers to design high-performance vector operation architecture. Disclosure of Invention In view of the above, an embodiment of the present invention provides an instruction processing method to solve some or all of the above problems. According to a first aspect of an embodiment of the present invention, there is provided an instruction processing method applied to a vector data processor, the vector data processor including a vector register, a vector floating point unit, and a transmitting unit, the vector register having a first bit width, the vector floating point unit storing therein N execution pipelines, each of the execution pipelines supporting a maximum data bit width of an operation being a second bit width, the first bit width being greater than the second bit width, a sum of the N second bit widths being greater than the first bit width, wherein N is a multiple of 2, the method comprising: Acquiring a vector instruction needing vector operation; Determining an instruction calculation mode of the vector instruction according to vector mode configuration information, wherein the vector configuration information comprises configuration information of a data path executed by a current vector through data block size adjustment; Determining one or more execution pipelines of the N execution pipelines for the vector instruction according to the instruction calculation mode; and the execution pipelines in the vector floating point execution unit respectively execute vector operations corresponding to the vector instructions. According to a second aspect of an embodiment of the present invention, there is provided a vector data processor including a vector register, a vector floating point unit, and a transmitting unit, the vector register having a first bit width, the vector floating point unit storing therein N execution pipelines, each of the execution pipelines supporting a maximum data bit width of an operation being a second bit width, the first bit width being greater than the second bit width, a sum of the N second bit widths being greater than the first bit width, wherein N is a multiple of 2, the method comprising: the instruction fetching unit is used for obtaining a vector instruction needing vector operation; A mode configuration unit, configured to determine an instruction calculation mode of the vector instruction according to vector mode configuration information, where the vector mode configuration information includes configuration information of a data path executed by the current vector through data block size adjustment; an instruction distribution unit configured to determine, for the vector instruction, one or more execution pipelines of the N execution pipelines according to the instruction calculation mode; And the vector floating point unit is used for controlling the execution pipeline to respectively execute vector operations corresponding to the vector instructions. According to a third aspect of embodiments of the present invention, there is provided a system on a chip, the system on a chip comprising: A control unit and a plurality of on-chip components including a vector data processor; The control unit is configured to control and manage the plurality of on-chip components, and the vector data processor is configured to execute a computer program/instruction, which