CN-121979578-A - Access address configuration method, processor, multiprocessor system, medium and product

CN121979578ACN 121979578 ACN121979578 ACN 121979578ACN-121979578-A

Abstract

The invention discloses an access address configuration method, a processor, a multiprocessor system, a medium and a product, wherein the method comprises the steps of providing independent register sets for each thread bundle in the processor, wherein the register sets are used for storing coordinate information of tensor data used in the execution process of a corresponding thread bundle program, reading the coordinate information of the tensor data to be processed from the register sets, and sending the read coordinate information to a coprocessor to perform access address calculation of the corresponding tensor data.

Inventors

Request for anonymity
Request for anonymity

Assignees

上海壁仞科技股份有限公司

Dates

Publication Date: 20260505
Application Date: 20260108

Claims (10)

1. An access address configuration method, comprising: providing independent register sets for each thread bundle in a processor, wherein the register sets are used for storing coordinate information of tensor data used in the execution process of a corresponding thread bundle program; and reading the coordinate information of the tensor data to be processed from the register group, and sending the read coordinate information to a coprocessor for carrying out access address calculation of the corresponding tensor data.
2. The access address configuration method according to claim 1, wherein the register set includes a plurality of first registers, one of the first registers for storing a coordinate component of one dimension in the coordinate information.
3. The access address configuration method of claim 2, wherein the method further comprises: For each first register, under the condition that the coordinate component corresponding to the first register changes, the first register updates the locally stored coordinate component.
4. The method of claim 3, wherein the first register supports updating the locally stored coordinate component with at least one of an addition operation, a subtraction operation, a multiplication operation, a division operation, a shift operation, and a logic operation.
5. The access address configuration method of claim 2, wherein the bit width of the first register comprises 32 bits or 64 bits.
6. The access address configuration method according to claim 2, wherein reading the coordinate information from the register group includes: when the thread bundle receives a storage access instruction for indicating handling tensor data, the currently stored coordinate components are read from each first register in the register group corresponding to the thread bundle.
7. The processor is characterized by comprising an execution unit, a plurality of thread bundles and a plurality of register sets, wherein the register sets are configured in one-to-one correspondence with the thread bundles, and the register sets are used for storing coordinate information of tensor data used in the execution process of the corresponding thread bundles; The execution unit is configured to: and reading the coordinate information of the tensor data to be processed from the register group, and sending a storage access instruction carrying the read coordinate information to the coprocessor to perform access address calculation of the corresponding tensor data.
8. A multiprocessor system is characterized by comprising a processor and a coprocessor; the processor comprises an execution unit, a plurality of thread bundles and a plurality of register sets, wherein the register sets are configured in one-to-one correspondence with the thread bundles, and the register sets are used for storing coordinate information of tensor data used in the execution process of the corresponding thread bundles; The execution unit is configured to: reading the coordinate information of tensor data to be processed from the register group, and sending a storage access instruction carrying the read coordinate information to the coprocessor; The address calculation unit is configured to: and carrying out access address calculation of corresponding tensor data according to the coordinate information.
9. A computer readable storage medium, wherein the computer readable storage medium stores a computer program, and wherein the computer program when executed controls a device in which the computer readable storage medium is located to perform the method for configuring an access address according to any one of claims 1 to 6.
10. A computer program product comprising a computer program or instructions which, when executed by a processor, implements the access address configuration method of any one of claims 1 to 6.

Description

Access address configuration method, processor, multiprocessor system, medium and product Technical Field The present invention relates to the field of artificial intelligence chips, and in particular, to an access address configuration method, a processor, a multiprocessor system, a medium, and a product. Background In artificial intelligence chips, such as graphics processors (Graphics Processing Unit, GPU), the processed data often exists in the form of multi-dimensional arrays, e.g., planar graphics are typically two-dimensional arrays, each pixel being represented by (x, y) in a rectangular system, and three-dimensional graphics are typically three-dimensional arrays, each pixel being represented by (x, y, z) in a rectangular spatial system, and by (x, y, z, w) in a homogeneous three-dimensional system. The data processed by the artificial intelligence chip is primarily tensors (tensor), with 3D, 4D, 5D, or even more dimensions. When the data are processed (such as addition, subtraction, multiplication and division, convolution), firstly, the addresses where the data are located are calculated according to the information of dimensions, data types (such as 32 bits, 16 bits), data layout and the like, and then the data are read from an external memory to the inside through a storage access instruction, so that the calculation can be started. Currently, for computing data addresses, many coprocessors, such as a storage access unit, a tensor core, a texture unit, etc., are added with an address computing module, which is used for computing addresses where the data are located according to information such as dimensions, data types, data layout, etc., and then sending the addresses to the coprocessors, which often needs more instructions. In a massively parallel processor, since multiple thread bundles are continuously switching execution, even if a certain thread bundle requests data, only one dimension needs to be changed, such as increment in x direction, all dimension information needs to be updated. And a data request of the processor needs to transmit information of each dimension to the coprocessor, and the dimension information updating logic brings a great deal of redundant instruction overhead, so that the execution time of the processor is wasted. Disclosure of Invention Aiming at the problems existing in the prior art, the embodiment of the invention provides an access address configuration method, a processor, a multiprocessor system, a medium and a product, which can effectively reduce instruction overhead, save execution time of the processor and improve processing efficiency of the processor. In a first aspect, an embodiment of the present invention provides an access address configuration method, including: Providing independent register sets for each thread bundle in a processor, wherein the register sets are used for storing coordinate information of tensor data used in the execution process of the corresponding thread bundle; and reading the coordinate information of the tensor data to be processed from the register group, and sending the read coordinate information to a coprocessor for carrying out access address calculation of the corresponding tensor data. As an improvement of the above-described scheme, the register set includes a plurality of first registers, one of which is for storing a coordinate component of one dimension in the coordinate information. As an improvement of the above solution, the method further includes: For each first register, under the condition that the coordinate component corresponding to the first register changes, the first register updates the locally stored coordinate component. As an improvement of the above solution, the first register supports updating the locally stored coordinate component by at least one algorithm operation of addition operation, subtraction operation, multiplication operation, division operation, shift operation, and logic operation. As an improvement of the above solution, the bit width of the first register includes 32 bits or 64 bits or other bit widths. As an improvement of the above-described aspect, reading the coordinate information from the register set includes: when the thread bundle receives a storage access instruction for indicating handling tensor data, the currently stored coordinate components are read from each first register in the register group corresponding to the thread bundle. In a second aspect, an embodiment of the present invention provides a processor, including an execution unit, a plurality of thread bundles, and a plurality of register sets, where the register sets are configured in one-to-one correspondence with the thread bundles, and the register sets are configured to store coordinate information of tensor data used in an execution process of a corresponding thread bundle program; The execution unit is configured to: and reading the coordinate information of the tensor data to be processed from