CN-114816323-B - Arithmetic unit, correlation apparatus and method

CN114816323BCN 114816323 BCN114816323 BCN 114816323BCN-114816323-B

Abstract

The present disclosure provides an arithmetic unit, a related apparatus and a method. The operation unit comprises a first format conversion unit and a second format conversion unit, wherein the first format conversion unit is used for converting data in a source data format into data in an intermediate data format according to a first rule, the second format conversion unit is used for converting the data in the intermediate data format into data in a target data format according to a second rule, the intermediate data format at least comprises all fields in the source data format and all fields in the target data format, and the number of bits in the fields in the intermediate data format is not smaller than the number of bits in the corresponding fields in any one of the source data format and the target data format. The embodiment of the disclosure improves the universality of hardware implementation of data type conversion and reduces the hardware resource consumption caused by the data type conversion.

Inventors

ZHOU JINYUAN
ZOU YUNXIAO
WANG ZIHAN

Assignees

阿里巴巴集团控股有限公司

Dates

Publication Date: 20260512
Application Date: 20210119

Claims (17)

1. An arithmetic unit applied as an instruction execution unit of a processing core in a processing unit or as a tensor engine in an acceleration unit core in an acceleration unit, wherein the arithmetic unit as an instruction execution unit is for receiving and processing an instruction transmitted by an instruction transmitting unit and decoded by an instruction decoding unit, and the arithmetic unit as a tensor engine is for processing convolution and matrix multiplication operations in a deep learning model, the arithmetic unit comprising: a first format conversion unit for converting the data in the source data format into an intermediate data format according to a first rule; a second format conversion unit for converting the data in the intermediate data format into data in a target data format according to a second rule, The intermediate data format at least comprises all fields of the source data format and all fields of the target data format, and the number of bits in the fields of the intermediate data format is not less than the number of bits in the corresponding fields of any one of the source data format and the target data format; the source data format is any one of a plurality of data types, and the target data format is any one of the plurality of data types except the source data format; the operation unit comprises a plurality of data paths, wherein the sum of related units used for executing operation processing of one data type is one data path, and the first format conversion unit and the second format conversion unit are arranged in one data path in the plurality of data paths and are used for multiplexing data format conversion of each data path; The arithmetic unit includes an integer data path for arithmetic processing of an integer, and a floating point data path for arithmetic processing of a floating point, and the first format conversion unit, the second format conversion unit are included in the integer data path or the floating point data path, and the integer data path includes a logical arithmetic unit that performs logical operation of an integer and an integer adder that performs addition processing of the integer.
2. The arithmetic unit of claim 1, wherein the intermediate data format comprises sign bits, exponent bits, integer bits, mantissa bits, the intermediate data format representing values of (-1)/(sign 2 (exponent-255)/(integer [0] +fraction 2-31-2 x integer [1 ]), wherein the sign represents an exponent, the sign represents a sign bit, the sign=0 represents a positive number, the sign=1 represents a negative number, the exponent exponent bit corresponds to a value, the integer [0] is the last bit of the integer bits, 0 represents a denormalized floating point number, 1 represents a normalized floating point number, fraction represents the mantissa bits, the integer [1] is the reciprocal 2 nd bit of the integer bits, 0 represents the mantissa is a positive number, and 1 represents the mantissa as a negative number.
3. The arithmetic unit of claim 2, wherein the exponent bits are 9 bits and the mantissa bits are 31 bits.
4. The arithmetic unit of claim 2, further comprising: A leading zero count unit for checking the number of consecutive 0s from the most significant bit in a sequence formed by the last bit of the integer bit and the tail bit; A left shift unit for shifting the number of consecutive 0s left out of the intermediate data format; and the index adjusting unit is used for adjusting the index bit to enable the numerical value corresponding to the index bit to be subtracted by the numerical value.
5. The arithmetic unit of claim 2, further comprising: the overflow detection unit is used for determining that the data in the intermediate data format overflows after being converted into the data in the target data format; And the overflow processing unit is used for carrying out preset processing on the overflowed data in the target data format.
6. The arithmetic unit of claim 5, wherein the overflow detection unit determines overflow by at least one of: if the target data format is a 32-bit tensor floating point number, and the value corresponding to the exponent bit is 383, determining that the overflow is upward; if the target data format is a 16-bit floating point number, the value corresponding to the exponent bit is greater than or equal to 271, then the overflow is determined, and if the value corresponding to the exponent bit is less than or equal to 240, then the overflow is determined; If the target data format is a 16-bit human brain floating point number, the numerical value corresponding to the index bit is more than or equal to 383, the upward overflow is determined, and if the numerical value corresponding to the index bit is less than or equal to 128, the downward overflow is determined; If the target data format is a 32-bit signed integer, the value corresponding to the exponent bit is greater than or equal to 286 and the sign bit is 0, then the overflow is determined upwards, and the value corresponding to the exponent bit is greater than or equal to 286 and the sign bit is1, then the overflow is determined downwards; If the target data format is a 32-bit unsigned integer, determining that the value corresponding to the exponent bit is greater than or equal to 287, and determining that the exponent bit is smaller than or equal to 254, and determining that the exponent bit is overflowed downwards; if the target data format is a 16-bit signed integer, the value corresponding to the exponent bit is more than or equal to 270 and the sign bit is 0, then the overflow is determined upwards, and the value corresponding to the exponent bit is more than or equal to 270 and the sign bit is1, then the overflow is determined downwards; If the target data format is a 16-bit unsigned integer, determining that the data overflows upwards if the value corresponding to the exponent bit is greater than or equal to 271, and determining that the data overflows downwards if the value corresponding to the exponent bit is less than or equal to 254; If the target data format is an 8-bit signed integer, the value corresponding to the exponent bit is more than or equal to 262 and the sign bit is 0, then the overflow is determined to be upward; If the target data format is an 8-bit unsigned integer, the value corresponding to the exponent bit is greater than or equal to 263, then the overflow is determined, and if the value corresponding to the exponent bit is less than or equal to 254, then the overflow is determined.
7. The arithmetic unit of claim 2, further comprising: A post-conversion denormalization determining unit configured to determine whether the intermediate data format data is denormalized data or an integer after being converted into the target data format data; And the right shifting unit is used for shifting the mantissa bits of the intermediate data format to the right according to the determination result of whether the data is denormalized data or an integer, so that the data which is shifted and converted into the target data format in the intermediate data format is normalized data.
8. The arithmetic unit of claim 7, further comprising: and the rounding unit is used for determining whether carry operation is carried out on the mantissa bits left in the intermediate data format or not based on the mantissa bits shifted out to the right by the right shifting unit.
9. The arithmetic unit of claim 7, wherein the right shift unit shifts mantissa bits of the intermediate data format to the right in accordance with at least one of: If the target data format is a 32-bit floating point number or a 32-bit tensor floating point number, the corresponding value of the exponent bit is T under the condition of determining the denormal data, and the intermediate data format is shifted to the right (137-T) bit; If the target data format is a 16-bit floating point number, the exponent bit corresponds to a value of T in the case of determining non-normalized data, shifting the intermediate data format to the right (262-T) bits; if the target data format is a 16-bit human brain floating point number, the corresponding numerical value of the exponent bit is T under the condition of determining the unnormalized data, and the intermediate data format is shifted to the right (153-T) bit; if the target data format is a 32-bit signed integer, 32-bit unsigned integer, 16-bit signed integer, 16-bit unsigned integer, 8-bit signed integer, or 8-bit unsigned integer, the exponent bit corresponds to a value of T, and the intermediate data format is shifted to the right (286-T) bits.
10. The arithmetic unit of claim 2, wherein the first rule comprises at least one of: Filling the sign bit of the 32-bit floating point or 32-bit tensor floating point into the sign bit of the intermediate data format, filling the last 7-bit exponent bits of the 32-bit floating point or 32-bit tensor floating point into the last 7-bit exponent bits of the intermediate data format, filling the first-bit exponent bits of the 32-bit floating point or 32-bit tensor floating point into the first-bit exponent bits of the intermediate data format, filling the opposite of the first-bit exponent bits of the 32-bit floating point or 32-bit tensor floating point into the second-bit exponent bits of the intermediate data format, filling the last one of the whole digits of the intermediate data format with 0 if the 32-bit floating point or 32-bit tensor floating point is a normalized floating point, filling the last one of the whole digits of the intermediate data format with 1 if the 32-bit floating point or 32-bit tensor floating point is a normalized floating point; Filling the sign bit of the 16-bit floating point number into the sign bit of the intermediate data format, filling the last 4-bit exponent bits of the 16-bit floating point number into the last 4-bit exponent bits of the intermediate data format, filling the first-bit exponent bits of the 16-bit floating point number into the first-bit exponent bits of the intermediate data format, filling the opposite digits of the first-bit exponent bits of the 16-bit floating point number into the reciprocal 5-8 exponent bits of the intermediate data format, filling the last digit of the integer bits of the intermediate data format into 0 if the 16-bit floating point number is a non-normalized floating point number, filling the last digit of the integer bits of the intermediate data format into 1 if the 16-bit floating point number is a normalized floating point number, sequentially placing the last digits of the 16-bit floating point number into the last digits of the intermediate data format from the first digits of the intermediate data format, and placing the remaining digits of the intermediate data format into 0; If the source data format is a 16-bit human brain floating point number, filling the sign bit of the 16-bit human brain floating point number into the sign bit of the intermediate data format, filling the last 7-bit exponent bits of the 16-bit human brain floating point number into the last 7-bit exponent bits of the intermediate data format, filling the first exponent bits of the 16-bit human brain floating point number into the first exponent bits of the intermediate data format, filling the opposite exponent bits of the first exponent bits of the 16-bit human brain floating point number into the second exponent bits of the intermediate data format, filling the last exponent bits of the integer bits of the intermediate data format into 0 if the 16-bit human brain floating point number is a non-normalized floating point number, filling the last exponent bits of the integer bits of the intermediate data format into 1 if the 16-bit human brain floating point number is a normalized floating point number, sequentially placing the mantissa bits of the 16-bit human brain floating point number from the first exponent bits of the intermediate data format, and placing the remaining mantissa bits of the intermediate data format into 0; if the source data format is a 32-bit signed integer, 32-bit unsigned integer, 16-bit signed integer, 16-bit unsigned integer, 8-bit signed integer, or 8-bit unsigned integer, converting the source data format to a 33-bit signed integer, setting the sign bit of the intermediate data format to 0, setting the exponent bit of the intermediate data format to a binary value corresponding to 286, and populating the integer and mantissa bits of the intermediate data format with the 33-bit signed integer.
11. The arithmetic unit of claim 10, wherein the converting the source data format to a 33-bit signed integer if the source data format is a 32-bit signed integer, a 32-bit unsigned integer, a 16-bit signed integer, a 16-bit unsigned integer, an 8-bit signed integer, or an 8-bit unsigned integer comprises: Adding 1 bit, 17 bits and 25 bits of 0 to the back of the 32 bit signed integer, the 16 bit signed integer or the 8 bit signed integer respectively to form a 33 bit signed integer; Adding 1 bit 0 to the front of the 32 bit unsigned integer to form a 33 bit signed integer; And adding 1 bit 0 to the front of the 16-bit unsigned integer or the 8-bit unsigned integer, and adding 16 bits and 24 bits 0 to the back of the 16-bit unsigned integer respectively to obtain a 33-bit signed integer.
12. The arithmetic unit of claim 2, wherein the second rule comprises at least one of: If the target data format is a 32-bit floating point number or a 32-bit tensor floating point number, filling sign bits of the intermediate data format into sign bits of the 32-bit floating point number or 32-bit tensor floating point number, filling last 7 exponent bits of the intermediate data format into last 7 exponent bits of the 32-bit floating point number or 32-bit tensor floating point number, filling first exponent bits of the intermediate data format into first exponent bits of the 32-bit floating point number or 32-bit tensor floating point number; If the target data format is a 16-bit floating point number, filling sign bits of the intermediate data format into sign bits of the 16-bit floating point number, filling last 4-bit exponent bits of the intermediate data format into last 4-bit exponent bits of the 16-bit floating point number, and filling first exponent bits of the intermediate data format into first exponent bits of the 16-bit floating point number; if the target data format is a 16-bit human brain floating point number, filling sign bits of the intermediate data format into sign bits of the 16-bit human brain floating point number, filling last 7-bit exponent bits of the intermediate data format into last 7-bit exponent bits of the 16-bit human brain floating point number, filling first exponent bits of the intermediate data format into first exponent bits of the 16-bit human brain floating point number; If the target data format is a 32-bit signed integer, a 16-bit signed integer, or an 8-bit signed integer, concatenating the whole digits of the intermediate data format with mantissa bits respectively excluding the preceding 1-bit, 17-bit, 25-bit, as the 32-bit signed integer, 16-bit signed integer, or 8-bit signed integer, respectively; If the target data format is a 32-bit unsigned integer, taking out the integer bit and the mantissa bit of the intermediate data format, and removing the first bit to be used as the 32-bit unsigned integer; If the target data format is a 16-bit unsigned integer, continuing the whole digits of the intermediate data format with the first digits removed and the mantissa digits of the first 16 digits removed as the 16-bit unsigned integer; If the target data format is an 8-bit unsigned integer, concatenating the whole digits of the intermediate data format with the first 24 digits removed, as the 8-bit unsigned integer.
13. A processing unit, comprising: an instruction execution unit as an arithmetic unit according to any one of claims 1 to 12; And a register, wherein the data in the source data format is read out from the register by the operation unit, and the data in the target data format is written into the register by the operation unit.
14. An acceleration unit comprising: tensor engine as an arithmetic unit according to any one of claims 1-12; An on-chip memory, wherein the data in the source data format is read from the on-chip memory by the tensor engine, and writing the data in the target data format into the on-chip memory by the tensor engine.
15. A computing device comprising a processing unit according to claim 13 or an acceleration unit according to claim 14.
16. A system on a chip comprising a processing unit according to claim 13 or an acceleration unit according to claim 14.
17. A data center comprising the computing device of claim 15.

Description

Arithmetic unit, correlation apparatus and method Technical Field The present disclosure relates to the field of chips, and more particularly, to an arithmetic unit, a related apparatus, and a method. Background Data processing often involves operations of a number of different data types, such as integer, floating point numbers, etc. Currently, in order to improve the operation efficiency in data processing, operations of a plurality of different data types are supported on hardware through chip design. The data types include, but are not limited to, 32-bit floating point number (F32), 32-bit tensor floating point number (TF 32), 16-bit floating point number (F16), 16-bit human brain floating point number (Bf 16), 32-bit signed integer (S32), 32-bit unsigned integer (U32), 16-bit signed integer (S16), 16-bit unsigned integer (U16), 8-bit signed integer (S8), 8-bit unsigned integer (U8). Conversion of different data types is indispensable. In the prior art, the hardware of the conversion, i.e. the specific data path, is generally set specifically for each two different data types according to the conversion method between the two different types. Any two different data types are combined with each other, and there are many combinations and data paths. This scheme is poor in versatility, requires many data paths, and cannot be multiplexed with existing data paths. Disclosure of Invention In view of this, the present disclosure aims to improve the versatility of hardware implementation of data type conversion, and reduce the hardware resource consumption caused by data type conversion. According to an aspect of the present disclosure, there is provided an arithmetic unit including: a first format conversion unit for converting the data in the source data format into an intermediate data format according to a first rule; And the second format conversion unit is used for converting the data in the intermediate data format into the data in the target data format according to a second rule, wherein the intermediate data format at least comprises all fields in the source data format and all fields in the target data format, and the number of bits in the fields in the intermediate data format is not less than the number of bits in the corresponding fields in any one of the source data format and the target data format. Optionally, the intermediate data format includes sign bits, exponent bits, integer bits, and mantissa bits, the intermediate data format represents values of (-1)/(sign 2 (exponent-255)) (integer [0] +fraction 2-31-2 integer [1 ]), where sign represents an exponent, sign=0 represents a positive number, sign=1 represents a negative number, exponent represents a value corresponding to the sign bit, integer [0] is the last bit of the integer bit, 0 represents a denormalized floating point number, 1 represents a normalized floating point number, fraction represents the mantissa bit, integer [1] is the reciprocal 2 nd bit of the integer bit, 0 represents the mantissa as a positive number, and 1 represents the mantissa as a negative number. Optionally, the exponent bits are 9 bits and the mantissa bits are 31 bits. Optionally, the operation unit further includes: A leading zero count unit for checking the number of consecutive 0s from the most significant bit in a sequence formed by the last bit of the integer bit and the tail bit; A left shift unit for shifting the number of consecutive 0s left out of the intermediate data format; and the index adjusting unit is used for adjusting the index bit to enable the numerical value corresponding to the index bit to be subtracted by the numerical value. Optionally, the operation unit further includes: the overflow detection unit is used for determining that the data in the intermediate data format overflows after being converted into the data in the target data format; And the overflow processing unit is used for carrying out preset processing on the overflowed data in the target data format. Optionally, the overflow detection unit determines the overflow by at least one of: if the target data format is a 32-bit tensor floating point number, and the value corresponding to the exponent bit is 383, determining that the overflow is upward; if the target data format is a 16-bit floating point number, the value corresponding to the exponent bit is greater than or equal to 271, then the overflow is determined, and if the value corresponding to the exponent bit is less than or equal to 240, then the overflow is determined; If the target data format is a 16-bit human brain floating point number, the numerical value corresponding to the index bit is more than or equal to 383, the upward overflow is determined, and if the numerical value corresponding to the index bit is less than or equal to 128, the downward overflow is determined; If the target data format is a 32-bit signed integer, the value corresponding to the exponent bit is greater than or equal to 286 and the sig