EP-4141647-B1 - APPARATUS AND METHOD WITH MULTI-FORMAT DATA SUPPORT

EP4141647B1EP 4141647 B1EP4141647 B1EP 4141647B1EP-4141647-B1

Inventors

YU, HYEONGSEOK
KWON, DONGHYUK
KIM, CHANNOH
PARK, SEONGWOOK
CHO, YEONGON

Dates

Publication Date: 20260506
Application Date: 20220831

Claims (15)

An apparatus with multi-format data support including floating point data, the apparatus comprising: a receiver (110) configured to receive a plurality of data corresponding to a plurality of data formats; and one or more processors (130) configured to: multiply (1230) the plurality of data using one or more multipliers; perform (1250) a first alignment on a result of the multiplication based on an exponent value of the plurality of data; add (1270) a result of the first alignment and data from a register (1125); and perform (1290) a second alignment on a result of the addition based on the exponent value and an operation result of a previous cycle, wherein the one or more multipliers comprises four multipliers (211 to 214) each having 8x4-bit input and a single multiplier (215) having a 4x4-bit input for performing unsigned multiplication on 4-bit, 8-bit, 16-bit integers and mantissa of a 16-bit floating point.
The apparatus of claim 1, wherein, for the multiplying, the one or more processors (130) are configured to: multiply a first bit input and a second bit input included in the plurality of data; convert a sign of a result of the multiplication of the first bit input and the second bit input; and combine the result of the multiplication of the first bit input and the second bit input with the converted sign to generate the result of the multiplying of the plurality of data.
The apparatus of claim 1 or 2, wherein, for the multiplying, the one or more processors (130) are configured to multiply a plurality of first bit inputs of the plurality of data.
The apparatus of one of claims 1 to 3, wherein the one or more processors (130) are configured to: add exponent values of all input pairs of the multiplication (1230) of the plurality of data; obtain a maximum exponent value based on the exponent values; determine a sum of remaining exponent values; and determine a difference between the maximum exponent value and the sum.
The apparatus of one of claim 1 to 4, wherein, for the performing of the first alignment, the one or more processors (130) are configured to shift the result of the multiplication based on a difference between a maximum exponent value obtained based on the exponent value and a sum of remaining exponent values.
The apparatus of claim 2, wherein, for the performing of the second alignment, the one or more processors (130) are configured to shift the result of the addition based on a plurality of exponent values from which a maximum is obtained and the operation result of the previous cycle.
The apparatus of claim 6, wherein, for the shifting of the result of the addition, the one or more processors (130) are configured to shift the result of the addition based on a difference between the maximum exponent value and an exponent value stored according to the operation result of the previous cycle.
The apparatus of one of claims 1 to 7, wherein, for the performing of the second alignment, the one or more processors (130) are configured to: extend a sign bit of the plurality of data based on a predetermined radix point; and add the extended sign bit to the exponent value.
The apparatus of one of claims 1 to 8, wherein the one or more processors (130) are configured to accumulate a result of the second alignment.
The apparatus of claim 9, wherein the one or more processors (130) are configured to: remove one or more sign bits with a predetermined length from an output of a result of the accumulation; and perform normalization on the output in which the one or more sign bits are removed.
The apparatus of one of claims 1 to 10, wherein the one or more processors (130) comprises: one or more multipliers (211, 212, 213, 214, 215) configured to perform the multiplying of the plurality of data; a first aligner (133, 1123) configured to perform the first alignment on a result of the multiplication; an adder tree (135, 1127) configured to perform the adding of the result of the first alignment and data from the register (1125); and a second aligner (1137) configured to perform the second alignment on the result of the addition.
A processor-implemented method with multi-format data support including floating point data, the method comprising: multiplying (1230) a plurality of data corresponding to a plurality of data formats using one or more multipliers; performing (1250) a first alignment on a result of the multiplication based on a difference between a maximum exponent value among exponent values of the plurality of data and a sum of remaining exponent values; adding (1270) a result of the first alignment and data from a register (1125); and performing (1290) a second alignment on a result of the addition based on a difference between the maximum exponent value and an exponent value of an operation result of a previous cycle, wherein the one or more multipliers comprises four multipliers (211 to 214) each having 8x4-bit input and a single multiplier (215) having a 4x4-bit input for performing unsigned multiplication on 4-bit, 8-bit, 16-bit integers and mantissa of a 16-bit floating point.
The method of claim 12, wherein performing (1250) the first alignment comprises performing a right-shift and the second alignment comprises performing a left-shift, or further comprising adding a predetermined value to an exponent value of an output of a result of an accumulation of a result of the second alignment, and performing normalization on the output in which the sign bit is removed.
A non-transitory computer-readable storage medium storing instructions that, when executed by one or more processors (130) of an apparatus with multi-format data support including floating point data, configure the one or more processors (130) to: multiply (1230) the plurality of data by routing data of a plurality of data corresponding to a plurality of data formats to one or more corresponding multipliers of a multiplier-accumulator, MAC, array (131, 1115) determined based on the plurality of data formats; perform (1250) a first alignment on a result of the multiplication based on an exponent value of the plurality of data; add (1270) a result of the first alignment and data from a register (1125); and perform (1290) a second alignment on a result of the addition based on the exponent value and an operation result of a previous cycle, wherein the multiplier-accumulator, MAC, array (131, 1115) comprises four multipliers (211 to 214) each having 8x4-bit input and a single multiplier (215) having a 4x4-bit input for performing unsigned multiplication on 4-bit, 8-bit, 16-bit integers and mantissa of a 16-bit floating point.
The computer-readable storage medium of claim 14, wherein the multipliers of the MAC array comprise a plurality of multipliers corresponding a larger bit input and another multiplier corresponding to a smaller bit input.

Description

BACKGROUND 1. Field The following description relates to an apparatus and method with multi-format data support. 2. Description of Related Art To support an operation of multi-format data, a method of individually providing an operation apparatus corresponding to a multi-format according to a format of data or concatenating, to an output, and thereby outputting a plurality of sub-type data by distributing an input of an operation apparatus that supports a maximum data type, may be used. In the case of performing an operation on floating point data, a floating point adder used for accumulation may require a long processing time. Therefore, a data hazard issue according to a pipeline may occur in a high-speed operation. The publication of Hamzah Abdel-Aziz, et al., titled "Rethinking floating point overheads for mixed precision DNN accelerators" ARXIV.org, Cornell University Library, 201 Olin Library Cornell University Ithaca, NY 14853, 28 January 2021 (2021-01-28) relates to floating point overheads for mixed precision DNN accelerators. A mixed-precision convolution unit architecture is proposed which supports different integer and floating point, FP, precisions. The architecture is based on low-bit inner product units and realizes higher precision based on temporal decomposition. EP 3396524 A1 is further prior art. SUMMARY This Summary is provided to introduce a selection of concepts in a simplified form that are further described below in the Detailed Description. The invention is claimed by the independent claims. Preferred embodiments are specified by the dependent claims. According to the present invention, there is provided an apparatus with multi-format data support including floating point data, the apparatus includes: a receiver configured to receive a plurality of data corresponding to a plurality of data formats; and one or more processors configured to: multiply the plurality of data using one or more multipliers; perform a first alignment on a result of the multiplication based on an exponent value of the plurality of data; add a result of the first alignment and data from a register; and perform a second alignment on a result of the addition based on the exponent value and an operation result of a previous cycle, wherein the one or more multipliers comprises four multipliers each having 8x4-bit input and a single multiplier having a 4x4-bit input for performing unsigned multiplication on 4-bit, 8-bit, 16-bit integers and mantissa of a 16-bit floating point. For the multiplying, the one or more processors may be configured to: multiply a first bit input and a second bit input included in the plurality of data; convert a sign of a result of the multiplication of the first bit input and the second bit input; and combine the result of the multiplication of the first bit input and the second bit input with the converted sign to generate the result of the multiplying of the plurality of data. For the multiplying, the one or more processors may be configured to multiply a plurality of first bit inputs of the plurality of data. The one or more processors may be configured to: add the exponent value; obtain a maximum exponent value based on the exponent value; determine a sum of remaining exponent values; and determine a difference between the maximum exponent value and the sum. For the performing of the first alignment, the one or more processors may be configured to shift the result of the multiplication based on a difference between a maximum exponent value obtained based on the exponent value and a sum of remaining exponent values. For the performing of the second alignment, the one or more processors may be configured to shift the result of the addition based on a maximum exponent value obtained based on the exponent value and the operation result of the previous cycle. For the shifting of the result of the addition, the one or more processors may be configured to shift the result of the addition based on a difference between the maximum exponent value and an exponent value stored according to the operation result of the previous cycle. For the performing of the second alignment, the one or more processors may be configured to: extend a sign bit of the plurality of data based on a predetermined radix point; and add the extended sign bit to the exponent value. The one or more processors may be configured to accumulate a result of the second alignment. The one or more processors may be configured to: remove a sign bit with a predetermined length from an output of a result of the accumulation; and perform normalization on the output in which the sign bit is removed. The one or more processors may include: one or more multipliers configured to perform the multiplying of the plurality of data; a first aligner configured to perform the first alignment on a result of the multiplication; an adder tree configured to perform the adding of the result of the first alignment; and a second aligner configured to