US-12619864-B2 - Efficient look-up table based functions for artificial intelligence (AI) accelerator
Abstract
A method for approximating an activation function, the method including: receiving an input value of the activation function; determining that the input value is within a range, the range includes a set of non-uniform intervals; determining a selected interval from among the set of non-uniform intervals including the input value; retrieving, by a hardware accelerator, from a look-up table (LUT) associated with a type of the activation function, values of one or more quadratic interpolation parameters associated with the selected interval; performing a quadratic interpolation on the input value to approximate the input value using the values of the one or more quadratic interpolation parameters; and determining a first approximated output of the activation function based on a result of the quadratic interpolation performed on the input value.
Inventors
- Johannes Boonstra
Assignees
- SYNOPSYS, INC.
Dates
- Publication Date
- 20260505
- Application Date
- 20220526
Claims (20)
- 1 . A method for approximating an activation function, the method comprising: receiving an input value of the activation function; determining, by a hardware accelerator, a range of input values that falls in a quadratic interpolation range of the activation function; determining, by the hardware accelerator, whether the input value is within the quadratic interpolation range, wherein the quadratic interpolation range comprises a set of non-uniform intervals; determining, by the hardware accelerator, a selected interval from among the set of non-uniform intervals comprising the input value; based on determining that the input value is within the quadratic interpolation range: retrieving, by the hardware accelerator, from a first look-up table (LUT) associated with a type of the activation function, values of one or more quadratic interpolation parameters associated with the selected interval; performing a quadratic interpolation on the input value to approximate the input value using the values of the one or more quadratic interpolation parameters; and determining a first approximated output of the activation function based on a result of the quadratic interpolation performed on the input value, and based on determining that a second input value is outside the quadratic interpolation range, the hardware accelerator determines to perform a linear extrapolation in one or more linear extrapolation ranges comprising a range from an upper limit value of the quadratic interpolation range to positive infinity and a range from a lower limit value of the quadratic interpolation ranges to negative infinity by: retrieving, by the hardware accelerator, from a second LUT, offset values and slopes of the activation function for each of the linear extrapolation ranges; performing, by the hardware accelerator, the linear extrapolation on the second input value to approximate the activation function at the second input value using an offset value and a slope corresponding to the linear extrapolation range; and determining, by the hardware accelerator, a second approximated output of the activation function based on a result of the linear extrapolation performed on the second input value.
- 2 . The method of claim 1 , wherein a Remez minimax algorithm is used to determine a non-uniform distribution of the set of non-uniform intervals.
- 3 . The method of claim 2 , wherein the determining the selected interval from the set of non-uniform intervals is based on one or more of most significant bits (MSB) of the input value and two's complement encoded input value.
- 4 . The method of claim 1 , wherein the first LUT associated with the activation function comprises the values of the one or more quadratic interpolation parameters for each interval from among the set of non-uniform intervals.
- 5 . The method of claim 4 , wherein the hardware accelerator stores a plurality of LUTs comprising parameters associated with quadratic interpolation ranges of a plurality of activation functions.
- 6 . The method of claim 5 , wherein the plurality of LUTs are implemented as flip-flops in the hardware accelerator.
- 7 . The method of claim 1 , wherein: the one or more linear extrapolation ranges do not overlap with the quadratic interpolation range.
- 8 . The method of claim 7 , wherein the input value is shifted by power of two factors within a range of representable values in a fixed point representation of the second approximated output of the activation function.
- 9 . A system for approximating an activation function, the system comprising: a bit shifter configured to receive an input value of the activation function and perform a shift operation on the input value to compute most significant bits (MSBs) of the input value, wherein the system is configured to determine a range of input values that falls in a quadratic interpolation range of the activation function; a first device configured to determine if the input value is within the quadratic interpolation range comprising a set of non-uniform intervals, and determine a selected interval from among the set of non-uniform intervals comprising the input value based on the MSBs of the input value; based on determining that the input value is within the quadratic interpolation range, a second device configured to retrieve from a look-up table (LUT) associated with the activation function, values of one or more quadratic interpolation parameters associated with the selected interval; and one or more multiply-accumulate-scale circuits configured to: perform a quadratic interpolation on the input value to approximate the input value using the values of the one or more quadratic interpolation parameters, and to determine a first approximated output of the activation function based on a result of the quadratic interpolation performed on the input value; and based on the first device determining that a second input value is outside the quadratic interpolation range, the one or more multiply-accumulate-scale circuits are further configured to perform a linear extrapolation in one or more linear extrapolation ranges comprising a range from an upper limit value of the quadratic interpolation range to positive infinity and in a range from a lower limit value of the quadratic interpolation ranges to negative infinity by: performing the linear extrapolation on the second input value to approximate the activation function at the second input value using an offset value and a slope corresponding to a linear extrapolation range containing the second input value from among a plurality of offset values and slopes of the activation function for each of the linear extrapolation ranges, and determining a second approximated output of the activation function based on a result of the linear extrapolation performed on the second input value.
- 10 . The system of claim 9 , wherein the system comprises a hardware accelerator and/or an electronic circuit configured by a hard macro intellectual property (IP).
- 11 . The system of claim 10 , wherein the hardware accelerator stores a plurality of LUTs comprising parameters associated with quadratic interpolation ranges of a plurality of activation functions.
- 12 . The system of claim 11 , wherein the plurality of LUTs are implemented as flip-flops in the hardware accelerator.
- 13 . The system of claim 9 , wherein Remez minimax algorithm is used to determine non-uniform distribution of the set of non-uniform intervals.
- 14 . The system of claim 13 , wherein the determining the selected interval from among set of non-uniform intervals is based on one or more of most significant bits (MSB) of the input value and two's complement encoded input value.
- 15 . The system of claim 9 , wherein the LUT associated with the activation function comprises the values of the one or more quadratic interpolation parameters for each interval from among the set of non-uniform intervals.
- 16 . The system of claim 9 , further comprising: a third device configured to determine the offset values and the slopes of the activation function for each of the one or more linear extrapolation ranges based on the first device determining that the second input value is within the one or more linear extrapolation ranges, the one or more linear extrapolation ranges do not overlap with the quadratic interpolation range.
- 17 . The system of claim 16 , wherein the input value is shifted by power of two factors within a range of representable values in a fixed point representation of the second approximated output of the activation function.
- 18 . A non-transitory computer readable medium comprising stored representation of an accelerator circuit, which when synthesized by a processor, cause the processor to synthesize a structure of an electronic circuit configured to: receive an input value of an activation function; determining, by a hardware accelerator, a range of input values that falls in a quadratic interpolation range of the activation function; determine, by the hardware accelerator, whether the input value is within the quadratic interpolation range, wherein the quadratic interpolation range comprises a set of non-uniform intervals; determine, by the hardware accelerator, a selected interval from among the set of non-uniform intervals comprising the input value; based on determining that the input value is within the quadratic interpolation range: retrieve, by the hardware accelerator, from a look-up table (LUT) associated with the activation function, values of one or more quadratic interpolation parameters associated with the selected interval; perform a quadratic interpolation on the input value to approximate the input value using the values of the one or more quadratic interpolation parameters; and determine a first approximated output of the activation function based on a result of the quadratic interpolation performed on the input value, and based on determining that a second input value is outside the quadratic interpolation range, the hardware accelerator determines to perform a linear extrapolation in one or more linear extrapolation ranges comprising a range from an upper limit value of the quadratic interpolation range to positive infinity and a range from a lower limit value of the quadratic interpolation ranges to negative infinity by: determining, offset values and slopes of the activation function for each of the linear extrapolation ranges; performing the linear extrapolation on the second input value to approximate the activation function at the second input value using an offset value and a slope corresponding to the linear extrapolation range; and determining a second approximated output of the activation function based on a result of the linear extrapolation performed on the second input value.
- 19 . The non-transitory computer readable medium of claim 18 , wherein the input value is shifted by power of two factors within a range of representable values in a fixed point representation of the first approximated output of the activation function.
- 20 . The non-transitory computer readable medium of claim 18 , wherein: the one or more linear extrapolation ranges do not overlap with the quadratic interpolation range.
Description
TECHNICAL FIELD The present application generally relates to an artificial intelligence (AI) accelerator system. More particularly, the present disclosure relates to a system and method for providing an efficient look-up table based function for an AI accelerator. BACKGROUND An artificial intelligence (AI) and/or machine learning (ML) accelerator is a high-performance parallel computation machine that is specifically designed for the efficient processing of AI workloads like neural networks. Given that processing speed and scalability are two key demands from AI applications, AI accelerators (e.g., a neural network processing unit) play a critical role in delivering the near-instantaneous results that make these applications valuable. The above information in the Background section is only for enhancement of understanding of the background of the technology and therefore it should not be construed as admission of existence or relevancy of the prior art. SUMMARY This summary is provided to introduce a selection of features and concepts of embodiments of the present disclosure that are further described below in the detailed description. This summary is not intended to identify key or essential features of the claimed subject matter, nor is it intended to be used in limiting the scope of the claimed subject matter. One or more of the described features may be combined with one or more other described features to provide a workable device. In one or more embodiments, a method for approximating an activation function includes: receiving an input value of the activation function; determining that the input value is within a range, the range includes a set of non-uniform intervals; determining a selected interval from among the set of non-uniform intervals including the input value; retrieving, by a hardware accelerator, from a look-up table (LUT) associated with a type of the activation function, values of one or more quadratic interpolation parameters associated with the selected interval; performing a quadratic interpolation on the input value to approximate the input value using the values of the one or more quadratic interpolation parameters; and determining a first approximated output of the activation function based on a result of the quadratic interpolation performed on the input value. In one or more embodiments, a Remez minimax algorithm is used to determine a non-uniform distribution of the set of non-uniform intervals. In one or more embodiments, the determining the selected interval from among the set of non-uniform intervals is based on one or more of most significant bits (MSB) of the input value and/or two's complement encoded input value. In one or more embodiments, the LUT associated with the activation function includes the values of the one or more quadratic interpolation parameters for each interval from among the set of non-uniform intervals. In one or more embodiments, the hardware accelerator stores a plurality of LUTs including parameters associated with quadratic interpolation ranges of a plurality of activation functions. In one or more embodiments, the plurality of LUTs are implemented as flip-flops in the hardware accelerator. In one or more embodiments, method further includes: determining, by the hardware accelerator, that the input value is within one or more linear extrapolation ranges, the one or more linear extrapolation ranges do not overlap with the range; determining, by the hardware accelerator, offset values and slopes of the activation function for each of the one or more linear extrapolation ranges; performing, by the hardware accelerator, a linear extrapolation on the input value to approximate the input value using the offset values and the slopes of the activation function for each of the one or more linear extrapolation ranges; and determining, by the hardware accelerator, a second approximated output of the activation function based on a result of the linear extrapolation performed on the input value. In one or more embodiments, the input value is shifted by power of two factors within a range of representable values in a fixed point representation of the second approximated output of the activation function. In one or more embodiments, a system for approximating an activation function includes: a bit shifter configured to receive an input value of the activation function and perform a shift operation on the input value to compute most significant bits (MSBs) of the input value; a first device configured to determine if the input value is within a range including a set of non-uniform intervals, and determine a selected interval from among the set of non-uniform intervals including the input value based on the MSBs of the input value; a second device configured to retrieve from a look-up table (LUT) associated with the activation function, values of one or more quadratic interpolation parameters associated with the selected interval; one or more multiply-accumulate-sc