US-12619859-B2 - Method and apparatus with quantized look up table for neural network operation

US12619859B2US 12619859 B2US12619859 B2US 12619859B2US-12619859-B2

Abstract

A neural network operation apparatus may include a receiver configured to receive input data to perform the neural network operation and a quantized Look Up Table (LUT) corresponding to a non-linear function comprised in the neural network operation, and a processor configured to perform scale-up on the input data based on a scale factor, to extract a quantized LUT parameter from the quantized LUT based on scaled-up input data, and to generate an operation result by performing a neural network operation based on the quantized LUT parameter.

Inventors

Donghyun Lee
Joonsang YU
Junki PARK
Jungwook CHOI

Assignees

SAMSUNG ELECTRONICS CO., LTD.
IUCF-HYU (INDUSTRY-UNIVERSITY COOPERATION FOUNDATION HANYANG UNIVERSITY)

Dates

Publication Date: 20260505
Application Date: 20220812
Priority Date: 20211018

Claims (18)

1 . A neural network operation apparatus comprising: a receiver configured to receive input data to perform the neural network operation and a quantized Look Up Table (LUT) corresponding to a non-linear function comprised in the neural network operation; and a processor configured to perform scale-up on the input data based on a scale factor, to extract a quantized LUT parameter from the quantized LUT based on scaled-up input data, and to generate an operation result by performing a neural network operation based on the quantized LUT parameter, wherein the processor is further configured to determine whether to perform the scale-up on the input data by comparing the input data and a threshold value.
2 . The neural network operation apparatus of claim 1 , wherein the quantized LUT is generated by quantizing an LUT based on a slope of a linear function that approximates the non-linear function, an offset of the linear function, and a decision boundary based on a range of the input data.
3 . The neural network operation apparatus of claim 2 , wherein the processor is further configured to perform scaling on the slope and the offset based on the quantized LUT parameter, in response to the scaled-up input data being within the decision boundary.
4 . The neural network operation apparatus of claim 1 , wherein the processor is further configured to obtain a bias and a scale factor of an address corresponding to the input data in the quantized LUT parameter, to generate converted data by converting the scaled-up input data based on the scale factor and the bias, and to generate the operation result based on the converted data.
5 . The neural network operation apparatus of claim 1 , wherein the processor is further configured to perform scale back on the operation result.
6 . The neural network operation apparatus of claim 2 , the processor is further configured to convert the decision boundary based on a scale parameter and to generate the quantized LUT parameter by scaling an LUT parameter based on the scale parameter and a square of the scale parameter.
7 . The neural network operation apparatus of claim 6 , wherein the scale parameter is a power of 2.
8 . A neural network operation apparatus comprising: a receiver configured to receive input data to perform the neural network operation and a Look Up Table (LUT) corresponding to a non-linear function comprised in the neural network operation; and a processor configured to perform scale-up on the input data based on a scale factor, extract an LUT parameter from the LUT based on scaled-up input data, and generate an operation result by performing a neural network operation based on the LUT parameter, wherein the processor is further configured to determine whether to perform the scale-up on the input data by comparing the input data and a threshold value.
9 . The neural network operation apparatus of claim 8 , wherein the processor is further configured to perform scaling on a slope of a linear function that approximates the non-linear function and an offset of the linear function based on the LUT parameter, in response to the scaled-up input data within a decision boundary calculated using a range of input data.
10 . The neural network operation apparatus of claim 8 , wherein the processor is further configured to obtain a bias and a scale factor of an address corresponding to the input data in the LUT parameters, to generate converted data by converting the scaled-up input data based on the scale factor and the bias, and to generate the operation result based on the converted data.
11 . The neural network operation apparatus of claim 8 , wherein the processor is further configured to perform scale back on the operation result.
12 . A neural network operation method comprising: receiving input data to perform the neural network operation and a quantized Look Up Table (LUT) corresponding to a non-linear function comprised in the neural network operation; performing scale-up on the input data based on a scale factor; extracting a quantized LUT parameter from the quantized LUT based on scaled-up input data; and generating an operation result by performing a neural network operation based on the quantized LUT parameter, wherein the performing of the scale-up comprises determining whether to perform the scale-up on the input data by comparing the input data and a threshold value.
13 . The neural network operation method of claim 12 , wherein the quantized LUT is generated by quantizing an LUT based on a slope of a linear function that approximates the non-linear function, an offset of the linear function, and a decision boundary based on a range of input data.
14 . The neural network operation method of claim 13 , wherein the performing of the scale-up comprises performing scaling on the slope and the offset based on the quantized LUT parameter, in response to the scaled-up input data being within the decision boundary.
15 . The neural network operation method of claim 12 , wherein the generating of the operation result comprises: obtaining a bias and a scale factor of an address corresponding to the input data in the quantized LUT parameter; generating converted data by converting the scaled-up input data based on the scale factor and the bias; and generating the operation result based on the converted data.
16 . The neural network operation method of claim 12 , further comprising: performing scale back on the operation result.
17 . The neural network operation method of claim 13 , further comprising converting the decision boundary based on a scale parameter and generating the quantized LUT parameter by scaling an LUT parameter based on the scale parameter and a square of the scale parameter.
18 . The neural network operation method of claim 17 , wherein the scale parameter is a power of 2.

Description

CROSS-REFERENCE TO RELATED APPLICATIONS This application claims the benefit under 35 USC § 119(a) of Korean Patent Application No. 10-2021-0138434, filed on Oct. 18, 2021, in the Korean Intellectual Property Office, the entire disclosure of which is incorporated herein by reference for all purposes. BACKGROUND 1. Field The following description relates to a neural network operation method and apparatus. 2. Description of Related Art In a conventional neural network operation, an error related to a non-linear function may occur when using a Look Up Table (LUT). A range of output values changes rapidly compared to a variation in input values for non-linear functions. Due to the error of the conventional neural network operation method, the prediction accuracy of the neural network model is reduced. In addition, a large number of LUT indexes are required to approximate a section with a high fluctuation rate, which increases hardware cost. Thus, there is a demand for a neural network operation method that maintains accuracy while maintaining a manageable number of LUT indexes. SUMMARY This Summary is provided to introduce a selection of concepts in a simplified form that are further described below in the Detailed Description. This Summary is not intended to identify key features or essential features of the claimed subject matter, nor is it intended to be used as an aid in determining the scope of the claimed subject matter. In one general aspect, there is provided a neural network operation apparatus comprising a receiver configured to receive input data to perform the neural network operation and a quantized Look Up Table (LUT) corresponding to a non-linear function comprised in the neural network operation, and a processor configured to perform scale-up on the input data based on a scale factor, to extract a quantized LUT parameter from the quantized LUT based on scaled-up input data, and to generate an operation result by performing a neural network operation based on the quantized LUT parameter. The quantized LUT may be generated by quantizing an LUT based on a slope of a linear function that approximates the non-linear function, an offset of the linear function, and a decision boundary based on a range of the input data. The processor may be configured to determine whether to perform scale-up on the input data by comparing the input data and a threshold value. The processor may be configured to perform scaling on the slope and the offset based on the quantized LUT parameter, in response to the scaled-up input data being within the decision boundary. The processor may be configured to obtain a bias and a scale factor of an address corresponding to the input data in the quantized LUT parameter, to generate converted data by converting the scaled-up input data based on the scale factor and the bias, and to generate the operation result based on the converted data. The processor may be configured to perform scale back on the operation result. The decision boundary may be converted based on a scale parameter and the quantized LUT parameter may be generated by scaling an LUT parameter based on the scale parameter and a square of the scale parameter. The scale parameter may be a power of 2. In another general aspect, there is provided a neural network operation apparatus including a receiver configured to receive input data to perform the neural network operation and a Look Up Table (LUT) corresponding to a non-linear function comprised in the neural network operation, and a processor configured to perform scale-up on the input data based on a scale factor, extract an LUT parameter from the LUT based on scaled-up input data, and generate an operation result by performing a neural network operation based on the LUT parameter. The processor may be configured to perform scaling on a slope of a linear function that approximates the non-linear function and an offset of the linear function based on the LUT parameter, in response to the scaled-up input data within a decision boundary calculated using a range of input data. The processor may be configured to determine whether to perform scale-up on the input data by comparing the input data and a threshold value. The processor may be configured to obtain a bias and a scale factor of an address corresponding to the input data in the LUT parameters, to generate converted data by converting the scaled-up input data based on the scale factor and the bias, and to generate the operation result based on the converted data. The processor may be configured to perform scale back on the operation result. In another general aspect, there is provided a neural network operation method including receiving input data to perform the neural network operation and a quantized Look Up Table (LUT) corresponding to a non-linear function comprised in the neural network operation, performing scale-up on the input data based on a scale factor, extracting a quantized LUT parameter from the quant