US-12619870-B2 - Programmable non-linear activation engine for neural network acceleration

US12619870B2US 12619870 B2US12619870 B2US 12619870B2US-12619870-B2

Abstract

A programmable, non-linear (PNL) activation engine for a neural network is capable of receiving input data within a circuit. In response to receiving an instruction corresponding to the input data, the PNL activation engine is capable of selecting a first non-linear activation function from a plurality of non-linear activation functions by decoding the instruction. The PNL activation engine is capable of fetching a first set of coefficients corresponding to the first non-linear activation function from a memory. The PNL activation engine is capable of performing a polynomial approximation of the first non-linear activation function on the input data using the first set of coefficients. The PNL activation engine is capable of outputting a result from the polynomial approximation of the first non-linear activation function.

Inventors

Rajeev Patwari
Chaithanya Dudha
Jorn Tuyls
Kaushik Barman
Aaron Ng

Assignees

XILINX, INC.

Dates

Publication Date: 20260505
Application Date: 20220318

Claims (14)

1 . A method, comprising: including a programmable non-linear (PNL) activation engine circuit within an integrated circuit implementing a neural network to implement a plurality of different activation function nodes of the neural network; in response to receiving an electronic instruction, selecting, using a processing element configuration circuit of the PNL activation engine circuit, a selected non-linear activation function from a plurality of non-linear activation functions by decoding the electronic instruction; at runtime of the PNL activation engine circuit, fetching, by the processing element configuration circuit, different subsets of coefficients corresponding to the selected non-linear activation function from a memory based on the electronic instruction as decoded, wherein the memory stores a table specifying coefficients for each of a plurality of different non-linear activation functions and a plurality of subsets of coefficients for the selected non-linear activation function, and wherein each subset of coefficients corresponds to a different range of input data to be processed at runtime; programming one or more of a plurality of processing circuits of the PNL activation engine circuit with the different subsets of coefficients to process different ranges of input data at runtime; performing, by the one or more of the plurality of processing circuits, polynomial approximations of the selected non-linear activation function on the different ranges of input data provided to the one or more of the plurality of processing circuits using the different subsets of coefficients; and outputting results from the polynomial approximations of the selected non-linear activation function from each of the one or more of the plurality of processing circuits to other circuits of the neural network in the integrated circuit.
2 . The method of claim 1 , further comprising: for a subsequent input data received by the PNL activation engine circuit, selecting a different non-linear activation function from the plurality of non-linear activation functions; fetching a set of coefficients corresponding to the different non-linear activation function from the memory; performing a polynomial approximation of the different non-linear activation function on the subsequent input data using the set of coefficients corresponding to the different activation function; and outputting a result from the polynomial approximation of the different non-linear activation function.
3 . The method of claim 1 , wherein the PNL activation engine circuit implements polynomial approximations for different ones of the plurality of non-linear activation functions for different input data during runtime by fetching different sets of coefficients from the memory for the different ones of the plurality of non-linear activation functions.
4 . The method of claim 1 , further comprising: performing at least one of, pre-scaling the input data based on whether the polynomial approximation of the selected non-linear activation function requires pre-scaling; or post-scaling a result based on whether the polynomial approximation of the selected non-linear activation function requires post-scaling.
5 . The method of claim 1 , wherein different ones of the plurality of processing circuits are configured to process different ranges of input data, and wherein each different one of the plurality of processing circuits is provided with a particular subset of coefficients corresponding to the range of input data handled by the processing circuit.
6 . The method of claim 1 , wherein each of the plurality of processing circuits is configured to compute a result of the selected activation function for each subset of coefficients for each range of input data, compute the range of the input data, and select the result of the selected activation function corresponding to the range of the input data.
7 . The method of claim 1 , wherein the electronic instruction specifies a range of each item of input data provided to each of the plurality of processing circuits and each processing circuit is provided with a particular subset of coefficients for the range of the item of input data provided to that processing circuit based on the electronic instruction.
8 . The method of claim 1 , wherein different subsets of coefficients for the plurality of different ranges are programmable at runtime.
9 . A system, comprising: a plurality of processing circuits; a coefficients table stored in a memory, wherein the coefficients table stores different subsets of coefficients for each non-linear activation function of a plurality of non-linear activation functions, wherein each subset of coefficients for a non-linear activation function corresponds to a different range of input data items to be processed at runtime; an instruction decode table stored in the memory, wherein the instruction decode table stores a pointer to each of the plurality of non-linear activation functions in the coefficients table; and a processing element configuration circuit configured to decode a received electronic instruction to select a selected non-linear activation function from the plurality of non-linear activation functions using the instruction decode table, fetch different subsets of coefficients for the selected non-linear activation function from the coefficients table, and provide the different subsets of coefficients fetched to one or more of the plurality of processing circuits; wherein the one or more processing circuits compute a result using a polynomial approximation of the selected non-linear activation function for one or more input data items using the subset of coefficients fetched from the coefficients table for the input data items corresponding to the different ranges of input data.
10 . The system of claim 9 , wherein, for the selected non-linear activation function, the one or more processing circuit are configured to perform at least one of pre-scaling the one or more input data items or post-scaling the result based on whether the polynomial approximation of the selected non-linear activation function requires pre-scaling or post-scaling, respectively.
11 . The system of claim 9 , wherein: for a subsequent input data item received, the processing element configuration circuit selects a different non-linear activation function from the plurality of non-linear activation functions based on a further electronic instruction and fetches a set of coefficients corresponding to the different non-linear activation function from the memory; and one or more of the plurality of processing circuits performs a polynomial approximation of the different non-linear activation function on the subsequent input data using the set of coefficients corresponding to the different activation function and outputs one or more results from the polynomial approximation of the different non-linear activation function.
12 . The system of claim 9 , wherein different ones of the plurality of processing circuits are configured to process different ranges of input data items and wherein each different one of the plurality of processing circuits is provided with a particular subset of coefficients corresponding to the range of input data item handled by the processing circuit.
13 . The system of claim 9 , wherein each of the plurality of processing circuits is configured to compute a result of the selected activation function for each subset of the coefficients for each range of a selected data input item, compute the range of the selected data input item, and select the result of the selected activation function corresponding to the range of the selected data input item.
14 . The system of claim 9 , wherein the electronic instruction specifies a range of each input data item provided to each of the plurality of processing circuits and each processing circuit is provided with a particular subset of coefficients for the range of the input data item provided to that processing circuit based on the electronic instruction.

Description

TECHNICAL FIELD This disclosure relates to integrated circuits (ICs) and, more particularly, to a programmable non-linear activation engine for neural network acceleration. BACKGROUND Deep learning refers to a subset of machine learning. To accomplish a given task, deep learning utilizes neural networks, also called “artificial neural networks” or “simulated neural networks.” The structure of a neural network mimics the way that biological neurons of human brains communicate with one another. A neural network includes layers of interconnected nodes that are operable to categorize input data into categories of interest. Natural Language. Processing (NLP) is an area of significant interest within deep learning. In general, NLP refers to a branch of computer science that endows computers with the ability to understand text and spoken words. NLP combines computational linguistics (e.g., rule-based modeling of human language) with statistical, machine learning, and deep learning models. Through NLP, a computer is able to determine meaning, intent, and/or sentiment from text or voice data. Examples of neural networks adapted to perform NLP include Transformer and BERT. Similar to Convolutional Neural Networks (CNNs), NLP networks are often cascaded and include several linear and non-linear functions. In general, a “linear” layer computes multiplication or correlation of input data with model dependent parameters, and subsequently adds a “bias” to the output. A “non-linear” layer enables the network to learn complex, non-linear features pertaining to the specific layer thereby enabling complex feature detection in subsequent layers. The non-linear layers facilitate learning of parameters during training and higher accuracy in during inference. SUMMARY In one or more example implementations, a method includes receiving input data within a circuit. The method includes, in response to receiving an instruction corresponding to the input data, selecting, using the circuit, a first non-linear activation function from a plurality of non-linear activation functions by decoding the instruction. The method includes fetching a first set of coefficients corresponding to the first non-linear activation function from a memory. The method includes performing, using the circuit, a polynomial approximation of the first non-linear activation function on the input data using the first set of coefficients. The method includes outputting a result from the polynomial approximation of the first non-linear activation function. In one or more example implementations, a system includes one or more processing circuits. The system includes a coefficients table stored in a memory. The coefficients table stores a set of coefficients for each non-linear activation function of a plurality of non-linear activation functions. The system includes an instruction decode table stored in the memory. The instruction decode table stores a pointer (e.g., a base address) to each of the plurality of non-linear activation functions in the coefficients table. The system includes a processing element configuration circuit configured to decode a received instruction to determine a non-linear activation function from the plurality of non-linear activation functions, fetch the set of coefficients for the determined non-linear activation function, and provide the set of coefficients fetched to the one or more processing circuits. The one or more processing circuits compute a result using a polynomial approximation of the determined non-linear activation function for one or more input data items using the set of coefficients fetched from the coefficients table. In one or more example implementations, a system includes a processor configured to initiate operations. The operations include receiving input data. The operations include, in response to receiving an instruction corresponding to the input data, selecting a first non-linear activation function from a plurality of non-linear activation functions by decoding the instruction. The operations include fetching a first set of coefficients corresponding to the first non-linear activation function from a memory. The operations include performing a polynomial approximation of the first non-linear activation function on the input data using the first set of coefficients. The operations include outputting a result from the polynomial approximation of the first non-linear activation function. This Summary section is provided merely to introduce certain concepts and not to identify any key or essential features of the claimed subject matter. Other features of the inventive arrangements will be apparent from the accompanying drawings and from the following detailed description. BRIEF DESCRIPTION OF THE DRAWINGS The inventive arrangements are illustrated by way of example in the accompanying drawings. The drawings, however, should not be construed to be limiting of the inventive arrangements to only the particular implemen