US-12625842-B2 - Device and method for on-the-fly processing chain reconfiguration in a streaming based neural processing unit

US12625842B2US 12625842 B2US12625842 B2US 12625842B2US-12625842-B2

Abstract

A neural network is able to reconfigure hardware accelerators on-the-fly without stopping downstream hardware accelerators. The neural network inserts a reconfiguration tag into the stream of feature data. If the reconfiguration tag matches an identification of a hardware accelerator, a reconfiguration process is initiated. Upstream hardware accelerators are paused while downstream hardware accelerators continue to operate. An epoch controller reconfigures the hardware accelerator via a bus. Normal operation of the neural network then resumes.

Inventors

Carmine CAPPETTA
Paolo Sergio ZAMBOTTI
Thomas Boesch
Giuseppe Desoli

Assignees

STMICROELECTRONICS INTERNATIONAL N.V.

Dates

Publication Date: 20260512
Application Date: 20230329

Claims (20)

1 . A method, comprising: passing a stream of feature data to a processing chain of a neural network, the processing chain including a first hardware accelerator; storing first identification data in the first hardware accelerator; inserting, into the stream of feature data, a reconfiguration tag; receiving the reconfiguration tag with the first hardware accelerator; checking if the reconfiguration tag matches the first identification data; and detecting that the reconfiguration tag matches the first identification data, and upon the detecting, initiating a reconfiguration process of the first hardware accelerator.
2 . The method of claim 1 , wherein the reconfiguration process includes stopping reception of the stream of feature data at an input of the first hardware accelerator.
3 . The method of claim 2 , wherein the reconfiguration process includes, after stopping reception of the stream of feature data, continuing to output from the first hardware accelerator feature data received by the hardware accelerator prior to initiating the reconfiguration process.
4 . The method of claim 3 , wherein the reconfiguration process includes raising a flag indicating that all previously received feature data has been output from the first hardware accelerator.
5 . The method of claim 4 , wherein the reconfiguration process includes reconfiguring the first hardware accelerator in response to the flag.
6 . The method of claim 5 , wherein reconfiguring the first hardware accelerator includes writing reconfiguration data to a configuration register of the first hardware accelerator.
7 . The method of claim 6 , comprising writing the reconfiguration data via a bus separate from the processing chain.
8 . The method of claim 7 , comprising writing reconfiguration data with an epoch controller of the neural network.
9 . The method of claim 5 , comprising: resuming reception of the stream of feature data at the first hardware accelerator after the reconfiguration process; and passing, from the first hardware accelerator, the reconfiguration tag to a second hardware accelerator of the processing chain.
10 . The method of claim 9 , comprising: storing second identification data in the second hardware accelerator; receiving the reconfiguration tag in the stream of feature data from the first hardware accelerator with the second hardware accelerator; checking if the reconfiguration tag matches the second identification data; and if the reconfiguration tag matches the matches second identification data, initiating a reconfiguration process of the second hardware accelerator.
11 . The method of claim 10 , wherein the reconfiguration process includes: stopping reception of the stream of feature data at an input of the first hardware accelerator; after stopping reception of the stream of feature data, continuing to output from the second hardware accelerator feature data received by the second hardware accelerator prior to initiating the reconfiguration process of the second hardware accelerator; raising a second flag indicating that all previously received feature data has been output from the second hardware accelerator; and reconfiguring the second hardware accelerator in response to the second flag.
12 . The method of claim 1 , comprising storing the first identification data in a read-only register of the first hardware accelerator.
13 . The method of claim 1 , comprising storing the first identification data in a rewritable register of the first hardware accelerator.
14 . A method, comprising: configuring a processing chain of a neural network, the processing chain including a first hardware accelerator, a second hardware accelerator downstream from the first hardware accelerator, and a third hardware accelerator downstream from the second hardware accelerator; passing a stream of feature data to the processing chain; inserting a reconfiguration tag into the stream of feature data; receiving the reconfiguration tag at the second hardware accelerator; comparing the reconfiguration tag to identification data stored in the second hardware accelerator; and detecting that the reconfiguration tag matches the identification data, and upon the detecting: stopping the first hardware accelerator from passing the stream of feature data to the second hardware accelerator; continuing to process the stream of feature data with the third hardware accelerator after stopping the first hardware accelerator; and reconfiguring the second hardware accelerator after stopping the first hardware accelerator.
15 . The method of claim 14 , comprising resuming passing the stream of feature data from the first hardware accelerator to the second hardware accelerator after reconfiguring the second hardware accelerator.
16 . The method of claim 15 , wherein inserting the reconfiguration tag into the stream of feature data includes inserting the reconfiguration tag directly between an end of frame signal and a start of frame signal.
17 . A device, comprising a neural network, the neural network including: a stream switch; a plurality of hardware accelerators configured as a processing chain of the neural network in conjunction with the stream switch; and a stream engine configured to pass a stream of feature data to the plurality of hardware accelerators and to insert a reconfiguration tag into the stream of feature data, wherein at least one of the hardware accelerators includes a register configured to store identification data, to compare the reconfiguration tag to the identification data, and to initiate a reconfiguration process of the at least one hardware accelerator if the reconfiguration tag matches the identification data.
18 . The device of claim 17 , wherein the neural network includes an epoch controller configured to reconfigure the at least one register in response to the reconfiguration tag matching the identification data.
19 . The device of claim 17 , wherein the at least one hardware accelerator is a convolutional accelerator.
20 . The device of claim 17 , wherein the at least one hardware accelerator is a pooling unit.

Description

BACKGROUND Technical Field The present disclosure generally relates to neural networks, and more particularly to configuration of neural networks. Description of the Related Art Deep learning algorithms promote very high performance in numerous applications involving recognition, identification and/or classification tasks, however, such advancements may come at the price of significant usage of processing power. Thus, their adoption can be hindered by a lack of availability of low-cost and energy-efficient solutions. Accordingly, severe performance specifications may coexist with tight constraints in terms of power and energy consumption while deploying deep learning applications on embedded devices. Convolutional Neural Networks (CNN) are a type of Deep Neural Networks (DNN). Their architecture is characterized by Convolutional Layers and Fully Connected Layers. The former layers carry on convolution operations between layer's inputs and convolutional kernels, non-linear activation functions (such as rectifiers) and max pooling operations, which are usually the most demanding ones in terms of computational effort. Furthermore, reconfiguring components of neural networks can be costly in terms of time and resources. BRIEF SUMMARY Embodiments of the present disclosure provide a method and device that enables on-the-fly reconfiguration of hardware accelerators of a neural network. In one embodiment, while a processing chain of the neural network is operating, a reconfiguration tag is inserted into the stream of feature data being processed. Each of the hardware accelerators of the processing chain receives the reconfiguration tag in turn. If the reconfiguration tag matches an identifier stored in a hardware accelerator, then a reconfiguration process is initiated. Processing of feature data upstream from the matched hardware accelerator is paused while hardware accelerators downstream continue to process feature data already in the chain. An epoch controller then reconfigures the matched hardware accelerator via a bus separate from the chain. Flow of the feature data is then resumed. The result is that one or more of the hardware accelerators can be reconfigured on-the-fly without entirely emptying the processing chain of feature data. In one embodiment, a method includes passing a stream of feature data to a processing chain of a neural network. The processing chain includes a first hardware accelerator. The method includes storing first identification data in the first hardware accelerator, inserting, into the stream of feature data, a reconfiguration tag, and receiving the reconfiguration tag with the first hardware accelerator. The method includes checking if the reconfiguration tag matches the first identification data and if the reconfiguration tag matches the first identification data, initiating a reconfiguration process of the first hardware accelerator. In one embodiment, a method includes configuring a processing chain of a neural network. The processing chain includes a first hardware accelerator, a second hardware accelerator downstream from the first hardware accelerator, and a third hardware accelerator downstream from the second hardware accelerator. The method includes passing a stream of feature data to the processing chain, inserting a reconfiguration tag into the stream of feature data, receiving the reconfiguration tag at the second hardware accelerator, and comparing the reconfiguration tag to identification data stored in the second hardware accelerator. The method includes, if the reconfiguration tag matches the identification data, stopping the first hardware accelerator from passing the stream of feature data to the second hardware accelerator, continuing to process the stream of feature data with the third hardware accelerator after stopping the first hardware accelerator, and reconfiguring the second hardware accelerator after stopping the first hardware accelerator. In one embodiment, a device includes a neural network. The neural network includes a stream switch, a plurality of hardware accelerators configured as a processing chain of the neural network in conjunction with the stream switch, and stream engine. The stream engine is configured to pass a stream of feature data to the plurality of hardware accelerators and to insert a reconfiguration tag into the stream of feature data. At least one of the hardware accelerators includes a register configured to store identification data, to compare the reconfiguration tag to the identification data, and to initiate a reconfiguration process of the at least one hardware accelerator if the reconfiguration tag matches the identification data. BRIEF DESCRIPTION OF THE SEVERAL VIEWS OF THE DRAWINGS FIG. 1 is a block diagram of an electronic device, according to some embodiments. FIG. 2A is a block diagram of a neural network illustrating a process chain, according to some embodiments. FIG. 2B is a representation of a feature tensor, accor