EP-4736071-A1 - NEURAL NETWORK PROCESSOR AND NEURAL NETWORK PROCESSING METHOD

EP4736071A1EP 4736071 A1EP4736071 A1EP 4736071A1EP-4736071-A1

Abstract

A neural network processor is designed to process sequential windows (W1, W2,... Wn) of a time-dependent signal. Each window contains multiple samples (ns) of the signal over a time interval (T), with each window shifted relative to the previous one by a time-step (ΔT) smaller than T. This time-step corresponds to a base shift amount (h) defined by h=[ΔT/T ns]. The processor executes a neural network with multiple layers (L), each containing a plurality of neurons (N(...,Y,...L)), indexed by time domain (Y). It performs operations for each sample window, including computing a differential result signal for a neuron by referencing a neuron with an index value (Y) determined by the base shift amount and an accumulated up/down-sampling factor S.

Inventors

POURTAHERIAN, Arash
Waeijen, Luc Johannes Wilhelmus
PIRES DOS REIS MOREIRA, ORLANDO MIGUEL

Assignees

Snap Inc.

Dates

Publication Date: 20260506
Application Date: 20240627

Claims (20)

1. A neural network processor for processing subsequent windows (W 1, W2, ...Wn) of a time dependent signal, each window comprising a plurality (ns) of samples of the time dependent signal for subsequent points in time in a timeinterval (T) and each window of samples being shifted in time relative to its immediately preceding window with a time-step (AT), which time-step is smaller than the time-interval (T), the time step corresponding to a base shift amount (h) defined by the neural network processor being configured to execute a neural network with a plurality of neural network layers (L), each with a respective plurality of neurons (N(...,Y,...L)), addressable with at least a time domain related index (Y) , the neural network processor being configured to perform respective operations for respective windows of samples, a respective operation comprising: computing respective result signal values (R(...,Y,...,L)) of neurons (N(... , Y, ... ,L)) of a layer L, determining differential result signal values (D(...,Y,...,L)) of the neurons (N(... , Y, ... ,L)) of the layer (L), providing respective output signal values (O(...,Y,...,L)) for respective ones of the neurons (N(...,Y,...,L)) of the layer (L) of which the differential result signal values (D(...,Y,...,L)) exceed a threshold value; wherein the differential result signal value (D(...,Y,...,L)) of a particular neuron (N(...,Y,...,L)) is computed dependent on a value of a reference neuron in the same layer with reference time domain related index (Y as specified below, the differential result signal value (D(...,Y,...,L)) being equal to the difference between the result signal value (R(...,Y,...,L)) of the particular neuron (N(...,Y,...,L)) obtained in the respective operation n and the result signal value (R(...,Y’,...,L)) obtained in the respective operation for the preceding window of sample with a reference neuron (N(...,Y,...,L)) in the layer having a reference time domain related index Y’ equal to the time domain related index incremented with a shift amount equal to a base shift amount h of the window times the accumulated up/down-sampling factor S at the output of the layer L, which is the ratio sY/ns, wherein sY is the size of the layer L in the direction of the time domain related axis Y.
2. The neural network processor according to claim 1, wherein the time dependent signal is a video signal or an audio signal.
3. The neural network processor according to claim 1, wherein the reference time domain related index Y’ is computed from the time domain related index Y of the particular neuron as: Y' = (Y + [h ■ SJ) mod sY where mod is the modulo operation, h is the base shift amount of the window, being h = and S is the accumulated up/down- sampling factor at the output of the layer L being the ratio sY/ns, wherein sY is the size of the layer L in the direction of the time domain related axis Y.
4. The neural network processor according to claim 2, wherein the reference time domain related index Y’ is computed from the time domain related index Y of the particular neuron as: Y' = (Y + [h ■ SJ) mod sY where mod is the modulo operation, h is the base shift amount of the window, being h = and S is the accumulated up/down- sampling factor at the output of the layer L being the ratio sY/ns, wherein sY is the size of the layer L in the direction of the time domain related axis Y.
5. The neural network processor according to claim 1, wherein each window (Wl, W2, ...Wn) is a spectrogram of an input signal.
6. The neural network processor according to claim 2, wherein each window (Wl, W2, ...Wn) is a spectrogram of an input signal.
7. The neural network processor according to claim 1, wherein the base shift amount is an integer power of 2.
8. The neural network processor according to claim 2, wherein the base shift amount is an integer power of 2.
9. The neural network processor according to claim 7, wherein the base shift amount is the integer power of 2 closest to the square root of the size of the first neural network layer in the direction of the time domain related index.
10. The neural network processor according to claim 8, wherein the base shift amount is the integer power of 2 closest to the square root of the size of the first neural network layer in the direction of the time domain related index.
11. The neural network processor according to claim 1, wherein a duration of the time window is at least five times the time shift value.
12. The neural network processor according to claim 2, wherein a duration of the time window is at least five times the time shift value.
13. A data processing system, comprising a Short-Time Fourier Transform (STFT) unit (2), a windowing unit (3) and a neural network processor (1) according to claim 1, wherein the STFT unit (2) receives at its input an input signal M(t) which is function of time and provides at its output a spectrogram (S(x,y)) as a function of time, wherein the windowing unit (3) repeatedly selects a window to be processed by the neural network processor (1), each window comprising a plurality of spectrograms obtained in a time-interval (T) and being shifted in time relative to its immediately preceding window with a time-step (AT), which time-step is smaller than the time-interval T.
14. A data processing system, comprising a Short-Time Fourier Transform (STFT) unit (2), a windowing unit (3) and a neural network processor (1) according to claim 2, wherein the STFT unit (2) receives at its input an input signal M(t) which is function of time and provides at its output a spectrogram (S(x,y)) as a function of time, wherein the windowing unit (3) repeatedly selects a window to be processed by the neural network processor (1), each window comprising a plurality of spectrograms obtained in a time-interval (T) and being shifted in time relative to its immediately preceding window with a time-step (AT), which time-step is smaller than the time-interval T.
15. A neural network processing method for processing subsequent windows (Wl, W2, ...Wn) of a time dependent signal, each window comprising a plurality (ns) of samples of the time dependent signal for subsequent points in time in a time-interval (T) and each window of samples being shifted in time relative to its immediately preceding window with a time-step (AT), which time-step is smaller than the time-interval (T), the time step corresponding to a base shift amount (h) defined by the neural network processing method comprising executing a neural network with a plurality of neural network layers (L), each with a respective plurality of neurons (N(...,Y,...L)), addressable with at least a time domain related index (Y), the neural network processor perform respective operations for respective windows of samples, a respective operation comprising: computing respective result signal values (R(...,Y,...,L)) of neurons (N(... , Y, ... ,L)) of a layer L, determining differential result signal values (D(...,Y,...,L)) of the neurons (N(... , Y, ... ,L)) of the layer (L), providing respective output signal values (O(...,Y,...,L)) for respective ones of the neurons (N(...,Y,...,L)) of the layer (L) of which the differential result signal values (D(...,Y,...,L)) exceed a threshold value; wherein the differential result signal value (D(...,Y,...,L)) of a particular neuron (N(...,Y,...,L)) is computed dependent on a value of a reference neuron in the same layer with reference time domain related index (Y as specified below, the differential result signal value (D(...,Y,...,L)) being equal to the difference between the result signal value (R(...,Y,...,L)) of the particular neuron (N(...,Y,...,L)) obtained in the respective operation n and the result signal value (R(...,Y’,...,L)) obtained in the respective operation for the preceding window of sample with a reference neuron (N(...,Y,...,L)) in the layer having a reference time domain related index Y’ equal to the time domain related index incremented with a shift amount equal to a base shift amount h of the window times the accumulated up/down-sampling factor S at the output of the layer L, which is the ratio sY/ns, wherein sY is the size of the layer L in the direction of the time domain related axis Y.
16. The neural network processing method according to claim 15, wherein the time dependent signal is a video signal or an audio signal.
17. The neural network processing method according to claim 15, further comprising receiving an input signal (M(t)) which is function of time and generating a spectrogram (S(x,y)) of the input signal as a function of time, repeatedly selecting a window comprising a plurality of spectrograms obtained in a time-interval (T) and being shifted in time relative to its immediately preceding window with a time-step (AT), which time-step is smaller than the time-interval T and subsequently processing the selected windows with the neural network processing method.
18. The neural network processing method according to claim 16, further comprising receiving an input signal (M(t)) which is function of time and generating a spectrogram (S(x,y)) of the input signal as a function of time, repeatedly selecting a window comprising a plurality of spectrograms obtained in a time-interval (T) and being shifted in time relative to its immediately preceding window with a time-step (AT), which time-step is smaller than the time-interval T and subsequently processing the selected windows with the neural network processing method.
19. A tangible computer-readable medium having computer-executable instructions stored thereon that, when executed by a processor, perform a method for neural network processing method for processing subsequent windows (W 1, W2, ...Wn) of a time dependent signal, each window comprising a plurality (ns) of samples of the time dependent signal for subsequent points in time in a timeinterval (T) and each window of samples being shifted in time relative to its immediately preceding window with a time-step (AT), which time-step is smaller than the time-interval (T), the time step corresponding to a base shift amount (h) defined by the neural network processing method comprising executing a neural network with a plurality of neural network layers (L), each with a respective plurality of neurons (N(...,Y,...L)), addressable with at least a time domain related index (Y), the neural network processor perform respective operations for respective windows of samples, a respective operation comprising: computing respective result signal values (R(...,Y,...,L)) of neurons (N(... , Y, ... ,L)) of a layer L, determining differential result signal values (D(...,Y,...,L)) of the neurons (N(... , Y, ... ,L)) of the layer (L), providing respective output signal values (O(...,Y,...,L)) for respective ones of the neurons (N(...,Y,...,L)) of the layer (L) of which the differential result signal values (D(...,Y,...,L)) exceed a threshold value; wherein the differential result signal value (D(...,Y,...,L)) of a particular neuron (N(...,Y,...,L)) is computed dependent on a value of a reference neuron in the same layer with reference time domain related index (Y as specified below, the differential result signal value (D(...,Y,...,L)) being equal to the difference between the result signal value (R(...,Y,...,L)) of the particular neuron (N(...,Y,...,L)) obtained in the respective operation n and the result signal value (R(...,Y’,...,L)) obtained in the respective operation for the preceding window of sample with a reference neuron (N(...,Y,...,L)) in the layer having a reference time domain related index Y’ equal to the time domain related index incremented with a shift amount equal to a base shift amount h of the window times the accumulated up/down-sampling factor S at the output of the layer L, which is the ratio sY/ns, wherein sY is the size of the layer L in the direction of the time domain related axis Y.
20. The tangible computer readable medium according to claim 19, wherein the time dependent signal to be processed is a video signal or an audio signal.

Description

NEURAL NETWORK PROCESSOR AND NEURAL NETWORK PROCESSING METHOD CLAIM OF PRIORITY This application claims the benefit of priority to European Patent Application Serial No. 23306052.4, filed on June 28, 2023, which is incorporated herein by reference in its entirety. TECHNICAL FIELD The present disclosure pertains to neural network processors. The present disclosure further pertains to a neural network processing method. BACKGROUND Neural networks become more and more advanced, and one of their applications is in the field of processing time-dependent signals, e.g. signals representing a video or a sound. In operation the neural network processes subsequent windows of the time dependent signal. Each window comprises a plurality of samples of the time dependent signal for subsequent points in time in a time-interval. Each window of samples is shifted in time relative to its immediately preceding window with a time-step that is smaller than the timeinterval over which the window extends. In processing time dependent signals it is necessary that the throughput of the neural network processor is sufficient in order to match the rate at which the time dependent signal is received. It is however also desired that this can be achieved without undue computational effort. SUMMARY According to a first aspect of the present disclosure a neural network processor is provided for efficiently processing subsequent windows of a time dependent signal. According to a second aspect of the present disclosure a neural network processing method is provided for efficiently processing subsequent windows of a time dependent signal. The present disclosure further pertains to a tangible or non-transitory computer-readable medium having computer-executable instructions stored thereon that, when executed by a processor, perform the method. Embodiments of the neural network processor of the present disclosure are configured to execute a neural network with a plurality of neural network layers. Each layer has a respective plurality of neurons of which the output is computed as a function of a set of one or more inputs. Typically, the set of inputs corresponds to a kernel in a previous layer or in the window being processed, wherein the kernel is centered around a position with coordinates corresponding to the coordinates of the neuron of which the output is being computed. E.g. if the window comprises a vector of audio signal values a(0),a(l),...,a(n-l) the input layer of the neural network may perform a convolution with a convolution kernel such that the output values o(j) of the neurons n(j) are determined as: Therein 2w+l is the size of the kernel It may be avoided that the value j+i is outside the boundary of the window by performing the addition modulo the size of the window or by skipping the contributions of non-existing input elements. Due to the fact that the window comprises a series of data elements, the audio signal values, ordered in time, the neurons o(j) as specified above also have a time-related ordering. I.e. the index according to which they are addressable is a time domain related index. This also applies if the layer applies a scaling. For example if a down-scaling is applied with a factor s, e.g. s=2, then the output values o(j) of the neurons n(j) are determined as: This further applies if the function with which the output values are computed is not a linear function as in the previous examples. More generally, in the one-dimensional case, the output value of a neuron o(j) may be written as o(j) = f a(sj — w), a(sj — w + 1), ..., a sj + w)). Typically the neural network has a set of successive layers and size of the layers determined for the time-domain related index reduces. For example, the set of successive layers comprises convolutional layers that perform a convolution in the time-domain related index (TDRI) or pooling layers that perform a pooling. Due to the fact that the output of a layer is a series of data elements having an ordering along a time domain related index, also the neurons of the subsequent layer have an ordering along a time-domain related index. The same principle is applicable to processing of higher dimensional time based signals. For example, in case the input signal is a multi-dimensional signal b(t,x,y,z), the output o(t’,x’,y’,z’) of a neuron n(t’,x’,y’,z’) of the input layer may be computed with a convolution like: Therein 2wt+l, 2wx+l, 2wy+l and 2wz+l are the sizes of the kernel along the time-axis t and the axis x,y,z for the further dimensions of the signal. Likewise, in this case, the neurons of the input layer are addressable with their time-domain related index t’, and additionally with their three other index axis x’,y’,z’. This is equally applicable if the layer performs a scale-down operation or a scale-up operation as specified in the example below. Also this is applicable if the function with which the output of the neuron is computed is not a linear operation, but a