CN-122021925-A - Execution path control method and equipment for pulse neural network reasoning
Abstract
The invention discloses an execution path control method and equipment for pulse neural network reasoning, wherein a group of execution path control data structures which can be directly read and executed by a processor are generated and solidified in an off-line stage, the processor reads the control data structure to generate a calculation unit enabling mask or an active index list according to low-cost statistical characteristics which can be obtained in real time in a prefix time window of an input sample, and accordingly, only performs scheduling execution on an arithmetic instruction sequence and a state update path corresponding to a shared subspace and a selected expert subspace, performs real skip on parameter block loading, arithmetic instruction transmitting, register or cache writing and state writing corresponding to an unsent subspace, and triggers Top-k or a full-activation rollback mechanism when confidence is insufficient. On the premise of controllable output precision, the invention reduces the number of multiply-add instruction sequences, the state write-back times, the storage access and the data carrying expenditure, and improves the resource utilization efficiency and the controllability of reasoning deployment.
Inventors
- XIE ZAIPENG
- WU YIJIA
- Zhou Yuantong
Assignees
- 河海大学
Dates
- Publication Date
- 20260512
- Application Date
- 20260214
Claims (10)
- 1. The execution path control method for pulse neural network reasoning is characterized by comprising the following steps: (1) Acquiring and preprocessing input data to be inferred, namely acquiring the input data to be inferred, preprocessing the input data, and encoding the input data into a pulse sequence or event representation suitable for a pulse neural network SNN; (2) Performing structural analysis on at least one target layer of the pre-trained dense SNN, dividing the target layer parameters into a shared subspace parameter section and a plurality of expert subspace parameter sections based on target layer weight parameters and a calibration data set, constructing a structural division mapping table Partition, evaluating output stability and resource consumption of different expert subspace enabling configurations, and generating a sample-level execution path control label for each sample; (3) The method comprises the steps of obtaining an offline statistical feature vector B-view corresponding to a preset prefix time window, training a router based on a sample level execution path control tag and the feature vector B-view to obtain a routing control parameter for predicting a special subspace starting result according to the B-view, solidifying and storing the structure division mapping table portion, the sample level execution path control tag and the routing control parameter in a memory in a data form which can be directly read by a processor, and reading by the processor in an online reasoning stage; (4) Acquiring low-cost statistical features of a target layer in a preset prefix time window, forming an online statistical feature vector B-view, reading the routing control parameters from the memory, and generating a special subspace starting result based on the B-view and the routing control parameters; (5) Generating an enabling control signal, namely combining an expert subspace enabling result and a structural division mapping table Partition to generate an enabling control signal for indicating whether each computing unit in a target layer participates in the reasoning calculation or not; (6) The true execution skips, namely enabling a computing unit in a non-enabled state to skip the corresponding arithmetic instruction transmission, neuron state updating and writing back, and loading and storing access of parameter blocks or intermediate activation data according to an enabling control signal; (7) And (3) outputting an reasoning task result, namely determining and outputting the reasoning task result according to the reasoning calculation result.
- 2. The method of claim 1, wherein the input data in step (1) comprises input images to be classified, event stream data, or other sensor timing data.
- 3. The execution path control method for pulse neural network reasoning according to claim 1, wherein the offline control data structure generation in step (2) comprises parameter reorganization, structure mining and teacher signal generation, wherein the parameter reorganization is executed by a processor in an offline stage and is used for reorganizing parameters of at least one target layer into a shared parameter section and a plurality of expert parameter sections on the premise of not changing reasoning semantics of a pre-training dense SNN under full expert enabling configuration, and generating an index mapping relation which can be referenced by a subsequent execution path control logic, providing an addressable parameter layout and structure division basis for subsequent execution path control, the structure mining utilizes the distribution behaviors of an internal computing unit of a calibration data statistics target layer to obtain division results of a shared subspace and a plurality of special subspaces, and writes the division results into a memory, generates a structure priori for solving a subsequent teacher signal as a routing control parameter, enables the control signal to be converted into a specific computing unit under the premise of not changing backbone network parameters under the reasoning stage, and is used for determining the consumption of a sample configuration in the sample configuration of a given sample, and the sample configuration is enabled under the offline stage, and the sample configuration is used for determining the sample configuration.
- 4. The method for controlling an execution path for pulse neural network reasoning according to claim 1, wherein the structure division mapping table Partition in the step (2) adopts an array, a bitmap or an index list form, and records the correspondence between a channel or a neuron group and subspace identification.
- 5. The method for performing path control for impulse neural network reasoning according to claim 1, wherein the generating of the sample level path control label in step (2) is based on a preset resource budget rule, and the resource budget rule includes at least three cost balance strategies including saving, balancing and conservation.
- 6. The method according to claim 1, wherein the routing control parameters in step (3) at least include a threshold parameter and a linear weight parameter, and are stored in the memory in a data structure of an array, a threshold table, a linear weight parameter table, an index list, a bitmap, or a combination thereof.
- 7. The method for controlling the execution path of the pulse-oriented neural network reasoning according to claim 1, wherein in the step (4), a rollback mechanism is set in the process of generating a special subspace starting result based on the B-view and the routing control parameters, the triggering condition comprises that the confidence of the routing output is lower than a threshold value, the expert selection ordering is unstable, the routing output distribution is too flat or budget statistics abnormal fluctuation is caused when the routing output is running, and the rollback action comprises that the Top-k value is lifted or the full expert subspace is started.
- 8. The method for controlling the execution path of the pulse neural network reasoning according to claim 1, wherein the enabling control signal in the step (5) is an enabling mask of a channel level or an expert level or an active index list, the computing units of the shared subspace and the selected expert subspace are in an enabling state, the computing units of the unselected expert subspaces are in a non-enabling state, and the enabling control signal can be directly read and resolved by an execution engine.
- 9. The execution path control method for impulse-oriented neural network reasoning according to claim 1, wherein the true execution skip of step (6) is implemented by: ① According to the enabling mask or the active index list, only enabling enabled channels or experts to enter corresponding instruction sequences through branch or loop control on the general processor, so that multiply-add, compare and write-back instructions of the disabled computing units are not transmitted; ② An operator or Kernel is only created and transmitted for the shared subspace and the selected expert subspace through operator or Kernel level scheduling on a graphic processor or other accelerators, so that parameter blocks corresponding to unselected experts do not participate in the execution; ③ By generating an active computing unit index list according to the enabling mask, performing operations and state updates only on computing units indicated by the index list, effective computation is aggregated into fewer working sets, reducing small-granularity task fragments and synchronization overhead.
- 10. An electronic device is characterized by comprising a memory, a processor and an accelerator; The memory is used for storing a computer program capable of running on the processor, and comprises a memory for storing pre-training dense SNN model parameters, image classification data set samples for calibration and evaluation, A-view and B-view statistical characteristics, structure division mapping partitions, sample level execution path control labels and routing control parameters; The processor is configured to execute the steps of the execution path control method for impulse-oriented neural network reasoning as claimed in any one of claims 1 to 9, when the computer program is run; The accelerator is used for accelerating forward reasoning and offline evaluation of the impulse neural network.
Description
Execution path control method and equipment for pulse neural network reasoning Technical Field The invention relates to artificial intelligence and deep learning technology in the technical field of computers, in particular to a method and equipment for controlling an execution path of pulse neural network reasoning. Background The impulse neural network (Spiking Neural Network, SNN) typically processes the inputs in discrete time steps, performs synaptic weighted summation, membrane potential accumulation, threshold comparison and impulse output at each time step, and accompanies updating and writing back of the neuron state (e.g., membrane potential, reset state, refractory period flag, etc.). When SNN reasoning is performed on a general purpose processor (CPU), graphics Processor (GPU) or other general purpose accelerator, the above process is manifested at an implementation level in that the processor repeatedly reads model parameters and intermediate activation data from memory in multiple time step loops, executes a large number of arithmetic and comparison instructions, and writes updated intermediate states back to memory or cache levels, with associated arithmetic instruction transmission, state write back, and memory access overhead often being difficult to automatically eliminate through existing hardware microarchitecture or conventional compilation optimization. In recent years, in order to improve the representation capability and accuracy of SNN on complex tasks such as image classification, the prior art often expands the model capacity by increasing the network depth, widening the channel number, introducing attention mechanisms, and the like. However, as model sizes increase and the number of time steps increases, the number of synaptic events and pulses issued in the network increases, resulting in an increase in the number of state updates associated with the synaptic events, an increase in the frequency of register or cache writes, and an increase in the amount of access to intermediate activations and parameters. For dense SNNs deployed in the form of fixed operators or Kernel on general processors or accelerators, these extensions ultimately manifest as significant increases in computational load, memory access load and data handling overhead in the reasoning phase, weakening the deployment advantages in low power consumption or low latency reasoning scenarios. To control the single inference overhead while increasing the model capacity, the prior art has proposed various attempts to reduce the inference cost, such as parameter pruning, quantization compression, and conditional computation. One type of representative solution is to introduce a structure with sub-network selection capability such that each input sample is computed only through part of the sub-network or expert path at the time of reasoning. Taking a hybrid expert structure (Mixture of Experts, moE) as an example, a part of private subnetworks are selected to participate in the computation according to input features, typically through a gating or routing module, so that the overall capacity of the model is theoretically decoupled from the single-pass reasoning computation. Some studies have also attempted to apply similar ideas to SNN by selecting different pulsed subnetwork paths for different time steps or samples through a gating mechanism based on pulsed activity or threshold. However, from the perspective of computer execution path control, the above-described sub-network selection class scheme still has the following common problems when specifically landing on a general-purpose processor or accelerator: First, gating mechanisms often exist in the form of soft gating or probability selection, lacking the hard routing constraints of interfacing directly with the underlying execution engine. Even if a sparse activation mode can be formed at the model representation level, operators or Kernel may still need to be allocated to a plurality of candidate experts during actual execution and corresponding intermediate state buffers are prepared, so that the expert paths which are not expected to be selected still generate arithmetic operation and state write-back, a situation that the surface is sparse but the actual execution load is not obviously reduced is formed, and the instruction quantity and the memory overhead are difficult to reduce in time and stably. Secondly, SNN has obvious time sequence characteristics and state accumulation characteristics, and states such as membrane potential, adaptive threshold and the like of neurons continuously evolve over multiple time steps. When different expert paths or sub-network paths are frequently switched in the time dimension, the intermediate states corresponding to the different paths need to be frequently read, written and switched, cache failure, data carrying increase and additional context switching overhead are easy to cause, meanwhile, the assumption of continuous