CN-122021887-A - Module level interpretability optimization method for time sequence transducer model

CN122021887ACN 122021887 ACN122021887 ACN 122021887ACN-122021887-A

Abstract

The invention discloses a module-level interpretability optimization method for a time sequence transducer model, which comprises a mask generation and disturbance stage and an optimization model application stage, wherein the mask generation and disturbance stage acquires a time sequence data set and divides a training set, a pre-training time sequence transducer base model to be optimized is loaded, and the problems of serious redundancy, insufficient noise resistance and single interpretability of the traditional X-formers model architecture are solved by constructing a time-module fusion network, executing frequency domain input disturbance based on wavelet transformation, executing module disturbance and executing joint optimization, so that fine-grained feature screening and model pruning of time-frequency domain dimensions are realized, the model prediction performance is not reduced, and simultaneously, the calculation cost and the memory occupation are remarkably reduced.

Inventors

WANG YONG
TAN SHAOQI
YANG MINGJIAN
XU CHANGJIAN

Assignees

电子科技大学

Dates

Publication Date: 20260512
Application Date: 20260119

Claims (6)

1. A module level interpretability optimization method for a time series transducer model, comprising a mask generation and perturbation phase and an optimization model application phase, the mask generation and perturbation phase comprising the steps of: a1, acquiring a time sequence data set, dividing a training set, and loading a pre-training time sequence transducer base model to be optimized; A2, constructing a time-module fusion network, and generating a multichannel input mask with high interpretability by utilizing different frequency component characteristics of input data and model structure information; a3, executing frequency domain input disturbance based on wavelet transformation, and carrying out fine disturbance on a time sequence in a wavelet domain by utilizing the generated multi-channel input mask to generate disturbance input with prominent key characteristics; Step A4, executing module disturbance, constructing a module mask capable of learning, and determining to keep an original module or replace the original module with an identity mapping according to the state of the module mask to generate a disturbance model; and step A5, executing joint optimization, transmitting disturbance input into a disturbance model, calculating the difference between output and output of an original model, constructing a total loss function by combining mask sparsity regular terms, and updating parameters of an input mask and a module mask in parallel.
2. The method for module-level interpretability optimization of a time-series transducer model of claim 1, wherein said step A2 comprises the steps of: A21, constructing a feature fusion middle layer, and inputting time series data , Wherein For the time step size of the time step, Mapping to and module mask by linear transformation The adaptive dimension is multiplied by the module mask element by element to generate the mixed feature of the fused model structure information Expressed as: ; Step A22 utilizing the hybrid features Generating a query matrix by two independent linear transformations Sum key matrix Directly to the original input data Performing linear transformation to generate a matrix of values Expressed as: ; ; ; Step A23, calculating scaled dot product attention and input mask generation based on query matrix Key matrix Sum matrix Calculating the attention distribution by querying the matrix Sum key matrix Is associated with the module and applies the attention weight to the value matrix Processing the attention output by a feed forward neural network FFN to obtain intermediate features for generating an input mask Expressed as: ; step A24 based on intermediate features By means of Parallel generation of corresponding frequency domain mask sets by independent projection heads Wherein Corresponding to the low-frequency approximation component, Corresponding to each level of high-frequency detail component Any of the first Frequency components, corresponding masks The calculation formula is as follows: 。
3. a method of module-level interpretability optimization for a time-series transducer model as recited in claim 2, wherein step A3 includes the steps of: step A31, based on discrete wavelet transform, time series data is obtained Proceeding with Stage decomposition to obtain Sets of frequency components Expressed as: ; Step A32 using the mask generated in step A24 Band-corresponding reference disturbance value generated by combining two-way GRU For each frequency component Independent disturbance treatment is carried out for Any of the first Individual components, components after disturbance The calculation formula is as follows: ; step A33, all disturbed frequency components are processed Reconstructing back to the time domain through inverse discrete wavelet transform to obtain final disturbance input data Expressed as: 。
4. A module-level interpretability optimization method for a time-series transducer model as recited in claim 3, wherein step A4 includes the steps of: step A41 defining an updatable module parameter matrix Calculating a binarized module mask Expressed as: ; step A42 for the first of the models Modules, if Replacing it with identity mapping if The original module is reserved; step A43, constructing disturbance model The output is 。
5. The method of module-level interpretability optimization for a time-series transducer model of claim 4, wherein the loss function for joint optimization in step A5 is expressed as: ; Wherein, the Representing the difference between the output of the original model and the disturbance model; the output of the original model; for sparsity regularization terms, promoting mask sparsification to identify the most critical portions; The validity of the reference disturbance value corresponding to the bidirectional GRU generation frequency band is ensured.
6. The module-level interpretability optimization method for a time-series transducer model of claim 5, wherein the optimization model application phase includes the steps of: step B1, obtaining a trained module mask, and identifying a redundant module with a mask value of 0; Step B2, pruning reconstruction is carried out on the original time sequence converter model, and the identified redundant module is permanently replaced by an identity mapping or other more efficient modules to obtain a lightweight optimization model; And B3, inputting the time series data to be predicted into an optimization model to obtain a final prediction result.

Description

Module level interpretability optimization method for time sequence transducer model Technical Field The invention relates to the technical field of time sequence analysis and deep learning model optimization, in particular to a module level interpretability optimization method for a time sequence transducer model. Background The time series prediction has wide application in the fields of energy management, traffic flow prediction, meteorological analysis and the like. In recent years, variant models based on the transducer architecture (collectively referred to as X-formers) have been proposed in large numbers and exhibit excellent predictive performance. However, these performance improvements are often accompanied by significant architectural redundancy. Studies have shown that many of the modules in X-formers not only add computational overhead, but may also impair the ability of the model to capture key features. Although there have been studies attempting to use CNN or structured matrices to simplify the attention mechanism, or to propose non-Transformer models (e.g., DLinear, timesNet), it remains a challenge to remove redundant components while retaining the advantages of the Transformer architecture. At present, the research on model interpretability is mainly divided into two types, namely a pre (Ante-hoc) method and a Post-hoc method. The prior approach is represented by an interpretive driven neural architecture search (INAS), but it is difficult to balance architecture interpretive and model performance and has limited generalization ability. Post-hoc methods are based on perturbation-based methods (e.g., gatemask, dynamask), typically consider models as black boxes, and mask perturbation of the input data in mostly a single time dimension. The single-scale disturbance mode cannot effectively distinguish low-frequency components bearing long-term trends in the time sequence from high-frequency noise containing random disturbance, so that interpretation results are easy to confuse key signals and noise interference, and the capability of fine analysis on the data frequency domain characteristics is lacking. Therefore, the deep interactive relation between the X-formers internal module and the multi-scale time-frequency characteristic of the input data cannot be effectively revealed in the prior art, and model optimization with strong noise immunity and fine granularity is difficult to realize. The method for simultaneously optimizing the multi-scale frequency domain feature selection and the model module structure is urgently needed, and on the premise of guaranteeing the prediction performance and inhibiting high-frequency noise interference, the complexity of the model is reduced, and the interpretability is improved. Disclosure of Invention Aiming at the technical problems, the invention provides a module-level interpretability optimization method for a time sequence transducer model, which is used for identifying redundant modules in the model through time-frequency domain joint analysis and utilizing a multi-channel mask to accurately position key input features (such as long-term trend and short-term fluctuation) under different frequency components, so that the computing efficiency, robustness and interpretability of the model in a time sequence prediction task are improved, and the defects that the X-formers model is redundant in architecture, the computing cost is high and the importance of the modules cannot be finely evaluated in the prior art are overcome. The invention is realized by adopting the following technical scheme: A module level interpretability optimization method for a time series transducer model, comprising a mask generation and perturbation phase and an optimization model application phase, the mask generation and perturbation phase comprising the steps of: a1, acquiring a time sequence data set, dividing a training set, and loading a pre-training time sequence transducer base model to be optimized; A2, constructing a time-module fusion network, and generating a multichannel input mask with high interpretability by utilizing different frequency component characteristics of input data and model structure information; a3, executing frequency domain input disturbance based on wavelet transformation, and carrying out fine disturbance on a time sequence in a wavelet domain by utilizing the generated multi-channel input mask to generate disturbance input with prominent key characteristics; Step A4, executing module disturbance, constructing a module mask capable of learning, and determining to keep an original module or replace the original module with an identity mapping according to the state of the module mask to generate a disturbance model; and step A5, executing joint optimization, transmitting disturbance input into a disturbance model, calculating the difference between output and output of an original model, constructing a total loss function by combining mask spa