CN-121981310-A - Thermal power plant pollutant concentration prediction method based on frequency domain residual error and multi-expert fusion
Abstract
The invention discloses a thermal power plant pollutant concentration prediction method based on frequency domain residual error and multi-expert fusion, and aims to solve the problems of insufficient utilization of frequency domain characteristics, weak model generalization capability and low prediction precision of the existing method. According to the invention, a frequency domain information residual error extraction module is introduced, the time domain characteristics are transformed, and the enhanced frequency domain information is fused into the time domain characteristics in a residual error mode, so that the frequency domain information residual error characteristics containing rich periodicity and fluctuation information are obtained. Furthermore, the feedforward network in the transducer model is replaced by a multi-expert fusion module based on frequency domain residual error gating, and expert knowledge is adaptively selected and fused according to the data characteristics. The method provided by the invention has the advantages of remarkably higher prediction precision and stronger generalization capability on the task of predicting the concentration of the pollutant nitrogen oxides discharged by the thermal power plant. The method is suitable for various time sequence predictions, in particular for industrial data prediction with complex periodicity and fluctuation characteristics.
Inventors
- YOU WEIJIE
- YING CHENHAO
- CHEN PING
- XU WEIQIANG
- CHEN JINSHUI
- LU JIANGANG
Assignees
- 浙江大学
Dates
- Publication Date
- 20260505
- Application Date
- 20251208
Claims (9)
- 1. The thermal power plant pollutant concentration prediction method based on frequency domain residual and multi-expert fusion is characterized by adopting a time sequence prediction model FOMA, wherein the time sequence prediction model FOMA comprises a feature encoder, a frequency domain information residual extraction module, a multi-expert fusion module based on frequency domain residual gating and a prediction output module, the frequency domain information residual module extracts time-frequency transformation, frequency domain feature enhancement, inverse time-frequency transformation and residual connection, the multi-expert fusion module based on frequency domain residual gating comprises a plurality of expert networks and a gating network based on frequency domain residual information, and the prediction method comprises the following steps: Preprocessing time series data of the concentration of the pollutant nitrogen oxide discharged by a thermal power plant, wherein the preprocessing comprises the steps of filling a missing value by adopting linear interpolation or moving average, identifying and correcting an abnormal value, carrying out maximum normalization or standardization on the data to enable the data distribution range to meet the model training requirement, combining a synchronously acquired technological process data set, and constructing a multidimensional input characteristic data set; Loading a pre-training weight which is obtained by training on a related time sequence prediction task data set and is disclosed by the feature encoder, and initializing the network weight so as to accelerate model convergence and improve prediction performance by utilizing a learned general time sequence mode; Reading the multidimensional input characteristic data processed in the step (1), extracting a multi-scale time domain characteristic representation from the input characteristic after the characteristic encoder in the step (2) finishes the pre-training weight loading, wherein the characteristic output of the ith layer is as follows Wherein B is the batch size, L e is the length of the ith layer feature sequence, and D i is the dimension of the ith layer feature, and finally the deep multi-scale feature X deep in the multi-scale feature contains more abstract and global time sequence semantic information for subsequent frequency domain information residual extraction; And (4) carrying out frequency domain information residual extraction on the deep multi-scale time domain features extracted in the step (3), wherein the method specifically comprises the following steps: Performing time domain to frequency domain transformation, namely performing one-dimensional fast Fourier transformation on each characteristic dimension D of X deep along a time sequence dimension L to obtain a frequency domain characteristic F deep ∈C B×L×D in a complex form; The method comprises the steps of (4.2) enhancing frequency domain characteristics, enhancing the frequency domain characteristics of F deep through a frequency domain attention mechanism, and learning the importance of different frequency components, wherein the specific steps are that F deep generates inquiry, key and value representation through three independent linear layers and an activation function, calculates attention scores between the inquiry and the key, normalizes the attention scores through a softmax function to obtain frequency domain attention weight A freq ∈R B×L×D , multiplies A freq by the value representation to obtain frequency domain enhanced characteristics F' deep so as to highlight frequency components which are strongly related to the periodic and fluctuation changes of pollutant concentration, and suppresses unimportant frequency noise; The frequency domain is inverse transformed to the time domain and connected with the residual, one-dimensional inverse fast Fourier transform is respectively carried out on each characteristic dimension D of the F' deep along the frequency dimension L, and the enhanced frequency domain characteristics are converted back to the time domain characteristics; adding the restored time domain features and the deep multi-scale time domain features obtained in the step (2) element by element to obtain frequency domain information residual features R deep =X deep +X′ deep , wherein the residual features are used as inputs of a gating network in a transducer module to guide an expert in an expert network to select and serve as inputs of the expert network; And (5) sending the frequency domain information residual characteristic R deep generated in the step (4) to the multi-expert fusion module, wherein the construction of the multi-expert fusion module specifically comprises the following steps: Constructing a plurality of expert networks, namely constructing M independent expert networks E 1 ,E 2 ,...,E j ,...,E M , wherein a j-th expert network E j is an independent feed-forward network and has independent parameters, and the j-th expert network E j receives the residual characteristics of the frequency domain information as input and outputs the processed characteristic representation F j =E j (R deep ; Constructing a gating network based on frequency domain residual information, constructing a gating network G, wherein the gating network G receives frequency domain information residual characteristics R deep as input and outputs an M-dimensional weight vector W gate , the gating network G consists of a full-connection layer and a softmax activation function, and ensures that the sum of all weights is 1; Step (5.3) gating weighted fusion, wherein the weight vector W gate output by the gating network and the characteristic representation F j output by each expert network are subjected to weighted summation to obtain the final output characteristic of the multi-expert fusion module The output characteristic F MoE of the multi-expert fusion module is used as the input of a linear output layer, and a final prediction result is obtained through a linear network; The method comprises the steps of (6) training a model, namely dividing the data set into a training set, a verification set and a test set according to a preset proportion by using a pollutant nitrogen oxide concentration data set and a process data set discharged by a thermal power plant, wherein a loss function adopts a mean square error and an average absolute error and aims at minimizing the difference between a predicted value and a true value, repeating the steps (1) to (5) on the training set data, inputting the training set data into the model to obtain an output predicted sequence, calculating the mean square error and the average absolute error between the predicted sequence in the data set and the model output predicted sequence, and repeating the process until the calculated error is not reduced any more, so that the model performance is optimal, and updating network parameters to obtain an optimal FOMA model. And (7) model reasoning, namely carrying out data processing in the step (1) on the pollutant concentration and auxiliary characteristic data input in each batch, uniformly adjusting the pollutant concentration and auxiliary characteristic data to the characteristic size input by the model, and generating a pollutant concentration prediction sequence through the model obtained in the step (6).
- 2. The method for predicting the pollutant concentration of the thermal power plant based on the fusion of the frequency domain residual error and the multiple experts according to claim 1, wherein in the step (4), the time domain to frequency domain transformation adopts a one-dimensional fast fourier transform on the whole time sequence characteristic X deep to capture the global frequency information of the sequence, the frequency domain characteristic F deep is in a complex form, the amplitude value of the frequency domain characteristic F deep represents the intensity of the corresponding frequency component, and the phase value of the frequency domain characteristic F deep represents the relative position of the frequency component.
- 3. The method for predicting the pollutant concentration of the thermal power plant based on the fusion of the frequency domain residual error and the multiple experts according to claim 1, wherein in the step (4), the query, key and value generation mode of the frequency domain attention mechanism is that Q=F deep W Q ,K=F deep W k ,V=F deep W v , and the attention score is calculated by calculation The result, where D k is the bond dimension.
- 4. The method for predicting the pollutant concentration of the thermal power plant based on the fusion of the frequency domain residual error and the multiple experts according to claim 1, wherein in the step (5), the M expert networks are independent fully connected feedforward networks with the same or different layers and hidden units, and the input and output dimensions of each expert network are consistent with the input and output dimensions of a feedforward neural network in a fransformer architecture.
- 5. The method for predicting the pollutant concentration of the thermal power plant based on the fusion of the frequency domain residual error and the multiple experts according to claim 1, wherein in the step (5), the gating network G is composed of one or more fully connected layers, the final output layer of the gating network G is connected with a softmax activation function to generate a weight vector W gate , and the gating network can combine with load balancing loss in the training process to encourage the gating network to uniformly activate different experts, so that the situation that part of the experts are overused and other experts are underutilized is avoided.
- 6. The thermal power plant pollutant concentration prediction method based on frequency domain residual and multi-expert fusion according to claim 1, wherein the frequency domain information residual extraction module can be embedded in different levels of a transform encoder, namely, in each transform block, frequency domain information residual extraction is firstly performed on the output characteristics of the current block, and then the obtained frequency domain information residual characteristics are used as the inputs of a gate network and all expert networks of the multi-expert fusion module in the block.
- 7. The thermal power plant pollutant concentration prediction method based on the fusion of the frequency domain residual errors and the multiple experts according to claim 1 is characterized in that the thermal power plant is a coal-fired power plant, the thermal power plant pollutant is nitrogen oxide, and the process data set comprises a plurality of physical quantity data of temperature, humidity, wind speed, air pressure, boiler load, power generation load, oxygen content, wind pneumatic valve position and water supply quantity in the process.
- 8. A non-transitory computer readable storage medium having stored thereon a computer program, which when executed by a processor, implements a thermal power plant pollutant concentration prediction method based on frequency domain residual and multi-expert fusion as claimed in any one of claims 1 to 7.
- 9. An electronic device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, wherein the processor implements the thermal power plant pollutant concentration prediction method based on frequency domain residual and multi-expert fusion of any one of claims 1 to 7 when the program is executed by the processor.
Description
Thermal power plant pollutant concentration prediction method based on frequency domain residual error and multi-expert fusion Technical Field The invention belongs to the technical field of artificial intelligence and environmental monitoring, and particularly relates to a time sequence prediction method based on deep learning, which is particularly applied to high-precision prediction of the concentration of pollutants discharged by a thermal power plant. Background Thermal power plants are important energy supply main bodies, and pollutant nitrogen oxide emission is one of main sources of atmospheric pollution. The accurate prediction of pollutant emission concentration of a thermal power plant has important significance for environmental protection, emission control, optimal operation and establishment of effective environmental policies. Traditional methods of contaminant concentration prediction mainly include models based on statistics and models based on shallow machine learning. The method has a certain effect in the aspects of processing linear relation and short-term prediction, but the prediction precision and the robustness of the method are difficult to meet the actual demands when facing the inherent nonlinearity, multimodality, high noise and complex time sequence dependence of the emission data of the thermal power plant. In recent years, with the development of deep learning technology, time sequence prediction methods based on models such as a cyclic neural network, a convolutional neural network and a transducer have made significant breakthrough in various fields. In particular, the transducer model can effectively capture long-distance dependence by virtue of a strong self-attention mechanism, and shows excellent performance in the aspect of processing long-sequence data. However, existing deep learning-based contaminant prediction methods still present the following challenges: The frequency domain characteristics are not utilized enough, namely the pollutant emission concentration of the thermal power plant is often influenced by various factors such as operation period, seasonal variation, equipment maintenance and the like, and the thermal power plant has obvious periodicity and fluctuation. The traditional model usually only focuses on time domain features, and the abundant frequency domain information contained in the data is underutilized, so that the model is difficult to accurately capture the regular changes. The model has poor generalization capability and robustness, namely, the operation working condition of the thermal power plant is complex and changeable, and the emission mode of pollutant nitrogen oxides can have various states. The single model is difficult to adapt to the prediction tasks under all working conditions, and the problems that the specific mode is over-fitted and the generalization capability of other modes is poor easily occur. When faced with new or unusual operational scenarios, the prediction accuracy of the model may be significantly degraded. The prediction accuracy is balanced with the complexity of the model, so that the model tends to be complicated to improve the prediction accuracy, so that the training cost is high, the reasoning speed is low, and the model is easy to be over-fitted under small sample or noise data. How to improve the adaptability and the processing efficiency of the model to different data modes while guaranteeing the prediction precision is the key point of the current research. In view of the above problems, the prior art attempts to introduce various improvements, such as capturing nonlinear relationships by designing more complex network structures, or improving model robustness by ensemble learning. However, these methods often fail to adequately fuse time domain and frequency domain information or lack a fine-grained expert distribution mechanism in processing multi-modal data. Particularly in the transducer architecture, the feedforward neural network is usually fixed, and lacks the adaptive capability to input data diversity. Therefore, how to design an intelligent prediction method capable of effectively fusing frequency domain information and dynamically adjusting model behaviors according to data characteristics becomes a technical problem to be solved currently. Disclosure of Invention In order to solve the problems in the background art, the invention aims to provide a thermal power plant emission pollution concentration prediction method combining frequency domain information residual extraction and a multi-expert fusion mechanism, so as to overcome the defects of the prior art in the aspects of frequency domain feature utilization, model generalization capability and prediction precision. Therefore, the invention adopts the following technical scheme: The thermal power plant pollutant concentration prediction method based on frequency domain residual and multi-expert fusion adopts a time sequence prediction mo