CN-121997980-A - Reinforced learning time sequence analysis system and method based on deep mixing expert

CN121997980ACN 121997980 ACN121997980 ACN 121997980ACN-121997980-A

Abstract

The invention provides a reinforcement learning time sequence analysis system and method based on a deep mixing expert, wherein the reinforcement learning time sequence analysis system comprises a preprocessing module, a strategy network and a weight generator, wherein the preprocessing module is used for denoising time sequences, the denoised time sequences are divided according to fixed window sizes to obtain preprocessed time sequences, the strategy network is used for encoding and decoding the time sequences processed by the preprocessing module based on the mixing expert layer to obtain scoring information of the time sequences, and the weight generator is used for obtaining the weight of each time sequence based on the scoring information of the strategy network. According to the invention, through the data preprocessing module, the strategy network and the weight generator, better time sequence characteristic extraction and better time sequence scoring are realized.

Inventors

Wu Zukai
Tu Shikui
XU LEI

Assignees

上海交通大学

Dates

Publication Date: 20260508
Application Date: 20241104

Claims (10)

1. A deep-mix expert-based reinforcement learning time series sequence analysis system, comprising: The preprocessing module is used for denoising the time sequence, and dividing the denoised time sequence by a fixed window size to obtain a preprocessed time sequence; the strategy network is a reinforcement learning network, and is based on a mixed expert layer, and the strategy network encodes and decodes the time sequence processed by the preprocessing module to obtain scoring information of the time sequence; And a weight generator for obtaining the weight of each time sequence based on the scoring information of the strategy network.
2. The deep-mix expert-based reinforcement learning time series sequence analysis system of claim 1, wherein the preprocessing module comprises: the decomposition unit is used for receiving the time sequence, decomposing the time sequence by utilizing a wavelet basis function and obtaining wavelet coefficients; A filtering unit for threshold filtering the wavelet coefficient; A reconstruction unit for generating a noise reduction time sequence based on the wavelet basis function by using the wavelet coefficient filtered by the threshold value; The dividing unit divides the noise-reduced time sequence with a fixed window size, and takes the time sequence divided by the equal window as a preprocessing result.
3. The deep-mix expert-based reinforcement learning time series sequence analysis system according to claim 1, wherein the strategy network is a U-shaped codec structure, comprising: the coder extracts the coding time sequence characteristics of the preprocessed time sequence; The decoder decodes based on the coding time sequence characteristics to obtain decoding time sequence characteristics of the input time sequence; And the full-connection layer is used for obtaining scoring information of the input time sequence based on the coding time sequence characteristic and the decoding time sequence characteristic.
4. A deep hybrid expert based reinforcement learning time series sequence analysis system according to claim 3, wherein the encoder comprises a segmentation layer, N hybrid expert layers and N-1 fusion layers, each fusion layer is provided with one hybrid expert layer, and the segmentation layer is followed by one hybrid expert layer; Wherein: the splitting layer divides an input time sequence into a plurality of subsequences and splices the subsequences; The fusion layer fuses the subsequences output by the segmentation layer or the mixed expert layer in pairs, so that the number of the subsequences is reduced; and the mixed expert layer extracts coding time sequence characteristics from the subsequences output by the segmentation layer or the fusion layer.
5. The deep hybrid expert based reinforcement learning timing sequence analysis system of claim 4, wherein the hybrid expert layer comprises 4 expert networks, a routing network, a normalization network, a selection network, and a summation network, wherein, Normalizing network, normalizing the input subsequence; The routing network calculates 4 pre-estimated values for the normalized result, and the 4 pre-estimated values correspond to 4 expert networks respectively; selecting a network, and selecting a corresponding expert network based on the maximum value of the predicted value; the expert network performs feature extraction on the normalization result according to the selected expert network; And the summing network fuses the input subsequence, the maximum value and the features extracted by the expert network to obtain the coding time sequence features of the input subsequence.
6. The deep mixed expert based reinforcement learning time sequence analysis system according to claim 3, wherein the decoder comprises N gating multi-layer perceptron layers and N-1 separation layers, wherein each separation layer is provided with one gating multi-layer perceptron layer, the gating multi-layer perceptron layers and the mixed expert layers are symmetrically distributed, and the separation layers and the fusion layers are symmetrically distributed; Wherein: the gating multi-layer perceptron layer extracts decoding time sequence characteristics from the input time sequence; and the separation layer is used for carrying out separation decoupling on the decoding time sequence characteristics output by the gating multi-layer perceptron layer, so that the number of subsequences is increased.
7. A deep hybrid expert based reinforcement learning time series sequence analysis system according to claim 3 wherein the expert network in the encoder and the gated multi-layer perceptron layer of the decoder are identical in structure.
8. The deep-mix expert-based reinforcement learning time series sequence analysis system according to claim 7, wherein the strategy network comprises a plurality of layers forming a U-shaped structure, in particular: layer 0, there is a split layer, divide the sequence of time sequence input into multiple subsequences through the split layer; layer 1, which is provided with a mixed expert layer and a gating multi-layer perceptron layer at the opposite side; the system comprises a layer 2, a layer N, a group of mixed layers and mixed expert layers, a group of gating multi-layer perceptron layers and separating layers, wherein each layer is provided with a group of mixed layers and mixed expert layers; the input of the expert mixed layer of the layer 1 is the output of the partition layer of the layer 0, the output is the extracted time sequence characteristic and is recorded as the time sequence characteristic of the layer 1 coding; from the 2 nd layer to the N th layer, the input of the fusion layer of the i th layer is the output of the mixed expert layer of the i-1 th layer, the input is fused in the fusion layer, and the input is output to the mixed expert layer of the i th layer for feature extraction, so that corresponding i-layer coding time sequence features are obtained; The input of the N-layer gating multi-layer perceptron layer is the output of the mixed expert layer of the N-layer, namely the N-layer coding time sequence characteristic, after the characteristic is extracted by the gating multi-layer perceptron layer, the input of the N-layer gating multi-layer perceptron layer is separated and decoupled, and the output of the N-layer separating layer is called the N-layer decoding time sequence characteristic; The input of the gating multi-layer perceptron layer from the N-1 layer to the 1 layer and the i layer is the sum of the output of the separation layer of the i+1 layer and the output of the mixed expert layer of the i layer, wherein the input of the separation layer of the i layer is the extracted characteristic of the gating multi-layer perceptron layer, and the output is the decoding time sequence characteristic of the i layer after separation and decoupling; The N full-connection layers are distributed between the 0 th layer and the N-1 th layer, wherein the input of the 0 th layer connection layer is the output of the 0 th layer segmentation and the output of the 1 st layer gating multi-layer perceptron layer, and the output is the addition result of the input; the inputs of the 1 st layer to the N-1 st layer connecting layer are the output of the i th mixed expert layer and the output of the i+1 th separating layer, and the output is the addition result of all the inputs.
9. The deep-mixed expert-based reinforcement learning time sequence analysis system according to claim 7, wherein the strategy network is a reinforcement learning network, the input of which is a preprocessed time sequence, namely, status information, and the output of which is a score of each time sequence, namely, action information; Defining an optimization target based on the state and the action of the policy network: r t ＝y t ·π(a t |s t ,θ) Where pi (a t |s t , θ) represents the return of a given state s t , action a t ,y t taken by the policy network represents the rate of return between states s t and s t+1 , R t represents the single step return that can be achieved after taking the action, R (τ) represents the return after performing the reinforcement learning trajectory, and E τ～πθ represents the mean square error.
10. The method for analyzing the reinforcement learning time sequence based on the deep mixing expert is characterized by comprising the following steps of: denoising the time sequence, dividing the denoised time sequence by a fixed window size to obtain a preprocessed time sequence; Based on a mixed expert, encoding and decoding the preprocessed time sequence to obtain scoring information of the time sequence; And obtaining the weight of each time sequence based on the scoring information.

Description

Reinforced learning time sequence analysis system and method based on deep mixing expert Technical Field The invention relates to the technical field of data processing and analysis, in particular to a reinforcement learning time sequence analysis system and method based on deep mixing expert. Background The time series is widely used in the real world, such as a trend of weather, a trend of energy consumption, fluctuation of financial stock market, etc., which are presented in the form of time series data. The importance of the predicted time series is also a key topic, and researchers need to analyze the importance of each time series according to the historical trend of a plurality of time series to make subsequent professional decisions. For example, the energy consumption is predicted to be greatly increased, and the energy related early planning is performed. In recent years, deep learning has been used to predict future trends in time series to guide decisions on related downstream tasks. Accurate predictions are extremely difficult because the timing sequence is noisy and subject to a variety of factors. While Deep Reinforcement Learning (DRL) can interact with the environment to achieve better environmental adaptation, deep reinforcement learning has attracted attention from various disciplines researchers, which has the potential to address complex decision challenges. In the hybrid traffic field, DRL is used to manage signalized intersections, combining Connected and Autonomous Vehicles (CAVs) with human-driven vehicles (HVs), employing the method of DQN. In the financial field DEEPTRADER introduces a causal graph convolution network to obtain the links between stocks, uses TCN to extract timing features, and finally obtains the weights of the investments through an investment portfolio generator. Deep reinforcement learning has further improved the ability to balance return on investment and risk in the financial domain. Although the time series analysis model based on deep reinforcement learning has satisfactory results, there are limitations. Previously used reinforcement learning methods often employ a unified model to analyze different time series sequences, which may result in sub-optimal performance due to the different characteristics of each time series sequence. The influence factors received by the whole different time sequences are different, so that the time sequence analysis is also very important for predicting the change trend of weather, the trend of energy consumption, the fluctuation of financial stock market and the like. Disclosure of Invention In view of the defects in the prior art, the invention aims to provide a reinforcement learning time sequence analysis system and method based on a deep mixing expert. According to one aspect of the present invention, there is provided a deep-mix expert-based reinforcement learning time series analysis system, including: The preprocessing module is used for denoising the time sequence, and dividing the denoised time sequence by a fixed window size to obtain a preprocessed time sequence; the strategy network is a reinforcement learning network, and is based on a mixed expert layer, and the strategy network encodes and decodes the time sequence processed by the preprocessing module to obtain scoring information of the time sequence; And a weight generator for obtaining the weight of each time sequence based on the scoring information of the strategy network. Preferably, the preprocessing module includes: the decomposition unit is used for receiving the time sequence, decomposing the time sequence by utilizing a wavelet basis function and obtaining wavelet coefficients; A filtering unit for threshold filtering the wavelet coefficient; A reconstruction unit for generating a noise reduction time sequence based on the wavelet basis function by using the wavelet coefficient filtered by the threshold value; The dividing unit divides the noise-reduced time sequence with a fixed window size, and takes the time sequence divided by the equal window as a preprocessing result. Preferably, the policy network is a U-type codec structure, including: the coder extracts the coding time sequence characteristics of the preprocessed time sequence; The decoder decodes based on the coding time sequence characteristics to obtain decoding time sequence characteristics of the input time sequence; And the full-connection layer is used for obtaining scoring information of the input time sequence based on the coding time sequence characteristic and the decoding time sequence characteristic. Preferably, the encoder comprises a partition layer, N mixed expert layers and N-1 fusion layers, each of the fusion layers being provided with one of the mixed expert layers, the partition layer being followed by one of the mixed expert layers; Wherein: the splitting layer divides an input time sequence into a plurality of subsequences and splices the subsequences; The fusion l