CN-121000559-B - Large-scale MIMO channel estimation method integrating dual attention mechanism and TCN-BiLSTM network
Abstract
The invention provides a large-scale MIMO channel estimation method integrating a dual-attention mechanism and a TCN-BiLSTM network, which combines a time convolution network TCN with a bidirectional long-short-time memory network BiLSTM and introduces a dual-attention mechanism to further enhance the feature extraction and focusing capability of a model. Compared with other advanced deep learning models, the method can greatly reduce prediction errors in various channel environments, has excellent performance and stability in non-line-of-sight scenes with serious multipath effects and interference, enables a large-scale MIMO base station to generate more accurate wave beams by providing more accurate CSI, more effectively transmits signal energy to target users, simultaneously minimizes interference to other users, brings higher single-user data rate, and allows a system to simultaneously serve more users in the same frequency spectrum resource, thereby remarkably improving throughput and frequency spectrum utilization efficiency of the whole network.
Inventors
- YAO JIANGUO
- LU WEN
Assignees
- 南京邮电大学
Dates
- Publication Date
- 20260512
- Application Date
- 20250829
Claims (8)
- 1. A large-scale MIMO channel estimation method integrating a dual-attention mechanism and a TCN-BiLSTM network is characterized by comprising the following steps: Step 1, data preprocessing and feature engineering, including conversion from complex number to real number, data normalization, sliding window construction and feature extraction, to obtain feature vectors as input of a model; Step 2, building a model and a data stream, building a TCN-BiLSTM mixed model combining a time convolution network TCN and a bidirectional long and short time memory network BiLSTM, introducing a double attention mechanism comprising a time attention mechanism and a space attention mechanism into the model, and performing data processing, wherein the data processing flow is as follows: The preprocessed input sequence is firstly sent to a TCN module; the module is formed by stacking a plurality of expansion causal convolution layers, and each layer is controlled by causality and expansion factors; the TCN outputs local time characteristics, an attention layer is added, the TCN output sequence is weighted, meanwhile, key time segments are highlighted, and irrelevant characteristics are weakened; the feature sequence output by the TCN module is then fed to the BiLSTM module, which processes the sequence in both the forward and backward directions to capture long-term, global context dependencies; The hidden state output of BiLSTM is fed into a dual attention module which calculates the attention weights in both the temporal and spatial dimensions in parallel and sums the features weighted according to the weights to generate a more information-intensive context vector, which is output for each time step Attention weight is calculated: ; Wherein, the Scoring the attention of time step t, The weight vector is scored for attention, For a linear transformation matrix used to calculate a temporal attention score, For the hidden state input of time step T, b is the bias term, T is the total time step number, and the final weight is expressed as: Representing the overall sequence table weighted by time weights, As the attention weight of the time step, The hidden state input of the time steps is adopted, and T is the total time step number; the spatial attention focuses on different components in the feature dimension For each feature dimension j, its attention weight is calculated: as the attention weight of feature dimension j, A scoring value for the attention weight of feature dimension j, w is a scoring weight vector for the learned spatial attention, As a matrix of a linear transformation that can be trained, For the j-th column of the input matrix H, a vector of dimension T, As bias items, m is the number of feature dimensions; the final weighting is: For the whole sequence weighted by the space weight, H is the BiLSTM output matrix after splicing, Attention weight vectors for all feature dimensions; The output after fusion of time and spatial attention is expressed as: The three parameters sequentially represent the final feature after fusing time and space attention, the whole sequence after time weighting and the whole sequence after space weighting, and the final output feature vector Sending the channel value into a full connection layer for regression prediction; Finally, the context vector is sent to a fully connected layer, and the layer carries out final regression calculation and outputs a predicted value of the future channel state; Step 3, model training and parameter configuration are carried out, and network parameters are optimized by minimizing errors between predicted values and true values; and step 4, putting the TCN-BiLSTM mixed model subjected to training and parameter adjustment in the step 3 into use, inputting the feature vector processed in the step 1, outputting a predicted value of a future channel state, and finishing channel estimation.
- 2. The method for large-scale MIMO channel estimation with dual attention mechanism and TCN-BiLSTM network as recited in claim 1, wherein in step 1, the conversion from complex to real is specifically that the original CSI data is a complex matrix comprising real and imaginary components, each complex value is first split into two components of real and imaginary components, and they are spliced in the characteristic dimension to form a pure real matrix.
- 3. The method for large-scale MIMO channel estimation with dual attention mechanism and TCN-BiLSTM network as recited in claim 1, wherein in step 1, data normalization is performed by adopting a Z-score normalization method to normalize data for accelerating model convergence and improving training stability, and for each feature dimension, a formula is applied Performing a calculation in which And Respectively original features Mean and standard deviation over the training set.
- 4. The method for large-scale MIMO channel estimation with integrated dual-attention mechanism and TCN-BiLSTM network as set forth in claim 1, wherein in step 1, sliding window construction is specifically performed by sliding window technology for converting time series data into supervised learning samples, a fixed window length is set, the window is slid on a time axis, a sequence in the window is input as a model, and data of the next time step after the window is used as a prediction target.
- 5. The method for large-scale MIMO channel estimation with dual-attention mechanism integrated with TCN-BiLSTM network as recited in claim 1, wherein in step 1, the feature extraction is specifically to extract statistical features including mean and variance and frequency domain features obtained by fast Fourier transform FFT from each sliding window, and splice the new features with the original sequence features to form the final model input vector.
- 6. The method for massive MIMO channel estimation combining dual attention mechanisms with TCN-BiLSTM according to claim 1, wherein in step 2, biLSTM consists of a forward long and short term memory network LSTM and a backward long and short term memory network LSTM, the forward LSTM going from the beginning to the end of the sequence Processing information, capturing history context, backward LSTM from end to start of sequence Processing information, capturing future context, at time Outputs of BiLSTM Is in a forward hidden state And a backward hidden state Is spliced by (1): 。
- 7. The method for estimating a large-scale MIMO channel by combining a dual attention mechanism and a TCN-BiLSTM network according to claim 1, wherein in step 3, an Adam optimizer is adopted to adaptively adjust the learning rate of each parameter, a mean square error MSE is adopted as a loss function, a dataset is divided into a training set of 90% and a test set of 10% in time sequence, key super parameters of a model are set to include a convolution layer number of 3, each layer of filter is 64, the convolution kernel size is 5, expansion factors are 1, 2 and 4, an activation function is ReLU, the number of hidden units of a BiLSTM module is set to 128, the initial learning rate is 0.001, and the training batch size is 64.
- 8. The method for large-scale MIMO channel estimation combining dual attention mechanism and TCN-BiLSTM network according to claim 1, wherein the specific steps of outputting predicted value for future channel state in step 4 are: step 4-1, the preprocessed model input vector is firstly sent to a TCN module, wherein the module is formed by stacking a plurality of expansion causal convolution layers and is used for extracting local and multi-scale time characteristics in channel data; step 2-2, the characteristic sequence output by the TCN module is then sent to BiLSTM module, which processes the sequence in both forward and backward directions to capture long-term, global context dependency; 2-3, the hidden state output of the BiLSTM module is sent to a dual attention module, the module calculates the attention weights of two dimensions of time and space in parallel, and performs weighted summation on the characteristics according to the weights to generate a context vector with denser information content; in step 2-4, finally, the context vector is sent to a fully connected layer DENSE LAYER, which performs the final regression calculation and outputs the predicted value for the future channel state.
Description
Large-scale MIMO channel estimation method integrating dual attention mechanism and TCN-BiLSTM network Technical Field The invention belongs to a mobile communication system, and particularly relates to a large-scale MIMO channel estimation method integrating a dual attention mechanism and a TCN-BiLSTM network. Background With the trend of the mobile communication system toward high speed, low delay and high reliability, massive MIMO (Multiple-Input Multiple-Output) technology is receiving a great deal of attention as one of the key support technologies for fifth generation (5G) and future sixth generation (6G) communication networks. To realize the performance potential of massive MIMO, accurate estimation of Channel State Information (CSI) is a core task in system design. Traditional channel estimation methods, such as a least square method (LS) and a minimum mean square error method (MMSE), are mature in theoretical basis and easy to implement, but depend on ideal statistical prior and linear channel assumption seriously, and have insufficient precision and poor robustness under the condition of actual complex, dynamic and non-stable channels. Particularly in unsteady environments such as high-speed movement, frequency selective fading, noise mutation and the like, the methods are difficult to adapt to rapid evolution and uncertainty of a channel. To overcome the above-mentioned bottleneck, researchers have introduced deep learning methods for channel modeling. Long and short term memory networks (LSTM) are one of the mainstream schemes because of their strong sequence modeling capabilities. However, the standard LSTM structure has the problems of high parameter coupling degree, single memory mechanism and the like, and modeling depth and training efficiency are difficult to be combined. Therefore, the extended long-short-term memory network (XLSTM) is proposed, a multichannel gating and deep layer structure is introduced, the fitting capability of a steady-state high-dimensional channel is improved, and the method is particularly suitable for channel estimation under a steady periodic scene. Although XLSTM achieves good performance in steady state scenarios, it still lacks sufficient perceptibility for rapidly abrupt, nonlinear strongly varying unsteady channels, especially with limitations in modeling channel short-term disturbances and bursty dynamic characteristics. Therefore, how to improve the response capability to local changes and sudden disturbances becomes the research focus of a new generation of unsteady state channel modeling method. In this context, some researchers have attempted to introduce Convolutional Neural Networks (CNNs) into communication modeling, but they lack global timing memory capabilities. In comparison, a Time Convolution Network (TCN) introduces a causal convolution structure while maintaining convolution parallelism and training stability, and has stronger modeling capability of short-time dependence. Combining TCN with bi-directional LSTM (BiLSTM) further enhances the modeling capability of context and context, providing a technological base for handling unsteady channel variations. Disclosure of Invention The invention aims to solve the problem of how to construct a high-efficiency deep neural network with local and global modeling capability and a key feature focusing mechanism under the condition of a fast-changing and unsteady MIMO channel so as to realize accurate estimation of CSI. The invention provides a TCN-BiLSTM large-scale MIMO channel estimation method based on a dual-attention mechanism. The method aims to solve the problems of low precision and poor robustness caused by oversimplification of model assumptions in the traditional channel estimation algorithm and the limitations of a single deep learning model in capturing the space-time characteristics of a complex channel, thereby realizing a channel estimation solution with higher estimation precision, stronger generalization capability and better robustness in a complex dynamic wireless environment. In order to achieve the above objective, the present invention proposes a hybrid network architecture based on deep learning. The core of the architecture is a novel combined model that organically combines a time convolutional network (Temporal Convolutional Network, TCN) with a bi-directional long-short Term Memory network (Bidirectional Long Short-Term Memory, biLSTM) and further enhances the feature extraction and focusing capabilities of the model by introducing a dual attention mechanism (including temporal and spatial attention). A large-scale MIMO channel estimation method integrating a dual-attention mechanism and a TCN-BiLSTM network comprises the following steps: Step 1, data preprocessing and feature engineering, including complex to real conversion, data normalization, sliding window construction and feature extraction, to obtain feature vectors as input of a model: Step 2, building a model and