CN-122026974-A - Channel state feedback method, electronic device, storage medium and program product
Abstract
The application provides a channel state feedback method, electronic equipment, a storage medium and a program product, and relates to the technical field of communication. According to the method, the current channel measurement state, the feedback action at the previous moment and the instant rewards are fused to generate the current state vector, the adaptive feedback action strategy is output according to the pre-training reinforcement learning model, the feedback CSI is processed accordingly, a closed-loop feedback mechanism is constructed, and the terminal can dynamically adjust the feedback mode according to the real-time channel condition, so that feedback expenditure and system performance are intelligently balanced in a complex time-varying environment, and system robustness in the complex time-varying wireless environment is remarkably improved.
Inventors
- JIANG QUNJIE
Assignees
- 上海星思半导体股份有限公司
Dates
- Publication Date
- 20260512
- Application Date
- 20260213
Claims (12)
- 1. A channel state feedback method, applied to a terminal, the method comprising: Acquiring a current state vector at the current moment, wherein the current state vector is generated based on the current channel measurement state, the feedback action at the previous moment and the instant rewards; Inputting the current state vector into a pre-trained reinforcement learning model to obtain a feedback action strategy at the current moment, wherein the feedback action strategy is used for defining a feedback mode of channel state information at the current moment; And according to the feedback action strategy, the obtained current channel state information is correspondingly processed and then sent to the base station.
- 2. The method of claim 1, wherein the feedback action strategy comprises at least one of: Compression modes of the channel state information, wherein the compression modes comprise modes of carrying out compression coding on the current channel state information by utilizing different types of encoders; Triggering the time for feeding back the channel state information; Feedback granularity of channel state information, wherein the feedback granularity comprises wideband CSI feedback and subband CSI feedback; The channel state information includes information types.
- 3. The method of claim 1, wherein the current channel measurement state comprises a channel state determined from a channel quality indication and a channel state determined from an interference signal strength, and wherein the obtaining the current state vector for the current time comprises: acquiring a current channel quality indication and a current interference signal strength; determining a corresponding current channel measurement state according to the current channel quality indication and the current interference signal strength respectively; And carrying out combined coding on the current channel measurement state, the feedback action at the previous moment and the instant rewards to generate a current state vector.
- 4. The method of claim 1, wherein the step of sending the obtained current channel state information to the base station after the corresponding processing according to the feedback action policy further comprises: And receiving a current instant reward sent by the base station, wherein the current instant reward is determined by the base station according to the current communication quality and the current feedback overhead after downlink scheduling according to the current channel state information.
- 5. The method of claim 4, wherein the current instant prize is calculated from a prize function, the prize function being a weighted sum function of the current communication quality and the current feedback overhead, the current communication quality including a current signal-to-noise ratio and/or a current block error rate, the current feedback overhead including a compression rate of the current channel state information.
- 6. The method of claim 5, wherein the reward function is: Wherein, the Indicating the current immediate prize, SNR indicating the current signal-to-noise ratio, BLER indicating the current block error rate, BLER target indicating a set target block error rate, Representing the compressed length of the current channel state information, Representing the length of the current channel state information before compression, The compression ratio, w1, w2, w3 represent the weight factors.
- 7. The method according to claim 1, wherein the method further comprises: acquiring training data from a buffer area according to a set period, wherein the training data comprises a channel measurement state corresponding to each moment, an executed action, an instant reward and a channel measurement state of the next moment; and updating model parameters of the reinforcement learning model by using the training data.
- 8. The method of claim 7, wherein the reinforcement learning model is a PPO algorithm model, and wherein updating model parameters of the reinforcement learning model using the training data comprises: initializing model parameters, wherein the model parameters comprise strategy network parameters and value network parameters; Calculating a dominance function corresponding to each training data based on a GAE method, wherein the dominance function is used for measuring the quality degree of actions relative to the average level; calculating a target cost function according to a discount return formula; Calculating strategy loss and value loss according to the dominance function and the objective cost function, and obtaining total loss; and updating the strategy network parameters and the value network parameters according to the total loss by adopting a random gradient descent method or an Adam optimizer.
- 9. The method according to any one of claims 1-8, wherein said inputting the current state vector into a pre-trained reinforcement learning model to obtain a feedback action strategy for the current time comprises: Inputting the current state vector into a pre-trained reinforcement learning model to obtain probability distribution of each feedback action strategy output by the reinforcement learning model; And determining a feedback action strategy with the maximum probability according to the probability distribution.
- 10. An electronic device comprising a processor and a memory storing computer readable instructions that, when executed by the processor, perform the method of any of claims 1-9.
- 11. A computer readable storage medium, on which a computer program is stored, characterized in that the computer program, when being executed by a processor, performs the method according to any of claims 1-9.
- 12. A computer program product comprising computer program instructions which, when read and executed by a processor, perform the method of any of claims 1-9.
Description
Channel state feedback method, electronic device, storage medium and program product Technical Field The present application relates to the field of communications technologies, and in particular, to a channel state feedback method, an electronic device, a storage medium, and a program product. Background In a fifth generation new wireless (5G NR) and future large-scale mimo systems, especially in a frequency division duplex mode, a terminal performs accurate and efficient feedback on downlink Channel State Information (CSI), which is a basis for a base station to implement high-precision beamforming, multi-user scheduling, and efficient utilization of spectrum resources. Along with the continuous expansion of the antenna array scale and the increasing of the communication frequency band, the data dimension of the channel state information expands sharply, so that the signaling overhead generated in the feedback process becomes a key bottleneck for restricting the system spectrum efficiency and the overall performance. Currently, the mainstream solutions in the industry mainly rely on two types of techniques, feedback based on predefined codebooks and static compression models based on deep learning. Codebook-based feedback requires the terminal to select the best matching one from a set of fixed beam patterns (codewords) predefined by the standard protocol, and only the corresponding index number is fed back to the base station. In the scheme based on deep learning, a neural network model such as an offline trained automatic encoder is generally adopted, compression coding is performed on a channel matrix at a terminal side, and then reconstruction is performed at a base station side. However, the prior art solutions have several fundamental limitations, firstly, the problem of static and environmental mismatch. Whether the codebook is fixed or the neural network model is trained offline and is not changed after deployment, the inherent rapid time-varying characteristic of the wireless channel and diversified deployment scenes cannot be effectively adapted, for example, a static feedback mechanism cannot be adjusted in real time, so that feedback information is severely disjointed from an actual channel, and the performance is severely deteriorated. Second, there is a fixed tradeoff between feedback overhead and recovery accuracy. The compression rate or codebook size of the traditional scheme is fixed during design and deployment, and the traditional scheme may cause unnecessary high-precision feedback to waste uplink resources when a channel is stable, and may cause the reduction of scheduling performance due to insufficient feedback precision when the channel is suddenly changed or service requirements are improved. Disclosure of Invention An embodiment of the application aims to provide a channel state feedback method, electronic equipment, a storage medium and a program product, which are used for improving the situation that the existing static feedback scheme cannot adapt to channel variation. In a first aspect, an embodiment of the present application provides a channel state feedback method, applied to a terminal, where the method includes: Acquiring a current state vector at the current moment, wherein the current state vector is generated based on the current channel measurement state, the feedback action at the previous moment and the instant rewards; Inputting the current state vector into a pre-trained reinforcement learning model to obtain a feedback action strategy at the current moment, wherein the feedback action strategy is used for defining a feedback mode of channel state information at the current moment; And according to the feedback action strategy, the obtained current channel state information is correspondingly processed and then sent to the base station. In the implementation process, the current state vector is generated by fusing the current channel measurement state, the feedback action at the previous moment and the instant rewards, and then the adaptive feedback action strategy is output by depending on the pre-training reinforcement learning model, and the feedback CSI is processed accordingly, so that a closed-loop feedback mechanism is constructed, the terminal can dynamically adjust the feedback mode according to the real-time channel condition, the feedback overhead and the system performance are intelligently balanced in a complex time-varying environment, and the system robustness in the complex time-varying wireless environment is remarkably improved. Optionally, the feedback action policy includes at least one of: Compression modes of the channel state information, wherein the compression modes comprise modes of carrying out compression coding on the current channel state information by utilizing different types of encoders; Triggering the time for feeding back the channel state information; Feedback granularity of channel state information, wherein the feedback gr