CN-121985373-A - Self-adaptive congestion control method and device based on time sequence prediction and reinforcement learning

CN121985373ACN 121985373 ACN121985373 ACN 121985373ACN-121985373-A

Abstract

The invention discloses a self-adaptive congestion control method and a self-adaptive congestion control device based on time sequence prediction and reinforcement learning, which relate to the technical field of communication and comprise the steps of collecting network state data of a transport layer protocol stack in a network environment, constructing a sliding window state sequence containing a plurality of time steps, and obtaining a current network state snapshot; the method comprises the steps of processing a sliding window state sequence by using a trained gating circulation unit neural network to obtain a bandwidth gradient trend factor, performing feature stitching on the bandwidth gradient trend factor and a current network state snapshot to obtain an enhanced state vector, processing the enhanced state vector by using a trained deep reinforcement learning strategy network to obtain a continuous gain scaling factor, obtaining a reference pacing gain, calculating a final pacing gain according to the reference pacing gain and the continuous gain scaling factor, and dynamically adjusting the transmission interval of a data packet in a network environment according to the final pacing gain. The invention can avoid the limitation of single prediction and the blindness of pure reinforcement learning.

Inventors

XU BAOYI
LIU ZHE
LI CHENCHEN
XU HENG
WANG FEN
LIU YUAN
XIANG ZHENG

Assignees

西安电子科技大学
西安电子科技大学杭州研究院

Dates

Publication Date: 20260505
Application Date: 20251224

Claims (9)

1. An adaptive congestion control method based on time sequence prediction and reinforcement learning is characterized by comprising the following steps: Acquiring network state data of a transport layer protocol stack in a network environment, constructing a sliding window state sequence containing a plurality of time steps, and acquiring a current network state snapshot, wherein the network state data comprises round trip delay, transmission throughput, current packet loss rate and current congestion window size; Processing the sliding window state sequence by adopting a trained gating and circulating unit neural network to obtain a bandwidth gradient trend factor representing future bandwidth change rate and direction, wherein short-term random noise is filtered by a reset gate module in the trained gating and circulating unit neural network, and long-term dependence characteristics are extracted by an update gate module in the trained gating and circulating unit neural network; Performing feature stitching on the bandwidth gradient trend factor and the current network state snapshot to obtain an enhanced state vector containing priori trend information; processing the enhanced state vector by adopting a trained deep reinforcement learning strategy network to obtain a continuous gain scaling coefficient; And obtaining a reference pacing gain under the current state machine phase by adopting a BBR congestion control algorithm, calculating a final pacing gain according to the reference pacing gain and the continuous gain scaling factor, and dynamically adjusting the sending interval of the data packet in the network environment according to the final pacing gain so as to realize preemptive detection or defensive avoidance of network bandwidth fluctuation.
2. The adaptive congestion control method based on time-series prediction and reinforcement learning of claim 1, wherein processing the sliding window state sequence with a trained gated loop neural network to obtain a bandwidth gradient trend factor characterizing future bandwidth change rate and direction comprises: the sliding window state sequence is carried out The first of (3) Feature vector for each time step Normalization processing is carried out, and the normalization processing is input to the trained gating circulating unit neural network for processing; Computing update door Expressed as: ; Wherein, the The Sigmoid function is represented as a function, Representing the update of the gate weight, Represent the first Hidden states corresponding to the time steps; computing reset gate Expressed as: ; Wherein, the Representing a reset gate weight; Computing candidate hidden states Expressed as: ; Wherein, the Representing the Tanh activation function, The weight matrix is represented by a matrix of weights, Representing element-by-element multiplication; updating hidden states Expressed as: ; And processing the hidden states corresponding to all time steps by adopting a full-connection layer, and processing the result of the full-connection layer processing by adopting a Tanh activation function to obtain the bandwidth gradient trend factor.
3. The adaptive congestion control method based on timing prediction and reinforcement learning according to claim 2, wherein the first step Feature vector for each time step The characteristics after normalization processing comprise: normalized smooth round trip delay Normalized bottleneck bandwidth sample value Current packet loss rate Current congestion window size First order difference in round trip delay First order difference of sum bandwidth 。
4. The adaptive congestion control method based on time-series prediction and reinforcement learning of claim 1, wherein processing the reinforcement state vector with a trained deep reinforcement learning strategy network to obtain a continuous gain scaling factor comprises: according to the magnitude of the bandwidth gradient trend factor, a strategy network in a trained deep reinforcement learning strategy network is adopted to process the enhanced state vector, and an action is obtained , The enhanced state vector is represented as such, Representing a function of a deep reinforcement learning strategy network, Representing the structure of a deep reinforcement learning strategy network, Representing a trainable set of weight parameters in a deep reinforcement learning strategy network; The action is performed Mapping to continuous coefficients ; The policy network comprises a plurality of full-connection modules, a Tanh activation function and a scaling module, wherein the full-connection modules comprise a full-connection layer and a ReLU activation function.
5. The adaptive congestion control method based on time series prediction and reinforcement learning according to claim 4, wherein processing the reinforcement state vector by using a strategy network in a trained deep reinforcement learning strategy network according to the magnitude of the bandwidth gradient trend factor comprises: when the bandwidth gradient trend factor is greater than a positive threshold, triggering the BBR congestion control algorithm to execute active detection logic, wherein the continuous gain scaling factor output by the trained deep reinforcement learning strategy network is greater than a first threshold; When the bandwidth gradient trend factor is smaller than a negative threshold, triggering the BBR congestion control algorithm to execute an active avoidance logic, wherein the continuous gain scaling factor output by the trained deep reinforcement learning strategy network is smaller than a first threshold; And when the absolute value of the bandwidth gradient trend factor is smaller than or equal to a positive threshold value, the continuous gain scaling factor output by the trained deep reinforcement learning strategy network is equal to a first threshold value, and the BBR congestion control algorithm is triggered to execute a reference control logic.
6. The adaptive congestion control method based on timing prediction and reinforcement learning according to claim 5, wherein dynamically adjusting the transmission interval of the data packets in the network environment according to the final pacing gain comprises: when the BBR congestion control algorithm executes active detection logic, shortening the transmission interval of the data packet in the network environment; And when the BBR congestion control algorithm executes the active avoidance logic, the transmission interval of the data packet in the network environment is prolonged.
7. The adaptive congestion control method based on time series prediction and reinforcement learning of claim 4, wherein the reward function of the trained deep reinforcement learning strategy network comprises a smoothing penalty term Expressed as: ; Wherein, the Indicating the throughput reward weight(s), Indicating the normalized throughput of the transmission, A latency penalty weight is indicated and is used, Indicating the normalized round trip delay is shown, Represents the penalty weight of the packet loss, Indicating a penalty for the packet loss rate, Representing the action smoothness constraint weights, Indicating the current time Is used for the continuous gain scaling factor of (a), Representing the last time Continuous gain scaling factor of (c).
8. The adaptive congestion control method based on timing prediction and reinforcement learning according to claim 1, wherein a final pacing gain is calculated from the reference pacing gain and the continuous gain scaling factor Comprising: The final pacing gain is calculated, expressed as: ; ; Wherein, the Represents a gain cycle array preset by the BBR congestion control algorithm in the bandwidth detection stage, Representing the current state machine phase index, Indicating that the BBR congestion control algorithm is at the current moment The determined reference pacing gain is used to determine, Representing a continuous gain scaling factor.
9. An adaptive congestion control apparatus based on time series prediction and reinforcement learning, comprising: The system comprises a data acquisition module, a data processing module and a data processing module, wherein the data acquisition module is used for acquiring network state data of a transport layer protocol stack in a network environment, constructing a sliding window state sequence containing a plurality of time steps, and acquiring a current network state snapshot, wherein the network state data comprises round trip delay, transmission throughput, current packet loss rate and current congestion window size; the system comprises a first data processing module, a second data processing module, a third data processing module and a fourth data processing module, wherein the first data processing module is used for processing the sliding window state sequence by adopting a trained gating and circulating unit neural network to obtain a bandwidth gradient trend factor for representing future bandwidth change rate and direction; the characteristic splicing model is used for carrying out characteristic splicing on the bandwidth gradient trend factor and the current network state snapshot to obtain an enhanced state vector containing priori trend information; The data processing module II is used for processing the enhanced state vector by adopting a trained deep reinforcement learning strategy network to obtain a continuous gain scaling coefficient; The result acquisition module is used for acquiring a reference pacing gain under the current state machine phase by adopting a BBR congestion control algorithm, calculating a final pacing gain according to the reference pacing gain and the continuous gain scaling factor, and dynamically adjusting the transmission interval of the data packet in the network environment according to the final pacing gain so as to realize preemptive detection or defensive avoidance of network bandwidth fluctuation.

Description

Self-adaptive congestion control method and device based on time sequence prediction and reinforcement learning Technical Field The invention belongs to the technical field of communication, and particularly relates to a self-adaptive congestion control method and device based on time sequence prediction and reinforcement learning. Background Under modern wireless network environments such as 5G, wi-Fi 6, the link bandwidth presents the characteristic of high dynamic and strong time variation, and the unstable network environment leads to the congestion control algorithm of a transmission layer to generally face serious challenges, and the transmission performance of high-throughput and low-delay services is seriously affected. In the prior art, the traditional congestion control algorithm (such as cube and BBR) mainly relies on measurement of network states (such as maximum bandwidth and minimum round trip delay) in the past period to construct a transmission model, and due to lack of prejudgement capability on future states, obvious feedback hysteresis exists when the bandwidth is suddenly changed due to wireless channel fading. Most of the existing optimization methods based on deep learning focus on predicting network delay or packet loss probability by using a model so as to optimize retransmission timeout threshold (RTO) or trigger fast retransmission, and the methods essentially belong to a fault recovery mechanism, namely passive remediation is carried out after congestion or packet loss occurs, so that the core problem of how to actively plan the sending rate according to the bandwidth change trend is not solved, and congestion cannot be avoided from the source. The existing end-to-end reinforcement learning scheme generally enables an intelligent agent to directly output a specific value of a Congestion Window (CWND), the action space of the black box control mode is overlarge, a protection mechanism of a bottom protocol stack is abandoned, throughput violent oscillation is easily caused in a complex network environment, a model is difficult to converge, and stability and robustness in a dynamic environment are lacking. Therefore, there is a need to provide a method and apparatus for adaptive congestion control to solve the drawbacks of the prior art. Disclosure of Invention In order to solve the problems in the prior art, the invention provides a self-adaptive congestion control method and a self-adaptive congestion control device based on time sequence prediction and reinforcement learning. The technical problems to be solved by the invention are realized by the following technical scheme: in a first aspect, the present invention provides a method for adaptive congestion control based on timing prediction and reinforcement learning, including: Acquiring network state data of a transport layer protocol stack in a network environment, constructing a sliding window state sequence containing a plurality of time steps, and acquiring a current network state snapshot, wherein the network state data comprises round trip delay, transmission throughput, current packet loss rate and current congestion window size; processing the sliding window state sequence by using a trained gating and circulating unit neural network to obtain a bandwidth gradient trend factor representing future bandwidth change rate and direction, wherein short-term random noise is filtered by a reset gate module in the trained gating and circulating unit neural network, and long-term dependence characteristics are extracted by an update gate module in the trained gating and circulating unit neural network; Characteristic splicing is carried out on the bandwidth gradient trend factor and the current network state snapshot, and an enhanced state vector containing priori trend information is obtained; processing the enhanced state vector by adopting a trained deep reinforcement learning strategy network to obtain a continuous gain scaling coefficient; And obtaining a reference pacing gain under the current state machine phase by adopting a BBR congestion control algorithm, calculating a final pacing gain according to the reference pacing gain and a continuous gain scaling factor, and dynamically adjusting the transmission interval of a data packet in a network environment according to the final pacing gain so as to realize preemptive detection or defensive avoidance of network bandwidth fluctuation. In a second aspect, the present invention also provides an adaptive congestion control apparatus based on time sequence prediction and reinforcement learning, including: The data acquisition module is used for acquiring network state data of a transport layer protocol stack in a network environment, constructing a sliding window state sequence containing a plurality of time steps, and acquiring a current network state snapshot, wherein the network state data comprises round trip delay, transmission throughput, current packet loss rate and cur