CN-121984861-A - RoCEv2 network waterline optimization method and system based on digital twin
Abstract
The invention discloses a digital twin-based RoCEv network waterline optimization method and system, which belong to the technical field of waterline optimization, and comprise the steps of acquiring network flow state information of a physical domain by utilizing in-band network telemetry based on RoCEv network waterline optimization requirements, judging flow state according to the network flow state information by a digital domain, carrying out random sampling with tendency in a search space by a Monte Carlo method to generate ECN waterline candidate solutions, carrying out parallel solution on the ECN waterline candidate solutions based on a Fluid model to generate different network convergence condition indexes, carrying out normalization evaluation on the different network convergence condition indexes based on a multi-objective optimization function, and selecting ECN waterline global optimal solutions to finish waterline updating based on the ECN waterline global optimal solutions. The invention solves the problems of parameter competition and network oscillation caused by traditional single-point tuning, and remarkably improves the throughput and stability of RoCEv network.
Inventors
- Tan Lizhuang
- CHU FUMING
- SHI HUILING
- ZHANG WEI
- ZHANG ZHIYUAN
Assignees
- 山东省计算中心(国家超级计算济南中心)
- 齐鲁工业大学(山东省科学院)
Dates
- Publication Date
- 20260505
- Application Date
- 20260228
Claims (10)
- 1. The RoCEv network waterline optimization method based on digital twinning is characterized by comprising the following steps of: based on RoCEv network waterline optimization requirements, physical domain network flow state information is acquired by utilizing in-band network telemetry; The digital domain judges the flow state according to the network flow state information, and random sampling with tendency is carried out in the search space by a Monte Carlo method to generate ECN waterline candidate solutions; Parallel solving is carried out on ECN waterline candidate solutions based on the Fluid model, and different network convergence condition indexes are generated; Carrying out normalized evaluation on indexes of different network convergence conditions based on a multi-objective optimization function, and selecting an ECN waterline global optimal solution; and finishing waterline updating based on the ECN waterline global optimal solution.
- 2. The digital twinning-based RoCEv network waterline optimization method of claim 1, wherein the physical domain network flow state information includes at least one of flow identification information, timestamp information, queue information, and traffic statistics information; throughput is approximated based on the ratio of the product of the number of reported telemetry packets per unit time and the MTU to the telemetry sampling rate.
- 3. The method for optimizing a RoCEv network waterline based on digital twinning according to claim 1, wherein the traffic state is judged according to the network flow state information, specifically expressed as: Accumulating the accumulated transmitted byte quantity of each stream in a sliding window with a set length; And judging the large stream when the accumulated transmitted byte quantity is not smaller than a preset large stream byte threshold value, judging the large stream as a potential large stream when the accumulated transmitted byte quantity is smaller than the preset large stream byte threshold value but continuously active in a window, and judging the small stream otherwise.
- 4. The digital twinning-based RoCEv network waterline optimization method according to claim 3, wherein random sampling with tendency is performed in a search space by a monte carlo method, and generating ECN waterline candidate solutions specifically includes: Setting a differential bias factor based on the judged flow type of the main load under the current network; calculating a sampling target mean value based on the bias factor on the basis of the reference waterline; and randomly sampling based on the sampling target mean value to generate a plurality of groups of ECN waterline candidate solutions.
- 5. The method for optimizing a RoCEv network waterline based on digital twinning as set forth in claim 1, wherein the different network convergence criteria include ECN trigger probability, real-time queue length of a bottleneck switch, a deceleration factor of each flow, a target rate of each flow, and an actual rate of each flow.
- 6. The method for optimizing a RoCEv network waterline based on digital twinning according to claim 1, wherein in a single bottleneck multi-stream competition scenario, parallel solution is performed on the ECN waterline candidate solution based on a Fluid model, and the method specifically comprises: dividing all active streams pointing to the same receiving end into a logic stream group; the actual sending rate of each logic flow group jointly determines the total length change rate of the queue sharing the bottleneck; based on the unified ECN marking probability calculated by the total length of the queue, substituting the unified ECN marking probability into Fluid model differential equation sets corresponding to all logic flow sets to carry out simulation solution; under a multi-bottleneck multi-stream competition scene, parallel solving is carried out on ECN waterline candidate solutions based on a Fluid model, and the method specifically comprises the following steps: Respectively establishing a queue state and a marking probability for each bottleneck port; for any logic flow group, the marking probability of all bottleneck ports on the path of the logic flow group jointly determines the equivalent marking probability perceived by the flow group; And substituting the equivalent marking probability into a Fluid model differential equation set corresponding to the flow set to carry out simulation solution.
- 7. The digital twinning-based RoCEv network waterline optimization method as set forth in claim 1, wherein the multi-objective optimization function is specifically expressed as: In the formula, Representing an ECN waterline global optimal solution in the current network state; represent the first Group candidate ECN waterline configuration; Expressed in parameter configuration Calculating the throughput index through a Fluid model; Expressed in parameter configuration The corresponding bottleneck port queue length index is downloaded; Expressed in parameter configuration The corresponding accumulated packet loss index; and calculating the ECN waterline global optimal solution with the highest value under the current network state according to the multi-objective optimization function.
- 8. RoCEv2 network waterline optimal system based on digital twin, which is characterized by comprising: the physical domain state sensing module is configured to acquire physical domain network flow state information by utilizing in-band network telemetry based on RoCEv network waterline optimization requirements; The digital domain waterline generation module is configured to judge the flow state according to the network flow state information, and randomly sample the tendency in the search space by a Monte Carlo method to generate ECN waterline candidate solutions; The digital domain simulation module is configured to carry out parallel solution on ECN waterline candidate solutions based on the Fluid model to generate different network convergence condition indexes; The digital domain optimal solution selection module is configured to perform normalization evaluation on indexes of different network convergence conditions based on a multi-objective optimization function, and select an ECN waterline global optimal solution; and the pipeline configuration module is configured to complete pipeline updating based on the ECN pipeline global optimal solution.
- 9. A computer readable storage medium having stored thereon a computer program, which when executed by a processor performs the steps in the digital twinning based RoCEv network waterline optimization method of any one of claims 1-7.
- 10. A computer device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, wherein the processor performs the steps in the digital twinning-based RoCEv network waterline optimization method of any one of claims 1-7.
Description
RoCEv2 network waterline optimization method and system based on digital twin Technical Field The invention belongs to the technical field of waterline tuning, and particularly relates to a RoCEv network waterline tuning method and system based on digital twinning. Background The statements herein merely provide background information related to the present disclosure and may not necessarily constitute prior art. RoCEv2 (RDMA over Converged Ethernet version) is a technique for implementing remote direct memory access over ethernet that relies on ECN (explicit congestion notification) mechanisms to implement lossless transmission of the network. The ECN marks congestion by setting a watermark threshold (including parameters such as k_min, k_max, and p_max) to trigger the congestion control algorithm (e.g., DCQCN) at the source to slow down. The setting of the waterline parameters directly affects the throughput, delay and packet loss rate of the network. If the water line is set too high, congestion control is delayed to generate packet loss, and if the water line is set too low, ECN marks can be triggered too early, so that the source end is slowed down too fast, and bandwidth cannot be fully utilized. Currently, the configuration of ECN waterline mainly depends on experience or static recommended configuration of equipment manufacturers, or manual tuning is performed by analyzing a service model and setting up a test environment. The former is often conservative and cannot reach the optimal performance, and the latter has high tuning cost and low efficiency. The prior art scheme has the following defects: (1) The existing partial scheme needs to modify ECN parameters on the physical network group by group, and waits for the dynamic convergence of the network to evaluate the effect again, and belongs to the sequential error test process. Each test requires longer observation and stabilization time, is inefficient and causes disturbances to the on-line traffic. (2) The local tuning of the exchanger is easy to cause the non-convergence of the whole network, and the existing partial scheme carries out parameter adjustment based on the local observation data of the single-point exchanger. The local optimal configuration of different switches may have conflict, malicious competition among the switches is easy to cause, so that the ECN parameters of the whole network continuously vibrate, and the global optimal configuration is difficult to converge. (3) The method has the defects of poor interpretability and controllability, that the existing partial schemes rely on black box models such as deep reinforcement learning to perform parameter search, the search process is difficult to interpret and verify, and the problems of blind search or unexpected results are easy to occur. Disclosure of Invention The invention aims to overcome the defects in the prior art, and provides a RoCEv network waterline optimization method and system based on digital twinning, which effectively solve the problems of parameter competition and network oscillation caused by traditional single-point optimization and remarkably improve the throughput and stability of a RoCEv network by introducing global optimization decision mechanism of in-band network telemetry, digital twinning architecture, model-driven simulation parallel computation and multi-objective cooperation. In order to achieve the above object, the present invention is realized by the following technical scheme: In a first aspect, the present invention provides 1. A RoCEv network waterline optimization method based on digital twinning, including: based on RoCEv network waterline optimization requirements, physical domain network flow state information is acquired by utilizing in-band network telemetry; The digital domain judges the flow state according to the network flow state information, and random sampling with tendency is carried out in the search space by a Monte Carlo method to generate ECN waterline candidate solutions; Parallel solving is carried out on ECN waterline candidate solutions based on the Fluid model, and different network convergence condition indexes are generated; Carrying out normalized evaluation on indexes of different network convergence conditions based on a multi-objective optimization function, and selecting an ECN waterline global optimal solution; and finishing waterline updating based on the ECN waterline global optimal solution. In at least one embodiment, the physical domain network flow state information includes at least one of flow identification information, timestamp information, queue information, and traffic statistics information; throughput is approximated based on the ratio of the product of the number of reported telemetry packets per unit time and the MTU to the telemetry sampling rate. In at least one embodiment, the traffic state is determined according to the network flow state information, which is specifically expressed as: