CN-122027541-A - Computing power network routing and on-network data processing joint optimization method and system based on deep reinforcement learning

CN122027541ACN 122027541 ACN122027541 ACN 122027541ACN-122027541-A

Abstract

The invention discloses a method and a system for joint optimization of computational power network routing and on-line data processing based on deep reinforcement learning, wherein the method comprises the following steps of S1, collecting whole network state information, constructing network state representation, S2, inputting the network state representation into a preconfigured decision model, wherein the decision model is used for extracting network characteristics and outputting strategy distribution parameters, S3, sampling and generating joint motion vectors from the strategy distribution based on the strategy distribution parameters, S4, calculating a transmission path for a data stream based on link weight parameters, S5, screening and determining calculation nodes for executing on-line data processing tasks on the transmission path based on processing probability parameters, and S6, issuing routing table items and calculation instructions, and executing data stream forwarding and on-line data processing operations. The invention creates a unified decision framework, and can decide whether to consume more computing resources to save communication resources or directly utilize the communication resources for transmission according to the real-time network state.

Inventors

CUI XIAOLONG
TANG XUEBIN
WANG ZIXU
HUANGFU WEI
XU ZHENGHAO

Assignees

北京科技大学

Dates

Publication Date: 20260512
Application Date: 20260212

Claims (10)

1. The joint optimization method of the computational power network routing and the online data processing based on the deep reinforcement learning is characterized by comprising the following steps of: S1, acquiring whole network state information, and constructing a network state representation, wherein the network state representation comprises calculation resource utilization rate data of a network node and service demand data of a data stream to be scheduled; s2, inputting the network state representation into a pre-configured decision model, wherein the decision model is based on a neural network architecture and is used for extracting network characteristics and outputting strategy distribution parameters; s3, sampling from strategy distribution parameters to generate a joint motion vector, wherein the joint motion vector comprises a link weight parameter used for determining a next path and a processing probability parameter used for determining whether to execute an online data processing task; S4, calculating a transmission path for the data stream by utilizing a shortest path calculation algorithm based on the link weight parameter; S5, screening the processing probability parameters on the transmission path and determining the computing nodes for executing the network data processing task by combining the real-time resource constraint of each node; And S6, issuing a routing table item and a calculation instruction, and executing data stream forwarding and network data processing operations.
2. The method of claim 1, wherein the decision model is configured as a multi-layer perceptron architecture or a graph neural network architecture.
3. The method of claim 2, wherein the multi-layer perceptron architecture is employed when the application scenario is a topologically fixed network, the network state representation being constructed as a one-dimensional vector; When the application scene is a network with topology dynamically changed, the graph neural network architecture is adopted, and the state representation of the graph neural network architecture is constructed into graph structure data comprising node characteristics, edge indexes, edge characteristics and global flow characteristics.
4. The method of claim 3, wherein when employing the graph neural network architecture, the decision model employs a dual stream feature extraction architecture comprising: The topological feature extraction flow is that a node feature matrix and an edge index matrix are processed by a graph convolution layer, and node embedded representation containing a topological structure is generated; the service characteristic extraction flow is to process the global flow characteristic vector by using the fully connected network and extract the embedded representation of the global service flow; And combining the global service flow embedded representation with the node embedded representation by the decision model through a feature fusion mechanism to generate node features fused with the global service context.
5. The method of claim 4, wherein the feature fusion mechanism performs a dimension expansion operation on the global traffic stream embedded representation in a broadcast-splice manner, expands the global traffic stream embedded representation to a dimension consistent with the number of current network nodes, and splices the node embedded representation in a feature channel dimension.
6. The method of claim 1, wherein the link weight parameters and the process probability parameters in the joint motion vector follow a predetermined continuous probability distribution.
7. The method of claim 6, wherein the output policy distribution parameters of the decision model define the continuous probability distribution shape, and wherein continuous action values are obtained by sampling from a distribution defined by the policy distribution parameters.
8. The method of claim 7, wherein the continuous probability distribution is a Beta distribution, and the policy distribution parameters include And The output layer of the decision model adopts smooth activation function to process and superimpose constant bias to ensure output And Are all greater than 1, thereby ensuring that the sampled action value lies within the (0, 1) interval.
9. The method according to claim 1, wherein in step S5, the filtering the processing probability parameter on the transmission path based on the filtering mechanism of the real-time resource constraint in combination with the real-time resource constraint of each node comprises: The method comprises the steps of performing action masking operation, obtaining residual computing resources of any candidate computing node on a transmission path, judging whether the product of the residual computing resources and a preset safety threshold is larger than estimated computing cost required for processing a current data stream or not, and forcibly masking the node to be not selected as a node for executing an online data processing task if the product of the residual computing resources and the preset safety threshold is larger than estimated computing cost required for processing the current data stream.
10. A joint optimization system of computational power network routing and on-network data processing based on deep reinforcement learning, characterized in that the system is adapted to implement the method of any of claims 1-9, the system comprising: The state sensing module is used for collecting the state information of the whole network and constructing a network state representation, wherein the network state representation comprises calculation resource utilization rate data of a network node and service demand data of a data stream to be scheduled; the intelligent decision module is used for inputting the network state representation into a preconfigured decision model, and the decision model is based on a neural network architecture and is used for extracting network characteristics and outputting strategy distribution parameters; A vector generation module for sampling from the policy distribution based on the policy distribution parameters to generate a joint motion vector, wherein the joint motion vector comprises a link weight parameter for determining a next path and a processing probability parameter for determining whether to execute an online data processing task; A path calculation module, configured to calculate a transmission path for the data stream using a shortest path calculation algorithm based on the link weight parameter; the node determining module is used for combining the real-time resource constraint of each node, screening the processing probability parameters on the transmission path and determining the computing node for executing the network data processing task; and the execution control module is used for issuing a routing table item and a calculation instruction, and executing data stream forwarding and network data processing operations.

Description

Computing power network routing and on-network data processing joint optimization method and system based on deep reinforcement learning Technical Field The invention relates to the technical field of computer networks, in particular to a method and a system for joint optimization of computational power network routing and on-line data processing based on deep reinforcement learning. Background With the continuous development of the convergence technology of edge computing and cloud networks, node devices (such as edge servers, intelligent routers, etc.) in the network do not only bear the function of data forwarding, but increasingly integrate powerful computing processing capabilities. This trend has led to a hybrid network ecosystem in which computing and communication resources coexist. However, in practical scenarios such as industrial internet, satellite communication and data center interconnection, existing routing and scheduling technologies face serious challenges, which are mainly reflected in the following three aspects: 1. blind area for sensing 'computing resource' by traditional routing protocol Existing mainstream interior gateway protocols (e.g., OSPF, IS-IS) and multipath transmission techniques (e.g., ECMP) have their routing logic based mainly on link overhead (Cost), hop Count (Hop Count), or link bandwidth utilization. These protocols do not take into account the computational load capabilities of the nodes at the beginning of the design. In a computing network scenario, data flow to accompanies computing tasks (e.g., video transcoding, data cleansing, federal learning aggregation). The pain is that when a path with high bandwidth but depleted computing resources exists in the network (for example, the utilization of the server CPU on the path reaches 99%), the conventional protocol still schedules the computing service to the path according to the bandwidth advantage, which results in queuing and backlog of tasks at nodes, and extremely high processing delay and even packet loss are generated. This "separate-from-management" mechanism results in an extreme mismatch of network resources, i.e., free computational resources on one hand cannot be reached and congested nodes on the other hand are overwhelmed, severely limiting end-to-end quality of service (QoS). 2. On-line compression scheduling mechanism lacking space time-shifting Bandwidth is often an expensive bottleneck resource when transmitting large amounts of data (e.g., factory sensor high frequency logs, high definition surveillance video) across a Wide Area Network (WAN). Existing network devices (such as a gateway with SmartNIC intelligent network cards and a P4 programmable switch) have certain hardware acceleration compression capability, but these computing forces are often in an idle state in the transmission process. The pain point is that the existing data transmission scheme mainly relies on end-to-end compression (source end compression and sink end decompression) and lacks a flexible intermediate jump scheduling mechanism. The network cannot dynamically decide whether to sacrifice a portion of the computing resources at the intermediate node to perform data compression based on the current link congestion level, thereby traversing subsequent congested links (i.e., the "computation swap bandwidth") in a smaller volume. The current routing algorithm cannot output two strong coupling decisions of "who is next" and "whether compression is to be performed at the node" at the same time, so that the network cannot achieve the optimal balance between the computational overhead and the transmission delay. 3. Architecture rigidity and generalization dilemma of the existing intelligent routing model (Model Rigidity) While Deep Reinforcement Learning (DRL) has been explored for route optimization, existing single model architectures have difficulty adapting to diverse network environments: A multi-layer perceptron (MLP) based scheme (e.g., deepRouting) has to have a fixed input layer dimension. Once the actual network topology is slightly changed (such as a base station expands capacity to increase nodes and links are disconnected due to faults), the dimension of an input vector is changed, the whole model is immediately invalid, data must be collected again and training is performed in a plurality of hours, and the real-time requirement of emergency communication cannot be met. Traditional graph neural network (GCN) based schemes (such as RouteNet) have cross-topology generalization capability, but the existing GCN routing algorithm often excessively pays attention to the topology characteristics of local neighbor nodes, and ignores the nonlinear influence of a full-network Traffic Matrix (Traffic Matrix) on local link congestion, so that decision performance is reduced in Traffic burst scenes. The pain point is that the industry lacks a unified architecture which can realize extremely fast reasoning (using MLP advantages) unde