Search

CN-122027503-A - Data center network service fault pre-evaluation method and system

CN122027503ACN 122027503 ACN122027503 ACN 122027503ACN-122027503-A

Abstract

A data center network service fault pre-evaluation method and system relate to the technical field of communication and comprise the steps of constructing a target Bayesian network based on a preset data center network structure, controlling the target Bayesian network to infer fault posterior probabilities of all nodes through a sampling-based approximate inference algorithm and a preset conditional probability table aiming at each service path in the data center network structure, determining joint probability of each service path according to the fault prior probabilities of all nodes and the fault posterior probabilities, taking a target service path with the highest joint probability as a predicted fault path, and taking a target polynomial energy function as a receiving-rejecting decision criterion of the sampling-based approximate inference algorithm, wherein the target polynomial energy function comprises network running state parameters. The application can realize the active prediction of the network service fault path so as to provide a data basis for implementing the fault early protection.

Inventors

  • LUO TING

Assignees

  • 烽火通信科技股份有限公司

Dates

Publication Date
20260512
Application Date
20260313

Claims (10)

  1. 1. The data center network service fault pre-evaluation method is characterized by comprising the following steps of: constructing a target Bayesian network based on a preset data center network structure; For each service path in a data center network structure, a control target Bayesian network infers the fault posterior probability of each node through a sampling-based approximate inference algorithm and a preset conditional probability table, and determines the joint probability of each service path according to the fault prior probability and the fault posterior probability of the node, wherein the conditional probability table comprises the fault prior probability of each node; taking the target service path with the highest joint probability as a predicted fault path; wherein a target polynomial energy function is taken as the acceptance-rejection decision criterion of a sample-based approximation inference algorithm, said target polynomial energy function comprising network operational state parameters.
  2. 2. The data center network traffic failure pre-evaluation method according to claim 1, further comprising, after the step of taking the target traffic path having the highest joint probability as the predicted failure path: and calculating the target service convergence time corresponding to the predicted fault path based on the fault posterior probability of each target node on the predicted fault path and the preset average fault time.
  3. 3. The data center network traffic failure pre-evaluation method according to claim 1, wherein in the process of node posterior probability reasoning by the target bayesian network, the method further comprises: Calculating a target energy function value according to a target polynomial energy function for each candidate state generated by a sampling-based approximate inference algorithm; And determining to accept or reject the candidate state according to the magnitude relation between the target energy function value and the acceptance probability corresponding to the candidate state.
  4. 4. The data center network traffic failure pre-assessment method according to claim 3, wherein said method further comprises: in each iteration process of the sampling-based approximate inference algorithm, counting the number of times of receiving the candidate state and the number of times of rejecting the candidate state to obtain a target proportion value between the number of times of receiving and the number of times of rejecting; performing difference calculation on the target proportion value in the continuous iteration process to obtain a target difference value; The target difference is taken as the exit criterion of the sample-based approximation inference algorithm.
  5. 5. The data center network traffic failure pre-evaluation method of claim 1, wherein the network operational state parameters include port state, spanning tree protocol state, link aggregation control protocol state, hardware spanning tree protocol state, hardware link aggregation control protocol state, medium access control state, and address resolution protocol state, and the target polynomial energy function is: E=k0×P+k1×S+k2×L+k3×HS+k4×HL+k5×M+k6×A Wherein E represents a target polynomial energy function, k0 represents a port state influencing coefficient, P represents a port state value, k1 represents a spanning tree protocol state influencing coefficient, S represents a spanning tree protocol state value, k2 represents a link aggregation control protocol state influencing coefficient, L represents a link aggregation control protocol state, k3 represents a hardware spanning tree protocol state influencing coefficient, HS represents a hardware spanning tree protocol state value, k4 represents a hardware link aggregation control protocol state influencing coefficient, HL represents a hardware link aggregation control protocol state value, k5 represents a medium access control state influencing coefficient, M represents a medium access control state value, k6 represents an address resolution protocol state influencing coefficient, and a represents an address resolution protocol state value.
  6. 6. The data center network traffic failure pre-evaluation method according to claim 1, further comprising, after the step of taking the target traffic path having the highest joint probability as the predicted failure path: Taking the node with the highest failure posterior probability on the predicted failure path as a key failure node; And determining a target fault factor based on the network operation state information of the key fault node.
  7. 7. A data center network service fault pre-evaluation system, comprising: The network construction module is used for constructing a target Bayesian network based on a preset data center network structure; The probability prediction module is used for controlling a target Bayesian network to infer the failure posterior probability of each node through a sampling-based approximate inference algorithm and a preset conditional probability table aiming at each service path in the data center network structure, and determining the joint probability of each service path according to the failure prior probability of the node and the failure posterior probability, wherein the conditional probability table comprises the failure prior probability of each node; A failure prediction module, configured to take a target traffic path with the highest joint probability as a predicted failure path; wherein a target polynomial energy function is taken as the acceptance-rejection decision criterion of a sample-based approximation inference algorithm, said target polynomial energy function comprising network operational state parameters.
  8. 8. The data center network traffic failure pre-assessment system according to claim 7, wherein said data center network traffic failure pre-assessment system further comprises a time prediction module for: and calculating the target service convergence time corresponding to the predicted fault path based on the fault posterior probability of each target node on the predicted fault path and the preset average fault time.
  9. 9. The data center network traffic failure pre-evaluation system of claim 7, wherein in the process of node posterior probability reasoning by the target bayesian network, the probability prediction module is further configured to: Calculating a target energy function value according to a target polynomial energy function aiming at a target candidate state generated by a sampling-based approximate inference algorithm; And determining to accept or reject the target candidate state according to the magnitude relation between the target energy function value and the acceptance probability corresponding to the candidate state.
  10. 10. The data center network traffic failure pre-assessment system according to claim 9, wherein said probability prediction module is further configured to: in each iteration process of the sampling-based approximate inference algorithm, counting the number of times of receiving the candidate state and the number of times of rejecting the candidate state to obtain a target proportion value between the number of times of receiving and the number of times of rejecting; performing difference calculation on the target proportion value in the continuous iteration process to obtain a target difference value; The target difference is taken as the exit criterion of the sample-based approximation inference algorithm.

Description

Data center network service fault pre-evaluation method and system Technical Field The application relates to the technical field of communication, in particular to a data center network service fault pre-evaluation method and system. Background In a data center network, as shown in fig. 1, services such as management, storage, big data, calculation, video and the like are generally operated, wherein the influence degree of packet loss of different types of services on the services is different, for example, the service flow of the management service is smaller, the service is basically not influenced when packet loss occurs, the storage service at the rear end mainly uses an FC (fiber Channel) network, the optimization range of the data center network is not involved, the storage service at the front end accesses a calculation node through an Ethernet, the service flow is large, the influence on the service is relatively large once packet loss occurs, the service burst in clusters such as big data Hadoop is serious, the data service has a certain tolerance on packet loss, but cluster splitting is caused when the cluster heartbeat is lost, the influence on the service is relatively large, and the flow department of the video service has a certain jitter, the service risk is liable to cause service complaint once packet loss occurs, and the service risk is liable to cause service complaint. In addition, the convergence time requirements of different traffic are different, for example, for the north-south traffic, the service convergence time requirement from the uplink to the downlink server is <2 sec, and for the east-west traffic, the service convergence time requirement for the storage, calculation cluster, hadoop cluster, and inter-server visit is <500ms. It can be seen that it is important to predict and locate network traffic failure in advance for proper operation of the data center network. In the related art, a data center often performs fault detection and fast switching of services by introducing related technologies of party link protection, ECMP (Equal-cost multi-path routing) routing protection and protocol fast convergence, but only performs passive fault service switching when a link has failed or a network node has failed by various two-layer link redundancy technologies and three-layer routing protection technologies, which inevitably introduces a certain delay, and thus service interruption or data loss may be caused, thereby affecting user experience. Disclosure of Invention The application provides a data center network service fault pre-evaluation method and a system, which can realize the active prediction of a network service fault path so as to provide a data basis for implementing the advanced fault protection. In a first aspect, an embodiment of the present application provides a method for pre-evaluating service faults of a data center network, including the following steps: constructing a target Bayesian network based on a preset data center network structure; For each service path in a data center network structure, a control target Bayesian network infers the fault posterior probability of each node through a sampling-based approximate inference algorithm and a preset conditional probability table, and determines the joint probability of each service path according to the fault prior probability and the fault posterior probability of the node, wherein the conditional probability table comprises the fault prior probability of each node; taking the target service path with the highest joint probability as a predicted fault path; wherein a target polynomial energy function is taken as the acceptance-rejection decision criterion of a sample-based approximation inference algorithm, said target polynomial energy function comprising network operational state parameters. With reference to the first aspect, in an implementation manner, after the step of taking the target traffic path with the highest joint probability as the predicted fault path, the method further includes: and calculating the target service convergence time corresponding to the predicted fault path based on the fault posterior probability of each target node on the predicted fault path and the preset average fault time. With reference to the first aspect, in an implementation manner, in a process of performing node posterior probability reasoning on the target bayesian network, the method further includes: Calculating a target energy function value according to a target polynomial energy function for each candidate state generated by a sampling-based approximate inference algorithm; And determining to accept or reject the candidate state according to the magnitude relation between the target energy function value and the acceptance probability corresponding to the candidate state. With reference to the first aspect, in an embodiment, the method further includes: in each iteration process of the sampling-based approxim