CN-121984935-A - Service flow switching method, device and storage medium of dual-activity data center

CN121984935ACN 121984935 ACN121984935 ACN 121984935ACN-121984935-A

Abstract

The application provides a service flow switching method, a device and a storage medium of a dual-activity data center, and relates to the technical field of computers. The method comprises the steps of responding to a service flow switching instruction, obtaining link data associated with a switching identifier, wherein the switching identifier is used for screening service flows to be migrated, the link data is call link data recorded with call relations and execution states among services, identifying asynchronous services with abnormal states in the service flow switching process based on the link data, triggering hierarchical compensation operation of the asynchronous services with abnormal states, determining a disaster recovery center to accept predicted resource requirements needed by total service flows after switching based on the link data, and carrying out resource scheduling on the disaster recovery center according to the predicted resource requirements.

Inventors

ZHANG MINGQI
SUN FENGWEN
WANG MIN

Assignees

中国联合网络通信集团有限公司

Dates

Publication Date: 20260505
Application Date: 20260112

Claims (11)

1. The service flow switching method of the dual-activity data center is characterized by comprising the following steps of: responding to a service flow switching instruction, acquiring link data associated with a switching identifier, wherein the switching identifier is used for screening service flow to be migrated, and the link data is call link data recorded with a call relation and an execution state between services; Based on the link data, identifying asynchronous services with abnormal states in the service flow switching process, and triggering hierarchical compensation operation of the asynchronous services with abnormal states; and determining the predicted resource requirement of the disaster recovery center for adapting the total traffic after switching based on the link data, and carrying out resource scheduling on the disaster recovery center according to the predicted resource requirement.
2. The method of claim 1, wherein the identifying asynchronous services that are abnormal in state during a traffic handoff comprises: Screening span data marked with an asynchronous service identifier from the link data, wherein the span data is a basic unit forming the call link data and is used for recording the start-stop time and the execution state of single call; And under the condition that the execution state of the span data meets the abnormal condition, judging that the corresponding asynchronous service state is abnormal.
3. The method of claim 2, wherein the exception condition comprises at least one of: The calling state is interrupt; Calling duration exceeds a preset threshold; The return code is a preset error code.
4. The method of claim 1, wherein the triggering of the hierarchical compensation operation for the state-anomalous asynchronous service comprises: Analyzing service endpoint information and service parameters corresponding to the asynchronous service with abnormal states, and initiating an automatic retry request to a corresponding service endpoint of a disaster recovery center; When the failure times of the automatic retry request reach a preset threshold, generating and sending alarm information to an operation and maintenance system, wherein the alarm information comprises a calling chain identification and error information of the asynchronous service.
5. The method of claim 1, wherein determining, based on the link data, a predicted resource requirement required by a disaster recovery center to accommodate a total traffic flow after handover comprises: acquiring current service flow of a disaster recovery center and historical data of the service flow to be migrated, and calculating total predicted service flow to be carried by the disaster recovery center; inputting the total predicted service flow to a prediction model to obtain the predicted utilization rate of the critical resources of the disaster recovery center; comparing the predicted utilization rate with utilization rate thresholds preset for various resources; when the predicted utilization rate exceeds the corresponding utilization rate threshold, calculating the number of instances to be expanded according to the single instance resource capacity and the resource demand gap so as to obtain the predicted resource demand; The prediction model is a regression model trained by a machine learning algorithm based on historical business flow data and corresponding historical resource utilization rate data.
6. The method of claim 5, wherein the predictive model is trained by: Acquiring a sample data set in a historical time period, wherein the sample data set comprises a plurality of samples, each sample comprises a feature vector used for model input and a target vector used for model prediction, the feature vector is a service call volume feature representing service pressure, and the target vector is a resource utilization rate corresponding to the service call volume feature and used for representing resource consumption level; inputting the feature vector into the regression model to obtain a prediction result; calculating a prediction error between the prediction result and the target vector; And under the condition that the prediction error is lower than a preset threshold value, taking the regression model as the prediction model.
7. The method according to claim 1, characterized in that the method further comprises: screening span data carrying the switching identification from the link data; generating a service call topological graph according to the service call relation recorded in the span data; determining calling frequencies of a plurality of service nodes in the topological graph in a set time window; and according to the calling frequencies of the plurality of service nodes, rendering color depth for the corresponding nodes in the service calling topological graph, wherein the color depth is positively correlated with the calling frequencies.
8. A service flow switching device for a dual-activity data center, the device comprising: The system comprises an acquisition module, a switching identification module and a storage module, wherein the acquisition module is used for responding to a service flow switching instruction and acquiring link data associated with the switching identification, the switching identification is used for screening service flow to be migrated, and the link data is call link data recorded with a call relation and an execution state between services; The processing module is used for identifying asynchronous services with abnormal states in the service flow switching process based on the link data; The processing module is also used for triggering the hierarchical compensation operation of the asynchronous service with abnormal state; and the processing module is also used for determining the predicted resource requirement of the disaster recovery center for adapting the total traffic after switching based on the link data, and carrying out resource scheduling on the disaster recovery center according to the predicted resource requirement.
9. A computer device comprising a processor and a memory, wherein the memory has stored therein at least one computer program, at least one of the computer programs being loaded and executed by the processor to implement the method of any of claims 1 to 7.
10. A computer readable storage medium having stored therein at least one computer program, the at least one computer program being loaded and executed by a processor to implement the method of any one of claims 1 to 7.
11. A computer program product comprising a computer program or instructions which, when executed by a processor, implement the method of any one of claims 1 to 7.

Description

Service flow switching method, device and storage medium of dual-activity data center Technical Field The present application relates to the field of computer technologies, and in particular, to a service flow switching method, apparatus, and storage medium for a dual-active data center. Background With the continuous improvement of the requirements of enterprise key business on continuity, a dual-activity data center architecture becomes a core scheme for guaranteeing high availability of business. The architecture is characterized in that applications and services are symmetrically deployed in two geographically separated data centers (a main center and a disaster recovery center), and the state consistency is maintained by means of a data synchronization technology, so that the aim of realizing the rapid switching of service flow under a fault scene is fulfilled, and the service continuity is guaranteed to the greatest extent. In the related art, modules such as symmetrical double-activity deployment, disaster recovery resource configuration based on static pre-planning, macroscopic index monitoring and the like are generally adopted. However, the scheme has obvious limitations in actual tangential flow, such as inaccurate resource handling, dependence on static reservation and manual judgment, lack of a dynamic prediction model based on traffic load, and easy insufficient or excessive capacity of resources, and incapacity of a compensation mechanism, asynchronous service usually adopts simple retry, lacks a hierarchical compensation strategy combined with call chain state recognition, and is difficult to ensure reliable execution and state consistency of asynchronous tasks after tangential flow. Disclosure of Invention The application provides a service flow switching method, a device and a storage medium of a dual-activity data center, which effectively improve the accuracy, reliability and resource utilization efficiency of dual-activity flow switching. In order to achieve the above purpose, the application adopts the following technical scheme: the application provides a service flow switching method of a dual-activity data center, which comprises the steps of responding to a service flow switching instruction, obtaining link data associated with a switching identifier, wherein the switching identifier is used for screening service flows to be migrated, the link data is call link data recorded with a call relation and an execution state between services, identifying asynchronous services with abnormal states in the service flow switching process based on the link data, triggering hierarchical compensation operation of the asynchronous services with abnormal states, determining predicted resource requirements of a disaster recovery center for adapting to total service flows after switching based on the link data, and carrying out resource scheduling on the disaster recovery center according to the predicted resource requirements. In one possible implementation, the method for identifying the asynchronous service with abnormal state in the service flow switching process comprises the steps of screening span data marked with an asynchronous service identifier from link data, wherein the span data is a basic unit for forming call link data and is used for recording the starting and ending time and the execution state of single call, and judging that the corresponding asynchronous service state is abnormal under the condition that the execution state of the span data meets the abnormal condition. In one possible implementation, the exception condition includes at least one of the call state being an interrupt, the call duration exceeding a preset threshold, the return code being a preset error code. In one possible implementation manner, the hierarchical compensation operation of the asynchronous service with abnormal state is triggered, and the hierarchical compensation operation comprises the steps of first-stage compensation, wherein the first-stage compensation comprises the steps of analyzing service endpoint information and service parameters corresponding to the asynchronous service with abnormal state and initiating an automatic retry request to a corresponding service endpoint of a disaster recovery center, and the second-stage compensation comprises the steps of generating and sending alarm information to an operation and maintenance system when the failure times of the automatic retry request reach a preset threshold value, wherein the alarm information comprises a calling chain identification and error information of the asynchronous service. In a possible implementation manner, the method for determining the predicted resource requirement of the disaster recovery center for receiving the total traffic after switching based on the link data comprises the steps of obtaining the current traffic of the disaster recovery center and the historical data of the traffic to be migrated, calculating the