CN-122027456-A - Flexible disaster recovery switching method and system for hospital core business system based on microservice and traffic arrangement

CN122027456ACN 122027456 ACN122027456 ACN 122027456ACN-122027456-A

Abstract

The application relates to a flexible disaster recovery switching method and system for a hospital core business system based on micro-service and traffic arrangement. The method comprises the steps of performing micro-service deployment on a hospital core service system, establishing a micro-service health state monitoring mechanism, collecting service response delay, interface success rate, system load and service transaction success rate data, calculating micro-service health degree scores, judging an operation state, constructing a service-micro-service-calculation node mapping relation, forming a node resource pool, dividing priority according to service importance degree, configuring a disaster recovery strategy, generating a service flow migration scheme through a flow arrangement algorithm when node abnormality is detected, executing flexible switching, degrading low-priority service when resources are insufficient, and performing progressive flow returning after node recovery, so that quick disaster recovery switching and continuous operation of the hospital core service system are realized.

Inventors

Yu Lizhang

Assignees

绍兴第二医院医共体总院(绍兴第二医院)

Dates

Publication Date: 20260512
Application Date: 20260312

Claims (10)

1. A flexible disaster recovery switching method of a hospital core business system based on micro-service and flow arrangement is characterized by comprising the following steps: S1, performing micro-service deployment on a hospital core service system, establishing a health state monitoring mechanism for each micro-service node, and collecting service response time delay, interface success rate, system load and service transaction success rate data in real time; s2, calculating health degree scores of all micro service nodes based on the acquired data, and judging the running state of the micro service nodes according to a preset health threshold; s3, constructing a mapping relation between the service-micro service-computing nodes, and acquiring resource load information of each computing node in real time to form a node resource pool; s4, carrying out priority division on hospital services according to service importance degrees, and configuring corresponding disaster recovery strategies for the services with different priorities; S5, when detecting that the micro service node is abnormal, generating a service flow migration scheme through a flow arrangement algorithm based on the micro service health degree score, the node resource load and the service priority; S6, according to the service flow migration scheme, the service flow of the fault node is migrated to an available node, and service flexible switching is realized by adopting a mode of firstly conducting flow and then disconnecting in the migration process; S7, when available node resources are insufficient, performing function degradation processing on the low-priority service according to a service priority policy so as to ensure continuous operation of the high-priority service; S8, after the fault node is recovered, the service flow is returned to the recovery node according to the progressive strategy according to the system load state, so that the self-healing recovery of the system is realized.
2. The method of claim 1, wherein the micro-service health score is obtained by a weighted calculation model by: HealthScore = w1×R + w2×S + w3×L Wherein the method comprises the steps of R is a service response delay indicator, S is an interface calling success rate index, L is a system resource load index of the system, W1, w2, w3 are corresponding weight parameters.
3. The method of claim 2, wherein the weight parameter is set to: The response delay weight is 30%, The success rate weight of the interface is 40%, The system load weight was 30%.
4. The method of claim 1, wherein the health status monitoring is by a combination of heartbeat detection and service probe detection, wherein heartbeat detection is used to detect micro-service survival status and service probe is used to simulate real service transactions to verify service availability.
5. The method of claim 1, wherein the traffic priority comprises: P0 level core guarantee service; P1 level of available services; the P2 level may degrade traffic.
6. The method of claim 1, wherein the traffic orchestration algorithm generates a traffic scheduling scheme based on a greedy optimization strategy, the optimization objective comprising: Service priority maximization, Node load minimization, The traffic migration costs are minimized.
7. The method of claim 1, wherein the traffic orchestration supports the following scheduling modes: a fault migration scheduling mode; load balancing scheduling mode; Priority preemption scheduling mode.
8. The method of claim 1, wherein the flexible handoff controls the handoff delay to be within 100ms by pre-establishing a node connection pool and caching service context information to reduce connection establishment overhead; the downgrade processing includes closing the uncore query interface, limiting data statistics tasks, or delaying execution of low priority service requests.
9. A flexible disaster recovery switching system of a hospital core business system based on micro-service and flow arrangement is characterized by comprising the following components: The micro-service health perception module is used for collecting micro-service running state data and calculating health degree scores; The service priority management module is used for carrying out priority division on the hospital service and generating a service disaster recovery strategy; The resource management module is used for maintaining the mapping relation of the service-micro service-node and acquiring the node resource load information in real time; the intelligent traffic scheduling module is used for generating a traffic scheduling scheme based on the micro-service health state, the node resource load and the service priority; the flexible switching control module is used for executing service flow migration when the node fault is detected; The service degradation module is used for carrying out function degradation on the low-priority service when the system resources are insufficient; and the self-healing recovery module is used for executing progressive business flow migration after the fault node is recovered.
10. A computer-readable storage medium having stored thereon a computer program, characterized by: The computer program, when run on a processor, causes the processor to perform the method of any one of claims 1 to 9.

Description

Flexible disaster recovery switching method and system for hospital core business system based on microservice and traffic arrangement Technical Field The application relates to the field of disaster recovery and cloud computing of medical informatization systems, in particular to a flexible disaster recovery switching method and system of a hospital core business system based on micro-service and flow arrangement. Background With the continuous development of information technology in the medical industry, hospital information systems have become an important infrastructure for supporting daily medical activities and management operations of hospitals. Currently, a plurality of core service platforms such as a Hospital Information System (HIS), an electronic medical record system (EMR), an order management system, a charging system, a medicine management system and the like are commonly deployed in large hospitals, and the systems bear key service flows such as out-patient registration, diagnosis and treatment records, order execution, cost settlement, medicine allocation and the like. Once the related system fails or operates abnormally, the normal diagnosis and treatment order of the hospital can be influenced, and adverse effects on the treatment process of patients can be caused, so that the method has very important significance for guaranteeing the stable operation and service continuity of the core service system of the hospital. In order to improve the reliability and disaster recovery capability of the system, the conventional hospital core service system is generally deployed by adopting a traditional disaster recovery architecture, for example, modes such as dual-machine hot standby, dual-active data center or multi-active cluster and the like. Under the architecture, the main system and the standby system are kept in a synchronous operation or standby state by deploying the redundant server, the storage equipment and the network resource, and when the main system fails, the standby system takes over the service operation so as to ensure the continuous availability of the system. However, the disaster recovery mode is highly dependent on hardware redundant resources, and a complete standby system environment needs to be additionally built, so that the informatization construction cost of the hospital is remarkably increased. Especially in large hospitals, the core business system is large in scale, the number of related servers, databases and storage devices is large, a large amount of hardware resources are often required to be input for building the dual-activity or multi-activity system, and the overall building cost is increased by 60 to 80 percent. Meanwhile, under the normal operation condition, the redundant node is usually in a low-load even idle state, the resource utilization rate is low, part of standby resources are idle for a long time, so that the calculation resources are wasted, and the efficient utilization is difficult to realize. On the other hand, the existing disaster recovery switching mechanism usually takes a whole machine level or a cluster level as a basic switching unit. When a certain node or a certain service module of the system is abnormal, the whole machine switching or the whole service cluster switching is often needed, namely, all the service flows are uniformly switched to a standby system. The switching mode has thicker granularity, and still triggers the whole system switching under the condition of local node fault, thereby not only increasing the recovery time of the system, but also possibly causing short-time service interruption in the switching process. For a hospital core service system, the services of clinic charging, inpatient doctor order execution, medicine allocation and the like have strong real-time requirements, and once the system is interrupted or delayed in response, doctor diagnosis and treatment operation and patient medical treatment flows can be influenced. In actual operation, the conventional disaster recovery system usually takes a few minutes from fault detection to complete system switching, the switching time is usually between 3 and 10 minutes, and during the period, problems such as transaction failure, service queuing or slow system response can occur, so that the continuity of medical services is affected. With the development of cloud computing and micro-service architecture, some hospitals start to gradually transform a core service system into the micro-service architecture, and split an original single system into a plurality of independent service modules, so that the system expansion capability and the deployment flexibility are improved. However, in the existing micro-service architecture, the system disaster recovery mechanism still has certain disadvantages. Most systems rely on basic service health detection or load balancing strategies to realize service switching, and when a certain service node fails, traffic is