CN-121984866-A - Rollback method, rollback device, rollback equipment and rollback medium
Abstract
The application relates to the technical field of computers, in particular to a rollback method, a rollback device, rollback equipment and rollback medium, wherein the rollback method comprises the steps of obtaining metadata of a new deployment event; the method comprises the steps of obtaining a communication link topological graph among services, determining abnormal characteristics and abnormal reasons of an abnormal link according to service information, the communication link topological graph and real-time link performance data related to new deployment events in metadata, determining a rollback strategy according to the abnormal characteristics and the abnormal reasons, generating a rollback operation sequence corresponding to a target influence link according to the rollback strategy by utilizing the link topological graph, and executing corresponding rollback operation according to components corresponding to each rollback operation in the rollback operation sequence. Rollback can be applied quickly and efficiently to minimize downtime and scope of impact.
Inventors
- MA BO
- ZHANG LU
- TAO MING
Assignees
- 上海任意门科技有限公司
Dates
- Publication Date
- 20260505
- Application Date
- 20260317
Claims (10)
- 1. A rollback method, comprising: acquiring metadata of a new deployment event; Acquiring a communication link topological graph among all services; Determining abnormal characteristics and abnormal reasons of an abnormal link according to the service information, the communication link topological graph and the real-time link performance data related to the new deployment event in the metadata; Determining a rollback strategy according to the abnormal characteristics and the abnormal reasons, and generating a rollback operation sequence by utilizing a link topological graph according to the rollback strategy; and executing corresponding rollback operation according to the corresponding component of each rollback operation in the rollback operation sequence.
- 2. The method of claim 1, wherein determining anomaly characteristics and anomaly causes for an anomaly link based on information of services, communication link topology maps, and real-time link performance data related to a new deployment event in the metadata comprises: determining at least one influencing link according to the information of the service related to the new deployment event in the metadata and the communication link topological graph; acquiring real-time link performance data, wherein the real-time link performance data comprises at least one key performance index influencing the links before and after deployment; The method comprises the steps of evaluating a target influence link according to real-time link performance data to obtain link abnormality information of the target influence link, wherein the link abnormality information comprises abnormality characteristics, whether the target influence link is an abnormal link and an abnormality reason when the target influence link is the abnormal link; and determining abnormal characteristics and reasons of the abnormal links according to the link abnormal information.
- 3. The method of claim 2, wherein evaluating the target-affected link based on the real-time link performance data to obtain link anomaly information for the target-affected link comprises: Determining respective index evaluation values of a plurality of key performance indexes of the target influence link according to the performance data of the target influence link of the real-time link performance data; determining whether a target key performance index is an abnormal feature according to an index evaluation value of the target key performance index of a target influence link, wherein the target key performance index is any performance index in a plurality of key performance indexes; determining a link evaluation value of a target influence link according to respective index evaluation values of a plurality of key performance indexes of the target influence link; determining whether the target influence link is an abnormal link or not according to the link evaluation value of the target influence link and a corresponding preset link threshold value; And if the target influence link is an abnormal link, determining abnormal information of the target influence link.
- 4. The method of claim 1, wherein obtaining a topology of a communication link between services comprises: continuously acquiring service instance information of all registered services; Analyzing the network flow logs in the distributed system in real time to obtain an actual communication mode among services, wherein the actual communication mode represents calling information and communication information among the services; Acquiring complete call chain data between services; and constructing a communication link topological graph among the services according to the service instance information of the services, the actual communication mode among the services and the complete call chain data.
- 5. The method of claim 1, wherein generating a sequence of rollback operations using a link topology map according to a rollback policy comprises: Generating an initial rollback operation sequence corresponding to the target influence link by utilizing a link topological graph according to the rollback strategy; Detecting whether an operation conflict exists in the initial rollback operation sequence; And if operation conflict exists, optimizing the initial rollback operation sequence to obtain the rollback operation sequence.
- 6. The method of claim 1, wherein performing a respective rollback operation in accordance with each rollback operation corresponding component in the sequence of rollback operations comprises: based on the distributed transaction and idempotent manner, executing corresponding rollback operation according to the corresponding component of each rollback operation in the rollback operation sequence.
- 7. The method according to any one of claims 1 to 6, further comprising: Monitoring the execution state and progress of the rollback operation in real time; And/or the number of the groups of groups, If the fault occurs, fault response is carried out based on a preset strategy; And/or the number of the groups of groups, After each rollback operation in the rollback operation sequence is completed, key performance indexes of the links after rollback are monitored, the key performance indexes are compared with the baseline performance indexes to determine a first rollback result, the health state of the links after rollback is obtained, a second rollback result is determined based on the health state, and whether rollback is successful or not is determined according to the first rollback result and the second rollback result; And/or the number of the groups of groups, After the rollback is successful, key performance indexes of the links after the rollback and the health states of the links after the rollback are continuously monitored so as to monitor long-term trend.
- 8. A rollback apparatus, comprising: The deployment event sensing unit is used for acquiring metadata of a new deployment event; The link topology modeling unit is used for acquiring a communication link topology diagram among the services; An anomaly impact analysis unit, configured to determine an anomaly characteristic and an anomaly cause of an anomaly link according to information of a service related to a new deployment event in the metadata, a communication link topology map, and real-time link performance data; The rollback strategy generation unit is used for determining a rollback strategy according to the abnormal characteristics and the abnormal reasons, and generating a rollback operation sequence by utilizing a link topological graph according to the rollback strategy; and the rollback execution coordination unit is used for executing corresponding rollback operation according to the corresponding component of each rollback operation in the rollback operation sequence.
- 9. An electronic device comprising a memory in which a computer program is stored and a processor which, when running the computer program, performs the method of any one of claims 1 to 7.
- 10. A computer readable storage medium having stored therein at least one program code loaded and executed by a processor to implement the method of any one of claims 1 to 7.
Description
Rollback method, rollback device, rollback equipment and rollback medium Technical Field The present application relates to the field of computer technologies, and in particular, to a rollback method, apparatus, device, and medium. Background With the popularity of cloud computing, containerization, and micro-service architecture, modern enterprise-level applications commonly employ distributed micro-service architecture to build systems. This architecture improves the scalability, resilience and deployment efficiency of the system by splitting the monolithic application into multiple independent micro-services, each of which is dedicated to a specific business function. However, this architecture also presents new challenges in that deployment, maintenance, and failure recovery of the application becomes more complex. In highly interconnected and rapidly changing production environments, stable operation of applications and rapid recovery from failures become critical to maintaining business continuity. Application rollback is a necessary mechanism to cope with deployment failures or software bugs, aimed at restoring the system state to a previously known stable and working version. In a link-level distributed application scenario, since an application is typically composed of multiple interdependent services or modules, a complex call link is formed, and a local failure may rapidly spread along the link, severely affecting the overall business process. Average fault recovery time is an important indicator for measuring system reliability. In modern software development with fast iterations, how to implement fast and efficient application rollback to minimize the downtime and scope of impact becomes a core technical challenge for current distributed system management. Disclosure of Invention The object of the present application is to provide a rollback method, apparatus, device and medium that enables a fast and efficient application of rollback to minimize downtime and impact scope. According to a first aspect, a rollback method is provided, which comprises the steps of obtaining metadata of a new deployment event, obtaining a communication link topological graph among services, determining abnormal characteristics and abnormal reasons of an abnormal link according to information of the services related to the new deployment event in the metadata, the communication link topological graph and real-time link performance data, determining a rollback strategy according to the abnormal characteristics and the abnormal reasons, generating a rollback operation sequence according to the rollback strategy by using the link topological graph, and executing corresponding rollback operation according to components corresponding to each rollback operation in the rollback operation sequence. The method can be further configured to determine the abnormal characteristics and the abnormal reasons of the abnormal links according to the service information, the communication link topological graph and the real-time link performance data related to the new deployment event in the metadata, wherein the method comprises the steps of determining at least one influence link according to the service information and the communication link topological graph related to the new deployment event in the metadata, acquiring real-time link performance data, wherein the real-time link performance data comprises key performance indexes of at least one influence link before and after deployment, evaluating the target influence link according to the real-time link performance data to obtain link abnormal information of the target influence link, wherein the link abnormal information comprises abnormal characteristics, whether the target influence link is an abnormal link and the abnormal reasons when the target influence link is the abnormal link, determining the abnormal characteristics and the abnormal reasons of the abnormal link according to the link abnormal information. The method can be further configured in a preferred example to evaluate the target influencing link according to real-time link performance data to obtain link abnormality information of the target influencing link, wherein the method comprises the steps of determining respective index evaluation values of a plurality of key performance indexes of the target influencing link according to the performance data of the target influencing link of the real-time link performance data, determining whether the target key performance index is an abnormality feature according to the index evaluation values of the target key performance indexes of the target influencing link, determining the link evaluation value of the target influencing link according to the index evaluation values of the target key performance indexes of the target influencing link, determining whether the target influencing link is an abnormality link according to the link evaluation values of the target influ