Search

CN-114416284-B - Distributed operating system control method, device, equipment, medium and program product

CN114416284BCN 114416284 BCN114416284 BCN 114416284BCN-114416284-B

Abstract

The disclosure provides a control method, a device, equipment, a medium and a program product of a distributed operating system, which relate to the technical field of computer application, in particular to the technical field of distributed operation. The method comprises the steps of responding to a first container carrying a first process, determining the current fault type of the first container when the first process is triggered to terminate based on the fact that the first container breaks down, rebuilding the first container if the current fault type is consistent with a target fault type, and restarting the first process based on the rebuilt first container. The method and the device can reconstruct the container according to the fault type capable of successfully reconstructing the container, and not reconstruct the container according to the fault type incapable of successfully reconstructing the container, so that the operation cost of the system is saved, and the operation requirement is met.

Inventors

  • WANG SHUAIJIAN
  • LI SHIYONG
  • ZHANG HENGHUA
  • LI PANPAN
  • HU ZAIBIN
  • LUO BAOTONG

Assignees

  • 北京百度网讯科技有限公司

Dates

Publication Date
20260508
Application Date
20211224

Claims (9)

  1. 1. A method of distributed operating system control, the method comprising: for a first container carrying a first process, determining a current failure type of the first container failure in response to detecting that the first process is terminated by a triggered process based on the first container failure; Reconstructing the first container if the current fault type is consistent with the target fault type, restarting the first process based on the reconstructed first container, and not reconstructing the first container if the current fault type is inconsistent with the target fault type; The target fault type is suitable for reconstruction of each container in the distributed operating system to which the first container belongs; wherein prior to reconstructing the first container, the method further comprises: obtaining container reconstruction information, wherein the container reconstruction information is used for indicating a container to be reconstructed when the container fails, and determining that the container to be reconstructed comprises the first container based on the container reconstruction information; If the container to be rebuilt comprises a second container based on the container rebuilding information, rebuilding the second container, and restarting a second process carried by the second container based on the rebuilt second container, wherein the second container is at least one container in a duplicate set of the first container, and has an association relation with the duplicate set of the first container, and is a container in a work of the first container; wherein the target fault type is predetermined by the following method: judging the reconstruction result corresponding to the container when the container has faults of different fault types, determining the fault type suitable for reconstructing each container in the distributed operating system according to the reconstruction result, and setting the fault type suitable for reconstructing each container in the distributed operating system as a target fault type.
  2. 2. The distributed operating system control method of claim 1, wherein the target fault type is characterized by a first identification; the current fault type is consistent with the target fault type, and is determined by the following method: Acquiring a second identifier representing the current fault type; determining that the first identifier matches the second identifier.
  3. 3. The method of claim 2, wherein the first identification comprises a first exit code generated by the first container upon failure of the first container to a target failure type; the obtaining a second identifier characterizing the current fault type includes: acquiring a second exit code generated by the first container based on the current fault type; Determining that the first identity matches the second identity comprises: Determining that the first exit code matches the second exit code.
  4. 4. A distributed operating system control apparatus, the apparatus comprising: The detection module is used for detecting that a first process is stopped by a triggered process based on the first container which is used for bearing the first process; A determining module that determines a current failure type of the first container failure in response to detecting that the first process is terminated based on the first container failure being triggered; The processing module is used for rebuilding the first container under the condition that the current fault type is consistent with the target fault type, restarting the first process based on the rebuilt first container, and if the current fault type is inconsistent with the target fault type, not rebuilding the first container, wherein the target fault type is suitable for the rebuilt fault type of each container in the distributed operation system to which the first container belongs; Wherein, the determining module is further configured to: Before reconstructing the first container, obtaining container reconstruction information, wherein the container reconstruction information is used for indicating a container to be reconstructed when the container fails, and determining that the container to be reconstructed comprises the first container based on the container reconstruction information; If the container to be rebuilt comprises a second container based on the container rebuilding information, rebuilding the second container, and restarting a second process carried by the second container based on the rebuilt second container, wherein the second container is at least one container in a duplicate set of the first container, and has an association relation with the duplicate set of the first container, and is a container in a work of the first container; wherein the target fault type is predetermined by the following method: judging the reconstruction result corresponding to the container when the container has faults of different fault types, determining the fault type suitable for reconstructing each container in the distributed operating system according to the reconstruction result, and setting the fault type suitable for reconstructing each container in the distributed operating system as a target fault type.
  5. 5. The apparatus of claim 4, wherein the target fault type is characterized by a first identification; The determining module determines that the current fault type is consistent with the target fault type in the following manner: Acquiring a second identifier representing the current fault type; determining that the first identifier matches the second identifier.
  6. 6. The apparatus of claim 5, wherein the first identification comprises a first exit code generated by the first container upon failure of the first container to a target failure type; The determining module obtains a second identifier characterizing the current fault type in the following manner: acquiring a second exit code generated by the first container based on the current fault type; the determining module determines that the first identifier matches the second identifier in the following manner: Determining that the first exit code matches the second exit code.
  7. 7. An electronic device, comprising: At least one processor, and A memory communicatively coupled to the at least one processor, wherein, The memory stores instructions executable by the at least one processor to enable the at least one processor to perform the method of any one of claims 1-3.
  8. 8. A non-transitory computer readable storage medium storing computer instructions for causing the computer to perform the method of any one of claims 1-3.
  9. 9. A computer program product comprising a computer program which, when executed by a processor, implements the method according to any of claims 1-3.

Description

Distributed operating system control method, device, equipment, medium and program product Technical Field The present disclosure relates to the field of computer application technologies, and in particular, to the field of distributed operation technologies. Background The distributed operating system is used to execute distributed jobs that carry respective processes of the distributed jobs through one or more containers. Further, the operation result of the distributed operation is obtained through the process execution result of the corresponding process. Disclosure of Invention The present disclosure provides a distributed operating system control method, apparatus, device, medium, and program product. According to an aspect of the present disclosure, there is provided a distributed operating system control method, the method including: The method comprises the steps of responding to a first container carrying a first process, determining the current fault type of the first container when the first process is started based on the fact that the first container breaks down and is triggered to be stopped, rebuilding the first container if the current fault type is consistent with a target fault type, and restarting the first process based on the rebuilt first container, wherein the target fault type is suitable for the rebuilt fault type of each container in a distributed operation system to which the first container belongs. According to another aspect of the present disclosure, there is provided a distributed operating system control apparatus, the apparatus including: The distributed operation system comprises a detection module, a determination module, a processing module and a restarting module, wherein the detection module is used for detecting a first container carrying a first process, the first process is terminated based on a triggered process when the first container breaks down, the determination module is used for determining the current fault type of the first container when the first process is terminated based on the triggered process when the first process is detected to break down, the processing module is used for reconstructing the first container when the current fault type is consistent with a target fault type, and restarting the first process based on the reconstructed first container, wherein the target fault type is suitable for the fault type reconstructed by each container in the distributed operation system to which the first container belongs. According to another aspect of the present disclosure, there is provided an electronic device including: At least one processor, and A memory communicatively coupled to the at least one processor, wherein, The memory stores instructions executable by the at least one processor to enable the at least one processor to perform the distributed operating system control method referred to above. According to another aspect of the present disclosure, there is provided a non-transitory computer-readable storage medium storing computer instructions for causing the computer to execute the above-mentioned distributed operating system control method. According to another aspect of the present disclosure, there is provided a computer program product comprising a computer program which, when executed by a processor, performs the above-mentioned distributed operating system control method. It should be understood that the description in this section is not intended to identify key or critical features of the embodiments of the disclosure, nor is it intended to be used to limit the scope of the disclosure. Other features of the present disclosure will become apparent from the following specification. Drawings The drawings are for a better understanding of the present solution and are not to be construed as limiting the present disclosure. Wherein: FIG. 1 is a flow diagram of a distributed operating system control method according to the present disclosure; FIG. 2 is a flow diagram of determining that a current fault type is consistent with a target fault type according to the present disclosure; FIG. 3 is another flow diagram of determining that a current fault type is consistent with a target fault type according to the present disclosure; FIG. 4 is a flow diagram of another distributed operating system control method according to the present disclosure; FIG. 5 is a flow diagram of yet another distributed operating system control method according to the present disclosure; fig. 6 is a block diagram of a distributed operating system control device according to the present disclosure. FIG. 7 is a block diagram of an electronic device used to implement a method of distributed operating system control in accordance with an embodiment of the present disclosure. Detailed Description Exemplary embodiments of the present disclosure are described below in conjunction with the accompanying drawings, which include various details of the embodiments of the presen