Search

CN-115756783-B - Space task dependency scheduling method and system across subsystems

CN115756783BCN 115756783 BCN115756783 BCN 115756783BCN-115756783-B

Abstract

The invention provides a space task dependency scheduling method and a space task dependency scheduling system of a cross-subsystem, which comprise the steps of configuring event triggers on a workflow, attributing a plurality of event triggers to a scheduling topology template, acquiring all the triggers attributed to the event triggers after topology scheduling triggering, analyzing the event triggers into a cross-space flow dependency topological relation, finding out a starting workflow to schedule, realizing upstream and downstream workflow dependency notification of the cross-work space flow dependency topological relation by using a message bus mechanism, establishing a distributed memory bus mechanism, abstracting a message bus, realizing TSS platform and message queuing service and corresponding consumption logic decoupling by using an SPI mechanism, and realizing plug-in message queuing service. The invention embodies a transverse expansion mechanism of distributed computing power to the greatest extent when facing the increase of the traffic volume, supports the dynamic increase of the cluster scheduling nodes to cope with the traffic pressure, and solves the problem of complex scheduling among task flows based on automatic calculation of topology dependency relation by algorithm.

Inventors

  • XIA XUESONG
  • SONG JIANHAI
  • ZHANG YUNLONG
  • HU BING
  • HUANG MING
  • DING JIANGWEI
  • ZHOU MING
  • Qing Linxin
  • CAI LIMING

Assignees

  • 上海宝信软件股份有限公司

Dates

Publication Date
20260508
Application Date
20221116

Claims (10)

  1. 1. A space task dependent scheduling method of a cross-subsystem is characterized by comprising the following steps: Step 1, configuring event triggers on workflows under a working space based on a TSS distributed workflow scheduling platform, configuring corresponding upstream dependent workflows and corresponding workflow running state results in the triggers to finally trigger rules, and attributing the event triggers of a plurality of workflows to a scheduling topology template; step 2, acquiring all triggers belonging to the topology scheduling trigger after the topology scheduling trigger, analyzing the upstream and downstream workflow dependency relationship of a plurality of unidirectional triggers into a complete cross-space flow dependency topology relationship through a preset algorithm, and finding out a starting point workflow for scheduling; step 3, using a message bus mechanism to realize upstream and downstream workflow dependency notification crossing the workflow dependency topological relation in a dispatching service logic, integrating a flow dependency message triggering and message consumption mechanism in the dispatching service, and realizing flow control based on a node backpressure fault tolerance mechanism of a TSS workflow dispatching platform; and 4, based on different production environment business volumes and resource allocation, establishing a distributed memory bus mechanism, abstracting a message bus, and using an SPI mechanism to realize decoupling of a TSS platform and message queuing service and corresponding consumption logic, thereby realizing plug-in message queuing service.
  2. 2. The method for space task dependent scheduling across subsystems according to claim 1, wherein based on a workflow event message triggering mechanism, an upstream task is judged by a scheduling service according to logic after completion and sends event messages to a message queue, and a downstream message consuming mechanism consumes event messages in the message queue to realize cross-workflow event dependent push triggering.
  3. 3. The method for space task dependent scheduling across subsystems according to claim 1, wherein asynchronous stream dependent event triggering node downtime exception handling logic is set, and the state of the memory bus before downtime is calculated and recovered according to the topology instance, the corresponding various message instance records, and the completed workflow instance state.
  4. 4. The method for space task dependent scheduling across subsystems according to claim 1 is characterized in that when a trigger of a topology instance encounters a triggering exception, if the triggering exception is judged to be non-resource overload or does not meet the logic rule level exception, the trigger automatically performs corresponding exception handling according to a plurality of selectable exception policies configured by a user, if the trigger is the scheduling platform resource is temporarily overloaded to cause the exception, a message bus is used for notifying an asynchronous trigger message compensation mechanism to perform polling asynchronous handling, and normal scheduling of the topology instance is ensured after cluster load is reduced.
  5. 5. The method for scheduling space task dependence of a cross subsystem according to claim 1, wherein all initial and final workflow nodes in the whole topology template are calculated through a preset algorithm based on upstream and downstream workflow dependence of a unidirectional trigger, a complete directed graph is obtained according to quantification of the unidirectional upstream and downstream workflow dependence in the trigger as a direction, then a DFS backtracking algorithm is combined based on Tremaux search, topology relation loop detection judgment is performed, it is ensured that after a workflow event trigger is configured by a user, the algorithm automatically detects whether a workflow dependence trigger loop is introduced after the current trigger is added into the topology template, and if the loop is detected, feedback prompt is automatically performed on all workflows on the loop and corresponding triggers according to the dependence sequence.
  6. 6. A cross-subsystem spatial task dependent scheduling system, comprising the following modules: The module M1 is used for configuring event triggers on workflows under a working space based on a TSS distributed workflow scheduling platform, configuring corresponding upstream dependent workflows and corresponding workflow running state results in the triggers to finally trigger rules, and attributing the event triggers of a plurality of workflows to a scheduling topology template; The module M2 is used for acquiring all triggers belonging to the topology scheduling after the topology scheduling is triggered, analyzing the upstream and downstream workflow dependency relations of a plurality of unidirectional triggers into a complete cross-space flow dependency topology relation through a preset algorithm, and finding out a starting point workflow for scheduling; a module M3, using a message bus mechanism to realize upstream and downstream workflow dependency notification of the cross-workflow dependency topological relation in a dispatching service logic, integrating a flow dependency message triggering and message consumption mechanism in the dispatching service, and realizing flow control based on a node backpressure fault tolerance mechanism of a TSS workflow dispatching platform; And the module M4 establishes a distributed memory bus mechanism based on different production environment traffic volumes and resource allocation, abstracts the message bus, and uses an SPI mechanism to realize decoupling of a TSS platform and message queuing service and corresponding consumption logic, thereby realizing plug-in message queuing service.
  7. 7. The system according to claim 6, wherein based on a workflow event message trigger mechanism, an upstream task is logically judged by a scheduling service and sends an event message to a message queue after completion, and a downstream message consumption mechanism consumes the event message in the message queue to implement cross-workflow event push trigger.
  8. 8. The cross-subsystem space task dependent scheduling system of claim 6, wherein asynchronous flow dependent event triggering node downtime exception handling logic is provided to calculate and recover the state of the memory bus before downtime based on the topology instance and corresponding various message instance records, and the completed workflow instance state.
  9. 9. The system of claim 6, wherein when the trigger of the topology instance encounters a triggering exception, if the triggering exception is determined to be a non-resource overload or an exception not meeting a logic rule level, the trigger automatically performs corresponding exception handling according to a plurality of selectable exception policies configured by a user, and if the trigger is an exception caused by temporarily overload of resources of the scheduling platform, the message bus is used to inform the message compensation mechanism of the asynchronous trigger to perform polling asynchronous handling, so that the topology instance is ensured to resume normal scheduling after the cluster load is reduced.
  10. 10. The system of claim 6, wherein all initial and final workflow nodes in the entire topology template are calculated by a preset algorithm based on upstream and downstream workflow dependency relations of unidirectional triggers, a complete directed graph is obtained by quantifying the unidirectional upstream and downstream workflow dependency relations in the triggers into directions, then a DFS backtracking algorithm is combined based on Tremaux search, topology relation loop detection judgment is performed, it is ensured that after a workflow event trigger is configured by a user, the algorithm automatically detects whether a workflow dependency trigger loop is introduced after the current trigger joins the topology template, and if the loop is detected, all workflows on the loop and corresponding triggers are automatically fed back according to the dependency sequence.

Description

Space task dependency scheduling method and system across subsystems Technical Field The invention relates to the technical field of task scheduling, in particular to a method and a system for space task dependent scheduling of a cross-subsystem. Background The cloud DGC data development module supports operation of the job in an event-triggered mode, so that job scheduling can be realized across spaces by taking DIS or MRS Kafka as a job-dependent tie. As shown in FIG. 1, job1 in workspace 1 may be completed by sending a message to trigger Job2 using DIS CLIENT or KAFKA CLIENT, job2 configuring event triggered scheduling, and triggering the running Job according to the message sent by DIS CLIENT or KAFKA CLIENT. Compared with the Hua-Chen cloud solution, the cross-space workflow dependent scheduling solution is insensitive to the original workflow in the aspects of the dependent event message sending and the expense of the workflow, and does not need to add any extra steps or workflow. The workflow triggers for topology and workspace are relatively independent and therefore less coupled. Patent document CN114510235A (application number: CN 202210114410.3) discloses a full life cycle management system and method for a scientific computing program, the system including a build environment subsystem providing computer resources for the build process during the life cycle of the scientific computing program and a production environment subsystem providing computer resources for the test and deployment process during the full life cycle of the scientific computing program. The ali cloud DataWorks supports the working space under the same area to carry out cross-working space dependence, and according to the scheduling dependence principle, the output of an upstream node is used as the input of a downstream node to form node dependence, so that the scheduling dependence of the cross-working space is realized. For example, adding the output of node A in workspace A as the input of node B in workspace B may implement cross-workspace dependencies. The configuration method is the same as the scheduling dependency configuration of the general scene, and detailed operation can be seen from the same period scheduling dependency of the configuration. Compared with an Arian cloud solution, the cross-space workflow dependent scheduling scheme of the invention supports a template multi-instance mode, has higher abstraction degree for actual service scenes, is more flexible to use by users, can reuse the existing topology templates, solves the problem of creating a large number of similar topologies or tasks by matching with dynamic parameter transmission, and supports the high concurrency scheduling topology templates to start a plurality of corresponding topology instances because service isolation is carried out among topology trigger instances. Disclosure of Invention Aiming at the defects in the prior art, the invention aims to provide a method and a system for space task dependent scheduling of a cross-subsystem. The space task dependency scheduling method of the cross-subsystem provided by the invention comprises the following steps: Step 1, configuring event triggers on workflows under a working space based on a TSS distributed workflow scheduling platform, configuring corresponding upstream dependent workflows and corresponding workflow running state results in the triggers to finally trigger rules, and attributing the event triggers of a plurality of workflows to a scheduling topology template; step 2, acquiring all triggers belonging to the topology scheduling trigger after the topology scheduling trigger, analyzing the upstream and downstream workflow dependency relationship of a plurality of unidirectional triggers into a complete cross-space flow dependency topology relationship through a preset algorithm, and finding out a starting point workflow for scheduling; step 3, using a message bus mechanism to realize upstream and downstream workflow dependency notification crossing the workflow dependency topological relation in a dispatching service logic, integrating a flow dependency message triggering and message consumption mechanism in the dispatching service, and realizing flow control based on a node backpressure fault tolerance mechanism of a TSS workflow dispatching platform; and 4, based on different production environment business volumes and resource allocation, establishing a distributed memory bus mechanism, abstracting a message bus, and using an SPI mechanism to realize decoupling of a TSS platform and message queuing service and corresponding consumption logic, thereby realizing plug-in message queuing service. Preferably, based on a workflow event message triggering mechanism, after the task at the upstream is completed, the scheduling service judges and sends event messages to a message queue according to logic, and a message consumption mechanism at the downstream consumes the event messages in