US-12625734-B2 - High availability scheduler event tracking
Abstract
Aspects include monitoring, by a controller, an operational status of a tracker system that is configured to track and record a current status of a job being executed and to report completion of the job to the controller. The recording includes storing two copies of the current status, where a first copy is stored in a shared memory location accessible by the controller. In response to determining, based on the monitoring, that the tracker system is operational, waiting to receive a job completion message for the job from the tracker system and performing a job completion action based on receiving the job completion message. In response to determining that the tracker system is not operational, obtaining the current status of the job from the shared memory location and performing the job completion action based on the current status indicating that the job has completed.
Inventors
- Xin Xin Dong
- Ming Qiao Shang Guan
- Mai Zeng
- Wei Song
Assignees
- INTERNATIONAL BUSINESS MACHINES CORPORATION
Dates
- Publication Date
- 20260512
- Application Date
- 20220105
Claims (14)
- 1 . A method comprising: monitoring, by a controller, an operational status of a job tracking module that is configured to track and record a current status of a job being executed and to report completion of the job to the controller, the recording comprising storing two copies of the current status, a first copy stored in a coupling facility (CF) list structure that is globally addressable by a plurality of logical partitions (LPARs) of a sysplex and accessible by the controller, and a second copy stored in an extendable common service area (ECSA) queue resident in a same LPAR of the plurality of LPARs as the job tracking module and not directly accessible by the controller; in response to determining, based on the monitoring, that the job tracking module is operational, waiting to receive a job completion message for the job from the job tracking module and performing a job completion action based on receiving the job completion message, wherein the job completion action is selected from a plurality of job completion actions based on how the job completed, wherein how the job completed is one of in error, canceled, or without error; in response to determining, based on the monitoring, that the job tracking module is not operational: obtaining the current status of the job from the CF list structure; and performing the job completion action based on the current status indicating that the job has completed, wherein performing the job completion action enables the controller to continue execution of other jobs or actions that depend on the job being executed, thereby alleviating processing delays and improving response times responsive to the job tracking module not being operational; and in response to determining that the job tracking module becomes available after not being operational for a period of time, resynchronizing events between the ECSA queue, the CF list structure, and an event dataset.
- 2 . The method of claim 1 , wherein current status records corresponding to the job are removed from the CF list structure upon completion of the job.
- 3 . The method of claim 1 , wherein it is determined that the job tracking module is not operational in response to the controller not being able to communicate with the job tracking module.
- 4 . The method of claim 1 , wherein the job completion action comprises initiating execution of a second job.
- 5 . The method of claim 1 , wherein the current status of the job is obtained from one or both of a system management facility (SMF) record and a job entry system (JES) record.
- 6 . A system comprising: one or more processors for executing computer-readable instructions, the computer-readable instructions controlling the one or more processors to perform operations comprising: monitoring, by a controller, an operational status of a job tracking module that is configured to track and record a current status of a job being executed and to report completion of the job to the controller, the recording comprising storing two copies of the current status, a first copy stored in a coupling facility (CF) list structure that is globally addressable by a plurality of logical partitions (LPARs) of a sysplex and accessible by the controller, and a second copy stored in an extendable common service area (ECSA) queue resident in a same LPAR of the plurality of LPARs as the job tracking module and not directly accessible by the controller; in response to determining, based on the monitoring, that the job tracking module is operational, waiting to receive a job completion message for the job from the job tracking module and performing a job completion action based on receiving the job completion message, wherein the job completion action is selected from a plurality of job completion actions based on how the job completed, wherein how the job completed is one of in error, canceled, or without error; in response to determining, based on the monitoring, that the job tracking module is not operational: obtaining the current status of the job from the CF list structure; and performing the job completion action based on the current status indicating that the job has completed, wherein performing the job completion action enables the controller to continue execution of other jobs or actions that depend on the job being executed, thereby alleviating processing delays and improving response times responsive to the job tracking module not being operational; and in response to determining that the job tracking module becomes available after not being operational for a period of time, resynchronizing events between the ECSA queue, the CF list structure, and an event dataset.
- 7 . The system of claim 6 , wherein current status records corresponding to the job are removed from the CF list structure upon completion of the job.
- 8 . The system of claim 6 , wherein it is determined that the job tracking module is not operational in response to the controller not being able to communicate with the job tracking module.
- 9 . The system of claim 6 , wherein the job completion action comprises initiating execution of a second job.
- 10 . The system of claim 6 , wherein the current status of the job is obtained from one or both of a system management facility (SMF) record and a job entry system (JES) record.
- 11 . A computer program product comprising a computer-readable storage medium having program instructions embodied therewith, the program instructions executable by one or more processors to cause the one or more processors to perform operations comprising: monitoring, by a controller, an operational status of a job tracking module that is configured to track and record a current status of a job being executed and to report completion of the job to the controller, the recording comprising storing two copies of the current status, a first copy stored in a coupling facility (CF) list structure that is globally addressable by a plurality of logical partitions (LPARs) of a sysplex and accessible by the controller, and a second copy stored in an extendable common service area (ECSA) queue resident in a same LPAR of the plurality of LPARs as the job tracking module and not directly accessible by the controller; in response to determining, based on the monitoring, that the job tracking module is operational, waiting to receive a job completion message for the job from the job tracking module and performing a job completion action based on receiving the job completion message, wherein the job completion action is selected from a plurality of job completion actions based on how the job completed, wherein how the job completed is one of in error, canceled, or without error; in response to determining, based on the monitoring, that the job tracking module is not operational: obtaining the current status of the job from the CF list structure; and performing the job completion action based on the current status indicating that the job has completed, wherein performing the job completion action enables the controller to continue execution of other jobs or actions that depend on the job being executed, thereby alleviating processing delays and improving response times responsive to the job tracking module not being operational; and in response to determining that the job tracking module becomes available after not being operational for a period of time, resynchronizing events between the ECSA queue, the CF list structure, and an event dataset.
- 12 . The computer program product of claim 11 , wherein current status records corresponding to the job are removed from the CF list structure upon completion of the job.
- 13 . The computer program product of claim 11 , wherein it is determined that the job tracking module is not operational in response to the controller not being able to communicate with the job tracking module.
- 14 . The computer program product of claim 11 , wherein the job completion action comprises initiating execution of a second job.
Description
BACKGROUND The present invention relates generally to computer processing, and more specifically, to high availability scheduler event tracking. IBM® Z® Workload Scheduler is an example of a workload automation solution that enables organizations to automate, plan, and control the processing of complex systems' workloads. It allows workflows to be managed from a single point of control across multiple platforms and business applications. The controller is the focal point of the IBM Z Workload Scheduler configuration. It contains the controlling functions, Interactive System Productivity Facility (ISPF) dialogs, databases, and plans. The system that the controller is started on is referred to as the IBM Z Workload Scheduler controlling system. IBM Z Workload Scheduler systems that communicate with the controlling system are called controlled or tracker systems. The controller provides a single, consistent, control point for submitting and tracking the workload on any operating environment. IBM Z Workload Scheduler provides distributed agents and open interfaces that can be used to integrate the planning, scheduling, and control of work units such as online transactions, file transfers, or batch processing in any operating environment that can communicate with z/OS®. An execution tracker (or “tracker”) is required for every z/OS system in an IBM Z Workload Scheduler configuration. The tracker handles the submission of jobs and tasks on the system, and keeps track of the status of the workload. In conjunction with standard interfaces to a Job Entry Subsystem (JES) and System Management Facilities (SMF), IBM Z Workload Scheduler records the relevant information about the workload by generating event records. The event records are captured and stored by the tracker. The tracker then communicates event information to the controller for further processing. The log where events are written by the tracker is called the event data set. The IBM Z Workload Scheduler address spaces are defined as z/OS subsystems. The routines that run during subsystem initialization establish services that enable event information to be generated and stored in an extended common service area (ECSA). In an IBM Z Workload Scheduler, the tracker handles the submission of jobs on the system, keeps track of the status of the workload, and sends the event records to the controller. The events are stored in an ECSA queue, then copied into the events dataset, and finally sent to the controller. When a failure of the tracker occurs, if there were events in the ECSA queue not yet copied to the event dataset, they will be unavailable until the tracker becomes operational again. For a job(s) that has already completed on the target system, a delay in processing can occur because the scheduler may be blocked from continuing execution of subsequent jobs until the controller gets the results indicating that the job has completed. SUMMARY Embodiments of the present invention are directed to methods for high availability scheduler event tracking. A non-limiting example method includes monitoring, by a controller, an operational status of a tracker system that is configured to track and record a current status of a job being executed and to report completion of the job to the controller. The recording includes storing two copies of the current status, including a first copy that is stored in a shared memory location accessible by the controller. In response to determining, based on the monitoring, that the tracker system is operational, the controller waits to receive a job completion message for the job from the tracker system and performs a job completion action based on receiving the job completion message. In response to determining that the tracker system is not operational, the controller obtains the current status of the job from the shared memory location and performs the job completion action based on the current status indicating that the job has completed. This can provide an improvement over known methods of tracking by alleviating the delay in processing that can occur in contemporary systems when a tracker system becomes unavailable. By quickly identifying jobs that have completed even when the tracker is not operational, one or more embodiments of the present invention allow the controller or scheduler to continue with the execution of other jobs or other actions that depend on the job being completed. This can lead to improved response times and faster processing. In addition to one or more of the features described above or below, or as an alternative, in further embodiments of the prevent invention a second copy of the current status is stored in an extended common service area (ECSA) that is not accessible by the controller system. One or more embodiments advantageously provide local access to the current status to the tracking module. In addition to one or more of the features described above or below, or as an alternative, in further