The mainframe remains one of the most effective methods for batch processing and a large number of organizations rely on it. However, maintenance is a key consideration for any mainframe batch application. Large batch applications fail for various reasons. These batch applications are commonly used for overnight processing and have strict Service-Level Agreements (SLAs) that make quick failure resolution imperative. This article presents an effective solution to automate most of the manual activities required while restarting any failed batch stream.
A self-restartable application consists of intelligent jobs capable of restarting from the point of failure without any manual intervention. These jobs are smart enough to know whether their previous run was successful or not, and depending on that outcome, can decide the course of action. If the previous run of any job was successful, then the job will allow itself to follow the normal processing path. However, if the previous run was unsuccessful, then the job will be able to restart itself from the point of failure in the previous run. This restart will automatically include all necessary housekeeping required for the job to resume; it could be undoing certain updates, deleting certain files, or handling other tasks.
Restart of any job, especially with multiple steps, is difficult and requires that various activities be performed before resubmission:
- Fixing the problem that caused the failure, which may include removing corrupt data, changing data feeds, etc.
- Finding the correct restart point of the job. (We might want to restart the job from some previous step due to a dependency.)
- Housekeeping or reverting to updates that would have happened during the failed run. Sometimes, due to complexity, we might need to use a separate restart procedure to revert to the updates.
- Editing the Job Control Language (JCL) with the restart step.
These activities can take hours to complete. With the self-restartable application, we have an automated mechanism that can take care of most of these activities. The concept requires little investment, can save both effort and money, and can be deployed gradually (a few jobs at a time) without causing any dependency on the execution of other jobs.
The following sections explain how to design a self-restartable application, upon which the required component can be coded and customized for use in any mainframe batch application.
Self-Restartable Application Components
The basic components of a self-restartable application are:
- A program or procedure to record the point (step) of failure. This will keep track of the current job step and save it for subsequent use. It will keep updating the name of current steps as they execute in a file during execution of the entire job.
- A file to hold all the jobs and their current steps. This will be a Keyed Sequence Data Set (KSDS) file consisting of a job name (key) and a step name. This is used to record the point of failure.
- A program or procedure to govern restart of any job. This program will run at the beginning of a job to check if the last run was unsuccessful, and if so, it will identify which job step needs to be restarted. This will also dynamically change the JCL of the failed job to meet the restart needs.
- A restart procedure for each restart point (step). This will be required to run just before the failed step in case of restart and will perform the housekeeping required to rerun the failed step; the housekeeping may include certain updates, reverting updates, or deleting certain files.
Let’s examine each component of the self-restartable application in detail.
Record the Point of Failure