One of the most gut-wrenching experiences a mainframe technician can have is to find out during a crisis that data he thought was safely backed up to tape can’t be restored. Not only is this a heart-stopping experience, it also can be a career-halting fiasco. There are many reasons for restore failures, including tape media failure, tape drive failure, and simply not backing up the appropriate disk drives.
This article presents a homegrown DASD backup system that uses the EMC Business Continuance Volume (BCV) mechanism, Innovation Data Processing’s FDR, a small Rexx script and standard MVS utilities that not only back up data, but also automatically verify the data being backed up can actually be restored. When my shop was recently audited for Sarbanes- Oxley (SOX), both the internal and external auditors loved this setup.
The following major steps are performed within the backup system:
- A control data set is updated with the current date and time
- The BCV volumes are re-established (synced)
- The BCV volumes are split
- The BCV is backed up to tape
- The control data set is restored to another volume and compared with the original.
When we back up DB2 data volumes, there are additional steps required to suspend DB2 activity prior to the split and resume DB2 activity after the split. These steps won’t be covered here.
Multiple job streams are used to back up our entire DASD farm. Each job stream consists of five jobs that perform the five functions previously listed. Each job stream backs up approximately 30 3390-3 volumes, which are stacked on a single tape volume. We simultaneously create two backup tapes using FDR’s capabilities, which means each job stream uses two tape drives. During our backup window, we have 10 tape drives available for our use, so we set up our scheduling system to keep five job streams running concurrently to minimize the time required to complete the backup cycle.
The Control Data Set
The control data set is the heart of the verification function. This data set is updated at the beginning of each backup job stream. After the backup has been performed, this control data set is restored (using a new name) and the contents are compared between the control data set on disk and the restored version. If the compare fails, the backup wasn’t successful.
A control data set resides on every DASD volume and has the naming convention of: OS.LSTBKUP.Vxxxxxx where xxxxxx is the volume serial number of the DASD volume. The control data set is initially allocated using the JCL shown in Figure 1.
The first job of each backup jobstream updates this control data set with the current date and time. The record placed in the control data set contains data such as: “20 Jun 2006 10:33:41”.