Sounds good but what isn’t important? Or, a better question might be, what is important? The simple answer is, only the information necessary to achieve an efficacious recovery. As cavalier as that answer may seem, it really is that simple. And, backing up only important data can have enormous benefits in terms of reducing costs and resource consumption, improving recoverability, and achieving a greener data center.
How can you determine what is important when there are thousands or hundreds of thousands of data sets to consider?
An important data set is one that must be available before application recovery can begin. Conversely, a data set that isn’t important for recovery is one that’s created during the recovery process. As a practical matter, identifying important data sets can be an impossible task when approached manually, as data centers often choose to err on the side of caution and back up everything, just to be safe. As a result, they’ve been backing up information that isn’t important to the recovery process.
In all fairness, many data centers have had the necessary expertise to do a pretty good job of determining what is and what isn’t important. However, more and more data centers are losing that expertise as the relentless march of the baby boomer generation continues to reduce the mainframe skills pool. The very people who have the expertise to manually manage data classification and backup are moving on to well-deserved retirements and often they aren’t being replaced. To compensate for that loss of expertise, many data centers are choosing to back up their entire system using various remote and local replication technologies. Many of those same organizations also are choosing DASD over tape as their preferred backup media. This is fine until you consider the cost of the resources consumed. More DASD means more power is consumed, which generates more heat, which requires more cooling—all of which requires more floor space. Whereas backing up only the important data sets means you’re backing up less data, which means less equipment is needed, less power is consumed, less cooling is required, and less floor space is required to ensure recoverability and a greener data center.
What should you look for when selecting backup tools or developing a process to help you determine what’s important? First and foremost is the ability to accurately and automatically identify important data. In a dynamic data center environment the importance of data sets can change from one day to the next and new data sets that are necessary for a successful recovery can be created any time. To keep up with the changing environment, the tool(s) you select must be capable of automatically identifying and tracking important data sets. Look for products that don’t require additional third-party products; however, all the popular utilities such as DSS, FDR, CA-Disk, ABARS, etc. should be supported. Ensure the tool or process you choose is able to address the more subtle aspects of identifying important data sets, including concatenated data sets, VSAM clusters, migrated data sets, and data sets that may never be referenced but must be present for application recovery. Failure to recognize these data set components as being important can and probably will result in a failed recovery. Look for helpful features that automatically identify important data sets that are mirrored and non-mirrored. Even though the cost to store data on DASD continues to fall, the infrastructure costs to mirror data can be prohibitive. Ensuring that only important data is mirrored works to reduce the overall cost. In addition, look for tools that avoid duplicating data that usually is a result of a backup selection process that makes new copies of data sets day after day without regard to whether the data set has changed since the last backup. Backing up unchanged data sets only once can significantly reduce the resources used by the backup process. Auditing, or the ability to report exactly which files are required for recovery, why they’re required, and why other files aren’t required, is a critical component of any backup process. Be sure the tool enables you to monitor the backup and recovery processes. Last but not least, there should be a convenient method to include or exclude data sets based on individual data center policies so the entire process can be tailored to suit your unique policies and practices.