Deduplication has been a buzzword in the storage industry for many years now. All storage vendors agree that it’s valuable and each has versions of the technology available, with roadmaps for additional future products that leverage deduplication capabilities. Mainframe operators have learned to be wary about industry hype and have the discipline to stick to what truly works, is manageable, and solves real business issues.
Deduplication for mainframe data centers has generated a mixed bag of information and perspectives from the leading mainframe vendors. Some of what has been said and heard can be summed up in these statements:
• “Mainframe data is different from other data, and it isn’t structured for deduplication to work well.”
• “Deduplication is an open systems technology—don’t trust it for mainframe storage management.”
• “We offer the primary benefits of deduplication already as part of our software and data management processes. Once again, the mainframe was first and other environments are still trying to copy and catch up.”
• “We will offer mainframe deduplication to our customers when we understand how the technology will help them.”
These statements each have elements of truth, but don’t share the full story of what we know today. Mainframe deduplication is available and has been proved by many enterprises and operators to solve important business issues while being reliable, non-disruptive, cost-effective, and easy to use.
Let’s consider the preceding statements more closely. Is mainframe data really different? It’s certainly true that mainframe tape data can be different from most open systems tape data. Open systems tape is primarily used for backup and disaster recovery operations. Mainframe environments use tape for backup, but also for strategic archives, and as primary sources of data in daily and batch operations. Figure 1 illustrates this point, showing the use of tape in each of four distinct areas of mainframe data storage.
Each data type will yield different results when using deduplication. Backups are well-suited to 20 times, 30 times, or greater deduplication. Other types of mainframe data, such as the primary data managed by DFSMShsm ML2, VSAM, Queued Sequential Access Method (QSAM), or other common applications, are the original unique copies of each data set and aren’t the same data repeatedly being written. There’s inherently less redundancy with this type of data, and depending on the application and type of data being stored, the deduplication ratios can widely vary. But as the amount of data stored grows, the process of eliminating redundant data can usually provide three times or greater reduction. When this is coupled with compression of the remaining data as part of the storage process, the total data reduction can still be significant. Based on a history of customer experience, total deduplication rates for primary types of data will typically range from six to 20 times, with 10 to 12 times as a reasonable upfront expectation.
Is deduplication an open systems technology, which isn’t reliable for mainframe data? The products and technologies themselves have been around long enough, and are so widely used, that any unreliability would be well-understood. Mainframe data centers of all sizes, in every industry, and across at least four continents are using deduplication in production today— and have for years. When combined with remote replication for disaster recovery, mainframe data centers have found it to be significantly more reliable than traditional tape-based approaches.