DB2 & IMS

Many people in the computer field are surprised when they learn that tape was available before disk. Think about your favorite old TV shows or movies; when the director wanted to show data being processed, you would see flickering lights on part of the CPU or console area. Often, you would see the old-fashioned tape reel being written to or read. Tape was one of the earliest means of storing and conveying large amounts of data and information. IBM introduced the 726 tape unit in 1952, while the first IBM disk, the 350, was introduced in 1956.

When it comes to data, DB2 professionals generally think about disk, but most sites store four to 15 times more data on tape than on disk. Today, there are a variety of tape types available, and some tapes aren’t even tapes at all but rather disk. IBM sells as a broad category manual, automated and virtual tape. Cartridges and disks generally replace the old tape reels, which allow for better reliability, performance and floor space reductions. Gone are the days when a computer operator had to frequently clean the tape drives, splice broken tape or try to unkink a reel when it bunched up.

Manual tapes still exist in the form of cartridges. Backing up a 3390 mod 3 no longer consumes five tape reels. Modern cartridges can hold 4 TB of information. It isn’t uncommon to place more than 10 full-volume disk backups on one cartridge. Automated Tape Library (ATL) uses robotic components to mount cartridges. The beauty of the ATL is that tape management is fast, as it doesn’t rely on an operator to mount or move tapes around; it’s all handled by robotics. Neither manual nor ATL tapes use disk, so there’s no virtualized layer. One key drawback to consider with cartridges that hold up to 4 TB is what happens when a tape snaps; we’ll examine that later.

Some tape data sets never make it to tape; rather, they’re redirected to disk with products such as the IBM Tape Mount Management (TMM) and Virtual Tape Facility for Mainframe (VTFM). Tape virtualization is more commonly used with the IBM TS7720 (tapeless tape) or the TS7740 (Hydra, which is the combination of a front-ended disk and back-ended tape). There are other tape technologies, such as deduplication, which won’t be discussed here. Tapeless tape is fast becoming a favorite of many sites, as you see tape operations, such as a mount, unload, keep, etc., but no tapes exist; it’s all RAID 6 disk. The TS7740 is a combination of disk and tape. The TS7740 is front-ended by RAID 5 disk, but back-ended by physical tape.

Virtualization

Tape virtualization has many similarities to disk virtualization. One of the commonalities is that as I/O devices, they have several powerful processors (generally Power7) to deal with all the virtualization and other requirements.

From an MVS perspective, virtualized tape operates as a real tape; for example, tape mount commands are real. MVS issues these commands, but no tapes are involved. Mounting a scratch tape for a DB2 archive log or image copy data set shows you the total tape flow: Mount the tape, display what tapes are being used and unwind the tape. Although MVS shows you all these operations, none are occurring.

One key drawback to these virtualized tapes is that MVS tells DB2 the data sets that reside on them are on real tape and therefore can’t be shared; they must be used in a serial operation. No parallel operations may exist, even when the data really resides on the disk portion of the device. Consider, for example, a DB2 archive log data set that resides on the disk portion of virtual tape and now has 10 recovery jobs wanting to use it in parallel. Unfortunately, the 10 jobs must execute serially; no parallelism is possible unless the archive log data set is brought back down to real disk and DB2 now knows it exists on disk. An alternative is to have Hierarchical Storage Management (HSM) migrate the archive log data sets from disk to tape, in which case they will be recalled to disk when required. The one exception to the parallelism issue is when using VTFM, which has the look of tape, resides on disk, but allows for parallel requests by using Parallel Access Tape (PAT).

As a point of interest for DB2 professionals, the repository that manages tapes in IBM Virtual Tape Systems (VTSes) resides on DB2 for LUW (in this case, UNIX in the virtualized hardware). You can’t connect to that DB2 to read the repository, as it’s totally segregated.

For virtualized tape, the TS7720 is a disk-only system, while the TS7740 is front-ended by disk and back-ended by real tape. The disk portion of both the TS7720 and TS7740 is called Tape Volume Cache (TVC). Tape allocations occur to the TVC, meaning that when DB2 allocates your archive log or image copy tape data sets, although it looks like the allocation was to a tape, it resides on disk; in this case, the TVC. For the TS7740, DB2 doesn’t deal with the back-end tape; we only know and care about the logical tape volume written to the TVC. One logical or physical tape can hold many data sets. If the archive log data set was written to tape volser 123456, that’s what’s important to us and DB2, not the real physical tape on which it resides. Not all allocations to the TVC of the TS7740 should be treated the same because although the TVC has a large amount of space, in the big scheme of total space required, it’s limited. DB2 professionals should tell the storage administrator how long to keep data on the TVC vs. the physical tape. What are the chances you will need your archive log or image copy data sets in the near future? If the chances are low, the storage administrator can send your specific data sets off to tape more quickly and allow the TVC to have more space available for data sets that require faster access. TVC residency time is influenced by the SMS Storage Class value for Initial Access Response Time (IART), referring to the Preference level.

Logical tapes that are moved to physical tapes in the TS7740 are stacked, similar to the way HSM stacks data sets on HSM-owned tapes when using the MOD approach. This avoids tape waste. If you took a 4 TB manual tape and used 20 MB and didn’t mod onto it, then that’s all that’s used of the 4 TB, which is an enormous waste. If five data sets reside on one logical tape, and only data set three is required, the entire logical tape is brought back into the TVC. Reading directly from a manual, ATL or TS7720 tape may be much faster than from the TS7740 when a data set resides on physical tape because the logical volume must first be read from the physical tape that’s recalled into the TVC and then finally read. This is similar to the way we use disk—data goes through the cache on reads and writes.

2 Pages