Storage

My March/April column started a discussion of file archive, which we suggested might well be the real killer app this year. The simple fact is that in most businesses today, the volume of data being stored as files exceeds the volume of data being stored as transactional, or “block,” data. Given IBM’s efforts to expand the role of the mainframe to shoulder more of the workload currently hosted on x86 server platforms (its zEnterprise kit and strategy), where much of the file-based data is generated today, it seems likely that mainframers will soon confront the problem of what to do with all those pesky files.

File archive is a simplification of sorts of the grander data archiving strategies that have been advanced (but rarely implemented) many times over the past decade. It involves taking older files that are rarely re-referenced and migrating them—file system and all—onto more economical media. It isn’t essential the file archive repository be as nimble as primary storage, of course. The re-reference rates of the data don’t merit a high-cost, high-speed platform. To be honest, most of us are used to waiting when we access a document written 10 years ago from a Web-based repository. It’s important we can find the file—a process greatly simplified by using a file system and straightforward search method—and retrieve it reliably within a timeframe that meets our needs.

This rationale appears to be behind the current fascination within some quarters of the industry with the use of tape as an archive medium—but with a twist. Front-ending a tape library with a file system and using the latest tape technologies, including Linear Tape Open (LTO) 5 and some enterprise tape products that feature partitioning, enables the resulting kit to be used as a sort of “Network Attached Storage (NAS) on steroids.”

This is quite different from the common, but inefficient, use of tape backups as archives. A tape backup is a collection of bits in an organized structure, but it’s unwieldy from the standpoint of active archiving. To retrieve a file from a backup requires multiple unpacking steps that take time. An active archive is aimed at file access within, say, two minutes max. With NAS on steroids, that parameter is more effectively addressed.

Moreover, an active archive—emphasis on active—is designed to be accessed by users, not by backup experts. Convenient file retrieval is enabled both by the aforementioned and familiar file system and by providing access to the repository across a network using a popular network file system meme such as the Network File System (NFS) protocol or Common Internet File System (CIFS)/Server Message Block (SMB). Users already know how to retrieve a file from a file server connected on a network or specialty NAS appliance, which is basically the same thing for more money. So, enabling a tape NAS solution with a network file system mount has the additional advantage of making the archive convenient and flexible.

In addition to partitioned tape, which enables faster access to the starting location of a file recorded on the medium, we need a tape-friendly file system to make NAS on steroids work. IBM and others have provided file systems for tape for years, but arguably 2010 saw the introduction of one that most closely mimics the file systems commonly used on PCs: the Linear Tape File System (LTFS). LTFS can be downloaded for free from IBM and deployed on just about any generic server running most x86 operating systems. With this tape file system, you could cobble together a pretty good NAS on steroids using a server box with an NFS client, a couple of hard disks, a bit of inexpensive DRAM, a back-end tape system, and a LAN connection.

To reduce the cobbling, IBM offers a NAS technology, SONAS, which can be used with Tivoli Storage Manager to support an expansive tape NAS scenario. You can split up your NAS on steroids environment to several locations; say across your branch office network. Alternatively, you can provide an “archive in the cloud” solution, using pre-integrated front-end systems and your preferred partitioning capable tape library.

This approach makes enormous sense for “long file” or “big file” storage applications, such as broadcast or surveillance video or human genome data set archiving. Tape NAS shines in these kinds of scenarios because its data delivery efficiency, once the requested file is located on tape, is actually greater than retrieving the file from a disk repository. The streaming rate of tape is actually faster than most disk repositories for long files.

There are other reasons to consider tape NAS as a file repository that I will explore in the next column.