Once upon a time, we had a simple archive strategy in the mainframe world. Software-based Hierarchical Storage Management (HSM) leveraged mainframe hardware architectural concepts such as storage classes to provide an easy to understand method for moving data across storage tiers over time until at last it found its way to magnetic tape. Along the way, we flirted with using additional software components to better manage what data was being stored where and for how long. We classified data based on factors beyond simplistic criteria such as date last used or modified and created policies that mapped data migration to business use cases. In short, we evolved the concept of archive into a concept of information lifecycle management.
That worked pretty well for a while—at least, until transactional data gave way to file-based data. In the traditional mainframe shop, which wasn’t exactly a user computing platform, user files weren’t the problem that they ultimately became for server and storage administrators in the distributed computing world. Still, by IDC’s accounting, files surmounted database output as the predominant form of data being created by organizations sometime in 2005. Now, most companies are drowning in them.
Files have proved to be a more challenging archive target than their more atomic peers in the transaction data world. While files share some things in common with database output (they occupy space, store as bits, etc.), they’re much harder to manage properly. In part, it’s because of the nature of a file: a self-contained entity that isn’t clearly part of a larger entity such as a database. Another problem is the sheer number of files. Let’s look at the second issue first.
For those old enough to remember the iconic Star Trek episode, “The Trouble With Tribbles,” files are a lot like those little furry beasties that are good for giving affection, but tend to get into the machinery, occupy every nook and cranny of available space, and make work generally more difficult for everyone. The more files you have, the harder it is to find just the one you’re seeking. That slows down operational performance during search and retrieval, and creates huge costs when the right files must be found quickly in response to a subpoena or summons.
While it’s a misnomer to call files unstructured data (file systems are highly structured, after all), they do tend to be non-conformist. Users don’t name their files in any sort of intelligent or consistent manner. They follow no rules that can be leveraged readily to group files into categories that reflect their purpose, relevance, or fit to external business reality (a project, a marketing initiative, etc.). Hence, files are anonymous and mysterious data in most organizations that consume precious resources and drive up both CAPEX (storage equipment) and OPEX (labor, backup and energy) costs at the exact time management wants to reduce both.
Files are thought to be important, since any file might possess a unique business insight that makes it a “crown jewel” or a “smoking gun” in the future. So, we can’t just delete them. But it’s clearly above the pay grade of most IT folk to take on the task of getting to know each file intimately or to come up with some clever way to sort them into a system of orderly categories (a signature attribute of a database, by the way).
This is the long way of introducing what I suspect will become an arc of columns concerning archive. Why do I want to pursue this focus? Simply put, I believe file archive (whether we call it that or not) is quite likely to become the “killer app” of 2012—not only for distributed IT mavens, but for mainframers, too. For the latter, the issue could mean redemption after years of mainframes being cast as dinosaur technology.
While file archiving has traditionally fallen outside the domain of mainframers, if IBM is successful with its zEnterprise initiative—blending the distributed and mainframe worlds into some sort of coherent whole—responsibility for solving the file archive quagmire promises to fall into the laps of mainframers. We will need to manifest the discipline and insight cultivated over the past half century of mainframe operations to come up with some answers.
In the next column, we will look more deeply at IBM SONAS and other technologies for file archiving, including tape file systems, that may provide some building blocks for an effective archive file storage repository. If anyone can solve this file archive conundrum, I will just bet it’s the mainframer.