Storage

Being a parent of several teenagers (and a former teen myself), I’ve seen too many times how bad choices, reinforced by other bad choices, can lead to extraordinarily bad outcomes—even when intentions are pure and noble. The same seems to hold true in the realm of technology.

Years ago, the industry sought to solve the problem of growing capacity demand (and associated costs) by moving from internal storage to external arrays. Nothing wrong with that, since direct attached storage, like DASD in mainframes, tended to be mostly managed via the host operating system. However, over time, again to solve capacity and cost challenges, we figured out how to deploy a lot of storage arrays—on networks in the case of file server appliances or network attached storage (NAS) arrays, and on fabric interconnects in the case of storage area networks (SANs). Unfortunately, we didn’t anticipate the impact of scaling storage in the absence of any sort of coherent way to manage all the plumbing.

Along came the vendors of storage virtualization, who sought to economize on the expense of storage (mainly by breaking hardware vendor lock-ins and divorcing expensive on-box, value-add software functionality from the hardware controller) and to make its capacity capable of allocation in a more agile way. 

They accomplished what they set out to do, including centralizing the management of the “services” of storage, but they did nothing to solve the problem of how to manage the hardware and plumbing of the real-world storage infrastructure.

Storage virtualization technology begat “cloud storage” architecture that, in its “private cloud” implementation model, enabled users to “self-allocate” their storage. This was seen as a good thing by non-technical business managers because it reduced the labor costs for storage in the form of skilled storage administrators. But again, cloud architecture provided little in the way of physical layer management. 

Today’s “software-defined storage” crowd is moving the ball even further, insisting they’re abstracting the concept of a disk away from the hardware infrastructure altogether. Rather than tethering application software to specific “hard-wired” storage targets, software-defined storage provides a virtual volume to the application for use in storing its data. This virtual construct, itself being only software, can move with the workload as it slides from one virtualized server host to another. So, whenever you make a change in the hardware infrastructure, you don’t need to disrupt application work. That too sounds marvelous, but, again, these enhancements to the storage narrative aren’t accompanied by any discussion of improved ways and means to manage the physical infrastructure.

Like teenagers seeking to repaint a car they never wash, everyone is aiming to redress the presentation layer. We’re dead set on buying and implementing new and innovative technology for abstracting what we want from what we already have, patting ourselves on the back for being so cost-conscious. Missing, however, is any interest in attacking the much less exciting but much more fundamental and important problem of managing what we have more intelligently.   

Total cost of ownership analyses suggest that storage costs a lot more to manage and administer over its useful life than it does to acquire and deploy. Some statistics suggest that as much as $6 may be spent in administration and maintenance annually for every dollar spent on hardware acquisition. Most of the administration work is performed manually using the rudimentary toolkit provided on each storage array. Few firms have instrumented their storage infrastructure with the necessary software kit to manage it holistically, to detect burgeoning problems so they can be rectified before they create downtime. In fact, very few of us include “manageability” as one of the top 10 discriminators we use when buying hardware; this fact hasn’t been lost on vendors and is often cited to explain why the hardware vendors don’t work together to deliver a universal, standards-based approach for managing all their kits in common: Consumers don’t reward the effort!

Meanwhile, we’re getting good information from root cause analyses of tens of thousands of storage array failures regarding the vulnerability of the physical infrastructure. It turns out that disks fail up to 1,500 times more frequently than vendors claim. Disk failures account for about half of storage system downtime, followed closely by interconnect failures (cabling, fans, power supplies, etc.), and further back by protocol issues (the software processes that enable a hosted application to read and write data to a specific location on disk). 

We could predict and resolve most disk failures simply by listening to self-monitoring, analysis and reporting technology (SMART) messages generated by virtually every disk to signal vibration, heat and other conditions that typically lead to drive failures. Unfortunately, most of us haven’t deployed anything to listen with.

Bottom line: Just as I’m getting tired of my teens telling me they want some cool, new wall posters to cover up the holes they’ve made in their walls (rather than wallboard, joint compound and paint to actually fix the walls), I’m getting tired of all the new memes of storage architecture that ignore the basic fact that we aren’t managing the physical infrastructure well at all.