Back in the day, storage management at a large, multibillion dollar global enterprise was comparatively easy. Storage was directly attached to servers, networks were “hard-wired” in place and virtualization was a concept rather than a reality. You need more capacity? You buy more storage. You need a business continuity solution? You buy more storage. You need to back up data? You buy more storage. You need storage at a remote branch office? You buy more storage. Simple, right?
Fast-forward to today. Cost pressures have forced IT managers to get more out of existing storage and servers. No longer is the solution to buy more. Instead, storage managers are looking to use more of the storage capacity they already have and to use that capacity more efficiently—all with the goal of driving down storage media costs, storage management costs and energy costs. New technologies, such as storage virtualization, storage networking, storage tiering, “flash” Solid-State Disk (SSD) storage, thin provisioning, deduplication and data compression have made this possible.
Previous articles have discussed storage management, storage virtualization/storage hypervisors, storage networking and storage tiering, which relies heavily on SSD flash. (For more information, please see “Storage Tiering: Optimizing Storage Price/Performance in the ‘Big Data’ Era” available at http://entsys.me/ja1vv and “Multi-Platform Storage Management: Reducing Complexity in Virtualized Computing Environments” available at http://entsys.me/wshj7.)
Here we will focus more closely on three specific storage efficiency enablers—thin provisioning, deduplication and data compression.
The idea behind thin provisioning is that applications only consume the capacity they’re actually using rather than the total capacity that has been allocated. In this way, disk storage can be shared among multiple users, improving efficiency and providing savings on hardware, energy and space. One way to think about this is that a storage volume has both a real capacity and a virtual capacity. In the absence of thin provisioning—that is, in a fully allocated volume—real capacity and virtual capacity are the same. But in a thin provisioned volume, virtual capacity will be much larger than the real capacity. This enables a scenario where future growth of an application can be accommodated without assigning storage capacity before it’s actually needed—providing scalability in a “pay-as-you grow” model. For example, a storage administrator might expect an application to grow to require 100 TB (virtual capacity), but that application only needs 20 TB (real capacity) today. By using thin provisioning, physical capacity is minimized while still providing for future growth.
IBM Thin Provisioning
IBM supports thin provisioning to improve storage utilization across its range of storage systems. It works like this: The real capacity is used to store data that’s written to the volume as well as the metadata that describes the thin-provisioned configuration of the volume. In general, thin provisioning improves efficiency by optimizing the utilization of available storage, thereby reducing capital costs and postponing the need to purchase new storage devices. Thin provisioning also simplifies server administration, since volumes can be configured with a large virtual capacity, and as application needs change, real capacity is transparently and dynamically allocated without administrator intervention and without any disruption to the application. IBM XIV includes thin provisioning with space reclamation, which means zeroing out deleted space and returning that capacity to a global pool of storage, allowing unused space to be shared across the entire storage infrastructure.
Let’s move on to other storage efficiency features, deduplication and compression. Bear in mind that thin provisioning doesn’t reduce the actual amount of data being stored, it just makes better use of available capacity. Deduplication and compression actually reduce the volume of data being stored, which also improves storage efficiency. Deduplication analyzes stored data and looks for files or large sections of files that are the same, so that only one copy of that file or data is stored. Compression looks for redundancy within a file (i.e., short, repeated substrings) and eliminates them typically using Lempel Ziv (LZ) file compression algorithms. Block storage deduplication works in the same way (see the following section). Storage administrators can take advantage of both features to dramatically reduce the amount of required storage.
Deduplication technology originated in the backup space where many copies of the same data were being backed up in each backup cycle. Today, deduplication is still typically used with highly redundant data sets found in backup applications and sequential access workloads. Figure 1 illustrates the deduplication process.