Feb 1 ’06

zStorage Industry Update and Trends zSTORAGE

by Editor in z/Journal

Some storage trends can be fairly predictable and consistent. For example, more data is being stored with more copies being made and retained for longer periods. Likewise, it’s safe to say storage capacity, performance, and features will continue to advance. Recent events that will continue in 2006 include:

Highlights from 2005 and early 2006 include:

Compared to the open systems environment, the number of new vendors in the mainframe storage market is rather small. However, from a storage perspective, new development is still taking place for mainframes. Specifically, last year:

For Storage Resource Management (SRM), Microsoft Excel remains a popular tool in non-mainframe environments. Excel doesn’t do the data collection, modeling, and other advanced functions found in traditional performance and capacity planning or SRM-type tools, but it’s good enough for many environments. Some vendors have figured out that by linking SRM to traditional server and storage performance and capacity planning, they spend less time on missionary work (i.e., educating customers about the importance of resource monitoring) and more time discussing solutions.

Data classification is another storage topic to keep an eye on. Solutions for early adopters of data classification technology are emerging. Deep data classification technology remains in its infancy, especially for large, mission-critical production environments. Learn and understand the differences between content and context-based classification and how it applies to you.

Data differencing, also known as data de-duplication, is a technique to eliminate duplicate data; it shouldn’t be confused with compression. Examples of technologies using data differencing include WAFS and WANS bandwidth optimization solutions and technologies such as backup, replication, and mirroring disk-based backup libraries. By identifying duplicate files, only one copy of a file must be saved; this reduces the capacity needed. Another example is to reduce the amount of duplicate data blocks that must be sent over a network link; this improves latency and optimizes bandwidth.

Confusion lingers as to what is and isn’t grid as well as what is a service, product, or architecture approach. Will 2006 be the year of the grid? That depends on your interpretation of what a grid is. It’s similar to ILM; both terms are used liberally to refer to different things. Grids are being used to refer to on-demand services, including remote storage capacity for sale or rent, managed services, compute clusters and server farms, storage systems composed of clustered general-purpose processors and many others.

Ask yourself a few questions:

Data protection, security, and business continuance should remain popular issues for the foreseeable future, given increased media coverage and awareness of information and data loss as well as regulatory and data privacy pressures.

To address security requirements, the focus has been on increased use of encryption and reducing the physical movement and handling of data. Key management is an important aspect of encryption. If you encrypt your archives, will you be able to access and unlock your encrypted data in several years? How will you manage the keys and tools to encrypt and decrypt data? Where to encrypt is another question; should it occur while the data is at rest on storage media, while in transit, or via a combination of approaches?

Disk-based data protection, including backup, mirroring or replication, Point-in-Time (PIT) copy and snapshots along with long-term retention (archive), remain popular. Highcapacity disk drive deployments, including ATA and Serial ATA (SATA), remain popular as secondary storage. High-capacity Fibre Channel disk drives are also finding their way into enterprise-class storage systems as an alternative to SATA disk drives for storage-centric and tiered storage applications. SATA disk drives are available in capacities up to 500GB, while high-capacity Fibre Channel disk drives are at 400GB.

While not new in 2005, Continuous Data Protection (CDP) received more coverage and debate due to the arrival of products from established vendors such as Microsoft, IBM and EMC, among others. One way to look at CDP is to consider your Recovery Time Objective (RTO), or how long you can afford your data to be unavailable, along with Recovery Point Objectives (RPOs), or how much data you can afford to lose. The traditional technique has been to perform scheduled full and regular incremental or differential backups perhaps combined with some journaling and replication of data locally or remotely. Your RTO and RPO may require that no data be lost and little to no downtime incurred. Another variation would be that you can afford some downtime with some data loss. For some people, CDP means all data is constantly protected and can be recovered to a particular state (RPO) in a short time if not instantaneously (RTO) with a fine granularity.

Near-CDP refers to larger granularity for RPO and RTO, however, finer than traditional backup provides. An example of near-CDP would be Microsoft Data Protection Manager (DPM), which has a default granularity of an hour. Compared to a pure CDP that enables an RPO and RTO of zero (no data loss or disruption), near-CDP, as in the Microsoft example, would have a default behavior of RPO of one hour or less.

While pure CDP continues to mature and gain traction, the real growth area for CDP will be with near- CDP for Small and Medium-Size Businesses (SMBs) and Small Office/ Home Office (SOHO) environments using cost-effective solutions such as those based on Microsoft DPM for Windows. Some solutions, such as Microsoft DPM, are block- or volume-based, while others are file-based such as IBM Tivoli Continuous Data Protection for Files. There are advantages to both; depending on your needs, you may need a combination of technologies.

Additional storage and networking trends and improvements include:

If you’re running or considering multiple Linux images on zSeries processors, you’ll want to be aware of N_ Port Virtual ID (NPIV). NPIV virtualizes physical Fibre Channel adapter ports presenting a virtual N_ Port to each image sharing an adapter. This means that without NPIV, each Linux image would need its own physical adapter to have a unique N_Port ID and World Wide Port Name (WWPN). That’s important because NPIV enables each virtual N_Port to have its own unique WWPN that can be used by storage-based Logical Units (LUNs) and volume masking and mapping features while sharing a physical adapter.

Figure 1 shows a zSeries mainframe with four Logical Partitions (LPARs), one supporting z/OS and three supporting Linux. The z/OS image has two physical channel adapters config ured for FICON for redundancy while Linux Images A and B each have a single adapter (no redundancy) and Linux Image C has two adapters for Fibre Channel FCP. The dashed lines indicate the primary data path with the solid line (from the switch to mainframe and storage devices) indicating the redundant path. A disadvantage of this configuration is that channel adapters need to be dedicated to the Linux LPARs unless a shared adapter is configured. The downside of using a shared adapter across the LPARs for Linux without using NPIV would be the inability to guarantee unique access from a specific Linux image to a specific LUN mapped to the shared physical port.

 

The solution is to use NPIV as seen in Figure 2, where a shared adapter presents a unique virtual N_Port for each image, enabling LUN mapping of a LUN to a specific Linux image for security and data integrity.

InfiniBand, which had been written off as dead technology, started regaining attention in 2005. InfiniBand has found its niche as a high-performance, low-latency, server-to-server interconnect for server and compute clusters, also known as compute grids. InfiniBand is also being used for high-bandwidth access to storage from vendors such as Engenio. InfiniBand as an interconnect interface supports multiple upper-level protocols and application interfaces, including iSER, Remote Direct Memory Access (RDMA), TCP/IP, xDAPL, and SCSI Remote Protocol (SRP), among others.

So does this signal it’s time to abandon Fibre Channel as a storage interface in favor of InfiniBand?  For most environments, probably not. However, for environments and applications that need or want to leverage InfiniBand, it provides an interesting alternative to Fibre Channel and Ethernet-based iSCSI. It’s still unclear what, if any, advantage InfiniBand may have for a pure IBM mainframe environment, but for those with mixed server environments, InfiniBand is a technology to watch.

The storage market can still be characterized as a buyer’s market if you’re a prudent buyer. You can expect another active year in the storage industry with continued development and adoption of previously announced products and technologies, along with new ones. Understanding these different technologies and techniques, along with where they might fit in your environment, enables you to make more effective decisions. Strive to develop a strategy for storage that implements the goals and objectives of your overall IT strategy and complements your server and networking strategies.