Feb 1 ’06
zStorage Industry Update and Trends zSTORAGE
Some storage trends can be fairly predictable and consistent. For example, more data is being stored with more copies being made and retained for longer periods. Likewise, it’s safe to say storage capacity, performance, and features will continue to advance. Recent events that will continue in 2006 include:
- Continued vendor consolidation
- Larger capacity, higher performance, more reliable, and physically smaller devices
- Focus on data protection, retention, and regulatory compliance
- Data and information security (e.g., encryption, asset tracking, and media disposal)
- Interest in data classification and policy-based automated storage management
- Pursuit of interoperability across hardware and software
- Automated conversion from smaller mainframe volumes to larger capacity volumes
- Growth of distributed and remote replicated data as well as network bandwidth
- Continued interest in storage virtualization, Information Lifecycle Management (ILM), grids
- Continued interest in fiber-based access using Network-Attached Storage (NAS)
- Wide Area Network (WAN) Services and Wide Area File Services (WAFS) awareness.
Highlights from 2005 and early 2006 include:
- Sun acquired StorageTek (STK), Seagate is buying Maxtor, and Imation is buying Memorex
- Hewlett-Packard (HP) acquired storage and management software vendors AppIQ and Peregrine Systems
- EMC’s acquisitions, including storage virtualization vendor Rainfinity and pieces of defunct switch start-up Marranti
- IBM refocused on NAS with a partnership with Network Appliance, which bought storage security startup Decru
- Symantec bought open systems storage software vendor VERITAS
- Cisco Systems made several acquisitions, including InfiniBand vendor Topspin Communications
- Brocade invested in WAFS and application accelerator vendor Tacit Networks
- Legacy storage networking and mainframe connectivity vendor McData acquired CNT, which had just digested its 2003 acquisition of INRANGE.
Compared to the open systems environment, the number of new vendors in the mainframe storage market is rather small. However, from a storage perspective, new development is still taking place for mainframes. Specifically, last year:
- EMC introduced the latest Symmetrix DMX-3 that now supports 2,400 disk drives
- Softek added an LDMF data migration facility
- Sun Microsystems (formerly STK) offered enhanced disk and tape solutions
- Hitachi Data Systems (HDS) provided TagmaStore USP and new NSC models
- IBM made processor, disk, tape, and software enhancements as well as released FICON Express2 for z990 and z890 systems and a statement of direction that the SAN Volume Controller (SVC) will gain zSeries support in the future but only for FCP access by Linux images running on zSeries.
For Storage Resource Management (SRM), Microsoft Excel remains a popular tool in non-mainframe environments. Excel doesn’t do the data collection, modeling, and other advanced functions found in traditional performance and capacity planning or SRM-type tools, but it’s good enough for many environments. Some vendors have figured out that by linking SRM to traditional server and storage performance and capacity planning, they spend less time on missionary work (i.e., educating customers about the importance of resource monitoring) and more time discussing solutions.
Data classification is another storage topic to keep an eye on. Solutions for early adopters of data classification technology are emerging. Deep data classification technology remains in its infancy, especially for large, mission-critical production environments. Learn and understand the differences between content and context-based classification and how it applies to you.
Data differencing, also known as data de-duplication, is a technique to eliminate duplicate data; it shouldn’t be confused with compression. Examples of technologies using data differencing include WAFS and WANS bandwidth optimization solutions and technologies such as backup, replication, and mirroring disk-based backup libraries. By identifying duplicate files, only one copy of a file must be saved; this reduces the capacity needed. Another example is to reduce the amount of duplicate data blocks that must be sent over a network link; this improves latency and optimizes bandwidth.
Confusion lingers as to what is and isn’t grid as well as what is a service, product, or architecture approach. Will 2006 be the year of the grid? That depends on your interpretation of what a grid is. It’s similar to ILM; both terms are used liberally to refer to different things. Grids are being used to refer to on-demand services, including remote storage capacity for sale or rent, managed services, compute clusters and server farms, storage systems composed of clustered general-purpose processors and many others.
Ask yourself a few questions:
- What is the real benefit of a grid solution? What is your storage access profile?
- Do you really need a grid or want one only because it’s the latest technology?
- Which of your applications need a grid? Is the grid for compute or storage purposes?
- How would delivery of storage services to your environment improve with a grid offering compared to traditional approaches?
Data protection, security, and business continuance should remain popular issues for the foreseeable future, given increased media coverage and awareness of information and data loss as well as regulatory and data privacy pressures.
To address security requirements, the focus has been on increased use of encryption and reducing the physical movement and handling of data. Key management is an important aspect of encryption. If you encrypt your archives, will you be able to access and unlock your encrypted data in several years? How will you manage the keys and tools to encrypt and decrypt data? Where to encrypt is another question; should it occur while the data is at rest on storage media, while in transit, or via a combination of approaches?
Disk-based data protection, including backup, mirroring or replication, Point-in-Time (PIT) copy and snapshots along with long-term retention (archive), remain popular. Highcapacity disk drive deployments, including ATA and Serial ATA (SATA), remain popular as secondary storage. High-capacity Fibre Channel disk drives are also finding their way into enterprise-class storage systems as an alternative to SATA disk drives for storage-centric and tiered storage applications. SATA disk drives are available in capacities up to 500GB, while high-capacity Fibre Channel disk drives are at 400GB.
While not new in 2005, Continuous Data Protection (CDP) received more coverage and debate due to the arrival of products from established vendors such as Microsoft, IBM and EMC, among others. One way to look at CDP is to consider your Recovery Time Objective (RTO), or how long you can afford your data to be unavailable, along with Recovery Point Objectives (RPOs), or how much data you can afford to lose. The traditional technique has been to perform scheduled full and regular incremental or differential backups perhaps combined with some journaling and replication of data locally or remotely. Your RTO and RPO may require that no data be lost and little to no downtime incurred. Another variation would be that you can afford some downtime with some data loss. For some people, CDP means all data is constantly protected and can be recovered to a particular state (RPO) in a short time if not instantaneously (RTO) with a fine granularity.
Near-CDP refers to larger granularity for RPO and RTO, however, finer than traditional backup provides. An example of near-CDP would be Microsoft Data Protection Manager (DPM), which has a default granularity of an hour. Compared to a pure CDP that enables an RPO and RTO of zero (no data loss or disruption), near-CDP, as in the Microsoft example, would have a default behavior of RPO of one hour or less.
While pure CDP continues to mature and gain traction, the real growth area for CDP will be with near- CDP for Small and Medium-Size Businesses (SMBs) and Small Office/ Home Office (SOHO) environments using cost-effective solutions such as those based on Microsoft DPM for Windows. Some solutions, such as Microsoft DPM, are block- or volume-based, while others are file-based such as IBM Tivoli Continuous Data Protection for Files. There are advantages to both; depending on your needs, you may need a combination of technologies.
Additional storage and networking trends and improvements include:
- 4GB Fibre Channel continues to evolve with availability of switches, host adapters, and storage systems. Not all environments benefit from the increased bandwidth compared to 1GB or 2GB, but many environments can benefit from the lower latency and consolidation capabilities enabled by 4GB Fibre Channel and eventually, 4GB FICON.
- 10Gb Ethernet continues its evolution as a backbone network for high-bandwidth and consolidation applications. Development of copper-based 10Gb technology and lower-cost 10Gb chipsets continues, though widespread adoption, especially at the desktop, is still distant.
- Serial-Attached SCSI (SAS), not to be confused with Statistical Analysis System (SAS), is a relatively new storage interface intended to replace parallel SCSI (also known as UltraSCSI). Initial SAS deployments in 2005 were as embedded storage on servers from vendors, including HP, IBM and Sun, and in entry-level storage arrays. It may be a few years before SAS-based disk drives appear in high-end, enterprise-class storage systems, but entry-level and midrange disk arrays are prime candidates for SAS disk drives, as are disk-based backup and virtual tape libraries. One advantage of SAS is the ability to have SATA disk drives coexist using the same back plane interconnect along with general connectivity improvements over parallel SCSI.
- iSCSI, like InfiniBand, went through a massive hype cycle and then a relative quiet period and is increasingly being adopted. iSCSI is being deployed in primary and secondary storage environments, particularly in cost-sensitive environments where good performance is good enough. While debate about iSCSI vs. Fibre Channel continues, the real debate should be iSCSI vs. NAS and which, if not both, applies to your environment.
If you’re running or considering multiple Linux images on zSeries processors, you’ll want to be aware of N_ Port Virtual ID (NPIV). NPIV virtualizes physical Fibre Channel adapter ports presenting a virtual N_ Port to each image sharing an adapter. This means that without NPIV, each Linux image would need its own physical adapter to have a unique N_Port ID and World Wide Port Name (WWPN). That’s important because NPIV enables each virtual N_Port to have its own unique WWPN that can be used by storage-based Logical Units (LUNs) and volume masking and mapping features while sharing a physical adapter.
Figure 1 shows a zSeries mainframe with four Logical Partitions (LPARs), one supporting z/OS and three supporting Linux. The z/OS image has two physical channel adapters config ured for FICON for redundancy while Linux Images A and B each have a single adapter (no redundancy) and Linux Image C has two adapters for Fibre Channel FCP. The dashed lines indicate the primary data path with the solid line (from the switch to mainframe and storage devices) indicating the redundant path. A disadvantage of this configuration is that channel adapters need to be dedicated to the Linux LPARs unless a shared adapter is configured. The downside of using a shared adapter across the LPARs for Linux without using NPIV would be the inability to guarantee unique access from a specific Linux image to a specific LUN mapped to the shared physical port.
The solution is to use NPIV as seen in Figure 2, where a shared adapter presents a unique virtual N_Port for each image, enabling LUN mapping of a LUN to a specific Linux image for security and data integrity.
InfiniBand, which had been written off as dead technology, started regaining attention in 2005. InfiniBand has found its niche as a high-performance, low-latency, server-to-server interconnect for server and compute clusters, also known as compute grids. InfiniBand is also being used for high-bandwidth access to storage from vendors such as Engenio. InfiniBand as an interconnect interface supports multiple upper-level protocols and application interfaces, including iSER, Remote Direct Memory Access (RDMA), TCP/IP, xDAPL, and SCSI Remote Protocol (SRP), among others.
So does this signal it’s time to abandon Fibre Channel as a storage interface in favor of InfiniBand? For most environments, probably not. However, for environments and applications that need or want to leverage InfiniBand, it provides an interesting alternative to Fibre Channel and Ethernet-based iSCSI. It’s still unclear what, if any, advantage InfiniBand may have for a pure IBM mainframe environment, but for those with mixed server environments, InfiniBand is a technology to watch.
The storage market can still be characterized as a buyer’s market if you’re a prudent buyer. You can expect another active year in the storage industry with continued development and adoption of previously announced products and technologies, along with new ones. Understanding these different technologies and techniques, along with where they might fit in your environment, enables you to make more effective decisions. Strive to develop a strategy for storage that implements the goals and objectives of your overall IT strategy and complements your server and networking strategies.