Jan 1 ’07

Consolidation Drives Virtualization in Storage Networks

by Editor in z/Journal

While great power brings great responsibilities, great computing power brings virtualization. Virtualization brings finer control to powerful systems that can be logically divided into multiple processes that can be run simultaneously. With the System z9 running z/VM and up to 60 Logical Partitions (LPARs) that each support many Linux on System z servers, it needs a storage network that can effectively handle this power. The System z9 uses new Fibre Channel techniques of N_ Port_ID Virtualization (NPIV) and Fibre Channel virtual switches in virtual fabrics to support Linux on System z under changing situations. This article explores these standardized storage networking virtualization techniques in a practical application.

Storage networks are expanding into virtual realms that have been widely accepted in mainframe environments for decades. Mainframes have housed many virtual processes to increase performance, utilization, and manageability. The same transformation is occurring in storage networks at two levels. NPIV lets individual Linux on System z servers access open systems storage that’s usually much lower cost than DASD associated with mainframes. Virtual fabrics let the storage network create multiple Fibre Channel virtual switches using the same physical hardware. A good way to get to know these virtualization techniques is through a typical implementation that has problems which are relieved with these storage virtualization techniques.

The Distributed Storage Network

Figure 1 shows a typical data center that’s spread across two sites. Each site has a mix of open systems applications running on open systems servers and mainframe applications running on Linux on System z servers. The mainframe applications are mirrored across both sites, but only a small percentage of the open systems applications are located on both sites. This mixed environment has two fabrics with two 64-port directors that connect to open systems storage and DASD, and a third fabric that connects backup applications to tape on two 24-port switches. A redundant set of fabrics isn’t shown in the drawing for simplicity. These fabrics have grown organically, and the multiple types of open systems servers have become difficult to manage and continue to grow faster than the mainframe applications.  

Many open systems servers have sporadically popped up in the data center with random business application requirements. Every time a department finds a new application, servers sprout like weeds in spring and require new cabling, switching, storage, and management. The open systems servers also bring administrative baggage such as procurement, maintenance, and inventory. A few servers a week, month or quarter turn into racks of servers over the years. The System z9 has targeted these racks of servers with Linux on System z virtual servers. Instead of acquiring new hardware for new or overloaded applications, IBM would like to solve the problem of multiplying open systems servers with virtual Linux on System z servers.

Linux on System z servers can replace open systems servers and be up and running in a matter of minutes or hours instead of the days or weeks it takes to acquire and install physical open systems servers. The Linux on System z servers offer cost savings by using low-cost open systems storage instead of the expensive DASD, but many deployments may still use their existing DASD with NPIV.

The FICON adapter is made for Fibre Channel, which supports multiple upper-level protocols such as FCP and FICON. So the same adapter can use the FCP devices with a different code load. A new adapter would probably be needed for the new application, but the cost of the HBA shouldn’t be more than the cost of the FICON adapter.

World-class computing resources can now use low-cost open systems storage with Linux on System z and NPIV. The Linux on System z servers scale predictably and quickly and offer many benefits such as consistent, homogenous resource management.

N_Port_ID Virtualization

While LPARs consumed one or more Fibre Channel ports, multiple Linux on System z servers can share a single Fibre Channel port. The sharing is possible because most open systems applications require little I/O. While the actual I/O data rate varies considerably for open systems servers, a rule of thumb is they consume about 10MB per second. Fibre Channel has reduced the I/O latency by scaling Fibre Channel speeds from 1Gigabit/second Fibre Channel (1GFC) to 2GFC, and 4GFC, with 8GFC expected in 2008. With each gigabit/second of bandwidth, one Fibre Channel port supplies 100 MB/s of throughput. A 4GFC link should be able to support about 40 Linux on System z servers from a bandwidth perspective.

To aggregate many Linux on System z servers on a single Fibre Channel port, the Fibre Channel industry has standardized NPIV, which lets each Linux on System z server have its own 3-byte Fibre Channel Address or N_Port_ID. After N_Ports (servers or storage ports) have acquired an N_Port_ID from logging into the switch, NPIV lets the port request additional N_Port_IDs—one for each Linux on System z server running z/VM Version 5.1 or later. With a simple request, the switch grants a new N_Port_ID and associates it to the Linux on System z image. The Worldwide_Name uniquely identifies the Linux on System z image and lets the mainframe Linux servers be zoned to particular storage ports.

Virtual Fabrics

Considering that a System z9 can have up to 336 4GFC ports, thousands of N_Port_IDs can be quickly assigned to a mainframe. Managing this many ports in a single fabric can become cumbersome and cause interference. To isolate applications running behind different N_Ports attached to the same switch or director, the T11 Fibre Channel Interfaces technical committee (www.t11.org) has standardized virtual fabrics. In a similar way to how the System z9 has multiple virtual servers, a physical switch chassis may support up to 4,095 Fibre Channel virtual switches.

Fibre Channel virtual switches relieve another difficulty shown in Figure 1, which is managing several distributed fabrics. The storage network has grown organically with different applications and the physical limitations of the switches. The table in Figure 2 shows the port counts for each fabric on each site. Fabrics 1 and 2 each have 64 port directors with 28 to 48 of the ports being used. The backup fabric has only 24 port switches, though, and only a single port is available on each of these switches. While Fabrics 1 and 2 have a considerable number of unused ports, the switches don’t have the capability to offer ports to the backup fabric. The usage rates of the fabrics also are relatively low and the mixture of products, firmware releases, and management applications makes the distributed fabric rather complex.

A better solution that meets all the needs of the data center is available. Figure 3 depicts the consolidated storage network where the physical configuration of the data center can be consolidated on two 256-port directors that can be divided into multiple virtual storage networks. These networks can be large or small and offer independent, intelligible management.

When large corporations require more than a thousand servers in their data center, the manageability and accountability for controlling storage networks can become unruly. Breaking the fabrics into small, virtual fabrics increases the manageability of the storage network. Instead of coordinating a team of administrators to manage one large storage network, individuals can manage a comprehensible piece of the solution after it’s broken down. The virtual nature of the new data center creates a management hierarchy for the storage network and enables administration to be accomplished in parallel.

A further explanation of the differences between the distributed storage network and the consolidated storage network illuminates the benefits of the new approach. In Figure 1, the distributed storage network has become a management nightmare where 58 open systems servers are consuming 58 Fibre Channel ports and the corresponding cabling and rack space. Each open systems server is using only a fraction of the link’s bandwidth as the speed of the link increases to 4GFC. The reliability of the variety of servers that have accumulated over the years is significantly less than the tested reliability of the System z9. The administrative costs of adding and repairing servers that change every quarter leads to complexity and inefficiency. The organic growth of the open systems servers has led to a population problem.

The consolidated data center in Figure 3 has become a unified architecture that’s adaptable and efficient. The System z9 supports mixed applications and is managed from a single screen or multiple screens. The director has been carved up into Fibre Channel virtual switches that suit the needs of each application today and quickly scale to about any level. The links in Figure 3 are black because they could be attached to any of the virtual fabrics. The combination of more powerful hardware and virtual processors and switches has led to a simpler architecture than the distributed storage network.

Figure 3 shows details of the virtualization techniques between the System z9 and the director. The mainframe applications have traditional direct links to the Fibre Channel virtual switches that connect the mainframe to the DASD. These links are typically running at 4.25 Gbits/second and can support high I/O Per Second (IOPS). The open systems applications are using NPIV to let open systems applications access the open systems storage. Figure 4 shows how one physical link to Virtual Fabric 1 supports eight open systems applications. Each open systems application has a zLinux server, Worldwide_Name and N_Port_ID for management purposes. With multiple open systems applications using a single link, the usage rates have increased to the levels shown in Figure 5. This table shows how fewer ports were used more efficiently in the consolidated storage network than in the distributed storage network. Even after several applications were added to the data center, the number of ports in the fabric decreased from 199 to 127.

With the link speed increasing to 4GFC, the utilization rate increased by more than three times to an average of 61 percent from 20 percent. With virtual fabrics, the number of ports used in each fabric is flexible and switch ports can be added to the fabric without having to buy another switch. Other Fibre Channel virtual switches also could be carved out of the director that has a higher reliability than multiple small switches. The benefits of the consolidated approach include efficiency, cost, and manageability.

Virtual Reality

IBM has worked with several companies to develop the standards to support these virtualization techniques for storage networking. These virtualization techniques were standardized in T11 so the System z9 could replace multiple open systems servers with Linux on System z servers. NPIV lets these servers use open systems storage to yield a low-cost solution with better properties. The Fibre Channel virtual switches that create virtual fabrics are another technique to increase manageability and let the physical switches be more adaptable.

NPIV and virtual fabrics will play into near-term solutions such as grid computing and computing on demand. To support automated computing power in these environments, the processors, storage, and storage networking must be driven by policy-based applications. The administrator establishes the performance policies and enables the soft or virtual layers on top of the hardware to automatically manipulate the resources to meet data center demands.

An aspect of virtual fabrics not covered here is virtual fabric tagging. Virtual fabric tagging lets multiple Fibre Channel virtual switches use the same physical link. Mixing of fabric traffic virtualizes the link and increases utilization of an InterSwitch Link (ISL). The virtual fabric tagging is highly effective for optimizing expensive, long-distance links.

A follow-up article will explore virtual fabric tagging and inter-fabric routing over large geographical areas. Storage networks offer much more than physical pipes. The intelligence being incorporated into virtualization is making the storage network an integral aspect of on-demand computing infrastructure. Z