Storage Attached Network (SAN) is an efficient solution for many storage needs, especially where hundreds of gigabytes or even many terabytes are required. Linux running on z/VM can use SAN nicely, and System z provides Fibre Channel Protocol (FCP) adapters to connect to the SAN fabric. But is it that easy? Connect it and it’s done? Not even close. From a z/VM perspective, if you have more than a handful of servers using SAN, things can get daunting quickly. This article describes some of the pain points associated with managing direct-attached SAN (not EDEVs, otherwise known as VM-emulated devices) from a z/VM perspective and how to overcome them.
Why even bother with SAN on the mainframe? When you have large file system needs, SAN is usually much better than using Extended Count Key Data (ECKD) disks. For example, assume you needed a modest 500GB file system for a Linux on System z server. You could find 74 Mod 9 volumes and use Logical Volume Manager (LVM) to logically pool them together, or you could have your favorite SAN administrator define one or more Logical Unit Numbers (LUNs) to fit your needs.
There’s also less space overhead with SAN. ECKD-formatted DASD has a 4K block overhead that’s unavoidable. For example, a raw Mod 9 has roughly 8119.6MB (7.9GB) of space, but formatted at 4K, it will only have 7043.2MB (6.9GB) of usable space. SAN devices don’t have any formatting overhead.
Another good reason to use a SAN is that your Linux administrators are likely already familiar with it since it’s used on many other Linux/UNIX platforms.
Great! SAN is the way to go, right? Well, that isn’t always the case.
Your Disaster Recovery (DR) resource requirements can become pretty complex. There’s currently no really good way to manage LUNs on z/VM and no built-in or vendor solutions available, so your only choice is to build one yourself. Where SAN is familiar to a Linux admin, it isn’t familiar to a z/VM systems programmer who lives and breathes ECKD devices. Let’s further consider the management tool issue.
At our installation, we chose Node Port ID Virtualization (NPIV) for production and non-NPIV for test/development servers. With NPIV, each FCP subchannel/device has a unique World Wide Port Name (WWPN). LUNs are masked in the SAN to the WWPN of a specific subchannel/device. This protects those LUNs from being accessed from anywhere other than the mapped devices. Non-NPIV LUNs are masked to the FCP Channel Path Identifier (CHPID), so any subchannel/device on that CHPID could access the LUNs and sharing LUNs can be a real problem. Generally, NPIV is a pain to manage; non-NPIV is easier, but not as secure.
Our production SAN environment (NPIV) is replicated for DR. The targets are on “fast spindles” and clones (think Flashcopy) are made to “slower spindles” (read: lower cost) for DR testing. This introduced triple pain points. NPIV-specific information must be mapped for the DR (targets) and DR test (clones), so you know what set of LUNs is for what purpose. FCPs in the DR environment must be pre-mapped for your servers. FCPs must be allocated but not in use until DR on your recovery machine. If you do use them for other guests, they then have access to the production data on those LUNs. With this scenario, it’s also necessary to map DR targets or DR test clones in the Linux guest, depending on what mode they’re running in. All of your Linux servers using SAN must be set up to “know” they’re in DR-test mode so that only the cloned LUNs are available. If your server comes up in DR or DR test mode and the LUNs it expects to have aren’t available, your Linux admins likely won’t be happy to have to logon to the console of each server that dropped into maintenance mode to fix them.
The lack of management tools in the z/VM space is the biggest challenge, and keeping DR straight makes it that much worse. Mapping all the WWPN information to match up with FCP adapters and Linux servers is critical. Keeping track of allocations of what’s being used, what’s free, how many, how big, which FCPs, NPIV vs. non-NPIV, and tools to query or look-up all that information generally aren’t available. The underlying issue is the amount of data and maintaining the relationships for that data. WWPNs and each of the LUNS are 16 characters each. For NPIV, add the virtual FCP WWPNs mapped to two target WWPNs and two LUNs. Then multiply by three because of DR and DR test. For one simple server, that’s a lot of data. Now multiply by the number of servers you have with all their LUNs. This is the point where your eyes begin to cross and you reach for the aspirin.
So, what’s needed on the Linux guest itself for SAN? Actually, not that much! Usually you will have two FCP devices attached to each guest for redundancy. Then you will need the WWPN/LUN information of the target storage. For example, in Figure 1, this identifies one LUN and the virtual addresses of the FCPs of 100 and 200.