Aug 8 ’11

Tips for a Successful Linux on System z Proof of Concept

by S. Michael Benson, Bryan Tanoue in z/Journal

Too many Linux on System z projects fail for preventable reasons. This article discusses some of those reasons and provides tips on actual Proof of Concept (POC) implementation. We will focus on POC, since most businesses prefer to minimize risk by testing a new technology before completely funding projects. We will explore executive sponsorship, application selection, project scope, cross-organization communication, and crisp success criteria; many of these are general best practices for executing any POC. 

A key ingredient for a successful POC is identifying and engaging an executive sponsor. Since a Linux on System z project touches so many different organizations, you need a strong sponsor who can help navigate potential political pitfalls. Cultural ideology from both the mainframe and distributed camps can easily derail a project, so it should be driven from the top down. Many projects suffer from a lack of strong executive support.

Which Application?

Selecting a target application is another critical component of a successful POC, and there are many factors to consider. Picking the wrong application could give Linux on System z a bad reputation and overshadow any benefits. IBM and several business partners have experienced application assessment teams that can help you evaluate candidates and choose one that minimizes the risk of failure. Considerations include:

• Select an application that already has strong mainframe content. Many successful candidate applications have their data on z/OS. There are obvious benefits to reducing the latency of data access by removing physical network hops through virtual network technology such as HiperSockets. Obviously, there are situations, such as in server consolidation, where there’s no existing mainframe data. Server consolidation is an excellent reason to consider deploying Linux on System z, but it can be hard to demonstrate that value in a first project.
• Keep it simple. With higher complexity comes a higher risk of failure. Fewer components in the workload will make it easier to install and configure. Don’t select an unfamiliar application that isn’t well-supported; familiarity and support will be needed if problems arise. Typically, problem resolution is faster in a simple environment where the only new technology is Linux on System z.
• Play it safe. If something goes wrong with this project, you want a second chance. However, if the application is business-critical, you may not get one. Avoid your core business applications; focus on some ancillary process that has lower volumes and lower resource requirements. Avoid brand new applications or new versions of existing applications; you don’t want to be debugging the application at the same time you’re trying to validate the platform.
• Ensure the target application uses technology supported on Linux on System z. Be aware of all potential vendor software, including IBM software, and ensure the correct levels are supported.

Processor and Memory Sizing

There may be more than one application that seems to be a good fit for demonstrating the viability and value of Linux on System z. A good next step is to size the resource requirements for each candidate application. Determine how many Integrated Facility for Linux (IFL) processors are needed and how much memory is needed. Even though many larger companies have capacity planning organizations, this environment is new and may require some help from IBM or a business partner to assess the needs.

You should understand exactly how you plan to test the application to know what system resources are needed. If you plan on doing any stress or performance testing, the environment must be the appropriate size. You can always remove resources to see if fewer are needed, but it can be tough to scramble for additional IFLs or memory once the application is running poorly in the middle of testing. It’s better to demonstrate the feasibility of Linux on System z with smaller environments; if you can select an application that needs fewer system resources, you can improve your chances of success.

For IFL utilization, IBM Techline is a great resource, providing tools to take the exact server model and configuration, along with CPU utilization from the current production environment, and accurately indicate how many IFLs you need to run the distributed workload. IBM Techline can also graph multiple servers together to view the consolidated data. This can be insightful, as frequently, the total number of IFLs can be decreased. That’s because the more workloads you stack, the lower the peak-to-average ratio becomes. With less variability, you can raise the average utilization by lowering the number of processors.

You must also address memory utilization. Often, a distributed platform will lack a robust I/O subsystem. Adding memory is the only alternative to compensate for the lack of a well-architected I/O subsystem. The easy answer is to throw additional hardware at a performance problem. Often, a team running a distributed application requests a guest that needs 8GB of memory because that’s what they have in distributed environments. Without analyzing the actual memory footprint, the guest will be oversized. Linux likes to hold on to memory. If it only needs 1GB but gets an 8GB allocation, it will cache its I/O in the other 7GB. This is a waste of memory from a mainframe perspective. It may also skew a Total Cost of Ownership (TCO) analysis because the cost of running the application on the mainframe may increase due to the additional  resources. If you add multiple guests, the numbers become even higher.

Figure 1 shows a common command you can run from a UNIX/Linux server to see memory usage. Quick and easy, it takes little effort, but isn’t sophisticated, so Oracle and WebSphere middleware may need additional memory sizing considerations. In this example, the server is basically idle, and no middleware is installed. It has 1GB of dedicated memory. The Mem: line shows the server has used 233MB and has 763MB free memory.

The -/+ buffers/cache line subtracts the amount of memory considered buffered I/O. The mainframe, with an excellent I/O subsystem, doesn’t need to cache I/O. Under the -/+ buffers/cache line, only 37MB is used. This could be a guest sized at 128MB instead of 1GB. Memory profiling is different for each application. No “rule of thumb” applies because each application must be understood on an individual basis.

Project Planning

Once an application has been selected, create a project plan to scope the project, outline steps and responsibilities, and identify success criteria. Assigning a formal project manager will help ensure these details are addressed. Typically, a project plan should include hardware and software planning, installation and configuration, and identifying specific test cases to be assessed. While it may be important to demonstrate scalability and performance, these tests can add significant cost and risk. Defining test cases and their associated success criteria must be done carefully.

Conduct team meetings regularly to communicate project status; they’re an important way of monitoring progress and solving problems before they need to be escalated to the executive level. The most successful project teams meet at least weekly—sometimes daily during the critical phases. These meetings help solve problems and foster teamwork. Relationships developed in the early validation phase can carry over into future projects.

To prevent scope creep, adopt and stick to a clearly documented project plan. Some projects start out as a simple test of a specific application in the Linux on System z environment, but then grow as new applications or new test cases are added. That isn’t always bad, but you should know and understand the effect on the project plan and required resources.

Consider the case of a client that defined a POC to test the IBM WebSphere Application Server in the Linux on System z environment. The application used DB2 data from their z/OS system, so it was a good candidate, given the proximity of the data. The test was successfully concluded ahead of schedule, so the client decided to add two more environments that were much more complex. This scope creep caused a problem because they didn’t think through the impact to the overall schedule or define the additional success criteria needed. The POC budget was exceeded and the project had a poor reputation even though the original test environment worked well.

Terminology is another key element. The mainframe organization uses different terminology and acronyms than distributed systems organizations. For example, “storage” can mean different things to different teams. Operational diagrams of the POC environment that are clearly labeled can help alleviate this problem. A common glossary can be distributed to all team members.

Success Criteria Definition

Identifying and documenting success criteria is critical. Too many projects fail because they have vague or non-existent success criteria or they didn’t determine how to measure test results. With documented, clearly measurable success criteria, you can avoid inconclusive results. Possible examples of success criteria include:

• Cost savings of software by using fewer IFLs
• Response time reduction
• Throughput measurement improvement
• Memory over-commit ratio greater than 1:1
• Network latency reduction due to virtual networking
• CPU utilization reduction.

Environment Setup

Once the project plan is completed, you can set up the POC environment. IBM, or other business partners, can help with this step and with skills transfer. There are many IBM Redbooks, cookbooks, and other documents that provide step-by-step instructions. Getting a skilled resource to help you the first time will boost your chance of success. If you go it alone, ensure you have the latest versions of these publications; there have been many changes over the last 10 years. Sending your staff to a few Linux on System z training sessions is another way to help minimize risk.

Configuration of z/VM for a Linux on System z Environment
 
Whether you’ve chosen Novell SLES or Red Hat’s RHEL as your Linux distribution, there are common z/VM installation and configuration steps to follow. Start by downloading the latest version of the Guide for Automated Installation and Service for z/VM 6.1 (see http://publibz.boulder.ibm.com/epubs/pdf/hcsk2c00.pdf). Before starting, thoroughly read this excellent reference material.

For z/VM on z10 and later hardware, the best installation approach is to use a DVD. Chapter 4 of the guide contains helpful planning worksheets with answers to common installation questions. Completing these worksheets in advance will save time. Typically, separate teams handle disk space allocation, security, networking, and other areas that need planning. For example, z/VM will require its own IP address for a 3270 session if you require the ability to log in from a workstation, so the network team will need to be involved.

Chapter 5 of the guide addresses z/VM installation. Each z/VM environment is different. This article assumes you want to install a simple z/VM system per the referenced guide. It also assumes your network is running and you can navigate a 3270 session in z/VM and z/VM commands.

Tips and Tricks

Here are some recommended changes to the SYSTEM CONFIG file:

Adding paging volume(s) to z/VM. This is a good practice. Having adequate system resources is essential. Paging volumes are just as important as central and expanded storage. z/VM will make intelligent decisions on what memory pages it can move to expanded storage and then to disk. You should also keep paging devices in z/VM uniform in size. If you allocated 3390-3s, continue to add paging devices of that size. If you mix and match devices, such as 3390-3s and 3390-9s, it can create performance issues. z/VM will page to disk in a round-robin fashion, but as the smaller 3390-3 fills up, it will drop out of the dispatcher/scheduler algorithms. Paging volumes should be spread out on different ranks, drives, and spindles to maximize parallelism.

Updating the user_volume_include statement. z/VM uses the user_volume_include statement to attach disk devices to the system at Initial Program Load (IPL) based on the volume identifier (partition name in Linux terms). The volume identifier is a volume serial number that’s up to six characters. As the Linux on System z environment grows, adding disk incrementally can be tedious. By using this statement with a wildcard, new disks will be automatically attached and available for the Linux guests at IPL.

Editing the features statement. Here are recommended FEATURES statement changes (see Figure 2):

• Disconnect_timeout off keeps the Linux guest running and won’t shut down the guest if you disconnect the console. Without this set, the guest will be automatically shut down on a disconnect. A continuously active guest will never be forced off the system.
• VDISK is recommended for a Linux swap device. You should provide Linux a tiered SWAP subsystem where the first device is a small (less than 512MB) VDISK in memory. If that isn’t enough, back it up with a dedicated disk volume or minidisk of appropriate size.
• Syslim and userlim are set to infinite to allow flexibility of VDISK size for each guest. Some Linux guests might only need 512MB of SWAP; others might need more.

Common Linux Default Profile

A common default profile in the USER DIRECT file will greatly simplify administration and is less error-prone than individually modifying the basic parameters for each Linux guest (see Figure 3). The MACH ESA statement will limit the number of virtual CPUs you can define for a particular virtual machine. To install Linux, you will need the READER, PUNCH, SPOOL, and other devices.

Shared Linux Files

You should have all the kernels, initrd, and parm files on one guest that can be shared in read-only mode with the other guests you need to create. That way, you only have to use File Transfer Protocol (FTP) once to send the files to the shared guest instead of every time you need to create a new guest. Once that guest is created, the files are shared with the other Linux guests via link statements in the guest’s PROFILE EXEC. LNXADMIN is a good name to choose for this shared guest; LNXADM is a good name for the volume identifier in the MDISK statement.
 
Linux Profile Exec

After the LNXADMIN guest is set up and the 191 minidisk is formatted, create a profile exec as shown in Figure 4. The CP LINK statement is assigning its 191 as 391. The other Linux guests that are created will have the kernel, parm, and initrd available to them as 391 and accessed as B after this profile exec is copied to the new Linux guest’s 191 disk.

Disk Numbering Schemes

The Linux Guest definition in Figure 5 provides a template you can use to manage guest creation. The 191 disk will contain a copy of the kernel, initrd, and parm files from the LNXADMIN guest as shown in Figure 6. The parm and other files can then be modified on the 191 disk to customize this guest.

Standardizing on a minidisk numbering scheme makes it easier to pick out the contents of each minidisk if something needs changes. Consider an example:

• 700s for the Linux base installation
• 800s for any type of middleware
• 900s for SWAPs and VDISK.

Running USER DISKMAP

Always run USER DISKMAP on any change before DIRECTXA is run to ensure there are no overlaps in guest definitions.

Validate Success Criteria

Once the z/VM and Linux on System z test Logical Partition (LPAR) is set up and the application with any required data is installed, you can begin to validate the success criteria, which is how you prove the success or failure of the POC. Depending on the criteria chosen, you may have to gather several different types of key metric data. Don’t try to present all the minute details of the data that you gather, as it can quickly become cumbersome. Include a few charts that net out the results and provide them in an executive overview.

Besides gathering run-time data, a TCO analysis is usually required. This should include hardware and software acquisition costs, ongoing maintenance costs, network equipment costs, environmentals such as power and cooling, and staffing costs to engineer and operate the environments. Don’t forget to include environments other than production in your analysis, since you will have costs for development, testing, and quality assurance servers.

IBM can help with TCO analysis; it has done these with many customers that were contemplating a migration to Linux on System z. IBM has developed several tools to help quantify and assess a possible migration; some of the assessments are available free.

More Information

To learn more, see these resources:

• z/VM: Guide for Automated Installation and Service,  http://publibz.boulder.ibm.com/epubs/pdf/hcsk2c00.pdf
• RHEL 6.0 (February 2011) z/VM and Linux on IBM System z: The Virtualization Cookbook for Red Hat Enterprise Linux 6.0,  www.redbooks.ibm.com/Redbooks.nsf/RedbookAbstracts/sg247932.html?Open
• SLES 11 SP1 (February 2011) z/VM and Linux on IBM System z: The Virtualization Cookbook for SLES SP1, www.redbooks.ibm.com/Redbooks.nsf/RedbookAbstracts/sg247931.html?Open
• Experiences With Oracle Solutions on Linux for IBM System z,
www.redbooks.ibm.com/abstracts/sg247634.html?Open
• Architecting z/VM and Linux for WebSphere V7 on System z, http://www-03.ibm.com/support/techdocs/atsmastr.nsf/WebIndex/WP101803.

Acknowledgement: Special thanks to Gary Loll (z/VM and Linux on System z Implementation Specialist, IBM Jamestown, NY) and Jon F. von Wolfersdorf (Certified Advanced Technical Support [ATS], Americas, IBM Endicott, NY) for their contributions to this article.