May 4 ’12

The Path to Cost-Effective, High-Quality Service: Automating Linux Management on System z

by Brian Jagos in z/Journal

Over the past 11 years, the Linux operating system has gradually taken the business world by storm. Now, many corporations use Linux on mainframes to save money and improve quality of service. To maximize these important benefits, the IT staff is being tasked with ensuring that Linux management is set up in the most efficient, cost-effective way. This article addresses how IT can help accomplish this goal by automating and simplifying common z/VM and Linux on System z tasks.

Automate Critical Management Tasks

Managing console logs is a critical task, one that can be complicated when multiple systems are supported. Specific tasks that need to be addressed include managing daily log content and messages, organizing stored information, moving older logs to tape, avoiding system problems and downtime, moving workloads from one system to the other, reducing maintenance of Linux guests, and producing valuable management reports. Automation can help with all these and even facilitate more efficient security.

Managing Daily Logs

The IT team faces several issues in keeping daily logs for Linux and z/VM. They need to keep a daily log of all z/VM system messages, commands entered, response messages, and return codes. They must also track all Linux console and system log messages for the Linux servers. They must keep logs organized, yet prevent them from filling up spool or disk space. They also must be able to easily review the log online for whatever day and time they choose, and store older logs on tape or disk for as long as required for audit and regulatory requirements.

Automation Console Management

The best way to handle all the console management requirements is to automate them. Letting automation maintain a log on disk of all the messages it processes as well as all commands entered with response messages and return codes helps eliminate audit worries. It also helps automate all the Linux console and system log messages in the same log.

How can you make it easy to identify if a problem on one Linux z/VM guest is caused by something that happened on another? By accessing this information through your company’s standard reviewing process, you can help solve the problem quickly and reduce system downtime. By automating closure of the daily log and using the date as the filename, you can set up intuitive, easy access to any day’s log.

Keep as many days of online logs for viewing as you wish for fast, easy access to historical data. You can use automation to automatically delete the oldest file when it needs more space for the current log. You can use automation to backup the z/VM and Linux logs for limited retention, which will help stratify some of the government or industry rules. There’s no need for operator intervention!

Automation Is the Data Center

Another issue IT must tackle is the need to automate more of their data center operation. Users can no longer wait on an operator to respond to a system message to fix a problem. By the time the message gets to the operator screen, it’s already too late. Business is moving too fast to wait for slow response, so just like on the z/OS side of the house, many organizations are relying on automation across their data centers.

Today, IT staff needs to automate all repetitive tasks. Also, companies can’t risk the possibility that a task won’t run; the consequences could cost millions of dollars. Customers will go to the competition for the products they want instead of waiting for your company to handle their requests because of problems with your systems. Your company might even face legal, non-compliance issues, depending on the business you’re in.

Downtime Isn’t an Option

Today, no one can afford too much downtime from their z/VM or Linux on System z guests. This is why an important part of automation is the ability to check the system to see if it’s operating properly. Through automation, one of your z/VM systems can quickly see if there’s a problem with another z/VM system or Linux on System z guest. If there’s an issue, automation can be set to bring up another system as long as one is available. It can also automatically generate a notification to a person on your team, while sending a message to a service desk-type product. While automation is performing that task, it can also be working on fixing the issue with the failing system or guest.

Or, in a failing situation where the z/VM system or Linux on System z guest is coming down, automation can take the workload from the failing z/VM system or Linux on System z guest and move it to another system. This should help you experience as little downtime as possible. You might have heard this process referred to as failover. In the old days, the message would go to the operator console, then an operator would try to determine the problem. If the operator couldn’t solve the issue, they would contact a systems programmer for help.

While all this occurred, the z/VM system or Linux on System z guests would be failing. Ultimately, the systems programmer might have to bring up another z/VM system or Linux on System z guest. This would take time and could involve a lot of people. The downtime cost companies time and money, and in some cases, damaged their reputation with customers.

More Efficient Backup and Restore

One of the most important repetitive tasks that can be automated is backup and restore of your company’s z/VM system or Linux on System z guests. How many companies today have a group or groups of people who spend hours a day/week/month administrating backups? This wastes the company’s most valuable resource—its people, who could be used elsewhere, and it can cost an exorbitant amount of time and money. By automating your company’s process, you can help prevent costly mistakes.

Also, when backup and restore processes complete, automation can create reports and send them to the appropriate people. This will give you an audit of the process, which can be stored for future use. If the process runs into a problem because of automation, the correct people can be notified. While that’s happening, the problem might be able to be fixed and the process restarted. This will save time and effort and help ensure the company is safe if there’s ever a major problem with z/VM or the Linux on System z guests.

Keeping Performance Levels High

Everyone knows that if your company’s z/VM system or Linux on System z guests aren’t performing at their best, it could create problems such as customers going to the competition or missing Service-Level Agreements (SLAs) with internal customers.

How can automation help? Even before IBM’s Single System Image (SSI), companies could easily move workload from one z/VM system to another. So if your company’s performance monitoring tool detects that your system is bogged down, and another system has spare cycles, automation can detect this and automatically move the workload from the busy system to the system with spare cycles. This resource provisioning helps improve performance without wasting an operator or systems programmer’s time. It accelerates operations, making the company more responsive and competitive. 

Improving Security

Automation can also help improve security. What if an unauthorized user tries to sign-on to the z/VM system or a Linux on System z guest and fails? If you have security in place, this could trigger a message to the system log, and automation can pick up on this potential intrusion and act on it by sending an email to the correct people. It can also create a report and, even in a worst-case scenario, shut down the guest that was compromised. You don’t have to waste resources monitoring for such breaches.

Summary

Automation of key Linux management tasks helps you deliver high-quality service and optimize the use of system resources with minimal time and effort. This frees staff up to work on projects that will help the company keep on the leading edge and gain a competitive advantage.