Aug 1 ’06

The Development of z/OS System Integrity

by Editor in z/Journal

When was the last time you heard of a virus, worm, or other malware infecting z/OS? Chances are you’ve experienced far fewer security problems with System z machines than any other platform you’ve worked with. But you’ve probably also heard it said that if z/OS was as popular as Windows and exposed to the Internet, it would have just as many security issues. Those who make such statements would have you believe “Patch Tuesday” is just the price one pays for success.

While it’s true the Internet is a hostile environment that poses special challenges for system designers, the superior stability and security of z/OS isn’t marketing fluff; it’s real. Ethical hacking experiments have confirmed that z/OS is highly resistant to malicious attacks. Consider the unique technology provided by mainframes and how IBM’s developers used it to avoid the problems that plague other platforms. The secret lies in how z/OS interacts with System z hardware to provide system integrity.

Integrity vs. Security

System integrity differs from system security. Security is concerned with who gets a key to the lock. Integrity is concerned with the fact there’s no way of bypassing the lock. For IBM mainframes, system integrity became a major design goal in the early ’70s, long before the Internet was developed. In those days, hacking was confined to local terminals and a few dial-up connections, and viruses were being spread by floppy disks. Although mainframes were largely unaffected by all this, storm clouds were gathering on the horizon. Anticipating the hostile world to come, IBM’s developers abandoned their old approach of “let’s prevent honest mistakes and accidents” and adopted a new, far more ambitious goal of “let’s protect the system from hostile users and malicious attacks.” This represented an astonishing, unprecedented change in corporate culture.

Research was conducted to determine how operating system integrity exposures occurred. As W. S. McPhee reported in his landmark 1974 IBM System’s Journal article, “Operating System Integrity in OS/VS2,” the developers found they were able to classify exposures into these major categories:

The design practices and procedures that might result in one of these exposures were identified and eliminated. Then a massive effort was launched to find and fix all known problems. Can you appreciate the difficulties of that little word “all”? Some exposures were timing-dependent; others were so obscure their probability was small. The decision to spend time and money to correct something that had almost no chance of ever happening had to be difficult to make. Yet, the developers successfully argued that, unless all exposures were corrected, no one could be truly confident that today’s obscure exposure might not become tomorrow’s gaping hole.

An historic decision was made to fix everything IBM knew about, without regard to the probability of its occurrence. The word came down,

“Just do it … ”

The IBM Integrity Statement

Then the other shoe fell: IBM also agreed to try to fix anything anybody else could find! In Software Announcement P73-17 dated Feb. 1, 1973 for VS2 Release 2, IBM formally defined system integrity and committed the company to accepting Authorized Program Analysis Reports (APARs) for integrity exposures found by customers. The famous “IBM integrity statement” was, and still is to my knowledge, unprecedented in the industry. It also speaks volumes about the confidence IBM has in its people and products. What that means in 2006 is that just about every conceivable problem in core operating system functionality has been found and fixed. Of course, z/OS and the System z continue to evolve and change as new features are added, but 30 years of a “zero defects” system integrity policy have given developers an unusually robust foundation upon which to build new things.

We’ll take a detailed look at the IBM integrity statement and, since z/OS traces its lineage back though earlier incarnations as OS/VS2, XA, ESA, and OS/390 collectively referred to as MVS, we’ll use MVS as a generic term for functionality shared by all members of this family of operating systems. Finally, system integrity depends on System z hardware facilities such as storage protection keys, machine states and more, which have a heritage dating back to the earliest IBM mainframes. As we shall see, system integrity depends on the interaction of both hardware and software mechanisms.

The 1974 IBM integrity statement was updated by Software Announcement P81-174 dated Oct. 21, 1981: “System integrity is defined for MVS as the inability of any program not authorized by a mechanism under the customer’s control to:

The integrity statement is fairly straightforward.

Summer 2006

Supervisor state has been available since the earliest IBM mainframes. The operating system needs to execute the entire machine instruction set supported by the hardware to do its job, that’s “supervisor state.” But what about a COBOL application program? Does it need to be able to do every machine instruction in the set, including those able to shut down the machine? For its own safety and the safety of other programs, the answer is obviously “no.” So application programs run in “problem state.”

Problem state programs can execute most of the machine’s instructions. Supervisor state programs can use all of them. System z hardware switches back and forth from problem state to supervisor state, such as when application programs make Supervisor Calls (SVCs) for operating system services. All IBM mainframes have this multi-state architecture. Each Central Processing Unit (CPU) knows which state a program is in by testing bit 15 of a special hardware register called the Program Status Word (PSW). The PSW maintains the active program’s control information during execution. In a multi-programming environment, the PSW is saved when a program is interrupted and then restored when it regains the CPU. The privileged machine instructions MVS uses to set and modify critical parts of the PSW require supervisor state (see Figure 1).

 

This explains the language of the integrity statement. One of the powers authorized programs have is the ability to switch into and out of supervisor state. If an unauthorized program (i.e., one running in problem state) somehow figures out how to gain supervisor state, outside normal controls, that’s an integrity exposure. Only supervisor state programs can use all machine instructions, including those that manage control mechanisms and system data storage.

Storage Protection

Let’s focus on that last point. Think of system data storage as either internal (main memory) or external (disks, tapes, etc.). You can further describe main memory as being either real storage or virtual storage. IBM’s mainframes have had to deal with sharing main memory between multiple programs since the early days of the S/360—the Multiple Fixed Tasks (MFT) and Multiple Variable Tasks (MVT) operating systems—long before the invention of virtual storage. So mainframe hardware was designed with storage protection keys to divide real storage in a secure fashion among multiple programs. Each 2KB (now 4KB) block of storage was assigned one of 16 storage protection keys, numbered 0 through 15. These keys, along with other control information about the storage blocks, are accessible only by the hardware using special supervisor state instructions (see Figure 2).

Programs also have a storage protection key, which is stored in bits 8 through 11 of the PSW. During execution, the hardware checks every storage access a program makes. It compares the PSW storage protection key to the physical storage block’s key. If they match, both read and write operations are allowed. If they don’t match, write access is automatically blocked and further checking is performed for read. Since some storage locations contain sensitive data, the hardware checks the storage block’s fetch protection bit, to determine if read access is allowed. The result is that programs can freely modify their own storage, but can only look at other programs’ storage. Even that’s prohibited if the storage has been fetch protected.

The PSW storage protection key zero is considered the master key. So, in addition to supervisor state, another capability that an authorized program might have is access to storage protection key zero, allowing complete access to all real storage blocks. These powers are related because only programs authorized to execute in supervisor state may use the special machine instructions that set storage keys and control fetch protection.

These controls are possible because the hardware design embraces the critical “separation of function” principle. While a program might be in supervisor state, it cannot affect storage until its protection key is set as needed (the instructions require supervisor state); conversely, a program running with storage protection key zero cannot execute instructions requiring supervisor state unless it also has supervisor state. This explains why the integrity state- Supervisor state has been available since the earliest IBM mainframes. The operating system needs to execute the entire machine instruction set supported by the hardware to do its job, that’s “supervisor state.” But what about a COBOL application program? Does it need to be able to do every machine instruction in the set, including those able to shut down the machine? For its own safety and the safety of other programs, the answer is obviously “no.” So application programs run in “problem state.”

Problem state programs can execute most of the machine’s instructions. Supervisor state programs can use all of them. System z hardware switches back and forth from problem state to supervisor state, such as when application programs make Supervisor Calls (SVCs) for operating system services. All IBM mainframes have this multi-state architecture. Each Central Processing Unit (CPU) knows which state a program is in by testing bit 15 of a special hardware register called the Program Status Word (PSW). The PSW maintains the active program’s control information during execution. In a multi-programming environment, the PSW is saved when a program is interrupted and then restored when it regains the CPU. The privileged machine instructions MVS uses to set and modify critical parts of the PSW require supervisor state (see Figure 1).

This explains the language of the integrity statement. One of the powers authorized programs have is the ability to switch into and out of supervisor state. If an unauthorized program (i.e., one running in problem state) somehow figures out how to gain supervisor state, outside normal controls, that’s an integrity exposure. Only supervisor state programs can use all machine instructions, including those that manage control mechanisms and system data storage.

Storage Protection

Let’s focus on that last point. Think of system data storage as either internal (main memory) or external (disks, tapes, etc.). You can further describe main memory as being either real storage or virtual storage. IBM’s mainframes have had to deal with sharing main memory between multiple programs since the early days of the S/360—the Multiple Fixed Tasks (MFT) and Multiple Variable Tasks (MVT) operating systems—long before the invention of virtual storage. So mainframe hardware was designed with storage protection keys to divide real storage in a secure fashion among multiple programs. Each 2KB (now 4KB) block of storage was assigned one of 16 storage protection keys, numbered 0 through 15. These keys, along with other control information about the storage blocks, are accessible only by the hardware using special supervisor state instructions (see Figure 2).

Programs also have a storage protection key, which is stored in bits 8 through 11 of the PSW. During execution, the hardware checks every storage access a program makes. It compares the PSW storage protection key to the physical storage block’s key. If they match, both read and write operations are allowed. If they don’t match, write access is automatically blocked and further checking is performed for read. Since some storage locations contain sensitive data, the hardware checks the storage block’s fetch protection bit, to determine if read access is allowed. The result is that programs can freely modify their own storage, but can only look at other programs’ storage. Even that’s prohibited if the storage has been fetch protected.

The PSW storage protection key zero is considered the master key. So, in addition to supervisor state, another capability that an authorized program might have is access to storage protection key zero, allowing complete access to all real storage blocks. These powers are related because only programs authorized to execute in supervisor state may use the special machine instructions that set storage keys and control fetch protection.

These controls are possible because the hardware design embraces the critical “separation of function” principle. While a program might be in supervisor state, it cannot affect storage until its protection key is set as needed (the instructions require supervisor state); conversely, a program running with storage protection key zero cannot execute instructions requiring supervisor state unless it also has supervisor state. This explains why the integrity statement declares that non-authorized programs may not “disable or circumvent store or fetch protection.” If a non-authorized program somehow figured out how to do this or get key zero, it would be able to access and modify anything in real storage, a major violation of system integrity.

Virtual Storage

With the development of virtual storage, each program running in the computer was given its own unique address space. From the program’s perspective, it’s the only thing running in the computer besides the operating system. The MVS operating system is wellnamed. Under the covers, virtual storage hardware, such as Dynamic Address Translation (DAT), worked with MVS software to make a single mainframe’s real storage appear to be a collection of separate 16 MB virtual address spaces (an enormous amount of storage at a time when programs were only a few KB in size.) Now each virtual address space can approach 2 Exabytes (EB) in size when 64-bit addressing is in effect (see Figure 3).

 

Virtual storage works because only a small part of an executing program actually needs to be in memory at any one time. Other parts of a program can be brought into storage as they’re needed. Conceptually, a virtual storage address space is simply memory mapped into a disk file. The virtual storage address space is divided into pages and real storage is divided into page frames. Pages are loaded into frames as required and updated pages are written back out to the page data sets until needed again. Because each program runs in its own virtual address space, it’s no longer possible for application programs to address each other’s storage (see Figure 4).

 

The security and integrity implications of being able to isolate application programs from each other using virtual storage are tremendous. Virtual storage became the primary storage protection tool for application programs, allowing new flexibility in the use of the legacy storage key facilities in protecting real storage. Storage keys 0 through 7 were reserved for authorized system-level programs. All application programs using virtual storage were assigned key 8. Storage keys 9 through 15 were reserved for the rare applications that were unable to use virtual storage. Over time, other refinements were made to MVS, including additional protection for low memory and the Link Pack Area (LPA), special “semi-privileged” instructions, and unique treatment for storage key 9.

This gave MVS mainframes a double layer of storage protection, with control mechanisms for both virtual and real storage. The language of the integrity statement reflects this, declaring that a non-authorized program can’t “obtain control in an authorized state … with a protection key of less than eight (8).” If an application program was able to do this, it would be a reportable integrity exposure.

Understanding APF

Of course there are legitimate situations where you need a program to be authorized. Since MVS is already authorized, it can extend authorization to any program it chooses. Remember the “authorized by a mechanism under the customer’s control … ” language in the integrity statement? The Authorized Program Facility (APF) is the primary mechanism provided for this purpose. It allows authorization of system-level programs that need to modify or extend the basic functions of the operating system. Typical applications might include access control software, database managers, online transaction processors, scheduling systems, tape and DASD storage management, etc. APF serves as the gatekeeper. To gain job step APF authorization, a program must meet two criteria:

Only when a program meets both criteria will it be allowed to execute the MODESET SVC to obtain supervisor state and storage key zero from MVS. But there’s another way to get a system storage key without having to use MODESET. The MVS Program Properties Table (PPT) can be used to specify operational characteristics of specific APF programs, including their storage protection key. The PPT can make a program non-cancelable, and can even exempt it from RACF data set control. The PPT must be carefully controlled to preserve overall system integrity. APF also determines which services authorized programs may perform on behalf of non-authorized programs. Some powerful services are restricted to being requested only by authorized programs. This is important since system integrity is lost if non-authorized programs can manipulate authorized programs into performing malicious actions. MVS can use the TESTAUTH SVC to identify APF authorized callers. So MVS can readily distinguish between system-level and application programs. The critical role of APF is confirmed by the integrity statement, which says non-authorized programs cannot make themselves APF authorized. As with the other control mechanisms we’ve discussed, if a program found a way to do this, we’d have a system integrity exposure that would require an APAR.

Managing APF

The management of APF is central to maintaining z/OS integrity. Let’s start with the linkage editor SETCODE control statement. SETCODE has an important role to play. Suppose APF program “A” performs checks to determine why it’s being run. Satisfied, it calls program “B,” which calls program “C.” How can we prevent a malicious user from bypassing the checks in “A” by directly calling “B” or “C”? MVS lets us enforce our calling sequence by marking “A” with SETCODE AC (1) and “B” and “C” with SETCODE AC (0).

To ensure we actually follow through on this, MVS requires the initial program of an APF authorized job step be marked AC (1) and then monitors all subsequent program loads to make certain they come only from APF libraries. Any attempt to load programs from a non-APF library results in a system 306 ABEND.

The most important step in managing APF is protecting its program libraries from unauthorized use or modification. Once we’ve identified all the libraries, it’s a straightforward exercise to protect them using our security software. But determining the APF libraries can be complicated. The place to start is the System Parameter Library (SYS1.PARMLIB) concatenation. Although IBM supplies extensive documentation for Parmlib, there are specialized publications that can help us sort out the security implications of Parmlib’s myriad options and parameters. Two of my favorites are OS/390- z/OS Security, Audit and Control Features by Peter Thingsted and A Guide to SYS1.PARMLIB by Mark Hahn. Both are available through ISACA (see www. isaca.org for details).

The APF environment may be defined as static or dynamic and if console operator commands are allowed, a careful review of SYSLOG will be necessary to assess their impact. Fortunately, MVS provides several methods, such as System Authorization Facility (SAF) facility-class profiles, to control operators. Determining exact APF library usage can be challenging because libraries can be both explicitly and implicitly defined. For example, certain libraries, such as SYS1.SVCLIB are always authorized, while others such as the SYS1.LPALIB concatenation are authorized only under certain conditions. Still others, such as the SYS1.LINKLIB concatenation, may or may not be authorized, depending on option choices.

Authorization also may be specified via LNKLSTxx, PROGxx, IEAAPFxx, LPALSTxx, IEALPAxx, and IEAFIXxx. The SCHEDxx member of Parmlib allows installation additions of APF programs to the PPT.

All these choices provide important tools and valuable flexibility for securely integrating system-level software into the MVS base. But it’s probably obvious why people have written books about this subject and why we must defer getting any deeper into Parmlib for another time. Although APF is the system’s primary authorization mechanism, there are others. The system protects the Pageable Link Pack Area (PLPA) from update, and controls access to the Prefixed Save Area (PSA) in low memory. To dig deeper into the technical details of Service Request Blocks (SRB), Dispatchable Unit Access Lists (DUAL), as well as a contemporary discussion of integrity exposures, refer to the “Protecting the System” chapter in IBM’s MVS Programming: Authorized Assembler Services Guide (SA22-7608).

Supporting Infrastructure

Looking beyond the mechanisms singled out by the integrity statement, we can see how legacy design decisions have both directly and indirectly contributed to the robustness of the mainframe environment. For example, MVS used a modular design with independent subsystems in separate address spaces communicating with each other through formally defined interfaces. This kind of architecture is reminiscent of the new “micro kernel” architectures that are currently all the rage. The mainframe had it decades ago. An important factor in MVS success is the System Modification Program (SMP/E). It has provided automated program maintenance for MVS system software for decades and has undoubtedly contributed to system integrity. SMP/E makes certain that Program Temporary Fixes (PTF) prerequisites and other dependencies are met, and allows the testing and rollback of fixes. Non-mainframe vendors are now realizing the value of change control to system stability.

Other important contributions are provided by System Management Facility (SMF) and console operator (SYSLOG) journals, tools for performance monitoring, tuning, and access control software such as IBM’s RACF, CA’s eTrust CA-ACF2 and eTrust CA-Top Secret, or JME Software’s DEADBOLT.

In keeping with the modular design of MVS, these security products all use the SAF to interface with the operating system. They add extensive authentication and authorization services to control user’s access to system resources (see Figure 5).

 

Security’s Role

The integrity statement also says the protection of assets by OS passwords and RACF is part of MVS integrity. We must use our security software to control access to the operating system’s data files and program libraries. In turn, MVS must provide a solid foundation upon which to build security. External security managers depend on this. For mainframes, the interrelationship of security and system integrity had been recognized by developers such as Barry Schrager (ACF2) and Eldon Worley (RACF), since the early ’70s. After decades of refinement, the System z security environment is feature rich and second to none with numerous options and parameters. While the complexity can be intimidating, all those choices provide enough flexibility and granularity to address almost any situation. Users aren’t pushed into the “one size fits all” environment of some platforms, where too many things wind up needing powerful “root” access. Many resources are available to help us achieve effective MVS security. Also, unlike other platforms, today’s mainframe user has a choice of multiple access control software systems.

Summary

While security is concerned with controlling who gets the key to the lock, integrity is concerned with the correct functioning of the lock. Because MVS was designed and optimized for a specific hardware platform, it can do far more than so-called “platform agnostic” operating systems. In a mainframe environment the operating system works with the hardware to ensure data is protected from both accidental and malicious modification and that only authorized programs are able to manipulate system resources. The mainframe’s careful attention to system integrity provides a solid foundation for building effective system security. It’s reassuring to note that today’s System z mainframe and its z/OS operating system have had so much protection built into them, from their very beginnings, 42 years ago. Now if we could only figure out how to transfer some of the hard lessons learned to the people living in the “Patch Tuesday” world …