As our systems have evolved, so, too, has our management focus. Beginning with managing a single system resource, to tightly integrated systems and their resources, to today’s focus on service management, this evolution over the past 50 years has built on fundamentals refined over this same period. Many of us have been working with mainframes for so long, there’s a lot we take for granted. One area we often overlook in our transfer of knowledge to new System z systems programmers is the fundamentals of z/OS TCP/IP network management. As a resource, it’s critical to the overall ability of the z/OS operating environment to provide services to the applications that run the business. This article will assist in filling this gap.
Resource Management Background
There are many ways to describe the fundamentals of managing networking systems and the acronym FCAPS is one of the most often used. This was developed back in the ’80s by the Open Systems Interconnection (OSI) as a way to discuss network management. Other ways to describe these functions are also defined using frameworks such as the enhanced Telecom Operation Map (eTOM), IT Service Management (ITSM), etc. FCAPS is old, but is a simple and easy to understand way to describe the basic elements:
• Fault: Is the system or some of its components broken?
• Configuration: How is the system configured?
• Availability: Is the system available?
• Performance: How is the system and its components responding to requests?
• Security: Is the system immune to breaches, loss of data, prying eyes and illegal use?
The overall process is cyclical with five key stages:
• The Monitor step collects the metrics.
• The Analyze step is performed on the collected metrics to determine baseline details, identify problems and trends and determine the overall health.
• The Diagnose step takes the analyzed data along with other details and determines the root cause of issues occurring.
• The Remediate step is taking action to eliminate any issues, which can include configuration changes, hardware changes and software updates.
• The Report step provides graphical or tabular reports on the gathered metrics to provide business insight to a wide range of business functions.
The goal is to optimize these elements, leading to a system that achieves the business-defined service levels, optimizes the utilization of resources, is highly available and performs at peak levels. Many elements within the system need to be monitored to get a complete picture; this commonly includes the z/OS operating system, TCP/IP protocol, CICS and DB2.
Within the z/OS operating system, the z/OS Communication Server provides network communication for all z/OS subsystems. It provides full support for TCP/IP networking as well as legacy SNA networking utilizing Enterprise Extender technology. It provides the link between the network hardware attachment and the applications running under subsystems such as CICS, IMS, DB2 and WebSphere.
When looking at the root cause of problems with computer systems, the majority of problems are due to network and system problems ranging from application to subsystem to protocol. These are more difficult to identify and remediate.
The broad list of problems necessitates having monitoring software covering many elements within your system. This is the reason your business has so many monitors, each looking at various components and reporting on both the availability and performance of a myriad of metrics. Before we look at how these monitors work, let’s briefly review the TCP/IP protocol, since this is the focus of our discussion.
The TCP/IP Protocol
The Internet Protocol (IP) came into existence in the late ’60s. The U.S. government realized that computer systems were hierarchical in nature with a central point of control. A great deal of configuration was required at these central points in order for the network to work. If any problems existed at the central control area or if the central control point was down, the entire network was down. Another major problem occurred when trying to interconnect two networks run by different organizations. The coordination was so complex that few organizations would attempt to connect networks. Since all aspects of our organizational lives were moving toward a reliance on computers, the single point of failure represented by computer systems of the day was a problem. The government, concerned about these problems during a war situation, funded a project to develop a computer network that wasn’t reliant on a given operating system, hardware structure or single vendor and that could run all manner of applications: terminal emulation, peer-peer, client/server, etc. By the way, it also needed to work in both the wide area network (WAN) and the local area network (LAN) and configuration needed to be minimal. Out of this work emerged IP.
IP is very simple in structure, which allows it to adapt to changes in computing needs. The basic structure is a four-layer architecture: