IT Management

Mainframe and distributed developers can learn a thing or two about communication and teamwork from dolphins.

Yes. Dolphins.

Whistles, clicks, yelps and trills are just random “animal” sounds to us, but for dolphins they have specific meanings such as, “Do you want to get some squid later?” or “Help, I’m in trouble!” Dolphins are team-oriented animals that communicate continually while hunting, socializing and traveling hundreds of miles in murky water.

The application delivery chain can be murky and also requires effective communication among its architects and caretakers because it’s so complex. Each time a consumer, for example, uses his or her mobile device or Web browser to transfer money from one account to another, they initiate a transaction, which travels a complex path composed of servers, middle-tier systems and messaging systems. It’s a path the application must travel to get to the mainframe to complete the transaction. 

Mainframe and distributed developers are largely responsible for creating and managing the performance of these applications. The distributed team develops, tests and maintains the software the customer sees every day while the mainframe team delivers the data to the distributed team to display to the customer. They can be required to work closely together to identify and resolve application performance issues, but because the teams are still largely siloed and don’t speak the same language, they’re sometimes at odds. Part of the reason is the two had distinctly different cultures that up until about 10 years ago rarely intersected. Today, the teams contend with business-critical applications that often span distributed system components as well as the mainframe, yet that divide continues.

Of course, there are also fundamental differences between their operating systems and the hardware platforms. Mainframers develop and administer a sophisticated platform that’s highly secure, scalable, reliable and performs terrifically under extreme conditions. Adding more capacity or upgrading the mainframe can be very expensive, so many mainframe groups are of the “conserve” mindset, spending time optimizing the applications that execute on the z/OS platform, focusing primarily on reducing CPU usage.

On the other hand, the distributed world is relatively new—and wide open. Distributed developers have access to a myriad of inexpensive resources and aren’t afraid to use them. This “consume” approach to application development often results in applications that don’t always use the mainframe efficiently, resulting in increased and often unnecessary CPU usage. For example, developers may make multiple calls to DB2 on z/OS to get data when one SQL or fewer SQL statements could gather the same data. More SQL statements generally means more CPU as well as increased response times. In addition, there are multiple stakeholders involved with and in charge of different domains. Solutions for security, recovery and reliability often come from multiple vendors and aren’t engineered to work well together. The chief concern of the performance teams is response time. Long response times for multitiered transactions are sometimes solved by adding another server to the configuration. This isn’t nearly as expensive as upgrading the mainframe, but over time, the costs can add up.

Here’s the rub. When these teams are required to work together to solve critical performance issues affecting multitiered applications, each team has its own tools. However, the tools don’t provide true fault domain isolation—and ultimately, root cause analysis—but rather show the problem isn’t in their portion of the application. The best and brightest of the organization can spend days sitting in on fruitless conference calls or in windowless conference rooms, trying only to prove the other person or team is at fault. 

To bridge this chasm, organizations need a new breed of Application Performance Management (APM) capabilities that provide a single version of the truth, with 24x7, end-to-end transaction monitoring that traces all transactions from all users, from the edge of the Internet to the mainframe and back. With a shared view of application performance across the enterprise, employees can work quickly and productively toward determining not only where the fault domain lies, but also what the root cause analysis is.

Borrowing a page from the dolphin playbook, communication and teamwork—coupled with the right tooling—are key to identifying and resolving performance problems quickly and painlessly.