IT Management

Imagine you’re the manager of a complex factory production line. The items in manufacturing travel through a complex web of stations and processes. What happens when the line slows? If you don’t correct the problem quickly, the resulting delays in product deliveries could have serious consequences.

The first step in fixing a problem is finding the root cause. When you have a fragmented view of the production line, you can’t trace and monitor the progress of the items as they pass through each phase of the manufacturing process, and you can’t quickly determine where the delay lies. To find the problem, you call an emergency meeting of line supervisors. This “all hands on deck” approach lengthens the time required to determine what caused the problem, takes key personnel from their “real” jobs, and exposes the business to risk.

Fortunately, the production managers in modern factories usually have full visibility across their entire manufacturing lines. They can quickly find problems and correct them before they affect the business. Complex, multi-tier business applications operate like factory production lines. Applications pass transactions through multiple components that run on multiple IT infrastructure resources such as mainframes, distributed servers, and middleware. Consequently, IT managers who are responsible for application performance are like factory production managers. In many IT organizations, these managers are operating with only siloed views of the “production lines” of their applications; they don’t have the ability to trace transactions as they traverse the application production line end-to-end.

When performance problems occur, IT managers can’t quickly determine which component is at fault. They may call all-hands meetings, which lengthen the time needed to pinpoint the root cause of the problem, thereby increasing Mean Time to Repair (MTTR). These meetings pull highly skilled technicians from many different areas away from managing their processes, because no one knows where the problem lies. Is it a database problem or transaction logic error? Is it on the mainframe or another platform? Once you detect a problem, you may not be able to quickly see which of the many application components is causing it. This determination becomes even more difficult if a mainframe is involved because the transaction may pass through multiple subsystems, such as IMS, CICS, and DB2. In addition to isolating the problem to the mainframe, you must also determine which subsystem on the mainframe is causing the delay.

IT organizations report that identifying which application component is at fault consumes far more than 50 percent—and some report up to 90 percent!—of the total time to repair. The longer it takes to resolve the problem, the more serious the consequences can be. In financial institutions, for example, each hour of downtime can cost hundreds of thousands, if not millions, of dollars.

Find It, Fix It, Forget It

The key to reducing the exposure to business risk caused by application performance degradation is to reduce the time it takes to identify the component causing the slowdown. Quickly identifying the faulty component speeds triage, so you can get the right people on the problem quickly. You need to be able to trace transactions end-to-end, through every hop, as they move through the application components, including application servers, Web servers, mainframes and middleware, on all the platforms where the application runs. This framework must provide real-time monitoring and tracking of transactions across all application components and correlate the information to provide a complete, end-to-end view of each transaction. An ideal framework provides early warning of problems so they can be addressed before they affect service quality. You need to be able to track transactions in real-time and have historical data so you can prevent problems from recurring.

Keep your transaction production line running with effective transaction management solutions that enable you to see transactions at an aggregate level as well as at an individual component level. You can see the aggregate transaction processing time as well as the processing time of each component. This approach lets you quickly determine where transactions are being held up so that you can fix them and maintain service level agreements.

April Hickel is the product manager for middleware and transaction management solutions at BMC Software. She was formerly the vice president of Product Management for MQSoftware, Inc. She  is an active member in the Minneapolis community and holds positions in several civic organizations.