Aug 16 ’10
Performance Management Essentials to IT Success
The Airline Tariff Publishing Company (ATPCO)—the Washington, DC-based central clearinghouse and publisher of airline fares and fare-related data—has been focusing on controlling costs to ensure customer satisfaction. The technology company’s IT division has focused on improving system performance, which reduces costs and frees up resources for new products and services.
ATPCO processes fare data from more than 500 airlines and sends that data to global distribution systems, computer reservations systems, and other travel-related companies around the world. ATPCO processes three times as many transactions as it did 10 years ago. In recent years, the company also has expanded the depth of its data, grew its customer base and broadened its distribution. However, many of ATPCO’s product applications are more than 10 years old and no longer meet all customer requirements.
ATPCO is leveraging its existing z/OS technology and infrastructure while rolling out new Web-based products to meet customer needs. Essential to these efforts is avoiding unnecessary hardware acquisition costs and reducing software maintenance costs. The company determined that implementing effective performance management was one of its best options for improving system performance.
At ATPCO, performance management encompasses more than software; it involves multiple organizational units, including the:
- Production support team comprised of experts on the company’s products and procedures
- DBA and systems support teams responsible for maintaining databases and the system environment
- Business team, which interacts with the worldwide user community
- Performance group, which tracks and communicates performance issues, and identifies and responds to situations that could jeopardize performance objectives.
The two-member performance team includes a system performance specialist and capacity planner, who tracks performance and does forecasting, and an application specialist, who proactively develops solutions. The system performance specialist monitors and manages system resources to meet business needs and maintain Service-Level Agreements (SLAs). Workload Manager (WLM) policies, which distribute resources to high-priority jobs so they can perform at the expected level, support this process. Since WLM doesn’t understand or resolve the reason why some jobs require more resources, the application specialist is responsible for finding ways to reduce resource consumption.
Augmenting the performance of existing architecture is challenging, perhaps even more so than designing new applications, yet challenges create opportunities that can lead to innovative ideas. Here’s how we handled those challenges from an application perspective.
We identified critical processes based on business value and visibility and targeted those that would secure the most cost savings. The nature of our business makes the load on our system highly volatile, especially since we can’t control the daily activity. Our goal is good performance with any reasonable load factor. To achieve this, we identified bottlenecks in the critical paths of strategic and business processes and located specific areas for code optimization.
We looked at scalability, focusing on applications that perform well at low load factors but crawl at high load factors, and vice versa. We optimized code for different load factors and created unique paths and implemented self-tuning applications with intelligence built into control sections to choose the most optimal path.
We profiled performance of processes to understand the flow of logic and behavior of functions. Tools help somewhat in gathering information, but getting actionable intelligence with manual processes (such as traversing through application logic to construct signal flow diagrams) is hard manual work. Understanding business functionality is extremely important, and it’s best not to disturb the business function even if it’s not understood initially. We looked for duplication of business functionality and extraneous function calls. We found ways to simplify the logic flow.
Throughout the process, we had to remember that the compiler or the environment may have inserted “invisible” functions or hidden code into the source; this meant we had to look beyond the source code. For example, while evaluating the high Task Control Block (TCB) switching rate of a CICS task, we learned that a third-party instrumentation facility that was pulled in at execution time was the culprit causing the high rate.
We took advantage of Parallel Sysplex for performance and load balancing and to lower third-party software costs by routing workload to the machine with the required software resource. We made minor changes to our applications to accomplish this, but we achieved a significant financial return.
Additionally, some of our databases don’t participate in database federation, forcing us to conduct multi-phase commits manually. This resulted in data integrity being questioned when we had to retrace many steps to do manual fixes. Even though data integrity wasn’t compromised, we considered this a performance issue because of the loss of productivity it caused. We carefully designed a homegrown agent that would oversee the multiple phases of commit and would trigger an undo process in case of failure.
Our back-end database is DB2 for z/OS, which manages about 10TB of data. Poorly performing SQL statements are the easiest to identify and correct. DB2 is well-equipped with accounting information at the correlation-ID level, package level, or even more granular levels. Only when an SQL solution is insufficient do we resort to other solutions.
Not all performance solutions are software solutions; some are procedural. Examples include stacking up non-critical or time-insensitive updates for non-peak hours or running various audit reports together by sweeping the database just once. It may sound like a clear-cut solution, but it’s difficult to get a consensus when working within the layers of communication and time zone differences typical of a global community.
We also faced some interesting business functionality issues. We collect data from different sources (converging on one key) and distribute the same data to a different set of clients (diverging on another key). There’s an authentication process validating who has clearance to input the data and a filtering process at the distribution end. Both are resource-intensive procedures because of the granularity of authentication, but we proved we could reduce cost significantly by rewriting the algorithms. It was risky to change the decision-making modules that are the backbone of the business, but there was a greater risk in not trying. After careful evaluation and thorough testing, we rolled out the new algorithms and they were successful.
We solved some design issues with historical databases that mirrored operational database design. By changing the design, we could curtail exponential data growth. We keep dated information in our database, and old attributes become outdated when new ones are made effective. The old design involved explicitly applying a discontinue date to the old attribute; the new design assumes an implied discontinue date based on the presence of a new attribute. This yielded a more than 50 percent savings in the load operation.
- IBM’s Tivoli Decision Support for z/OS for general trend analysis
- PLAN_TABLE as a repository of explains of static SQL
- DSN Dynamic Statement Cache for dynamic SQL
- Visual Explain option of IBM’s Data Studio for graphic view of the SQL
- Compuware Corp.’s STROBE for in-depth analysis after we target a process for tuning.
We extracted accounting information from the Tivoli database using native SQL in a format that’s comparable to a DB2PM report. We used thread-level and package-level details to pinpoint likely candidates, then we used STROBE and Data Studio for in-depth analysis.
The keys to good performance are to issue as few I/Os as possible, reduce internal data movement, and minimize table processing. One methodology doesn’t suit all circumstances, so ATPCO used numerous methods:
- Since thread creation is expensive, we took steps to avoid job initiation. When events that warrant a job were infrequent, we made job initiation event-triggered rather than polling on a time interval. When these job-triggering events were too frequent and numerous, we stacked them up and released them at regular intervals.
- Usually, external I/O to and from data files and networks is the slowest component. Our first priority was to tune outside the application by optimizing buffers or cache controllers or accelerating I/O with DFSORT.
- We reduced slack and wait for resources by eliminating resource contention and optimized our batch window by increasing parallelism.
- Avoiding date format conversion reduced internal data movement. Date is a big part of our data attributes; it’s received in different formats from numerous sources. We try to ensure that it’s converted only once to the internal format, and, in most cases, we’ve standardized the date element for minimal conversion.
- We reduced table processing and searches. We learned that in-memory processing is faster than DB2 temp tables, and binary searches are faster than serial searches. We conducted binary search on the one-to-one relationships and serial search only on no-match of the one-to-many relationships. We split tables larger than the maximum allowable sizes and used pointers to map the tables rather than moving data around. We developed a way of pre-calculating the size of the searches so we can use these techniques only when necessary.
- We avoided wide loops by placing unnecessary functions outside the loop and positioning the busiest loop inside while nesting.
- We learned the hard way that run units can be timed out, buffers can be paged out, and inactive files can be quiesced if we don’t optimize the life of threads. We streamlined and minimized external data access by storing data in memory for the duration of the process. We also reduced the size of DBRM and program to avoid EDM pool paging. We identified and converted edits that different modules performed repeatedly.
DB2 tuning efforts that gave us the most performance gains involved:
- Targeting packages that showed high getpage and taking steps to reduce it
- Tracking deadlocks and lock escalations and re-evaluating commit frequency
- Deferring updates until the commit point to minimize the duration of lock
- Evaluating actual usage of indexes and making adjustments accordingly
- Reviewing nested loop joins to confirm they were looping on the smaller table and achieving up to a 90 percent savings by forcing the SQL to loop on the smaller table
- Identifying single object constraints and increasing parallelism by partitioning into smaller objects
- Using the multi-row insert capability introduced in DB2 Version 8 for a 20 percent savings. More savings could have been achieved with the atomic option, but the non-atomic option gave us a unique time stamp (40 to 50 microseconds apart) on a column defined as timestamp with default. Using the multi-row fetch operation yielded an almost 50 percent improvement, but the overall gain was only 25 percent because the multi-row result set had to be adapted to a framework designed to handle one row at a time.
Here are some additional improvements we made and lessons learned in the process:
- A DB2 table indexed on an ever-ascending timestamp column was defined with no free space and no free page. Everything was fine as long as there was only one thread inserting data. When multiple concurrent threads started, the first one put an exclusive lock on the last page for insert. The second thread tried unsuccessfully to get a lock on the target page. Since the target page is the last page, it started browsing from the beginning. We observed the high getpage and corrected the problem by adding some free pages.
- A singleton fetch-only thread bound with cursor stability was providing great performance, but five concurrent threads going after keys that are far apart (no two SQLs targeting the same page) resulted in poor performance. Concurrent threads showed a tremendous increase in synchronous read. We tested on an isolated DB2 system to further research the scenario. At execution of concurrent threads, the buffer pool and other resources were in contention and the optimizer may have decided to do synchronous reads. There must be a maximum limit on the percentage of the buffer pool that can be monopolized by a single object. We corrected the situation by partitioning the table and rendering it into multiple objects.
- We noticed an increased amount of Systems Management Facility (SMF) log offloading when a particular CICS task was running. The CICS-DB2 thread with RELEASE (COMMIT) attribute would cut a thread termination record, SMF-101, at commit time. This particular task was looking for an unused five-digit number. Once the largest five-digit number is used, it looks for unused holes, with every single lookup resulting in a commit to release lock. The application was modified to correct the situation.
- One application that keeps a large set of locks between commits was affected when a monitoring facility capturing IFCIDs, which uses Extended Common Service Area (ECSA) heavily, was turned on. The coincidence was identified and the problem was corrected.
- We observed some SQLs doing table space scans instead of list prefetch. We identified the root cause as a RID LIST exceeded condition. Doubling the RID pool size corrected most of the issues and the remaining problems were handled on a case-by-case basis by tuning SQL and avoiding a large RID LIST.
- A column with possible Y/N value defaulted to cardinality 25 because of insufficient statistics, resulting in poor performance. We corrected this with enhanced distribution statistics.
- An SQL used match columns that weren’t in the index, causing data pages to be searched in the join. Index columns alone would have sufficed.
- A table with a single identity column and no index was used as a next number generator. An application was created to insert a row, get the next number, delete the row, and proceed. The unit of work couldn’t be committed until much later. We noticed that the delete action was taking a table lock rather than a page lock. Table lock for deleting a row with index is justified because it can’t release the index page until commit, but we didn’t understand the reason for the table lock when there’s no index. We modified the application so it didn’t delete the row and added an offline process to delete rows behind the scenes.
With an IT agenda that was dominated by cost cutting, ATPCO management had the foresight to invest in efficiencies. A CPU upgrade was deferred in 2007, major applications were installed with no adverse performance impact in 2008, and the momentum continued in 2009. In January 2010, CTO Steve Daniels explained that adjustments to ATPCO’s mainframe performance improved the entire IT services division and “literally saved the company millions of dollars.”
Figure 1 shows how the 2009 average work units per workday increased by 20 percent, yet mainframe usage declined by nearly 5 percent. All solutions haven’t yet been installed because they’re being completed in stages based on resource availability and business needs.