IT Management

Picking the Best SQL Statement to Tune

You may have read the title and are now thinking to yourself that the best SQL statement to tune is the one the users are complaining about today. I would not argue with that perspective. However, I would like to present a different approach to SQL tuning. This discussion will not take the place of firefighting problem transactions, but it just might remove those situations from your routine. Proactive performance tuning has been discussed for years. This article examines a methodology that can be applied to any application for purposes of tuning the portions of the application that would provide the greatest return on that tuning investment. It is based on the Pareto principle, or the “80/20 Rule.” When applied to a DB2 application, this rule presumes that 80 percent of the resource usage (CPU, memory, and I/O) is being consumed by 20 percent of the SQL in that application. Therefore, tuning that 20 percent of the SQL statements would provide the greatest ROI.

Assuming, therefore, that proactive tuning is desirable, and that the Pareto principle applies, what can be done to achieve the goal of a well-tuned DB2 application? Obviously, the answer is to tune the 20 percent of the application that consumes the most resources. This article proposes a methodology to find that 20 percent of the application. Once you find the right SQL statements to tune, you can achieve tremendous resource savings in your applications. You never know; this type of tuning might just save enough money to keep an American employed in the IT infrastructure as opposed to having the job shipped offshore!

What Data Do I Need?

This methodology is based upon readily available data from z/OS, IMS, CICS, and DB2. z/OS will provide broad information concerning the applications by LPAR. This information will be presented by service class when in Goal mode with Workload Manager (WLM) and by performance group if you are not yet using Goal mode. Figure 1 shows a chart of this data.

While this example does not present a significant amount of detail, the chart does reveal large online and batch workloads for this LPAR. For the ONLINE workload, more detail would be derived from either CICS or IMS data sources. If you have both CICS and IMS combined in the same service class, then separating the data sources would provide a more detailed breakdown of the applications that use each of the online processing subsystems.

CICS provides usage data in the SMF type 110 records. IMS does not provide usage data natively in SMF records, but this information can be obtained from BMC Software’s MainView for IMS. This product produces FA/F9 records that are written to the IMS log and can be extracted for this type of usage study. Figure 2 shows a chart that could be produced with this type of data.

With this level of detail, the 20 percent of the SQL application that uses 80 percent of the resources can be found more easily. In this case, three components rise to the top as large resource users. This breakdown that separates IMS or CICS usage from DB2 usage is important. If the majority of the resource usage is in the online subsystem, then tuning DB2 SQL will not reduce your resource usage. This process of separating CICS from DB2 usage can be accomplished by correlating the DB2 usage data in the SMF type 101 records with the CICS usage in the SMF type 110 records through the CICS UOW field. CICS originally created the UOW field to tie its own records together from the TOR to the AOR to the FOR. Now you can instruct CICS to include this field when a transaction invokes a DB2 connection through the accounting record’s definition on the CEDA DEFINE DB2CONN panel. However, if you are running a CICS Version 4 system, this is set in the RCT tokeni field. MainView for IMS records automatically provides the breakdown.

DB2 provides accounting records in SMF type 101 records. These records contain varying levels of detail, depending upon the classes that are active. Classes 2 and 3 provide basic execution and wait time statistics at the plan level. Classes 7 and 8 provide this same information at the package level. Assuming that the problem applications are DB2, then this data can be analyzed to find the plans or packages that would be candidates for tuning. There is a small amount of overhead for the package data in SMF and would be worthwhile information to obtain for at least a few days to pinpoint 20 percent of the applications that would benefit from tuning.

In addition to this data, there are also third-party software products that produce accounting records at the SQL statement level. Two products that I know of are Detector from Computer Associates and APPTUNE from BMC Software. With these products, those responsible for tuning can easily determine the SQL statement that uses the greatest amount of system resources.

The final piece to this puzzle is dynamic SQL statements. You can look for candidates through the accounting data, as previously described, or through third-party products. However, the difficulty with dynamic SQL is that usually the applications using this type of SQL have a generic plan and package name structure. Also, the statements do not lend themselves to easy summarization and categorization. Just looking at the SQL text would produce a unique statement for each execution because of substituted literals in the predicate(s). Static statements are most easily summarized because of host variables in the text and a consistent statement number. Please don’t get me wrong; these types of applications can be analyzed and tuned with this process. It’s just that the data accumulation process is more difficult. If you have no third-party software, then you might investigate using IFCID 22 and 63 to collect SQL text and explain information, and then apply general rules against that data to identify statements that have access paths that are usually considered poor.

Putting These Pieces Together

Having discussed the “what” in the process, let’s discuss the “how.” Putting all this information together may not be the work of one person in an organization. Therefore, I would suggest a collaborative process between individuals. All of the information described may exist today. So, the key ingredient is to correlate the data between groups and produce a plan for performance improvement and resource usage efficiencies.

Using Figures 1 and 2, the systems programmer would identify the online service class as the largest resource user. In fact, a more detailed service class definition list could pinpoint usage for a particular online region. An analysis of that service class would provide a list of applications that are the largest resource users. Within that list of applications, a list of plans or packages that are the largest resource users would be developed. Finally, within that list of plans or packages, a list of SQL statements would be produced by resource usage. This process is analogous to peeling back the layers of an onion: The deeper you go, the more detail is revealed. The inside of this process contains SQL that would benefit from tuning.

Although an in-depth discussion would require much more detail than I’ve provided here, it is important to keep in mind the tuning process should focus on a single change at a time. You should benchmark and compare data to determine the change(s) that would produce the best results. Implement these changes through a normal change management process and look for more opportunities for tuning.

Conclusion

For those of you who are still fighting fires, perhaps you can find time to collect some of this data and begin to proactively tune your applications. For those of you who are already involved in a proactive process, perhaps this article will give you new ideas on how to improve that existing process. Finally, I have described a process that could be used for tuning SQL statements. I believe this same process could be used in many other ways. I leave the application of this process to your imagination. Z