You can choose from many different ways to code an SQL statement to answer a given question. While the result set will be the same, performance can vary dramatically, depending on the statement construct and data structure. This article examines the major ways of coding SQL and when you should use the various techniques. We’ll address how to code joins, table expressions, and subqueries effectively for specific situations so you can better understand when to use these coding methods.
Basic Tools Required
Before attempting any SQL tuning endeavor, or when constructing SQL statements for a new application, you must understand the potential performance implications that could occur from how you code the SQL statements. This requires an understanding of the tables you’re accessing and the process you’re attempting to serve. For example:
- How is what you want to accomplish impacted by the size of the tables you access?
- What is the input to the process?
- How often is the statement accessed?
- What indexes are available?
- How well do predicates in the statement match the indexes?
Talk to application developers to see what their intention was when they coded the various SQL statements. Talk to data administrators to understand the concepts behind the database design. If you’re involved in database or application design, keep the number of indexes to a minimum and support the use of SELECT statements; this will keep SQL statements simple and help data modification statement performance. Try to avoid system-generated keys; stick with natural keys when possible. Don’t partition unless there’s a reason such as size, concurrency, ease of management, or availability.
Keep these handy:
- SQL reference guide
- Application programming guide
- Administration guide
- Performance guide
- Table Data Definition Language (DDL) or Entity Relationship diagram
- Catalog statistics.
With this documentation, and an understanding of the conditions under which the statement executes, you’re ready to make correct choices and tune your queries. You do, however, need some tools to do the tuning. We use DB2 EXPLAIN, usually via IBM Optimization Service Center or IBM Data Studio, to check the access path, and REXX to test query performance.
Before tuning SQL, ensure that the catalog statistics accurately reflect the data stored in your tables. Most access path problems can be resolved by collecting catalog statistics or running a REORG and then collecting catalog statistics if your data is disorganized. It’s important that table and index statistics are in agreement; that’s why you should never manually manipulate the catalog statistics or collect index statistics at a separate time from table statistics. When it comes to collecting catalog statistics, either run them once or run them often. If you’re familiar with the data stored in the tables for an application and how the application needs to access that data, you should be familiar with the proper access paths for the queries. In that case, you want to collect statistics that accurately represent the proper data storage and assure that DB2 is picking the appropriate access path. At that point, the statistics shouldn’t be altered so your access paths remain consistent.
If you don’t understand the data organization and how the application uses the data, then you’re better off putting your faith purely in DB2. In that case, you want to run RUNSTATS regularly to give DB2 timely information to pick what it thinks is the best access path.
Collecting RUNSTATS can be expensive, particularly for very large tables. Setting a low sample percentage helps, but you shouldn’t set the sample below 10 percent because you may lose accuracy. Before starting any tuning activity, analyze the statistics for the objects accessed by the SQL you’ll be tuning. You want to know as much as possible about the data.
Subquery Syntax Choices for Performance