Having logic in an SQL statement is a good thing. However, there are times when the processing is excessive. The ability to write an SQL statement in one place, and then run it from anywhere that has a connection to the database is powerful. Complex SQL can be an extreme performance advantage if:
- The program processes written into the SQL statement aggregate or filter data
- The complex SQL statement logic is data-intensive
- The application program can issue one large statement that returns a result rather than many smaller statements that return data. However, if a complex SQL statement processes data, rather than filters it, then it can be a performance detriment. SQL statements that use lots of UDFs (especially external UDFs), CASE expressions, and data conversions in the SELECT clause may impact performance. Nested table expressions that contain expressions in a SELECT clause and then have references to those nested expressions reused in outer expressions also may significantly impact performance. It’s almost always cheaper to put this type of processing in an application program vs. the SQL statement.
So, complex SQL statements can be a performance advantage or disadvantage. Data-intensive logic is almost always a performance advantage; data-processing logic is almost always a disadvantage. The trade-off is being able to balance the performance with the reusability, flexibility, and portability of complex SQL.
7 Overuse of DISTINCT
Is use of the DISTINCT clause justified in your query? Why is it there several times? Many tools that generate SQL automatically put a DISTINCT immediately after a SELECT by default. Also, many users and programmers will use DISTINCT “just to make sure.” DB2 needs a unique index on the columns specified in the SELECT clause to avoid a sort. Also, a DISTINCT used in a nested table expression will force materialization. The question to ask is, “Are duplicates possible?” If not, then remove the DISTINCT from the query.
If DISTINCT is required, then it should be used in a meaningful way. The performance of DISTINCT needs to be considered and the duplicates in a complex query should be eliminated as early as possible in the processing of that query. Also, try to use DISTINCT only once in a query. You can have multiple DISTINCTs in a query as of V8. You also can consider using a GROUP BY on all columns of the query in place of a DISTINCT. DB2 will consider both unique and non-unique indexes to avoid a sort in support of the GROUP BY.
6 Coding inefficient predicates Some questions that need to be addressed when it comes to DB2 SQL performance include:
- Have you coded efficient stage 1 predicates?
- Do you have stage 3 predicates (filtering logic in code)?
- Are all indexes known at the time of coding and were they exploited (stage 1 indexable predicates or index screening predicates)?
- Are rows being filtered as early as possible?
There are critical questions that should be evaluated by someone reviewing the code. Often, these types of reviews don’t happen and enhancement opportunities are missed.
Consider stage 3 predicates. This is where filtering occurs outside DB2. This type of code isn’t efficiently using the DB2 engine and, without allowing DB2 to do the work on the data while it has it, you have no tuning opportunities in DB2.
The most important thing you can do to improve query performance is to run appropriate statistics. Every column used in every WHERE clause should have statistics on it. You also need to be aware of when your predicates are being applied in the database. Transforming as many predicates as possible to stage 1 indexable will reduce CPU, improve table join sequence selection, and improve application performance. Column expressions are common on indexed columns in the WHE RE clause. Coding a column expression automatically makes the predicate stage 2, and not eligible for index access. Figure 4 shows how to convert two-column expressions to equivalent stage 1 predicates.
5 Using incorrect BIND parameters