There’s a simple solution to the generic search if you know a little bit about how your users enter their search queries. Often, users know the common search criteria they enter. By simply coding two queries and a little programming logic, you can dramatically improve overall performance of the generic search. A study of user activity showed that users usually received an employee number for input, and were rarely given a different search value. With this knowledge, the application can be built so a search on only the employee table happens first (see the early-out query in Figure 4).
If the employee is then found, the search is over. If the employee isn’t found, then the normal generic search executes. The generic query is then executed only if the early-out query produces no result. If two queries and application logic aren’t desired, then you could potentially put your early-out logic directly in a single query. The combined query in Figure 5 behaves just like the two queries and application logic in the early-out design combined. It does this by taking advantage of a DB2 during join predicate. DB2 join predicates are evaluated during the join operation, and the join is only performed if the join condition is true. If the join predicate is false, the join doesn’t happen. Here, if the employee number is found in the first join for just the employee table, then the second table expression doesn’t run (i.e., if A.EMPNO has a value, then the join doesn’t happen).
Figure 6 also has early-out logic built into the query via a COALESCE function. The COALESCE function returns the first non-null value in a list of two or more values. Those values can actually be expressions, and those expressions can be SQL statements. The drawback is that the COALESCE function can only return a scalar result. However, this technique can be effective for complex search operations with variable input, or for search input against several tables such as active vs. inactive tables.
Positioning and Restart
Often, when cursors are used for scrolling or restart, the queries are coded with compound predicates that facilitate the scrolling based on multiple columns. Unfortunately, these types of compound predicates typically lack Boolean term predicates, and DB2 needs Boolean term predicates to efficiently use indexes for queries. If your query lacks Boolean term predicates, then the best index access you’ll get (as of DB2 9) is multi-index access or non-matching index access. This can cause a lot of anxiety as you wait for what appears to be a simple scrolling cursor to scan through an entire index before returning the few rows you require to fill a screen. So, it’s important to code Boolean term predicates. You can do this by maintaining separate search, scroll, and restart queries. You can also add redundant Boolean term predicates to a generic scrolling cursor to improve index access.
A typical scrolling cursor (see query 1 in Figure 7) reads data in a batch program for a range on a combination of leading key column values. Due to a lack of Boolean term predicates, it can do non-matching index access at best. Query 2 in Figure 7 is the same query as the typical scrolling cursor, but the predicates were changed so they’re Boolean term. The result is a single column match on the first key column.
Query 3 in Figure 7 is the same query, but combines the search positioning/restart with the technique of building keys from code tables. Here, a code table that contains all values for SEG and SUBSEG is accessed in a non-correlated subquery, where the range of values desired is applied. The result of the non-correlated sub-query is then evaluated in a row expression to get two matching index column access. This works only if the code table you build is relatively small. The code table in this example contained 200 rows of data while the main table had about 500 million rows.
DB2 10 may offer relief for these situations automatically with the introduction of range-list access.
Scrolling Multi-Table Queries
A major issue with search and scroll is when it involves complex queries that access several tables that can return large quantities of data. You want to minimize the amount of work the database is doing, and often DB2 can do this for you. DB2 will do its best to avoid sorting or materializing the result so you only return the data needed to fill a screen. However, if DB2 must materialize the result, then the query could process a lot of data only to have the application throw most of it away when it gets control. When this happens, you should try to write a query that eliminates the materialization, but this isn’t always possible. If materialization is unavoidable, you may be better off materializing the smallest amount of data in one query, then reverting to the database for the rest of the data later.
Query 1 in Figure 8 is a simple example for what’s actually a problem with extremely large, complex queries used to fill screens with data. If the ORDER BY results in a sort, then the query can be quite expensive to operate for search and scroll. If we can’t avoid materialization, perhaps we can minimize it. The applications will read only the data required to avoid or reduce the materialization (with the appropriate supporting index), such as query 2 in Figure 8. Subsequent queries can collect related data using IN-List or programmatic join. An application can also store the required sort information in an array, and only get screen data that was needed as the user paged through the screens. Remember, in these situations, we’re expecting the screen or user to read only a limited amount of the data that qualifies.
The goal of tuning search and scroll processes should be to balance the amount of work involved in coding these processes with the performance of the queries and program process. It’s always good to reduce the number of indexes in the database. Don’t just try throwing an index at a performance issue. Instead, see if there’s an alternative way to code the SQL statement that may yield a performance improvement. Many DB2 features and SQL coding techniques offer solutions. The SQL techniques presented here show alternative paths to the data. Consider testing these, including EXPLAIN and benchmark tests, to determine the best opportunity for your specific performance situation.