Analytics is a hot topic. The amount of data stored in our operational systems is increasing daily, and management is realizing this information can and should be harnessed quickly for the business to make timely decisions on sales directions, talent acquisition, cost containment and more. A big challenge is to formulate answers to these questions that use the most current information, are inexpensive and easy to create, and can deliver the answers quickly. Often, great expense is incurred in moving data, creating data warehouses and using specialized software to produce various reports.
Online Analytical Processing (OLAP) expressions are an extremely powerful tool in SQL that enable advanced reporting features, such as ranking, counting, averaging and adding a set of data processed in an SQL statement. This feature lets you aggregate data based on values in a query in a manner similar to coding control breaks in a program process. This allows for entire programs, or even applications, to be replaced by more flexible, portable SQL statements. Surprisingly, this functionality might also be useful for batch processing and even transaction processing systems.
Having OLAP functionality built into the DB2 engine can help reduce some of the operational and software costs involved in getting answers to complex questions. You can use this functionality in data warehouses as well as against Online Transaction Processing (OLTP) databases with equal results.
DB2 OLAP Capabilities
The OLAP specification was introduced in DB2 9 for z/OS and included counting and ranking of result sets. The specification was expanded with DB2 10 for z/OS to include aggregate functions. All this functionality has existed in DB2 for Linux, UNIX and Windows (LUW) for some time; the enhancement to DB2 10 for z/OS brings the specification nearly in line across the entire DB2 family. This functionality goes by several names, including OLAP expressions, OLAP specification, window functions and OLAP functions; they’re documented in the DB2 SQL Reference in the “Expressions” section of the “Language Elements” chapter.
The DB2 OLAP specification consists of the:
• Ordered-OLAP-Specification, which includes RANK, DENSE_RANK, LAG (DB2 for LUW only) and LEAD (DB2 for LUW only)
• Numbering-Specification, which includes ROW_NUMBER
• Aggregation-Specification, which includes AVG, MIN, MAX, COUNT, SUM and more; FIRST_VALUE (DB2 for LUW only); LAST_VALUE (DB2 for LUW only); and RATIO_TO_REPORT (DB2 for LUW only).
These window functions operate against a specified set of values and return a scalar result as part of the result set returned from a fullselect or subselect. An OLAP expression consists of a function name and an optional window-partition-clause and window-order-clause for all functions, and also an aggregation specification for aggregate functions. The form of the expression is:
<OLAP-function> OVER(window-partition-clause window-order-clause window-aggregation-clause)
Row numbering is easy to understand; it does what its name implies—numbers rows in the output. Since windowing and ordering can be applied to row number and numbering is easy to understand, it’s the perfect function to use to learn about these features.
You enable numbering via the ROW_NUMBER() function, which has no parameters. You can number within windows and apply an order to the numbering. However, the numbering itself occurs arbitrarily. Despite the limitations, this function can be extremely useful for things such as determining the minimum and maximum row according to an order, data sampling and pagination (although there are some performance implications with pagination using numbering).
Figure 1 shows the result of the ROW_NUMBER() function when there’s no ordering or windowing involved. The ROW_NUMBER() function tells DB2 to number the output row according to the ordering applied to the function, starting with the number one (1) and continuing by adding one (1) to the number for each additional row returned. If no ORDER BY is specified, then the numbering is arbitrary with respect to the order of the result table. The ORDER BY clause of the final SELECT (the only SELECT in this example) has no meaning for the numbering, so don’t be fooled by a coincidental numbering in the order of the result.