Jun 11 ’12
How to Leverage DB2’s Automated Time Travel Queries and Temporal Tables
DB2 temporal tables are an outstanding database automation feature that can replace significant amounts of application programming. This makes them a popular feature of DB2 10 for z/OS and DB2 10 for Linux, UNIX and Windows (LUW). There are some things you need to understand and prepare for before using temporal tables. However, once you have a better understanding of how time travel queries work, their uses and implications, you will be able to leverage them for your needs.
The Need for Temporal Data Storage
Data volume is increasing exponentially as business and regulatory demands drive the need to maintain large volumes and multiple versions of data. Flexibility in application design is important, especially in the era of the ever-expanding Internet, where adaptability and flexibility dictate whether a product, business, or strategy will succeed. Temporal designs give you the flexibility to store time-relative information, adding another dimension to your data store. Not only do you know what data is valid, but when it’s valid. This lets you stage data for future use, or go back in time to re-create a data condition when investigating an anomaly or performing analytics. You can also automatically track the history of changes to data for auditing or re-creating data at a point in time. DB2 now automates this.
DB2 Temporal Table Benefits
The great thing about triggers is that you can use them to replace large sections of application logic. You can use triggers to capture database changes and propagate them to audit tables, perform denormalization on the fly, and aggregate data to summary tables. You may be able to replace days or weeks of application programming and testing with just a few hours to set up and test the triggers. Moreover, once a process is no longer needed, you can simply DROP the triggers to remove the process without an application program change.
Temporal tables take this concept of database automation even further; they take an entire portion of application logic and place it in a relatively simple table structure with seamless automation built in. Application developers can incorporate a time dimension into their design without having to code for maintenance of the time factor; they can rely on DB2 to maintain the proper consistency of their time-dependent data. They will still have to consider time as they read their data, but DB2 automates maintenance of the time dimension.
Application-Period Temporal Tables
An application-period temporal table is a table that has been defined with the appropriate time columns along with the specification of a business period. They’re useful for managing data active during a certain time, and allow data to be staged (active in the future), current, and historical. An application-period temporal table can be useful for such things as reference and code tables where descriptions will change over time, or for more involved applications such as the active period of a specific insurance policy. The application-period is created by including a “PERIOD BUSINESS_TIME” clause that identifies the start and end time columns. These columns can be DATE or TIMESTAMP data types.
Figure 1 shows an example of a table defined as an application-period temporal table. When you define the application-period for the table, DB2 will automatically generate an implicit table check constraint that ensures the start time column value for a row is less than the end time column for that row. The application is still responsible for setting the values for the start and end time columns when new data is inserted. The application can also update start and end time columns. However, DB2 will control the data, including start and end times, when you apply time-relevant updates and deletes. Optionally, you can also define a unique constraint or index on the table that includes the application-period. This optional definition will ensure the application-periods don’t overlap across rows in the table. Figure 1 also includes this unique restriction in the definition.
System-Period Temporal Tables
A system-period temporal table is a table that has been defined with the appropriate timestamp columns, along with the specification of a system-period; it can be useful when all changes to a table need to be tracked with a retained history to fulfill auditing or compliance requirements. The system-period is created by including a “PERIOD SYSTEM_TIME” clause that identifies the start and end timestamp columns. When you define a system-period for a table, DB2 is in control of the start and end timestamps; the timestamps reflect the state of the data for the time specified. Within the base table only, data that’s currently effective is represented. Optionally, you can define a history table and associate it to the base table. Any changes to data then result in before images of the changed data being automatically replicated to the history table, with appropriate updates to the timestamps in both the base and history tables. Figure 2 shows an example of the Data Definition Language (DDL) to create two system-period temporal tables and their associated history tables.
You must consider history table, table space, and index design for system-period temporal designs. Unlike its base table counterpart, the history table is an insert-only table. You may want to consider an alternate physical design of the history table that favors a higher level of performance for inserts, depending on the frequency of base table changes. If the base table experiences highly frequent updates or deletes, some history table design considerations would be to:
• Make the history table APPEND ONLY
• Make the table space MEMBER CLUSTER
• Use a larger index page size than 4K to reduce the frequency of index page splits.
Frequent REORGs to the history table or index are important when there’s high insert activity resulting from updates to the base table, regardless of the physical design. If there are also frequent reads from the history table, then the REORGs become even more important; you will want to respect the clustering that’s meaningful for the readers.
An application-period and system-period design can be incorporated into a single table, making what’s known as a bi-temporal table. This design gives you all the flexibility and power to combine both the application-period and system-period controls in a single table. The features of each application-period and system-period can be assumed in a bi-temporal design.
Referential Integrity and Temporal Tables
DB2 doesn’t currently support time-based referential integrity. You can define a primary key on an application-period temporal table, but can’t establish relationships with other tables. Since the whole point of application-period temporal tables is to add a time component to what was otherwise a primary key, this adds some complexity to a database design that uses application-period temporal tables. One solution is to establish separate key tables that support the database-enforced referential integrity, while their application-period temporal children support storage of the detailed elements (see Figure 3).
Setting up database-enforced referential integrity for system-period temporal tables is more traditional and straightforward. Since the system-period columns aren’t part of the primary key, and there’s still only one row per primary key, then traditional parent-child relationships can be enforced for the base table.
Changing Data Relative to Time
With an application-period temporal table, the application and database control changes to the span in time in which a row is active. When making changes to data, you can specify start and end times for your data values since the application ultimately controls these values. For example, you can change the end date of the data for ACTNO 10 in our ACT table to make the value expire at the end of 2012. Section A of Figure 4 shows that update statement and the result. You can execute normal insert and update statements, but if the unique or primary key of the table includes the business time, then your insert can add a row for an existing non-time-based key value and a delete could remove multiple rows representing different time ranges for a value. Our ACT table has a non-time key column of ACTNO and then the time duration as part of its unique key. Inserts, updates, and deletes must consider the time duration.
In addition to application-provided start and end times, the database can also control the active time range for a key value dependent on the updates and deletes performed. DB2 has provided an extension to the SQL syntax to allow a time specification for temporal modifications to data in the form of the FOR PORTION OF clause. This clause lets you specify the time period in which a row for the non-time key columns is active. So, for our data (notice I didn’t say row) represented by the ACTNO 10, you can start a new representation of the data for a future time period. Imagine that, beginning in June 2012, this ACTNO will represent something called “MANAGE/ADVISE2.” The update, as section B of Figure 4 shows, will result in the addition of a row for ACTNO 10 as well as an update so the data for the two new time periods is properly represented.
With system-period temporal tables, the start and end timestamps are out of the control of the application and completely controlled by the data server. Inserts add data to the base table, updates modify data in the base table, and deletes remove data from the base table. Only updates will result in changing start timestamps in the base table, and the end timestamps will always be the maximum value.
It’s only when a history table is associated with the base table that things become more interesting. Inserts into the base table won’t generate any history table activity. However, any update or delete will result in “before” image rows being propagated automatically to the history table. Also, the end timestamp value for the propagated row will be updated to reflect the end period for the data values of the row. For example, if you change the value of the LASTNAME column of employee “000010” as shown in Figure 5, a row will be inserted into the history table to reflect the condition of the data before the update. Likewise, a delete to the DEPT table for DEPTNO “J22” will result in the row for that department being removed from the base table, and a row reflecting the image of the row up to the point of the delete being inserted into the DEPT_HIST table.
Writing Time Travel Queries
Temporal tables can be accessed via normal SQL statements, and don’t require a time specification when accessing them. Queries against temporal tables can also contain a clause called a period-specification. This can be for a specific time or for a range of time values. However, it’s important to know how the queries will behave both with and without the time specification.
Consider a BUSINESS_TIME specification for application-period temporal tables. It’s important to note that the business start and end times for a row in an application-period temporal table are inclusive for the start time and exclusive for the end time. So, if a row has a range of “2012-06-01” to “2012-12-31,” as with our ACT table example, then the data is active inclusively for June 1st through December 30th. Consider the ACT table data in section B of Figure 4. You can assume a query with a period-specification of “2012-12-31” will return no rows while one that specifies “2012-12-30” will return the single row that’s active for that date. Consider Figure 6 for the query and result. The period-specification is transformed by the query transformation component of DB2 into a range predicate.
Besides using the “AS OF” period-specification, you can query based upon a range of time. There are two choices for specifying range values: the “FROM” and “BETWEEN” clauses. They differ slightly. FROM is strictly locating data that was active in the time that intersects the specification, and BETWEEN is locating data that was active in the time that overlaps the specification. FROM will return any rows where a start value for a row is less than the second value specified and the end value for a row is greater than the first value specified. BETWEEN will return any rows where a start value for a row is less than or equal to the second value specified and the end value for the row is greater than the first value specified. For BETWEEN, the end value still must be greater than the first value specified because the end time for a row is exclusive. Figure 6 shows time range period-specification examples for our ACT table data. Make sure developers are trained to accurately code period-specifications. If necessary, you can always code the predicates against the time value columns yourself! This could let you break the rules and treat the end time value as inclusive in your queries.
Query Table Access With System-Period Temporal Tables
System-period temporal base tables and history tables can each be accessed directly via normal SQL. From a performance perspective, this is the preferred way to access the tables. If you need current data, then you read from the base table directly. If you want historical data, then you can read from the history table directly, or from both the base and history tables. You can also include a period-specification in your SQL statement against the base table, in a manner similar to an application-period temporal table. The same rules apply for the various period-specification clauses as with the application-period temporal tables discussed previously. The big difference here is that any time travel query against a system-period temporal table will generate access against both the base and history table.
The query in Figure 7 will return data from our employee history table when the AS OF period-specification includes a time prior to our update. In the transformed query, what was written to appear as single table access was rewritten by DB2 as a union of accesses to both the base and history table. This will happen even for queries in which the AS OF period-specification includes the CURRENT TIMESTAMP value. Your applications that deal with system-period time travel queries should perhaps use them only for truly historical requirements. For current period requirements, only the base table access without a system-period specification will deliver optimal performance. There may be a change in a future release of DB2 that will allow current time queries to prune access to the history table. For now, you will have to code for it yourself.
DB2 query transformation of a query with a system-period specification into a union of the base and history tables can become more complicated when you consider the joining of two tables. No one would think much about a join between the DEPT and EMP tables, and can assume two tables are accessed in the query, but if you include a SYSTEM_TIME period specification for the tables in the query, then the number of tables accessed increases dramatically. The number of tables accessed in a time travel join of system-period temporal tables, if each table reference contains a SYSTEM_TIME period specification, is equal to (2n) x n, where n is the number of tables in the join. The join of DEPT to EMP, each with their own period-specification, results in eight table accesses. A similar three-table join will result in 24 table accesses, and a similar four-table join will result in 64 table accesses. Clearly, joining system-period temporal tables using SYSTEM_TIME period-specification should be done carefully or perhaps avoided when performance is the primary concern. Figure 7 shows the join of two system-period temporal tables that include a period-specification and the transformed query. (Note: Figures 2 and 7 have been edited for length in the print edition of this article but appear unabridged in the digital version.)
Establishing and maintaining application-period, system-period, or bi-temporal tables is straightforward. With DB2 managing the integrity of the temporal data, the application is free from the responsibility of properly updating and reading temporal data. However, it’s important to learn exactly how a period-specification is interpreted by query transformation to be certain these specifications are coded correctly in your application. In addition, performance considerations are critical when using queries containing a period-specification against system-period temporal tables, especially for joins. Using database-enforced referential integrity with temporal tables can be a little more involved than with a normal database design. It is critical for both DBAs and application developers to understand how the tables work.