Jan 1 ’07

z/Data Perspectives: Database Archiving for Long-Term Data Retention

by Editor in z/Journal

The growing number of regulations, and the mandate for companies to be in compliance with them, is driving the need to retain data for longer durations. There are many regulations (such as the Sarbanes-Oxley Act and HIPAA) that govern how long data must be retained. Indeed, one analyst group estimates there are more than 150 federal, state, and local laws that dictate how long data must be retained. As such, businesses of every size have begun to realize they must formulate plans for archiving data from their operational databases.

Database archiving is a component of the larger discipline of data archiving. Data exists in many formats and for many purposes, and only a fraction of it is actually stored in databases. Yet, even this small amount comprises quite a lot of data.

Physical documents, electronic documents, computer files and data sets, email, and multi-media files are all examples of data that may need to be archived at some point. Each of these “things” needs to be archived to fulfill regulatory, legal, and business requirements. But each type of data requires different archival processing requirements due to its form and nature. What works to archive email isn’t sufficient for archiving database data, and so on. In other words, each type of data may need to command its own technology. This is most certainly true for database data. Why?

Well, data stored in a database is different in many ways from other types of data. The main advantage of using a DBMS is to impose a logical, structured organization on the data. A DBMS provides a layer of independence between the data and the applications that use the data. In other words, applications are insulated from how data is structured and stored. The interface to the data is through the DBMS data language; whether it’s SQL for relational databases, DL/1 for IMS, or even XQuery for XML databases. So, the archival of data from a database requires knowledge of and operation in conjunction with the mechanisms and interfaces of the DBMS.

OK, if we accept that database archiving is a subset of data archiving, we need to define exactly what we mean by the term. Database archiving is the process of removing selected data records from operational databases that aren’t expected to be referenced again and storing them in an archive data store where they can be retrieved, if needed.

Let’s examine each of the major components of that last sentence. We say removing because the data is deleted from the operational database when it’s moved to the data archive. Consider the lifecycle of a “piece” of data: Upon creation, the data is in an operational state. After a period of time the data is still needed for reference, but not to drive operational transactions. After another period of time it’s no longer needed for reference, but is needed for legal purpose: This is the archive state. So when data is archived it can be removed from the operational database.

Next, we say selected records. This is important because we don’t want to archive database data at the file or table level. We need only those specific pieces of data that are no longer needed by the business for operational and reference purposes. This means we must be able to selectively choose particular pieces of related data for archival … not the whole database, not an entire table or segment, and not even a specific row. Instead, all the data that represents a business object is archived at the same time. For example, if we choose to archive ORDER data, we also would want to archive the specifics about each ITEM on that order, as well as CUSTOMER and PRODUCT data. This data likely spans multiple constructs within the database.

The next interesting piece of the definition is this: and storing them (the data) in an archive data store. This implies that the data is stored separately from the operational database and doesn’t require either the DBMS or the operational applications to be useful. Archived data is separate and independent from any and all production systems from which it was moved.

The final component of the definition that warrants clarification is . . . where they can be retrieved, if needed. The whole purpose of archiving is to maintain the data in case it’s required for some purpose. The purpose may be external, in the form of a lawsuit, or to support a governmental regulation, or internal, in the form of a new business practice or requirement. At any rate, the data needs to be readily accessible in a reasonable timeframe without requiring a lot of manual manipulation. Let’s face it, anyone can archive data if they don’t have to worry about how to query it later, right?

So, do you have the technology and resources at your disposal to archive your database data in accordance with legal requirements? Z