Database archiving has been defined as the process of removing selected data records from operational databases that aren’t expected to be referenced again, and storing them in an archive data store where they can be securely retained and retrieved as needed, and discarded at the end of their legal life. This article examines why database archiving is important and highlights the facilities and features organizations should expect from their database archive solution.
All organizations are storing more data in their databases. Gartner assesses the rate of growth at roughly 125 percent. In addition, the types of data that can be stored in a database are increasing. Originally, databases stored structured data such as characters, numbers, dates, and times, but newer versions of mainframe databases now store unstructured data—images, videos, etc. —just as easily. DB2 9. 1 has PureXML, which is IBM’s name for its XML facility that lets you store XML documents so that the hierarchical information is retained and the XML document itself can be queried.
Many people confuse archives and backups. They consider both methods of storing data from a database with an ability to restore that data when needed. The truth is they are completely different.
To be fair, however, both do take copies of data stored in a database and can be used to restore data to the database, but that’s where the similarity ends.
Traditionally, backups are to tape, and tapes can be stored at the same site or at an offsite backup site. In the event of a catastrophe occurring at the main site, data can be loaded onto the backup mainframe and work can continue. Sometimes, the backup site runs in parallel with the original site, allowing hot-swapping between the two. At all times, the backup data can be a copy of the live data. If tape technology is used, after a week or so, the backup tapes can be overwritten and new backup data can be stored on them.
Archived data is different from backup data because the original data in the database is deleted. The archive copy of the data becomes the only copy. This makes it important for the data in the archive to be available and accessible for as long as necessary. What makes archiving so important now is that this length of time the archive data must be available to be accessed is steadily increasing; we could be talking about 25 to 30 years for some industries, and perhaps even longer.
You may ask, “Why do I need to archive?” There are three answers: for performance, business, and compliance reasons.
One reason for archiving data is to get it out of the database so your database performs optimally. Leaving the data in the database means that backups and any restores take much longer. It also means that more CPU is required when users perform standard database activities such as updating data. Because other housekeeping activities, such as REORGs, also take longer to perform, archiving old data can yield immediate performance gains.
An August 2007 survey of IT executives conducted by eMedia on behalf of BridgeHead Software found that 59 percent of respondents complained that the volume of data they’re forced to back up is disrupting business operations or will do so eventually. And 93 percent said their routine backup volumes are continuing to increase. The survey identified several advantages to reducing the volume of data routinely backed up:
• Less IT time devoted to backup and other business class processes (69 percent)
• A reduction in the impact of backup and replication on network utilization and capacity (60 percent)
• A reduction in disk resources devoted to data snapshots, replication, and mirroring (58 percent)
• Reduced disruption to the live application environment (45 percent).