You’ve probably heard a lot about IBM’s pureXML feature of DB2 and seen some examples of data stored and retrieved from XML columns. pureXML is the native XML storage and retrieval feature of DB2 that lets an XML document be stored in a format the database engine can manipulate. The ability to store XML natively in DB2 is vital, and DB2 is feature-rich in managing and manipulating XML. From a business perspective, however, there are several choices that need to be made when managing XML data with DB2. Which choice you make depends on your objectives, the perceived costs, the personnel and skills available, and the amount of time dedicated to establishing and maintaining the database housing the data.
When choosing to store XML in DB2, you have three choices. You can store the:
- Data in normal DB2 tables (rows and columns) and then shred or build the XML on the way in and out of the database
- Entire XML document as a string Variable Character (VARCHAR) or Character Large Object (CLOB)
- XML natively as an XML data type.
Once we understand how the XML data type is stored in DB2, and how with each option we can move the data in and out of DB2 and/or manipulate that data, then we can decide which storage option is best for a particular application or situation.
XML Usage Benefits and Concerns
With XML, you can represent multiple elements as a single object, such as a purchase order; with relational data, this may require several tables and normalization. An insert of an entire purchase order in XML would involve a single column and retrieval is from one column. Inserts with relational data would involve many columns (with consistent format) and queries for retrieval would require joins.
XML also offers a better data model for many new applications. It provides added flexibility, schema versatility, and is stored in a hierarchical nature. You can store both semi-structured or unstructured data. The data is inherently hierarchical, and may be nested or contain complex data. For data that’s changing or evolving, schemas can be supported easily. XML data is self-describing in that the data is mixed together with metadata (tags). You can query data and metadata simultaneously. The XML data is extensible and isn’t regular and homogeneous like tables, so every document can have a different structure. An element can have different data types in different documents.
Not all data should be stored in DB2 as XML, as there can be some overhead to XML storage in DB2 depending on your needs. Storing data in a relational database relationally is still the best and most flexible way to store data. With a proper relational design, you can minimize data anomalies, improve performance, reduce the cost of changing data, and properly store metadata with your user data. Most of this is lost with XML storage. There are additional storage and processing costs associated with XML, and the XML itself is more voluminous than storing the data relationally due partly to all the tags and namespace declarations.
Why Store XML in DB2?
Benefits of managing XML data with DB2 include lower development costs, reduced code and development complexity, improved developer productivity, faster solution development, and lower costs. DB2 XML support also provides for greater business agility because you can easily accommodate changes to data and schemas. You can also update applications rapidly, reduce maintenance costs, respond quickly to dynamic conditions, and realize value earlier.
SQL or XQuery (not covered here) can be used to retrieve data; you can join XML data to relational data. You can quickly and efficiently search XML documents using a proven search optimization engine of a mature database. You can store and manage both relational and XML using an existing technology infrastructure and skills. XML data can be inserted into a column and optionally validated against a schema. User-defined indexes on XML data facilitate searches. The XML data can be extracted and decomposed into relational data. You can construct XML documents from both relational and XML data.
Starting with DB2 9, there’s support for native XML hierarchical storage with no shredding and no CLOBs or Binary Large Objects (BLOBs) required. No parsing of XML data at query run-time is required and it’s fully integrated with relational processing, providing for seamless queries accessing various types of data simultaneously. DB2 XML support provides schema flexibility so changes don’t force unloading or reloading of data.