The XML type modifier is specified in a CREATE TABLE or ALTER TABLE statement and associates an XML column with an XML schema. Figure 8 shows an XML type modifier being applied in the Data Definition Language (DDL) for a table definition that contains an XML column.
Manual XML validation is enabled via the DSN_XMLVALIDATE built-in function. The input to the function would include an XML string expression and a reference to an XML schema. Using this function with a registered schema provides the necessary flexibility to validate the XML document. You don’t necessarily need a type modifier for the column in the table. Figure 9 shows manual validation.
All XML data stored in XML columns must be well-formed. There’s no difference in stored XML data whether you validate or not, unless there are default values defined in the XML schema. You must consider the costs of validation during application design; validation is two to three times as expensive in CPU utilization as well-formed parsing only. However, it’s eligible for processing with the System z Integrated Information Processor (zIIP) and System z Application Assist Processor (zAAP) specialty engines. Validation increases CPU time for inserts and reduces throughput. Use schema validation, if needed, but avoid schema validation for highest possible insert performance.
Setting a Direction
The method you choose for storing data (i.e., rows and columns, character string, XML data type) really depends on several factors. The most significant factors are where you want the XML to be processed and your staff’s skills.
If the data doesn’t have to be shared with any other applications in the enterprise, and it doesn’t need to be manipulated or maintained in any way, then perhaps storing it as a string is sufficient. It requires the least amount of knowledge for your DBAs and will use the lowest amount of mainframe CPU. If the data must be manipulated or shared, then storing it as a string may become too expensive. The data will have to be interpreted and analyzed before being shared in the enterprise. This could result in many more trips across databases and applications, and can seriously impact network traffic because XML strings can become quite large. Moreover, if an XML schema or dictionary must be maintained, it must be maintained outside the database. This means you will need personnel capable of maintaining this information; perhaps keeping that information integrated with the enterprise model and personnel. Backup and recovery plans also must be coordinated between the database and whatever repository contains the dictionary or XML schema. Finally, with no data definitions stored in DB2, the XML data is virtually useless outside the application responsible for it. Not very flexible!
If you’re sharing the data with other applications and users across the enterprise, then it may be beneficial to store the data in rows and columns. This can be especially true for data in transaction environments that change frequently. This provides a high level of flexibility and the potential for high performance. The data definitions are stored in the DB2 system catalog; the database engine performs validation. A myriad of indexes can be built to support all sorts of data access paths. A wealth of built-in functions can generate XML from the DB2 tables. DBAs know how to deal with “normal” DB2 tables, and there would be little or no XML skills required for the DBA beyond knowledge of the built-in functions. This option can, however, be more performance-challenged than the other options when all the data is read all the time.
Storing the data natively as pureXML affords the best of both worlds. It offers the ability to share the data with other applications and databases, indexing capabilities, and the option for built-in element definition and validation. All this can happen inside the database engine so the data and definitions can be backed up and restored together. Powerful functionality can manipulate the data in many ways to afford compatibility across applications and users. To take advantage of all this functionality, your DBA staff must be skilled in the areas of storage, database maintenance, XML, and XML schema construction and management. Performance could be better than rows and columns when entire documents are always required, but could degrade as more manipulation affects performance and data is shared in pieces across many applications.
XML is here to stay and is becoming increasingly popular. The support and capabilities in DB2 for z/OS are providing us with ways to take advantage of XML and provide support for applications requiring it. IBM’s pureXML capabilities offer a secure, reliable, and familiar way to store and retrieve data. Understanding the benefits and costs of storing XML data in various forms in DB2 can help determine the best way to store and use this data within your organization.