The DB2 catalog will now be stored in Unicode tables. Before Version 8, there was a restriction within DB2 on joining tables with different Coded Character Set Identifiers (CCSIDs), so that Unicode, EBCDIC, and ASCII tables cannot be accessed in the same SQL statement. Since the catalog will now be in Unicode, but the majority of user data will still be in EBCDIC encoding, this restriction would cause extreme difficulty for many vendors and users who have written applications combining catalog information and user data. Thankfully, in Version 8, these restrictions have been addressed. Tables with different CCSIDs can now be included in one SQL statement. The CCSID of the results will be determined by the default CCSID for the DB2 subsystem.
Using Unicode solves the problem of handling characters uniformly worldwide. It also helps address the lack of compatibility between the mainframe EBCDIC CCSIDs and other Unix and Web workloads. As different EBCDIC or ASCII code pages are inconsistent about the code points for some characters (such as “$” and “|”), the move to Unicode parsing solves this issue. However, one restriction to the use of Unicode deals with authorization IDs. Authorization IDs sent to a DB2 Version 8 server must conform to the security package guidelines (i.e., RACF). Distributed requesters and servers now exchange information about the various CCSIDs they understand and will use Unicode where possible.
DB2 Version 7 included the basic infrastructure for supporting Unicode, including storage and conversion of Unicode-encoded data and coordination with OS/390 Unicode Systems Services. DB2 Version 8 builds on this support by adding several new functionalities, with probably the most important one being Unicode parsing. Unicode parsing of SQL statements affects many portions of DB2:
- Any SQL statement sent for parsing is converted to Unicode UTF-8 before parsing
- The Precompiler writes SQL statements into DBRMs in Unicode
- There are new Unicode hexadecimal string constants
- Long identifiers have been extended from 18 bytes to a new maximum length of 128 bytes. Note that some of the long identifiers will be lengthened to the new 128-byte maximum, but not all names will grow. Also note that the column name maximum is still 30 bytes, as in the entire DB2 family products.
DB2 uses two forms of Unicode called UTF-8 and UTF-16. Each form specifies one set of code points that handles many character sets. UTF-8 uses 1 byte (8 bits) for common characters (code points 0 through 127) that are compatible with ASCII. Code points 128 and beyond each occupy 2, 3, or 4 bytes. DB2 uses UTF-8 for stored data in the Catalog and for SQL statements that are to be parsed. UTF-16 uses 2 bytes (16 bits) for most characters.
Since the Precompiler executes outside of DB2, there is a new Precompiler option that tells it whether or not to allow Version 8 new syntax; i.e., whether SQL statements are to be interpreted as being in EBCDIC or Unicode, whether they must be converted to Unicode, and whether or not the DBRM must be marked as Version 8-dependent. Bind and Rebind are also enhanced to take EBCDIC statements and convert them to Unicode. The additional CPU cost required in doing Unicode conversion is incurred in Bind (both static and dynamic), Precompilation, Execute Immediate, and other facilities. This cost is not expected to be significant.
In addition to parsing SQL statements in Version 8, Unicode parsing of Utility control statements is now entirely in Unicode. Output to the SYSPRINT data set and the MVS console will continue to be in EBCDIC with conversion taking place as required. There is a new utility stored procedure DSNUTILU. It is identical to DSNUTILS except that the inputs are in Unicode and dynamic allocation of data sets is removed.
The DB2 UDB for z/OS Version 8 Open DataBase Connectivity (ODBC) driver has been enhanced to include implicit data conversion of Unicode data bound to non-character or non-graphic columns. It will also now support execution of SQL statements encoded in Unicode. A new ODBC INI keyword, CURRENTAPPENSCH, will allow users to indicate which encoding scheme (Unicode, EBCDIC, or ASCII) the ODBC driver will assume for input/output host variable data, SQL statements, and all character string arguments of the ODBC APIs that are passed by the application. The new keyword will belong to the COMMON stanza. If this keyword is present in the INI file, the ODBC driver will set the Current Application Encoding Scheme special register on behalf of the application to the value specified by flowing the appropriate SET statement to DB2 upon a successful connect. If this keyword is not present, then the driver will assume EBCDIC as the default application encoding scheme. In order to avoid inconsistent data states inside the ODBC driver, users will not be allowed to issue Set Current Application Encoding Scheme as an SQL statement against the ODBC driver. Also, Unicode encoding is not supported for the actual INI file itself. Both keywords and their values must still be specified in EBCDIC.
V8 MIGRATION CONSIDERATIONS
A migration to DB2 V8 is only supported from DB2 V7 and is done in two phases: New Release migration and New Function migration.