DB2 for z/OS Version 8: Improved Unicode Conversion

2 Pages

DB2 Universal Database for z/OS Version 8 vastly improves DB2’s support for Unicode and lifts many of the limitations of V7. For example, when DB2 has been migrated to V8 New-Function Mode (NFM), the DB2 catalog is converted to Unicode. In addition, all the SQL statements must be parsed as Unicode, even in CM, while DB2 V8 NFM lets you combine different encoding schemes, Coded Character Set Identifiers (CCSIDs), in the same SQL statement. So DB2 now must perform  before, which can impact performance. This article discusses the major improvements that have occurred in converting to and from Unicode. These improvements come from three major sources:

  • Optimization inside V8 (V8 minor conversions)
  • Improvements in the z/OS conversion services
  • Improvements in the zSeries processor hardware.

Background

V8 supports three types of encoding schemes: EBCDIC, which is used by mainframe environments; ASCII, which is used by Unix and Windows environments; and more recently, Unicode. We identify the character set for a language by a numeric CCSID. A CCSID encoding scheme consists of a Single Byte Character Set (SBCS), and optionally a Double-Byte Character Set (DBCS) along with a mixed character set. Over time, each geography developed its own EBCDIC or ASCII encoding schemes. The variety of CCSIDs made it difficult for multi-national companies to combine data from different sources.

Unicode was developed to solve the problem. By storing data in a single Unicode CCSID, text data from different languages in different countries can be easily stored and managed. DB2 V5 introduced the ability to create ASCII databases, table spaces, and tables via the CCSID ASCII clause. This gave you the choice to store data in either EBCDIC or ASCII, and DB2 would convert data between these encoding schemes where appropriate. DB2 V7 built upon that support, introducing support for Unicode objects.

However, where EBCDIC and ASCII conversions occur within DB2, V7 always relied on OS/390 conversion services to handle Unicode conversion. The performance of Unicode conversion in V7, with older levels of the operating system and zSeries hardware, was less than optimal. V8 enhances support for Unicode, and simultaneously takes advantage of the faster conversion provided by the current versions of z/OS services and zSeries processors.

When storing and retrieving data, DB2 converts the data from one CCSID to another CCSID where necessary. Obviously, it’s preferable that no conversion be carried out because conversion impacts performance. However, IBM has done considerable work to improve conversion performance on zSeries. CCSID conversion occurs whenever there’s a mismatch between the CCSID of a source and target string, such as between a host variable and its associated column. This conversion support in DB2 has existed since DB2 began to support client/server connections in V2.3 between different EBCDIC CCSIDs. DB2 V5 began to support ASCII tables and ASCII host variables, so CCSID conversion was expanded to include conversion between EBCDIC and ASCII.

To perform these conversions (not involving Unicode), DB2 uses a translate table that’s stored in SYSIBM. SYSSTRINGS. We refer to this type of CCSID conversion as SYSSTRINGS conversions. The specific EBCDIC and ASCII CCSIDs used to store data by a DB2 system must be specified in the DSNHDECP data-only module, and they shouldn’t be changed. Starting in V8, they can’t be changed. Within a single DB2 system, all EBCDIC tables have the same EBCDIC CCSIDs and all ASCII tables have the same ASCII CCSIDs. In addition, numeric columns and date and time columns are stored as binary fields, but they may involve CCSID conversion when loading or unloading such fields using the EXTERNAL attribute. However, the utilities always apply the EXTERNAL attribute to date and time fields.

V7 introduced the ability for user tables to be defined as Unicode by specifying “CCSID UNICODE” on the database, table space or table definition. Within a Unicode table, CHAR and VARCHAR columns are stored in UTF-8 format. GRAPHIC and VARGRAHIC columns are stored in a Unicode table as UTF-16.

Any CCSID conversion involving Unicode is called Unicode conversion and is necessary whenever there’s a mismatch between the CCSID of a source and target string. V7 uses the z/OS conversion services exclusively for all Unicode conversions, which can substantially affect the CPU time. In V7, the DB2 Catalog remains in EBCDIC and all SQL statement parsing still occurs in EBCDIC. So, for applications such as the Universal Java Driver that provide an SQL statement in Unicode, DB2 must convert the statement to EBCDIC. Similarly, any incoming metadata, such as package names and authorization IDs, may also require conversion by DB2.

One limitation of V7 is that you can’t join two tables that use different CCSIDs. Another limitation is performance. V7 completely relies on the z/OS Unicode conversion service for Unicode conversions, and prior to z/OS 1.4, the conversion service was quite slow, especially on pre-z900 processors. V8 brings many more enhancements in DB2’s support for Unicode as well as removing many of the limitations that were in V7.

2 Pages