Legacy migration refers to converting existing data to a new database or converting an existing application to a new language. When we convert from a legacy data store or other Database Management System (DBMS) to DB2 z/OS, there are some specific considerations. We must handle DB2 z/OS differently if we want superior performance and availability. There are creative solutions to consider and common mistakes to avoid.
Turning data into information requires getting it in a flexible, easy-to-use data store such as DB2 z/OS. This article looks at several different approaches to migrating legacy data to DB2. It explores:
- What has worked well and what hasn’t
- How to improve performance without completely converting the application
- The disastrous results of failing to properly plan and set expectations
- How to set expectations for successful conversions
- Five major types of conversion efforts
- Common pitfalls of conversions
- Techniques for helping performance in database and application migrations.
Let’s begin with a customer experience where the data was stored in a proprietary, highly compressed format.
DB2 and Java Complete Application Re-Write
In our example, the data was stored in a highly compressed, proprietary format. Complex business rules were used to translate the data as stored in the files and into a human-readable display format. For this reason, custom conversion modules were written in both Basic Assembler (BAL) and COBOL to interface with the legacy Application Program Interface (API) and create DB2 load-ready data sets. There were numerous dramatic changes to database tables and overall database design. Carefully constructed change control practices were required to maintain design flexibility and adaptability of the conversion programs.
Having custom-coded programs allowed for making rapid changes to conversion programs, business rules, database tables, and the data. A 400GB source database can be converted into a 5TB DB2 database in less than one day. Making changes to tables, changing the conversion process to match, and rerunning conversion during development has allowed further refinements to the conversion process, enabling a faster conversion time. Even though the database design evolved, the conversion programs were written early in the process. Database changes did impact the conversion programs, but having real data, practicing the conversion, and having full-volume copies of the database for performance testing were extremely valuable.
The legacy record format was broken out into approximately 20 DB2 tables, originally in third normal form. A big challenge in this conversion was dealing with some of the large tables and continuing operations after the conversion. UNION in VIEW SQL was used for several of the largest tables. Special considerations were made to deal with the particulars of UNION in VIEW and DB2. All efforts were made to avoid Non-Partitioned Secondary Indexes (NPSIs) and REORGs, due to the requirements for high availability. In some situations, full availability of tables was required, and an availability strategy was developed for situations that involved NPSIs. This high-availability design involved being able to put certain portions of the application in read-only mode, and employ access path and package switching to avoid using the NPSI while it was re-built during REORGs (more about this in a future z/Journal article). In almost all situations, we had to code availability features into the application.
Given the number of tables, dealing with data changes was challenging. In the legacy application, changes were made to the data file with a record replacement technology using match/ merge logic. This wouldn’t work with a relational database. Imagine the impact of replacing an entire 5TB database every night via an application process. In this case, we spent two years preparing for the new database and change. The data and update processes were analyzed. Proof of Concept (POC) databases and COBOL programs were built to simulate how the application would work. During extensive analysis and testing, we were able to minimize change in the database and meet expectations for change and performance. Without significant research, we couldn’t have come up with a process of applying changes to data that ultimately saved more than 60 percent of operating expense for an extremely large system. POCs, proper planning, and allowing the time for upfront analysis are critical.
In this migration, the application was also rewritten from BAL and COBOL to Java; this provided flexibility in selecting server platforms and operating systems, since Java is portable across z/OS, z/Linux, Windows, and AIX with little or no coding changes. Programmers could write and unit test programs on their PCs before porting them to the mainframe for integration testing. Despite Java’s benefits, we had to do some research to determine the best way for Java to interact with the database. We explored the use of different DB2 data types; fixed length vs. variable length character data types (fixed length using RTRIM worked better for Java), small integers vs. single-byte character code fields (small integers are a primitive data type in Java), and use of NULL value (very compatible with the Java Null value). These all work differently for Java than for other languages. There also was the issue of cost. Java is more expensive to run in terms of CPU on the mainframe (about twice the cost of COBOL). Use of the zAAP engines can mitigate some of this cost, and the portability of Java facilitates offloading some or all the cycles to cheaper platforms.
One issue with Java program portability was dealing with how DB2 behaved differently when using remote vs. local connections. If you design the application with respect to the database and platform, you’ll get the best performance. Moving the application to another server means you’re remotely connecting to DB2, even if it’s on z/Linux. You then have to consider network bandwidth and availability, Distributed Relational Database Architecture (DRDA) settings, and Distributed Data Facility (DDF) settings, such as the CMTSTAT installation parameter, and the effect on memory usage and accounting records. Other issues involved the loss of such efficiencies as sequential detection and index look-aside. When you have an application that must be connected remotely, you need to be ready and willing to make such performance sacrifices and measure them to ensure they don’t hurt too bad.