Apr 1 ’06
Legacy Migrations: Experiences of the Industry
Legacy migration refers to converting existing data to a new database or converting an existing application to a new language. When we convert from a legacy data store or other Database Management System (DBMS) to DB2 z/OS, there are some specific considerations. We must handle DB2 z/OS differently if we want superior performance and availability. There are creative solutions to consider and common mistakes to avoid.
Turning data into information requires getting it in a flexible, easy-to-use data store such as DB2 z/OS. This article looks at several different approaches to migrating legacy data to DB2. It explores:
- What has worked well and what hasn’t
- How to improve performance without completely converting the application
- The disastrous results of failing to properly plan and set expectations
- How to set expectations for successful conversions
- Five major types of conversion efforts
- Common pitfalls of conversions
- Techniques for helping performance in database and application migrations.
Let’s begin with a customer experience where the data was stored in a proprietary, highly compressed format.
DB2 and Java Complete Application Re-Write
In our example, the data was stored in a highly compressed, proprietary format. Complex business rules were used to translate the data as stored in the files and into a human-readable display format. For this reason, custom conversion modules were written in both Basic Assembler (BAL) and COBOL to interface with the legacy Application Program Interface (API) and create DB2 load-ready data sets. There were numerous dramatic changes to database tables and overall database design. Carefully constructed change control practices were required to maintain design flexibility and adaptability of the conversion programs.
Having custom-coded programs allowed for making rapid changes to conversion programs, business rules, database tables, and the data. A 400GB source database can be converted into a 5TB DB2 database in less than one day. Making changes to tables, changing the conversion process to match, and rerunning conversion during development has allowed further refinements to the conversion process, enabling a faster conversion time. Even though the database design evolved, the conversion programs were written early in the process. Database changes did impact the conversion programs, but having real data, practicing the conversion, and having full-volume copies of the database for performance testing were extremely valuable.
The legacy record format was broken out into approximately 20 DB2 tables, originally in third normal form. A big challenge in this conversion was dealing with some of the large tables and continuing operations after the conversion. UNION in VIEW SQL was used for several of the largest tables. Special considerations were made to deal with the particulars of UNION in VIEW and DB2. All efforts were made to avoid Non-Partitioned Secondary Indexes (NPSIs) and REORGs, due to the requirements for high availability. In some situations, full availability of tables was required, and an availability strategy was developed for situations that involved NPSIs. This high-availability design involved being able to put certain portions of the application in read-only mode, and employ access path and package switching to avoid using the NPSI while it was re-built during REORGs (more about this in a future z/Journal article). In almost all situations, we had to code availability features into the application.
Given the number of tables, dealing with data changes was challenging. In the legacy application, changes were made to the data file with a record replacement technology using match/ merge logic. This wouldn’t work with a relational database. Imagine the impact of replacing an entire 5TB database every night via an application process. In this case, we spent two years preparing for the new database and change. The data and update processes were analyzed. Proof of Concept (POC) databases and COBOL programs were built to simulate how the application would work. During extensive analysis and testing, we were able to minimize change in the database and meet expectations for change and performance. Without significant research, we couldn’t have come up with a process of applying changes to data that ultimately saved more than 60 percent of operating expense for an extremely large system. POCs, proper planning, and allowing the time for upfront analysis are critical.
In this migration, the application was also rewritten from BAL and COBOL to Java; this provided flexibility in selecting server platforms and operating systems, since Java is portable across z/OS, z/Linux, Windows, and AIX with little or no coding changes. Programmers could write and unit test programs on their PCs before porting them to the mainframe for integration testing. Despite Java’s benefits, we had to do some research to determine the best way for Java to interact with the database. We explored the use of different DB2 data types; fixed length vs. variable length character data types (fixed length using RTRIM worked better for Java), small integers vs. single-byte character code fields (small integers are a primitive data type in Java), and use of NULL value (very compatible with the Java Null value). These all work differently for Java than for other languages. There also was the issue of cost. Java is more expensive to run in terms of CPU on the mainframe (about twice the cost of COBOL). Use of the zAAP engines can mitigate some of this cost, and the portability of Java facilitates offloading some or all the cycles to cheaper platforms.
One issue with Java program portability was dealing with how DB2 behaved differently when using remote vs. local connections. If you design the application with respect to the database and platform, you’ll get the best performance. Moving the application to another server means you’re remotely connecting to DB2, even if it’s on z/Linux. You then have to consider network bandwidth and availability, Distributed Relational Database Architecture (DRDA) settings, and Distributed Data Facility (DDF) settings, such as the CMTSTAT installation parameter, and the effect on memory usage and accounting records. Other issues involved the loss of such efficiencies as sequential detection and index look-aside. When you have an application that must be connected remotely, you need to be ready and willing to make such performance sacrifices and measure them to ensure they don’t hurt too bad.
DB2 Application With “Hooks” to Legacy
In a second customer situation, there were four large BDAM (Basic Direct Access Method) systems. The records were stored in a highly compressed proprietary format. Based on the size and complexity of each source system, and subsequent DB2 database complexities, two conversion strategies were created. The first used the efficient legacy API to extract data into a display format, and then convert that data in a single pass into a set of DB2 load files. The second employed a hand-coded set of metafiles as input to a REXX and COBOL custom process that generated conversion programs. The REXX code generation process used artifacts created during the modeling process as input, as well as the DB2 catalog and legacy copy books. This allowed the generation process to adapt to any changes. This automated process handled most of the simple conversion business rules. For more complex rules, custom COBOL libraries were created, and source code was merged with the generated code. This automated process enabled flexibility of the database design during development, and a reusable process for the most complicated files. Having the development of the conversion in-house gave the programmers intimate knowledge of the conversion programs. Once the database tables reached a point of stability, the conversion generation process was abandoned and the generated programs maintained by hand.
The databases contained sensitive, strategic data. We began the process with a two-year POC that was performed in two phases. Because the legacy data was in a proprietary compressed format, it was difficult to evaluate the data. A sample of each file was converted into a simple DB2 database that matched the legacy data format. All data was stored as character display, and an extensive series of queries were executed to analyze the condition of the data. This information helped reveal undocumented codes, invalid data, and the frequency of relationships in the data. This data evaluation phase was then followed by a POC to identify and simulate application functionality. This included process simulations using both SQL scripts and COBOL programs. Various tests were conducted to measure performance, application compatibility, and data integrity. Although much of this code was thrown away after the POC was completed, the information gathered was used to set physical database design and application API program logic. Measurements taken from SQL scripts and POC program execution were used to obtain initial CPU capacity estimates and application elapsed time estimates. All these measurements were presented to management so they were well aware of the performance impact of moving to DB2 and could plan accordingly and set realistic expectations. A list of issues was developed from the POC as input to the physical database model and application API development.
We weren’t allowed to change any application programs. A data access API already existed, and we were to re-write that API. This impacted DB2 physical database design decisions. Although it was desirable to create a normalized database design for the future, we had to accommodate a “read everything” legacy approach. Through modest de-normalization, indicator columns, and by taking the time to study the legacy API, we were able to meet the future design direction and performance objectives. We carefully studied the behavior of the legacy API. By understanding how it worked, we minimized DB2 access in some situations and dramatically improved performance (four times performance improvement). This was all verified in the POC so there were no implementation surprises.
We’ve seen a similar kind of problem in cases where IMS conversions are done one segment to one table. This creates poor-performing SQL that’s trying to pass the data back the way the application expects it, but since the data isn’t in any sort of normalized format, the SQL gets overly complicated.
During migrations where legacy applications won’t be changing, it’s sometimes beneficial to affirm the migration will occur in phases. The first phase will accommodate the past and prepare for the future. However, a later phase may involve another conversion to a more normalized design when legacy access is abandoned. Planning for this and expecting less than desirable performance initially will set reasonable expectations!
DB2 Application I/O Layer Access to Legacy
There are many conversion efforts under way to convert to DB2 from VSAM, but without any changes to the application. While it’s always preferred to not upset the user community with new application code, we also must set expectations accordingly and be prepared for less than optimal performance. Fortunately, there are options to help achieve decent performance, but we must evaluate all options to ensure we don’t compromise integrity of the new DB2 relational database.
If you’re moving from VSAM to DB2 with no application changes, you must be willing to take a performance hit. Going to DB2 provides a higher level of availability, enhanced backup and recovery, and options for built-in data integrity. However, with these advantages comes a longer code path and reduced performance. In addition, if the move is from a highly denormalized recordbased system to a normalized DB2 database, there’ll be an I/O impact, too. Spending time on a significant amount of data analysis can benefit your database design and API decisions. Otherwise, be prepared to take a performance hit.
In one client situation, a database design based on a third normal form was developed. This database was deployed based on the model and required sophisticated logic to translate from the legacy record format to the DB2 database. This impacted both the conversion process and the API. The decision was made to use an Extract, Transform, and Load (ETL) tool to create the conversion programs. Although ETL tools are extremely useful for moving transaction data to warehouses and such, they’re not always the best choice for a one-time conversion process. The complex conversion rules and B-level conversion code generated by the tool resulted in an extremely poor-performing conversion process. In addition, this complexity made the conversion code so confusing that any database changes tend to code around the conversion programs rather than change them.
Moving to a normalized database with no detailed analysis of the application layer leads to a highly inefficient API. The “read everything” philosophy, to a fully normalized database, was cumbersome. A decision was made to denormalize the database. This impacted the conversion, future applications, and the wisdom of the migration in the first place. Conducting and acting on proper research is better, though it may take longer.
DB2 CICS Migrating to DB2 WebSphere
One client came up with a creative way to convert a CICS-based legacy application on an older Amdahl machine that was using DB2 to a new WebSphere-based application running on an IBM z800. Determining how to convert a legacy application from CICS to WebSphere without a “big switch” was the biggest challenge. What about using DB2 z/OS data sharing in the Parallel Sysplex environment? That environment lets you run DB2 on a variety of different hardware platforms, while still being able to share data among the DB2 subsystems. It does so by using DB2 data sharing and linking the machines together via the coupling facility technology. This way, the same database can be shared by applications running on two different machines.
The database was designed to receive orders from queues that correspond to certain partitions that were key limited in the same way as the queues. Queries, inserts, and updates would all be involved. The idea was to convert a set of related transactions together from COBOL/CICS on the original subsystem on the older Amdahl to Java/WebSphere on the new z/800. By using DB2 data sharing, only one copy of the database was needed because both machines could be in the same data-sharing group and allowed to share the same data via the coupling facility. Transactions could then be moved from the legacy interface to the new interface in a controlled fashion and while the performance is monitored. This allowed for a smooth, controlled transition from old to new without an outage. The transactions were first carefully studied and a plan was developed as to which type of transactions would first be moved and converted. A performance testing plan was also employed as part of this creative solution to bring the application forward.
DB2 z/OS Migration From Other Databases
What may work well on other platforms may not work well in a z/OS environment. In many one-to-one conversions, performance expectations weren’t just lower than expected; work was simply not accomplished in the allowed timeframes. When re-centralizing data onto the z/OS platform or simply migrating an application to the z/OS platform, recall the uniqueness of the DB2 z/OS environment, take advantage of all it has to offer, and account for the differences.
A common mistake people make involves their assumptions about how stored procedures work on other platforms. On database servers such as Sybase, Microsoft SQL Server and Oracle, you often see applications designed with a stored procedure for every SQL statement. We saw one instance where there were 200 tables and four stored procedures for each table (one for SELECT, one for INSERT, one for DELETE, and one for UPDATE) for a total of 800 stored procedures. This is a black box I/O module design and will perform horribly on the DB2 z/OS platform. This type of design needs to be re-evaluated and SQL placed into the application to make use of the SQL language.
In the case of stored procedure execution on z/OS, DB2 has to go cross-memory to call the stored procedure in another address space. Subsequently, the stored procedure goes cross-memory to execute SQL statements from what is another allied address space. In addition to these cross-memory calls, the operating system has to manage the stored procedure address spaces, and there are several z/OS system settings (Workload Manager [WLM] policies, number of TCBs, etc.) that can impact this. If you’re using Java, then you have issues with starting virtual machines within the address spaces. In addition, Java programs consume about twice the CPU as other stored procedure languages so that can be a factor, too. This is all exaggerated by techniques used to write the stored procedures.
Programmers who are used to coding in Sybase or Oracle tend to write highly inefficient DB2 z/OS stored procedures. That’s because the Sybase and Oracle procedures run within the database server. People tend to use the procedures extensively as simple extensions to their applications. However, with DB2, programs work quite differently. If you have only single SQL statements in your DB2 stored procedures, then you can assume a significant performance detriment. If you’re calling stored procedures from other stored procedures, then your performance detriment is amplified. In DB2 for LUW Version 8.2, the SQL stored procedures now run within the database server as run-time structures. The same will hopefully soon be true for a future release of DB2 for z/OS.
We’ve also seen situations in which the DBMS being migrated from offers functions that don’t exist on DB2 for z/OS. In these situations, all is not lost. DB2 for z/OS is extremely flexible with its User-Defined Functions (UDFs). We’ve been able to solve all these incompatibilities by coding our own SQL and external UDFs. It takes more programming effort, but that’s easier than an application re-design (assuming that’s even possible).
Remember, too, that SQL, data types (e.g., VARCHARs) and other items also work quite differently on DB2 z/OS. Best performance will be achieved by converting or migrating with an understanding of the uniqueness and capabilities of the z/OS platform.
With any type of conversion or migration, extensive testing on various designs needs to be performed to ensure the data structures are as optimal as possible for the application, and the integrity and future use of the database aren’t sacrificed. It’s also important that your conversion methodologies don’t limit your ability to make future database changes for performance reasons. There will be some performance degradation during most legacy conversions, but patching the problems without thought as to how the data is going to be used in the future can make the migration a waste of time and resources. There are clever solutions to many problems and you should take time to explore them.