Wachovia Achieves High Availability With CICS

3 Pages

The CICS systems programmers must work closely with the MVS systems programmers to properly configure and manage the SMSVSAM cache and structure sizes, since all the careful tuning of LSR buffer pools previously done by the CICS team now moves into the SMSVSAM area of expertise with VSAM RLS.

When data moves to a shared environment, whether it’s DB2 or VSAM, syncpoints in online transactions become critical. If a transaction does too many updates without taking a syncpoint or doesn’t take any syncpoints, locks will be held until the transaction ends, and this will defeat the goal of higher availability. A lesson learned at Wachovia was that applications need to take a close look at their application maintenance transactions and ensure they’re taking syncpoints. Many application maintenance functions, such as online purges or resynchronization transactions, were never intended to delete or update hundreds of thousands of records when they were first designed and may be missing syncpoint logic. When these transactions run in a non-shared environment before running in a parallel sysplex, they run OK, impacting only their own application while the maintenance transaction is running. However, when these transactions run in a parallel sysplex environment with data sharing, the lock structures they’re using are shared by the entire sysplex and will impact the entire sysplex, not just the single application.

The lack of proper syncpointing can have serious consequences across the entire sysplex. Transactions with missing syncpoints can slip through the testing phase, since test data volumes aren’t as large as production, so it’s a good idea to point out their importance to application support teams in the new parallel sysplex world.

Sharing Temporary Storage

Another type of shared data that must be addressed is shared TSQs. Since shared TSQs reside in the coupling facility, the initial size of the coupling facility structure must be carefully determined.

It’s better to move into production with a large temporary storage CF structure and decrease the size of the structure through tuning efforts after all applications are running in the cloned environment than to fill up the temp storage CF structure in the coupling facility. When this happens, every CICS region in the sysplex that’s using that shared temp storage structure stops until either some of the space is freed up or the structure size is increased. The CICS shared temp storage server has statistics that show use of the CF structure. These statistics can be monitored after each application moves to the new parallel sysplex environment to determine the new CF structure utilization. Figure 4 shows an example of the CICS shared temp storage server statistics.

When CICS transactions run in a single region, TSQs reside in memory in a single region or in a separate temporary storage dataset defined to the region. With parallel sysplex, all the queues that need to be shared must reside in the coupling facility, which means users of shared TSQs must play well together and effectively clean up. A utility program should be written to clean up queues that haven’t been accessed in 24 hours to ensure the temp storage coupling facility structure stays in order. Since TSQs will remain in the CF structure across cold starts, it may be necessary to occasionally delete and define the shared temp storage CF structure or use the SET XCF FORCE command to clean up any orphaned queues, but this requires an outage of the entire CICS sysplex environment that shares the same TSQ pool.

Testing and Implementation

The most important part of migrating to a CICS parallel sysplex environment from an application perspective is creating good test scripts. It takes time and testing tool expertise to create good, reusable scripts, but once they’re created, they’re valuable for this and future projects. The testing tool can drive a reasonable level of transaction volume through the cloned test CICS environment to validate the application. At Wachovia, the scripts were used to run many different testing scenarios with the application groups and various technical groups watching for abnormalities. Different  subsystems, such as CICS, MQ and DB2, were brought down to watch the application’s behavior. Even complete Logical Partitions (LPARs) were shut down while the scripts were running to ensure the application continued to run smoothly. No application moved to the new cloned environment until all groups felt comfortable the application was ready.

Since the test and production cloned regions were built as brand new regions at Wachovia, the production CICS application setup could occur ahead of time and copies of the CICS and CICSPlex objects were migrated from the test parallel sysplex environment. Each application was switched over to the new cloned CICS regions by shutting down the MQ triggers and bridges and TCP/IP listeners in the old single CICS region and starting them up in the new regions. Green-screen transactions were switched over to the new regions from the TORs by removing the transactions from the TOR so that CICSPlex SM would route them to the healthiest cloned region by setting the DTRTRAN=CRTX in the TOR’s startup parameters.

Benefits

Since the production implementation of parallel sysplex occurred one application at a time and in many stages, the move to CICS high availability was predominantly uneventful from a user perspective. As the applications moved from the old CICS regions over to the new cloned parallel sysplex regions, the CICS commands that caused affinities were removed. Many of the same CICS commands that caused the affinities also needed to be removed to exploit thread-safe transactions and Open Transaction Environment (OTE).

The CICS environment is now staged to move to thread-safe and reap the performance benefits. The new configuration is less complex for operations and applications. Troubleshooting and maintaining applications is easier for CICS systems programmers. Most important, application regions may have a slowdown or problem, but now that can happen without impacting application availability. The CICS transactions that are at the heart of the daily processing can be cleaned up and get a new, more resilient life so the CICS applications can be positioned for moving into the future. Z

3 Pages