DB2 is an extremely efficient and scalable relational database, capable of processing thousands of transactions per second through a single DB2 subsystem. By taking advantage of data sharing groups, DB2 can scale, almost linearly, across multiple logical partitions (LPARs), enabling applications to exploit the parallelism inherent in a z/OS sysplex. This allows DB2 applications like those designed to run in CICS MRO or a WebSphere Application Server horizontal cluster to realize a significant increase in throughput due to their ability to load balance across multiple LPARs. But what about batch applications? Is it possible for them to exploit the parallelism in a z/OS sysplex and see an increase in throughput as well?
If your IT organization has evolved from pure legacy mainframe operations, you might have a mission-critical DB2 batch application that’s long-running, one that might be pushing you well outside your batch window. You’ve squeezed all the performance you can out of it; you’ve tuned your SQL servers, added more CPU and memory and tried data-in-memory techniques, DASD cache optimization, database partitioning and I/O subsystem tuning, and you’ve reached the point of diminishing return on your efforts. Of course, nothing is wrong with your application; it’s well-written and functionally works well. Nothing is wrong with DB2, either. Chances are, you’ve reached the limits of running in a single DB2 subsystem or the reality of long DB2 I/O wait times relative to your CPU processing. The good news is that, with some creativity, your DB2 batch programs can now address both these challenges by exploiting parallel processing in a sysplex.
In most legacy batch applications, the biggest bottleneck with regard to throughput is I/O operations. In high-volume, I/O-intensive applications, it can account for upward of 70 percent of overall processing time and can be the difference between meeting or missing your batch window. It’s the old “hurry up and wait” adage where your CPU operations complete relatively quickly only to sit for a very long time, waiting for I/O to complete. A good analogy for this behavior is a four-lane highway that suddenly turns into a one-lane road.
Legacy batch applications are usually designed around a serial, monolithic processing model that reads data from and writes data to data stores managed by the local LPAR. The problem with this design is that both processing and I/O operations are limited by the resources available locally on the server. And while we can scale vertically using various techniques, eventually, we reach certain physical limitations that make it impossible for us to optimize I/O beyond a certain point. This is where parallel I/O processing can significantly increase your throughput.
Today’s state-of-the-art applications circumvent both processing and I/O bottlenecks by using modern technologies such as message-oriented middleware (MOM), clustering and shared storage to spread their load across multiple servers. This allows them to scale both vertically and horizontally as demand increases. OK, so you’re saying to yourself, “That’s great, but that doesn’t help me with my long-running batch job that I have to keep throwing resources at.” Well, as it turns out, it does. With a little bit of thinking outside the box, you can use these same modern technologies to distribute the I/O load of your legacy batch application across multiple LPARs and drastically reduce overall processing time in batch applications.
Solution and Implementation
Just before a DB2 insert or update operation is executed in an application, the host variables representing the columns being acted on are populated, usually in a contiguous storage area such as the DCLGEN. This area of storage can then be used as the input buffer to an MQPUT API and drop its contents as a message in a queue. This, in turn, can trigger and drive a process that executes the actual DB2 UPDATE or INSERT statements. Since WebSphere MQ is location-transparent, you can send the DB2 data to be updated to a queue located on any LPAR in the sysplex, including the local one. Once you implement this piece of code in your application, you can use the infrastructure to load balance and scale the DB2 load across any number of LPARs as volume dictates. Figure 1 shows an implementation of this solution.
Of course, it would be unusual for a mission-critical batch application to be updating just one table, so this solution would have to be modified to allow multiple tables to be updated in parallel across multiple LPARs. There are several approaches you can take. Rather than having just one process updating all the tables, you could have multithreaded updates where multiple processes are updating multiple groups of tables in parallel. For each such group of tables, you would combine the data for all tables in the group in one MQ message that’s then sent to a specific cluster queue defined on each LPAR. You would then have multiple queues driving multiple processes on each LPAR. If you take this latter approach, one option is to use stored procedures to perform the actual updates rather than doing them in the message-driven processes. In this case, you would use the “EXEC SQL CALL” statement to pass the data to and execute the stored procedures. For a single LPAR, this updated implementation of the solution would look like Figure 2.
The solution implemented so far provides both redundancy and scalability; however, in the event that one of your queue managers or LPARs should fail, it’s possible you would be left with marooned messages on that LPAR that wouldn’t get processed until the queue manager is restored. Since this is a batch application where you would want all records or transactions to be processed by end-of-job, you would want to ensure that it remains the same with this solution. This added requirement would call for the installation of a WebSphere MQ queue sharing group. You would still load-balance across the selected LPARs; however, messages would be directed to a shared queue in the coupling facility through each LPAR, rather than a local queue on each. Now, even with the failure of an LPAR or queue manager, all messages can be drained and processed from the shared queue.
So why would you use this solution instead of, say, a data-in-memory approach? First, since this solution involved infrastructure changes—a DB2 data sharing group, WebSphere MQ and possibly WebSphere MQ queue sharing group—you wouldn’t choose it unless you already have invested in these products. Second, this solution will yield good ROI only if the application updates a large volume of data in DB2 and has a very high write-to-read ratio; for example, an application that reads transactions from a file and updates DB2. On the other hand, the data-in-memory technique works best in applications that have a very high read-towrite ratio or those with repeatable reads. What makes this solution so effective and efficient is that it directly addresses the two main causes for low throughput in highvolume DB2 batch applications: single DB2 subsystem limitation and long I/O waits. This is assuming, of course, you’ve already taken care of the usual performance suspects mentioned earlier. It does this by using asynchronous messaging to break the application into two logical parts—processing and DB2 updates—and allows them to run separately and independently without impacting each other.
So now you have your existing application running in one LPAR, processing as usual. However, instead of updating DB2 directly, it now ships that task to other processes that can be scaled as needed to meet demand. This solution eliminates the single DB2 subsystem limitation by exploiting multiple DB2 engines across multiple LPARs in the sysplex. And since your application no longer has to throttle and wait for I/O associated with DB2 processing to complete, I/O is also eliminated as a limiting factor.
While this innovative solution requires both application and infrastructure changes, the ROI can be significant, depending on volume and the application’s I/O profile. The larger the volume of DB2 updates and the higher the write-to-read ratio, the more significant the reduction in total elapsed job time. Rather than trying to force additional throughput and performance through one server, this solution simply spreads the load across additional machines, which is an approach used by most modern software products.