Where will the mainframe be in 10 years? If z/OS is your lifeline, your risk assessment should adequately consider this point. Since one of my clients estimates it would take them eight years to migrate from zSeries to another platform, they recently engaged me to analyze mainframe viability through 2015…
There have been a lot of changes going on in the IBM world, particularly here in Europe. You will recall that IBM announced a substantial headcount reduction a couple of months back, and intimated the ax would be wielded most forcefully on this side of the Atlantic.
As the Information Revolution evolves, there has been an immense shift of responsibility in the custodianship of the critical and very personal information that resides on the world’s IT platforms. With the onslaught of legislation over just the last five years, including HIPAA and Sarbanes-Oxley, CIOs now find themselves in the position of Chief Security Officer as well—with the potential of personal repercussions if their custody is mishandled…
Introduced in May 2002 on zSeries G7 systems, the Fibre Channel Protocol (FCP) opened the door to the Small Computer System Interface (SCSI) world for Operating Systems (OSes) running on a zSeries machine. While access to classical zSeries storage via ESCON and FICON channels required control units that conform to zSeries- specific I/O and storage access protocols, the FCP channel allows attaching industry-standard storage controllers. These controllers typically use the SCSI command protocol, based on different underlying physical transmission media and protocols. …
The article, “The Good, the Bad and the Really Ugly: DB2’s DSNZPARM Module” (z/Journal April/May 2005), discussed what DB2’s DSNZPARM member is and how it can be maintained, and provided a look at some of the good and bad things you can accomplish when manipulating ZPARMs. In this follow-up article, we explore some specific DSNZPARM keywords, what they mean, how to change them, and the possible effects of those changes. We’ll examine only a subset of the DSNZPARM keywords; for more detailed information, refer to the IBM DB2 Installation Guide.
Since the introduction of Linux for the mainframe in December 1999, IBM has made numerous changes and enhancements to its z/VM operating system to improve the performance and manageability of Linux/390 guests. Some of these changes have been internal to z/VM, but two in particular have made a big difference in the amount of effort needed to set up and manage Linux/390 network communications. Those two enhancements are the Guest LAN and the VSWITCH (Virtual Switch). This article focuses on the VSWITCH, but we’ll also compare and contrast it to Guest LANs. …
When was the last time you thought about your job scheduling software? More than likely, it was when you had a problem with your scheduler or had to sign the check for the annual maintenance bill. Most people don’t view job scheduling software as cutting-edge technology. It’s been around for more than 20 years; a dependable workhorse that continues to automatically submit jobs based on calendar requirements such as time and date. It really hasn’t changed much, right?
If you believe that, then you should ask yourself the following questions: …
DB2 UDB for z/OS Version 8 (V8) introduced many new features. So, how could anyone decide among all the features, which one is the most important? The choice offered here is based on work with numerous customers migrating to V8.
DB2 distinguishes itself from all other relational database systems because of its unique cost-based optimizer. The goal of the optimizer is to choose a method of data retrieval that will result in the fastest response time with the least cost in resources. The key to access path selection is the mathematical formulas in DB2 that estimate filter factors. Based on these estimates, DB2 determines the cost of any particular method of data retrieval. If DB2 estimates of the filter factor are as close to accurate as possible, then the access path will truly be the least cost method every time.
So what feature of V8 could possibly make a significant improvement in filter factors that ripples down to the access path and to thread response time?
V8 drastically changed capabilities of the Runstats utility. With V8 Runstats, the DBA can now collect frequency and correlation statistics on any and all columns in a table as well as groups of columns. The impact of this change is enormous concerning the optimizer and access path selection. This article explains changes in the Runstats utility, including the new mechanics of collecting these statistics, and shows their effect on access path selection.
Collecting These Statistics
The following sample input to Runstats will collect the 10 most frequent and least frequent values for the three named columns as a group and store the statistics in the system catalog tables, SYSCOLDIST and SYSCOLDISTSTATS:
RUNSTATS TABLESPACE DSN8D81A.DSN8S81E
Some Additional Options
By including the phrase, “SHRLEVEL CHANGE,” the DBA can collect the statistics with no object locking. This will result in statistics that lose a small amount of accuracy based on the percentage of changes that occur when the statistics are gathered. However, the slight difference is well worth the price of allowing the collection of statistics on a more frequent and less intrusive manner.
The table name block can be repeated to specify more than one table in a single scan of the tablespace, but all tables specified in a command must reside in the same tablespace. The column names in the COLGROUP block can be any columns in the table, whether residing in an index or not.
You can change the “Colgroup” keyword to “Column” and gather statistics for each of the columns individually. However, single column cardinalities shouldn’t be restricted to a list. Rather, all column statistics should be collected unless you know for certain only those columns listed will appear in SQL predicates.
The “Index” keyword can be added to specify collection of statistics on any index associated with the named table. You can name particular indexes or simply use the “ALL” keyword to collect statistics on all indexes without having to name each one.
Frequencies can be collected with the FREQVAL keyword, but these statistics only provide value to the optimizer when all columns in the COLGROUP appear as equal predicates of an SQL statement.
Elapsed time and CPU time for the utility will increase as you specify more columns, COLGROUPs, and FREQVALs in the control cards. Runstats will scan the chosen tablespace and read the relevant pages for processing. If you have a segmented tablespace, then only the pages for the named tables will be processed. If a tablespace is partitioned, then the utility will process all partitions, unless it is limited by control statements. You can reduce the impact by specifying “sampling” in your input, but this will be ignored for collection of statistics associated with the COLGROUP and FREQVAL keywords.
How often you should execute this utility is influenced by the choice of SHRLEVEL options, the number of statistics being collected, and the volatility of the data. We’ve talked about the first two, so let’s examine the third factor.
If the values for a particular column don’t change significantly over time, then the need to execute this utility more than once may not be required. As a precaution, the utility could be scheduled to execute monthly or quarterly just to update the catalog with current values or after any batch process that might make significant changes to the column values.
If the column data changes frequently, and as a result the distribution changes, too, then this utility would need to be run frequently.
For example, assume we have a column with codes that represent work flowing through an application and the flow of the work typically results in a large percentage of the rows representing one status code at the beginning of a month and changing through five normal status codes as the month progresses. Assuming the status codes all change fairly consistently in a two- or three-day window, the DBA might choose to collect statistics on this particular column five times a month, with an objective of getting the statistics just as the column values change during monthly processing. This may mean setting up a schedule that’s tied to the business-processing day of the month instead of a particular calendar day.
What makes scheduling a particularly difficult task is the varying requirements caused by the nature of the customer data. Let’s assume we have a table with 10 columns that are important for filter factors, but each of the 10 have different volatility characteristics. Would we set up 10 different jobs to run at different times and collect the statistics for each column independently? Since each time the utility runs, it scans all the pages for the table, it would probably make more sense to set up a schedule to collect all of the column statistics at once and execute the utility based on the column that required the most frequent collection of statistics. Each application will probably be different, so it will make for difficult choices when scheduling this utility. There will always be a need for some standardization in a shop, but the need for these statistics will make for hard choices between standardization and optimal performance.
Why would anyone go to such trouble to change filter factors? Consider how these statistics will affect the optimizer and SQL performance. In the following examples, you may recall particular queries that had you scratching your head about what was happening with DB2. In the examples, the DB2 optimizer didn’t disregard the lowest cost access path for these queries to choose a poor access path. Rather, the examples show that DB2 lacked sufficient information to choose the best access path based on default assumptions in its formulas. It can’t correctly replace the assumptions when it lacks necessary statistics.
Join Example With Equal Predicates
The SQL statement in Figure 1 doesn’t perform as fast as might be expected. Running explain on this statement showed that DB2 chose the “People” table “P” as the first table in the join sequence. This table has a large percentage of rows that qualify, while the “Group” table “G” has a low percentage of rows that qualify. (The optimizer uses filter factor in conjunction with cardinality to estimate the number of rows that will qualify for a particular query.) So the SQL is I/O-intensive and takes longer than expected. What happened to DB2? Where did it go wrong?
The easiest way to determine the qualifying rows for a join query is to decompose the query into selects for each of the tables that participate in the join, then count the rows. For this particular query, decompose the SQL to determine the qualifying rows as shown in Figure 2.
After turning the count into a percentage of each table’s cardinality, the qualifying rows from “G” are less than 1 percent and the qualifying rows from table “P” are 60 percent. After accumulating some statistics on the relevant columns, the disconnect for the optimizer choices becomes apparent. For the “P” table, the PEOPLE_GROUP column has nine possible values, so the optimizer (lacking any other statistics) assumes an equal distribution of values and decided the filter factor for this column was .1111 (1/9). The COUNTRY column had 227 possible values, so the filter factor was .44 (1/227). The STATUS column had 29 possible values for a filter factor of .0345. The CARE_STATE column had 12 possible values for a filter factor of .0833. Taken together, the optimizer calculated the filter factor on this table as .00014. This filter factor is the result of optimizer assumptions when no statistics are present to let the optimizer make a more correct estimate for the filter factor. In this case, there’s a 4,285 times difference between .00014 and .60.
What could account for the significant discrepancy in the filter factors? Collecting distribution statistics for the columns in table “P” referenced in the predicate revealed the distributions in Figure 3, which displays the great disparity between even distribution and actual. After distribution statistics were collected for the two tables involved in the join, the optimizer chose “G” as the first table in the join sequence and the run-time went from more than 30 minutes to less than 1 minute. Not every column will be skewed like this example, so don’t collect frequency statistics on all columns.
Here, the SQL contained literals that made the analysis a little easier. It also helped the optimizer make a better choice because it was dealing with known values in the predicate. If this statement had contained host variables, then the REOPT option would need to be used to cause the optimizer to evaluate the statement at each invocation to determine the filter factors at that time based on the predicate values…
Many people realize that installing and customizing IBM WebSphere Application Server for z/OS is somewhat more complex than inserting a CD and clicking “setup.” This article explores what makes implementing IBM WebSphere Application Server on z/OS different from WebSphere on other operating systems. We discuss the tools available to you to help make customization straightforward and manageable. You’ll learn the six key areas where careful planning and preparation can simplify your installation and customization on z/OS. …
Those of you with a mainframe background know that Rexx provides the power required to perform any scripting task while still being easy to read and maintain. What you may not know is that Rexx comes in a variety of free and open source versions that run on every operating system and platform. This means you can apply your Rexx skills to a wide variety of systems, including desktops, servers, laptops and even hand-held devices such as PocketPCs, Personal Digital Assistants (PDAs), and Palm Pilots. This article reviews free Rexx interpreters and their accompanying tools and interfaces. The goal is to help you use your Rexx knowledge to program any system around, even if your knowledge of that system is limited.