Sep 1 ’06

DFSORT: Taming Wild and Unruly Data

by Editor in z/Journal

In a perfect world, data in columns would be neatly aligned and easy to sort, edit, and display. But here in the real world, things are often messier. Data can get pretty wild and unruly: It can be unaligned, aligned differently in different records, and difficult to sort, edit, and display. This article shows and explains an example that illustrates some of the newer features of IBM’s DFSORT that can whip wild and unruly fixed fields into shape. Future articles will discuss additional features for manipulating data in various ways, including how to parse variable fields such as Comma-Separated Values (CSVs) and delimited fields into fixed fields, and how to deal with records with different layouts.

Input Data

Figure 1 shows the wild, unruly input data we want to tame in our example. The input data set has four fields:

Goals for Our Example

We have several goals we want to accomplish for our example:

Control Statements for our Example

Figure 3 shows the DFSORT control statements we need to accomplish the goals for our example. We’ll discuss what each control statement does and identify the new functions used in that control statement. Then we’ll discuss the details of each new function. Although this particular example uses the new functions while sorting, keep in mind they also can be used while copying or merging.

We use the INREC statement to left-align the name values so we can correctly sort them. Since the INREC statement is processed before the SORT statement, the left-aligned name values will be used by the SORT statement rather than the original unaligned name values. The new functions here are OVERLAY and JFY.

We use the SORT statement to sort the left-aligned name values, and the amount1 values, in ascending order. The new function here is SFF.

We use the OUTREC statement to create the amount2 values from the amount1 and change values, and to right-align and add leading zeros to the id values. The OUTREC statement is processed after the SORT statement. The new functions here are OVERLAY, SFF, and UFF.

We use the first OUTFIL statement to create a report with the four fields nicely displayed in fixed positions, and with a trailer containing the averages for the amount1 and amount2 values. We also suppress the page eject that would normally appear for the trailer records. OUTFIL statements are processed after the OUTREC statement.

The new functions here are BLKCCT1 and SFF. Figure 4 shows the resulting output report.

We use the second OUTFIL statement to remove the plus signs from the amount1 and amount2 values, and to create the CSVs. The new functions here are BUILD and SQZ. Figure 5 shows the resulting CSV output records.

Refer to Figures 4 and 5 to see the output fields created by using the new functions.

Overlay Columns (OVERLAY)

OVERLAY is a new parameter you can use on the INREC, OUTREC and OUTFIL statements to change data in specific columns without affecting the rest of the record. OVERLAY also can be used to add fields at the end of each record. With OVERLAY, you can concentrate on just the fields of interest without rebuilding the entire record.

You can specify OVERLAY items in any order, change the same item multiple times, and overlap fields. Changes to earlier items affect changes to later items. This gives you great flexibility in what you can do with OVERLAY in a single INREC, OUTREC, or OUTFIL statement.

OVERLAY is used in the INREC statement to operate on the name values in columns one through 12; no other fields are affected.

OVERLAY is used in the OUTREC statement to operate on the change values in columns 23 through 34 and on the id values in columns 35 through 44. No other fields are affected. The output column indicates the start of the field to be overlaid; 23: is used for the start of the change field and 35: is used for the start of the id field. If an output column isn’t specified, it defaults to 1:, so 1: isn’t needed for the start of the name field.

Build Records (BUILD)

OVERLAY is one of three ways you can reformat records. The other two are BUILD, which lets you reformat each record by specifying all its items one by one, and IFTHEN clauses, which let you reformat different records in different ways. A future article will discuss IFTHEN clauses.

BUILD is an alias for the old FIELDS parameter of the INREC and OUTREC statement, and for the old OUTREC parameter of the OUTFIL statement. If you get in the habit of using BUILD for the INREC, OUTREC and OUTFIL statements, you won’t have to remember which uses FIELDS and which uses OUTREC.

BUILD is used in the second OUTFIL statement to create the CSV output records.

Left-Justify and Right-Justify (JFY)

JFY is a new parameter you can use with OVERLAY and BUILD to left-justify or right-justify data in specific columns. For a left-justified field, leading blanks are removed and the characters from the first non-blank to the last nonblank are shifted left, with blanks inserted on the right if needed. For a right-justified field, trailing blanks are removed and the characters from the last non-blank to the first non-blank are shifted right, with blanks inserted on the left if needed. Optionally:

JFY is used in the INREC statement to left-justify the name values. As a result, the first non-blank character of each name will start in column one. Each name will be padded on the right with blanks up to column 12, as needed.

Signed Free Form Format (SFF)

SFF is a new format you can use for any type of signed free form data. SFF extracts decimal digits (0-9) from right to left anywhere in the field to form a positive or negative number. If a minus sign or right-parenthesis is found anywhere in the field, the number is treated as negative; otherwise, the number is treated as positive. Any combination of characters is valid, but characters other than 0-9, the minus sign, and the rightparenthesis are ignored.

SFF can be used in the SORT, MERGE, INCLUDE, OMIT, INREC, OUTREC and OUTFIL statements, and in the ON and BREAK parameters for all ICETOOL operators. Here are some of the ways you can use SFF for signed free form fields:

SFF is used in the SORT statement as the format for the amount1 values so they can be sorted correctly.

SFF is used in the OUTREC statement as the format for the amount1 and change values so they can be added together to create the new amount2 value. M4.LENGTH=12 is used to edit the resulting amount2 value with the M4 pattern and give it an output length of 12 bytes. The new edited amount2 value overlays the original change value.

SFF is used in the OUTFIL statement as the format for the amount1 values so their average can be calculated. M4,LENGTH=10 is used to edit the resulting amount1 average with the M4 pattern and give it an output length of 10 bytes. SFF also is used in the OUTFIL statement as the format for the amount2 values so their average can be calculated. M4,LENGTH=12 is used to edit the resulting amount2 average with the M4 pattern and give it an output length of 12 bytes.

Note that SFF treats values with a minus sign (found in amount1 and amount2) or a right-parenthesis (found in change) as negative numbers.

Unsigned Free Form Format (UFF)

UFF is a new format you can use for any type of unsigned free form data. UFF extracts decimal digits (0-9) from right to left anywhere in the field to form a positive number. Any combination of characters is valid, but characters other than 0-9 are ignored.

UFF can be used in the SORT, MERGE, INCLUDE, OMIT, INREC, OUTREC and OUTFIL statements, and in the ON and BREAK parameters for all ICETOOL operators. Here are some of the ways you can use UFF for unsigned free form fields:

UFF is used in the OUTREC statement as the format for the id values so just their digits can be extracted. Note that the dashes in the id values are ignored with UFF. EDIT=(TTT-TTTTT) is used to edit the resulting digits with the TTT-TT-TTT pattern to add leading zeros and dashes. LENGTH=12 is used to expand the output values to 12 bytes and right-align them.

Suppress Page Eject for Trailer (BLKCCT1)

BLKCCT1 is a new parameter you can use on OUTFIL statements to avoid forcing a page eject at the start of the report trailer described by the TRAILER1 parameter; the ANSI carriage control character of “1” (page eject) in the first line of the report trailer is replaced with a blank. BLKCCT1 is used in the first OUTFIL statement. So the entire report appears on one page. Without BLKCCT1, the first line of the report trailer (the line with asterisks) would have a “1” carriage control instead of a blank carriage control and would start on a new page.

You also can use these other OUTFIL parameters to affect the ANSI carriage control characters:

Left-Squeeze and Right-Squeeze (SQZ)

SQZ is a new parameter you can use with OVERLAY and BUILD to leftsqueeze or right-squeeze data in specific columns. For a left-squeezed field, all blanks are removed and the characters from the first non-blank to the last nonblank are shifted left, with blanks inserted on the right if needed. For a right-squeezed field, all blanks are removed and the characters from the last non-blank to the first non-blank are shifted right, with blanks inserted on the left if needed. Optionally:

The syntax for squeeze is: p,m,SQZ=(options) where p and m are the starting position and length, respectively, of the field to be squeezed, and options are specific keyword parameters you can use with SQZ as follows:

For example, PREBLANK=C’+-’ or PREBLANK=X’4E60’ replaces each ‘+’ or ‘-’ character with a blank before squeezing. Pre-blanked characters are treated the same as blanks.

Summary

DFSORT offers many functions for dealing with various types of wild and unruly data. By using the techniques discussed here, you can tap into those features to produce correctly sorted records, readable data in fixed columns, delimited data, and much more. Future articles will discuss additional DFSORT functions you can use. Z