Traditionally, mainframe data set records were arranged as fixed fields, each of which had a specific starting position and length. Although fixed fields are still widely used, they now must coexist with records arranged as variable fields, each of which can have a different starting position and length. Comma Separated Values (CSVs), tab separated values, keyword delimited fields and many other types are now commonly used. A new feature of DFSORT called PARSE, available since April 2006, lets you extract variable fields into fixed fields you can use with all of DFSORT’s reformatting features. This article shows and explains three examples that illustrate what you can do with PARSE.
Reformatting CSV Fields as Fixed Fields
Consider a typical example dealing with fields arranged as CSVs. Figure 1 shows an input file with RECFm=FB and LRECL=50. There are three fields in each record with information about my pet rats; a name field, a type field and a color field. Each of these fields is separated from the others by a comma. If the type field has more than one characteristic, it’s enclosed in quotes with a comma and blank between the characteristics (for example, “Rex, Self, dumbo”). We want to extract the values in each of these fields left-justified in fixed output fields: name in output positions 1 through 8, type in 9 through 33, and color in 34 through 40. Since the input fields can start in different positions and have different lengths in each record, we obviously can’t use the normal fixed p,m (position, length) notation to deal with them. But we can parse each variable field into a fixed parsed field (%nn) and use that fixed parsed field as we would use a regular fixed field (p,m).
Figure 1 shows the DFSORT control statements we need to extract the variable fields into fixed fields and reformat them. you can use the PARSE parameter on the INREC, OUTREC, and OUTFIL statements and define up to 100 fixed parsed fields as through %99 (%1 through %9 are equivalent to through %09). A %nn field can be used in a BUILD or oVERLAy parameter in the same way a p,m field can be used (for example, %nn,TRAn=UToL).
Here’s what the PARSE parameter of the OUTREC statement does for each input record:
• defines the parsed field for the name value. ENDBEFR=C',' (“end before a comma”) tells DFSORT to extract the value from position 1 (the default starting position) up to but not including the next comma it finds, and save it as the value. FIXLEN=8 (“fixed length of 8”) gives a fixed length of 8 bytes. So the first value is extracted into the field left-justified. If the value is shorter than 8 bytes, it’s padded on the right with blanks. If the value is longer than 8 bytes, it’s truncated to 8 bytes (characters on the right are lost). For the first record, April followed by three blanks is extracted into .
• defines the parsed field for the type value. ENDBEFR=C’,’ tells DFSORT to extract the value from the “current” position up to but not including the next comma it finds. The “current” position is the one after the comma that satisfied ENDBEFR=C’,’ for the field (that is, after the first comma in the record). For the first record, that would extract “Rex, (five characters) but we actually want the entire string, including the quotes; that is, “Rex, Self, dumbo” (18 characters). So, we need to specify PAIR=QUoTE to tell DFSORT not to stop at a comma inside the paired quotes. That will let us find the comma outside the ending quote and extract the value all the way up to the ending quote. FIXLEN=25 gives a fixed length of 25 bytes. For the first record, “Rex, Self, dumbo” followed by seven blanks is extracted into .
• defines the parsed field for the color value. FIXLEN=7 gives a fixed length of 7 bytes. Since we haven’t specified ENDBEFR=string, DFSORT just extracts 7 bytes starting from the “current” position. The current position is the one after the comma that satisfied ENDBEFR=C’,’ for the field. For the first record, Gray followed by three blanks is extracted into .
The BUILD parameter of the OUTREC statement uses the extracted values in the , and fields to BUILD each output record:
• The value (name) is placed in output positions 1 through 8 (remember that FIXLEN=8 gave a fixed length of 8). For the first record, April followed by three blanks is placed in those output positions.