IT Management

“PLATUNE”: Taking Back the Data Center

3 Pages

Jurgenson made extensive use of his ASG-TMON electronic surveillance equipment and discovered that the initial, swift progress of many jobs had been summarily halted. His full-bore drilldown indicated significant data set contention between developer jobs. He made generous use of his RMF Monitor III to confirm his sighting. He also reported that hostilities had spilled over into the production area, causing delays in critical production batch processing, thus destabilizing the online regions. Jobs in the production area are plainly noncombatants. What did we have to do, rename them all UN* something? Don’t these barbaric developers have any respect for accepted codes of conduct? Our Shop Standards Manual, regulation UR2-L8, clearly states: “No nonproduction job shall allocate any production data set during such time that said production data set may be allocated for exclusive use by aforementioned production job.” There would be no avoiding casualties this day.

WAR PLANS

Once all bogies were identified and isolated, strategic command formed a task force to study the problems and devise plans of attack. Our initial thrust would be to out-flank the actuaries and regain control of the initiators. Once this ground was secured, we would advance on the tape library, scatter the developers, and liberate the local inhabitants. Our final push would be to secure and protect our production data sets. Supreme command issued strict orders that production data sets would be protected at all costs. This ultimate objective was critical to the overall success of our mission.

We recognized that the enemy had deeply infiltrated the company. It could be anyone: the polite fellow guarding the water cooler; the nice lady patrolling the hallways. Friend or foe? It was impossible to tell them apart just by looking at them. We needed intelligence, and we needed it fast. We turned to our CANeuMICS database for information, where we kept detailed dossiers on every user and job in the system. We began mining data to separate the decent citizens from the irregular militia from the subversive groups. We were hunting for jobs of mass consumption.

Examining history data, we learned the routines of our users. We categorized their jobs by resource requirements and arrival rate. We classified jobs by CPU time. We grouped jobs by tape requirements. We knew more about our users’ habits than they, themselves knew. Given all this information, we rendered our target coordinates. We knew just where to strike and how. We aimed our photonic cannon and began to fire!

COUNTER-ATTACK!

Actually, we don’t have a photonic cannon. But, we do have a howitzer in our arsenal. It’s our mother of all batch-tuning tools—a product named ThruPut Manager from MVS Solutions.

We launched our first counter-strike using ThruPut Manager Job Limiting Services (JLS), with a scheme to control the number of concurrent initiators a user can occupy. JLS provides job-limiting agents, which can be used to represent system resources. A JLS agent has a threshold value associated with it. One or more JLS agents can be tagged to a job. When a JLS agent is tagged to a job, a weight is assigned, which represents that job’s use of the agent. JLS will allow a job to initiate only when the job’s added weight does not cause the agent’s threshold to be exceeded. For our initiator management strategy, each job was assigned a JLS agent, such as INITS.userid, with a weight of one and a conservative threshold of three. This effectively limited or capped the number of jobs a user could run concurrently to three.

So far, our operation proved quite successful. A few minor political skirmishes flared up, but they were quickly extinguished. Most people agreed that three concurrent jobs were fair and generally adequate. We received some complaints that once three long-running jobs initiated, the user was unable to do any other work—even a quick print job. This seemed a reasonable complaint, so we added another JLS agent named LARGEJOB.userid with a threshold of two. This agent was tagged, depending upon the job’s CPU requirements. “Fast” jobs (those requiring 13 CPU seconds or less) did not receive this agent; all other jobs were tagged with it. The LARGEJOB. userid agent ensured that a user’s short-running jobs would always be processed in a timely manner. After this scheme burned in, we were able to increase the INITS.userid and LARGEJOB. userid thresholds to five and four, respectively.

SECURE THE PERIMETER

3 Pages