Over the past few weeks, a crescendo of noise has been amassing in various news outlets regarding the next big corporate career opportunity. This one is in a technology field, surprisingly, and goes by the name “data scientist.”
From what I can glean, a data scientist is someone who can interpret the results of Big Data analytics programs—models that provide insights into large collections of structured and unstructured data that most companies have sitting around occupying space on a growing compliment of spinning rust. The data scientist plies a set of Byzantine and mysterious whiles to decipher constantly updating information pools, spotting trends and turning points, to advise business decision-makers about what’s going on and where it’s going.
Sounds a bit like day traders to me, or hedge fund managers. To hear the descriptions in publications ranging from Computerworld to Fortune, these data scientist positions will be the ones to land if you want big bucks. Or, if you have kids, you should be pushing them toward data science degree programs that will enable them to capitalize on the boom years ahead.
While I’m in favor of any educational program that delivers an advanced degree (while also helping to refine a young adult’s thinking processes, which was the original goal of advanced education), I have some serious misgivings about this data scientist thing. First, don’t we already have computer scientist degree programs? And don’t we already have, if not full degree programs, at least excellent coursework on statistical analysis? The database administrators I know, not to mention a lot of systems programmers, application developers, and infrastructure managers and even a few business managers, all seem to have pretty sophisticated data analysis skills. So, why do we need an entirely new discipline to service Big Data analytics?
We haven’t even defined what Big Data is. Ask 10 vendors and you will get 11 definitions. The smartest guys in the field, like IBM’s Jeff Jonas, seem to think it involves smarter algorithms that mimic human reflection to constantly reassess the meaning of data in a growing pool of real-time information.
OK, if Jonas is right, the smarter algorithms will churn out results that are as easy to understand as fortune cookies—even for idiots like me. I shouldn’t need a special degree to interpret the results from Big Data analytics engines. Metaphorically, they aren’t giving me raw material describing high and low pressure areas and wind directions; they should give me a message saying, “Hey, Jon, it’s about to rain, so take an umbrella.”
Where we will need some degrees is in the arts and crafts of traditional IT. We need a lot more folks who can come up with ways to cobble together massive storage infrastructure to hold Big Data while it’s being analyzed. We’re going to need more folks who can rig together lots of processors to deliver megaflops of processing power for real-time analyses. We’re going to need interconnect and network specialists, middleware developers, application developers, systems managers, help desk administrators, database managers, web services mavens, smart client app writers, and a host of other people who may not fit the idea of data scientist, but without whom there will be no infrastructure to hold data, no power to analyze the data, no way to distribute results, and no way to troubleshoot the inevitable glitches.
This is “the cast of thousands” they used to talk about in the old movie trailers; the folks who keep the ship afloat so that some “deciders” can decide their strategies and tactics for the next battle. Without them, all the data scientists in the world are worthless. And, guess what, the supply of IT talent is drying up.
We first heard this concern raised by “self-serving folks” in the software development space: Their complaints were interpreted as an effort to get more H1-B visa holders into the country because they would work more cheaply than U.S. programmers. Then we heard it in the arguments of the mainframe haters: “Who will run Big Iron once grandpa retires?” In fact, the shrinking pool of qualified IT talent has been happening since the mid-90s, when dotcoms stopped making multi-millionaires out of everyone with a logo idea on a T-shirt. It promises to hit distributed systems a lot harder and a lot faster than mainframe shops, given the difference in the number of folks required to service a server vs. a sysplex.
All the well-trained data scientists won’t mean a thing if there’s no infrastructure to host their data or ply their wares. Growing a cadre of data scientists without first addressing the real requirements of IT staffing strikes me as a house of cards approach. And, no, it doesn’t require a Big Data analytic model to work out the end of that story.