Just as data keeps proliferating all
around us, there is a great hue and cry about what to do with all those
terabytes, petabytes, exabytes…whatever bytes you! Sure, there are ever
powerful number-crunching machines and more capable software, but at the end of
the day, you are going to need professionals especially skilled in the science
of data analysis, management and insights.
That will be the Data Scientist, a role
dubbed by some as the sexiest job of this century. Sexy not necessarily in
terms of what all it involves but certainly in the high demand and even higher
pay packets.
But what exactly would these data
scientists do?
An illuminating blog entry on this very
interesting and still intriguing question was posted recently by Bernard Marr,
an analytics expert and founder of Advanced Performance Institute. To demystify what the work of a data
scientist actually involves, and what sort of person is likely to be successful
in the field, Marr spoke to one of the world’s leading data scientists, Dr.
Steve Hanks—a doctorate from Yale who has worked with companies like Amazon and
Microsoft.
Currently the Chief Data Scientist at
Whitepages.com (whose Contact Graph database contains information for over 200
million people and which is searched 2 billion times a month), Dr. Hanks talks
about some key attributes of a data scientist: One, they have to understand
that data has meaning; Two, they have to understand the problem that they need
to solve, and how the data relates to that; and Three, they have to understand
the engineering (behind delivering a solution).
While all three of these capabilities are
important, writes Marr, it doesn’t mean there’s no room for specialization. He
quotes Hanks as saying that it is “virtually impossible to be an expert in all
three of those areas, not to mention all the sub-divisions of each of them.” The
important thing here is that even if one specializes in one of these areas, one
at least has good appreciation of all of them. Further, in Hanks’ words: “Even
if you’re primarily an algorithm person or primarily an engineer—if you don’t
understand the problem you’re solving and what your data is, you’re going to
make bad decisions.”
I can especially identify with the
“holistic appreciation” quality of data scientists, as many CIOs and
development project heads have often shared similar sentiments about most code
writers: they are too narrowly focused on the “problem” at hand and usually
miss the big picture about the whole project.
Fortunately, unlike the job of a
programmer, the field of data science is attracting or likely to attract people
“of different personality types and mindsets.”
Having said that, the main challenge for
data scientists is not in specializing in a particular machine learning
algorithm or a particular sub-field or tool, but in keeping up with the general
speed of development in data science, the blog notes.
Do let me know what you think of the fast-emerging
field of Data Science.