Tuesday, January 26, 2016

Leading Data Scientist Talks about, Well, What Data Scientists Do!

Just as data keeps proliferating all around us, there is a great hue and cry about what to do with all those terabytes, petabytes, exabytes…whatever bytes you! Sure, there are ever powerful number-crunching machines and more capable software, but at the end of the day, you are going to need professionals especially skilled in the science of data analysis, management and insights.

That will be the Data Scientist, a role dubbed by some as the sexiest job of this century. Sexy not necessarily in terms of what all it involves but certainly in the high demand and even higher pay packets.

But what exactly would these data scientists do?

An illuminating blog entry on this very interesting and still intriguing question was posted recently by Bernard Marr, an analytics expert and founder of Advanced Performance Institute. To demystify what the work of a data scientist actually involves, and what sort of person is likely to be successful in the field, Marr spoke to one of the world’s leading data scientists, Dr. Steve Hanks—a doctorate from Yale who has worked with companies like Amazon and Microsoft.

Currently the Chief Data Scientist at Whitepages.com (whose Contact Graph database contains information for over 200 million people and which is searched 2 billion times a month), Dr. Hanks talks about some key attributes of a data scientist: One, they have to understand that data has meaning; Two, they have to understand the problem that they need to solve, and how the data relates to that; and Three, they have to understand the engineering (behind delivering a solution).

While all three of these capabilities are important, writes Marr, it doesn’t mean there’s no room for specialization. He quotes Hanks as saying that it is “virtually impossible to be an expert in all three of those areas, not to mention all the sub-divisions of each of them.” The important thing here is that even if one specializes in one of these areas, one at least has good appreciation of all of them. Further, in Hanks’ words: “Even if you’re primarily an algorithm person or primarily an engineer—if you don’t understand the problem you’re solving and what your data is, you’re going to make bad decisions.”

I can especially identify with the “holistic appreciation” quality of data scientists, as many CIOs and development project heads have often shared similar sentiments about most code writers: they are too narrowly focused on the “problem” at hand and usually miss the big picture about the whole project.

Fortunately, unlike the job of a programmer, the field of data science is attracting or likely to attract people “of different personality types and mindsets.”

Having said that, the main challenge for data scientists is not in specializing in a particular machine learning algorithm or a particular sub-field or tool, but in keeping up with the general speed of development in data science, the blog notes.

For more interesting details and insights, I would urge you to read the full blog post.

Do let me know what you think of the fast-emerging field of Data Science.


(Note: This blog post first appeared on dynamicCIO.com. Image courtesy: Americanis.net)