TDWI Upside - Where Data Means Business

Profile: So You Want to Be a (Data Science) Rock Star

Data scientist Nathan Hamilton explains that a strong grasp of statistics and a creative, problem-solving imagination have made him a successful data scientist.

In his day-to-day work as a data scientist, Nathan Hamilton isn't expected to work miracles.

If there's anything miraculous -- or, more precisely, not strictly "scientific" -- about the work Hamilton does, it has to do with the imaginative application of statistical tools and concepts to real-world problems. This isn't miracle working -- although the stuff he comes up with often seems miraculous.

"One problem we've been working on for a while and making a lot of progress on is trying to understand through patient behavior what different providers [i.e., doctors] are like. The problem is that we can only observe what providers are doing through the outcomes of their patients. We have to be able to make inferences from the patients back to their providers," says Hamilton, a quantitative methods manager with a prominent provider of managed healthcare services.

This and other difficult problems have pushed him to some of his most creative and ingenious work. "We're now making enough progress that we feel we'll be able to help people understand who are the providers that are going to be most beneficial for them," he says.

One Path to Becoming a Data Scientist

These days, Hamilton lives in a command line interface, writing Python, R, and SQL code. It wasn't always so. Like many of his peers, he came to data science via a non-traditional route: specifically, graduate work in sociology.

This shouldn't be surprising. As Theodore M. Porter writes in The Rise of Statistical Thinking: 1820-1900, some of the earliest applications for statistics were in social sciences. Demography in particular was a locus of statistical innovation in the first half of the 19th century. Today, the social sciences still place an enormous emphasis on statistical numeracy.

When he was in graduate school, Hamilton discovered that he could apply his knowledge of statistics and his facility with tools such as R and SAS to questions in healthcare and other verticals.

That's all it took. "I went to graduate school with the intention of being a sociologist -- a demographer, to be more specific. I wanted to understand populations, variations in populations, how things change in a population, stuff like that," he says, explaining that after graduation he took a job with a research firm that specializes in advanced analytics. Since then, he's worked exclusively in healthcare.

Most Important Data Science Skills

The two most important assets for the data scientist are an intuitive grasp of statistics and a profound imagination, Hamilton maintains. "To me, data science really is just [the application of] statistics. I would hesitate trying to define it as more than that. At most, I'd say it's statistics on a much larger scale, using a lot more 'found' data," he says.

"For me, a huge part of doing the job is understanding the assumptions that go into these statistical models. You really have to know what's going on, what the math is assuming. If [those assumptions] aren't true, your inferences are garbage. Once you understand that, you also have the ability to try to do things a different way."

New Training and Tools Improving Data Science

Data science today is qualitatively different from its proto-predecessor of a decade ago, Hamilton argues. In the first place, he says, college programs in the social and hard sciences are producing graduates who are, on balance, more statistically numerate than their predecessors.

In the second, the tools are much better, too. He specifically cites the emergence of R, an open source programming environment for statistics, that has made advanced statistical technology (as well as free, user-submitted code) available to a much larger pool of potential users.

"The free-ness of R naturally grows a strong community around it. R is widely available [and has] an enormous group of users who are collaborative in sharing code with one another. Because it is so open and because it is a more flexible programming language, you're able to get things faster and do things faster with R," he comments, noting that the R community is able to tackle emergent problems long before SAS and other vendors are able to address them in their commercial products.

Advice for Industries Getting Started with Data Science

Because Hamilton has worked in an industry (healthcare) that's been in the vanguard of data science, he feels that he's usually had the resources and support he needs to do his job.

He has a suggestion for companies in other verticals that are interested in getting started in data science, however. "I think patience is probably an important virtue for executives who are relying on the work of data scientists -- especially if they're trying to really push beyond what they've already been doing with more straightforward business analytics.

"There is the increased chance of failure [that stems from] trying to look at things in new ways or answer questions in new ways that haven't really been addressed," he says. "If they aren't patient, executives may not capture the value of the data scientist ... because there's uncertainty and the chance of failure in trying to answer these hard questions."

Creativity and Challenge Are the Benefits

Work in data science is hard, Hamilton says, but rewarding. In fact, it's precisely the challenge of what he does that motivates him to go to work every morning. "That push for innovation is really important for keeping me going back to my desk every day," he observes.

"One of the most rewarding aspects of my job is the analytic creativity. Health insurance claims data hasn't changed much in the last 10 years. Now we're bringing in medical records and other kinds of data, too. [We're] merging in this other data to enrich these [health insurance claims]. The creativity is in trying to figure out new ways to look at what in a sense is a very old problem."

About the Author

Stephen Swoyer is a technology writer with 20 years of experience. His writing has focused on business intelligence, data warehousing, and analytics for almost 15 years. Swoyer has an abiding interest in tech, but he’s particularly intrigued by the thorny people and process problems technology vendors never, ever want to talk about. You can contact him at evets@alwaysbedisrupting.com.


TDWI Membership

Accelerate Your Projects,
and Your Career

TDWI Members have access to exclusive research reports, publications, communities and training.

Individual, Student, & Team memberships available.