TDWI Upside - Where Data Means Business

Big Data Engineer: The Elusive Job Description

Big data engineer will be among the hottest jobs in 2016, with starting salary growth of almost 9 percent. Wait, what's a big data engineer?

IT-staffing specialist Robert Half Technology recently highlighted several hot jobs it projects will enjoy rising salary and compensation in 2016.

Not surprisingly, data scientist was near the top of the list, at number three. Big data engineer came in at number two, right behind wireless network engineer.

Wait, you say, what's a big data engineer?

One thing we know -- courtesy of Robert Half Technology -- is that the average starting salary for a big data engineer spans a large range: from almost $130,000 at the low end to nearly $184,000. According to the most recent (2013) data available from the Internal Revenue Service (IRS), that upper figure puts big data engineers in the top five percent of all earners.

In 2013, approximately seven million U.S. citizens earned $184,000 or more per year, according to the IRS. In all likelihood, big data engineers comprised only a tiny proportion of this group.

This doesn't tell us what a big data engineer is or does.

Neither does the usually reliable Wikipedia. It contains no entry for "big data engineer," although several articles do make reference to the person or role of big data engineer. Tellingly, Wikipedia's big data entry doesn't contain any such reference.

Urban Dictionary has a "big data" entry, but not one for "big data engineer." Ditto for Gartner Inc.'s IT Glossary, which helpfully defines "big data", but has nothing to say about "big data engineer."

Reddit, a social networking site that has something to say about almost everything, has dozens of posts related to big data engineers -- many of them tangential, some of them job listings. What Reddit is missing, however, is a description of what a big data engineer does.

Speaking of jobs, Stack Overflow has only five explicit listings for a "big data engineer." There is more demand on Indeed, which has 372 job listings. LinkedIn has 236. These include "senior big data engineer," "big data environment engineer," "big data analytics engineer," "Hadoop engineer," "lead big data engineer," and, of course, "data scientist."

Let's put a pin in the connection to data science, at least for now. A Dutch outfit called DataFloq, which bills itself as "the one-stop source for big data," has a comprehensive description of what a big data engineer is -- and what that person does.

The big data engineer, writes DataFloq founder Mark van Rijmenam, "builds what the big data solutions architect has designed." Okay, but what does that mean in practice? "Big data engineers develop, maintain, test, and evaluate big data solutions within organizations," van Rijmenam continues. "Most of the time they are also involved in the design of big data solutions, because of the experience they have with Hadoop[-]based technologies such as MapReduce, Hive, MongoDB or Cassandra."

According to van Rijmenam's explanation, a big data engineer is at least as skilled as a data scientist.

Not only is the role responsible for the construction of "large-scale data[-]processing systems," a big data engineer must also have considerable expertise in data warehousing and NoSQL technologies.

On top of this, the role requires coding expertise, enterprise architecture expertise, and data science knowledge. "A big data engineer should have sufficient experience in software engineering before the move can be made to the field of big data. Experience with object-oriented design, coding, and testing patterns as well as experience in engineering ... software platforms and large-scale data infrastructures should be present," he writes.

"Big data engineers should also have the capability to architect highly scalable distributed systems, using different open source tools," van Rijmenam concludes. "He or she should understand how algorithms work and have experience building high-performance algorithms."

On these terms Robert Half's projection of a $130,000 starting salary may seem a little bit low.

As the job listings on Stack Overflow, Indeed, and LinkedIn demonstrate, there is such a thing as a "big data engineer." This makes sense, because "data engineer" is both an established job category and (in the form of either data engineering or information engineering) an established practice area.

Based on both the starting salary range specified by Robert Half Technology and the job description itself, however, the big data engineer might be an even rarer animal than the data scientist.

Remember, data scientists are sometimes described as unicorns: fabulous creatures that don't actually exist. Think of the big data engineer as that most fantastical of creatures: a green unicorn.

At a minimum, the big data engineer must be able to build and implement a combined data management and data processing infrastructure. The person in the role must also possess an analytical skill set that overlaps, to some degree, with that of the data scientist. The engineer is almost certainly going to have to have coding expertise of some kind, too -- usually in general-purpose and/or specialized high-level programming languages, such as Python, R, Scala, or SQL.

In other words, they're the products of decades of experience, training, and private study, as well.

Maybe they're like data engineers, whose skill set overlaps with that of the data scientist, but who also have strong backgrounds in engineering. By virtue of this background, data engineers are really good at identifying, breaking down, and solving data-oriented problems. Like all engineers, they intuitively understand that almost all solutions entail costs of some kind.

Maybe big data engineers, like data engineers, are neither deeply steeped in enterprise architecture nor polyglot practitioners of object-oriented programming. Knowledge of Java is almost certainly a bonus, though, because most big data technologies are coded in and expect to consume Java code.

Mark Madsen, a research analyst with information management consultancy Third Nature Inc., and a frequent presenter at big data conferences such as Strata and Strata + Hadoop World, captured this best: "If anything, a 'big data engineer' would need to know Java because all of the tooling is written in Java, and it exposes -- or leaks -- Java in higher-level abstractions."

Like the data engineer, maybe the big data engineer's primary passion is transforming, wrangling, and manipulating -- engineering -- data.

On the other hand, maybe big data engineering is primarily a function of proficiency in one or more big data technology stacks. Maybe it's the kind of thing that requires an imprimatur of some kind -- a certification, even. Maybe someday soon we'll see credits such as "Cloudera Certified Big Data Engineer" or "Hortonworks Big Data Engineering Expert." It's probably just a matter of time.

About the Author

Stephen Swoyer is a technology writer with 20 years of experience. His writing has focused on business intelligence, data warehousing, and analytics for almost 15 years. Swoyer has an abiding interest in tech, but he’s particularly intrigued by the thorny people and process problems technology vendors never, ever want to talk about. You can contact him at evets@alwaysbedisrupting.com.


TDWI Membership

Accelerate Your Projects,
and Your Career

TDWI Members have access to exclusive research reports, publications, communities and training.

Individual, Student, & Team memberships available.