DataOps: An Interview with Tamr CEO Andy Palmer
Why DataOps is gaining traction, how it's evolving, and why it's ever-more important in large enterprises -- an interview with the man who coined the term.
- By James E. Powell
- August 6, 2018
Andy Palmer is the founder and CEO of Tamr, which offers a patented software platform for enterprise-scale data unification that combines machine learning and human expertise. He's generally credited with coining the term DataOps three years ago.
In our conversation, we explore how the term came about, why its popularity is growing, how to get started on a DataOps project, and how new technologies are having an impact on DataOps.
TDWI: You introduced the term DataOps to the marketplace in 2015. Tell us how that came about.
Andy Palmer: My career has been a mashup of experiences that has included data-oriented software start-ups such as Trilogy, pcOrder, Vertica, and Tamr as well as technology leadership roles at companies such as Novartis and Infinity Pharmaceuticals where I used third-party technology to manage and use data. Sitting on both sides of the "vendor/customer" table has given me a unique window into how painful it can be for companies to get the right data into the right hands at the right time to make better, data-driven decisions -- what my friend Tom Davenport calls "competing on analytics."
As agile methods birthed what we now call DevOps, I saw first-hand how DevOps was enabling Internet companies to dramatically increase their feature velocity without compromising on reliability, stability, and quality. I began to see how many of the same principles behind the DevOps movement could be applied to data management and pipelining to enable more agile, accurate, and comprehensive data to power next-gen analytics, resulting in a step change in the analytics velocity that most companies are seeking to accelerate their digital transformation. That's what prompted me to write a blog post in 2015 where I coined the term DataOps.
Why is DataOps gaining traction in the marketplace now?
I think we're at a unique point in time where three big trends have come together to create the opportunity to implement DataOps.
First, there's huge pressure on large enterprises in particular to harness the potential of the data they collect and transform it into a competitive advantage.
Second, these big companies are undergoing a generational change in how they manage their data that includes retooling to cope with the realities of big data as well as sunsetting decades-old technology from traditional vendors (Teradata, Oracle, IBM, etc.).
Third, enterprise migration to the cloud is accelerating rapidly. This is creating a unique opportunity for the behavioral change inside enterprise IT that is necessary for something like DataOps to be widely embraced.
How is DataOps different than the generations of data curation that came before it?
Technology leaders have been trying to get their heads around the data curation problem for years. We've tried rationalizing to reduce the number of systems an enterprise manages, standardizing the data within those systems, and using aggregation to get the same types of data in the same physical data store. The latest attempt has been top-down modeling -- aka master data management.
Each of these approaches can be successful for a given use case at a given point in time, but even taken together they are not sufficient to solve the scale of the "data debt" problem that large enterprises face. In other words, they are all necessary but insufficient to match the scale and complexities presented by the data debt that large enterprises have acquired in the past 40+ years of business process automation.
On the process side, DataOps starts by acknowledging the reality that data will change and data sources will proliferate, adapting data pipelines to that reality. The opportunities and the challenges presented by DataOps are first and foremost human. How we as people in large enterprises value, manage, and curate our data is at the core of the DataOps opportunity.
Has your vision for DataOps changed since 2015?
I think that DataOps is becoming a reality much faster than I anticipated. On the technical side, aggressive chief data officers are being enabled to deal with their data debt crisis, while on the end-user side, the democratization of analytics has awakened enterprises to the power of their data. On the vendor side, I think technology providers are recognizing that the heterogeneous reality of their customers' environments implies that embracing open, best-of-breed strategies is required to create successful outcomes as their customers shape their DataOps ecosystems.
What is the importance of DataOps to a large enterprise?
Large enterprises are experiencing a foundational shift in how they value their data, structure their data engineering teams, and empower their businesspeople with data on the front lines. Organizations capture and store more data than ever before, but the data that is most meaningful to a person in a large company is usually very small and their expectations of quality are (and should be) very high.
The prospect of competing on analytics is compelling but rife with challenges. For example, using all that data at scale is challenged by the "data debt" that is accumulated over time by enterprises struggling to manage the extreme volume and variety of their data. Companies need to start managing their data as an asset, much like they would their own money, if they hope to compete on analytics. DataOps can help business do just that and achieve the analytics velocity necessary to create a competitive advantage.
How have newer technologies (cloud, AI, big data) impacted DataOps?
New technologies are both the underlying causes and the enablers of DataOps. The big data tsunami is one of the biggest factors driving the adoption of DataOps. Companies that can harness the potential of all this newly available data can use it as a source of competitive advantage to disrupt industries and defeat much larger competitors. The cloud is simultaneously a cause and an enabler of DataOps. The rapid adoption of cloud computing is causing organizations to rethink their IT strategies from top to bottom, and that massive change of mindset creates fertile ground for rethinking traditional data management strategies.
At the same time, cloud technologies provide a level of operational flexibility that is essential for delivering on one of the core promises of DataOps -- agility.
Finally, the enthusiasm for AI initiatives is just starting to shine a bright light on the underlying data problems that large enterprises struggle with. Without good quality data, most AI projects will struggle or even fail spectacularly. With Tamr, our goal is a focused application of AI to fix underlying data problems and enable DataOps initiatives to succeed.
What are the key components to implementing DataOps? How should companies get started?
DataOps utilizes agile development, DevOps, and statistical process controls to produce a rapid-response, flexible, and robust data analytics capability. Getting started is as simple as identifying a single, hard-to-answer question: "How many customers do we have?" or "How much do we pay each of our suppliers?" By pursuing answers not as a one-and-done project, but as analytics that should be readily available to many users at any point in time, the principles of DataOps necessarily come into focus.
What are the main challenges companies face when trying to implement DataOps?
DataOps represents a significant change to the status quo. Any time you try to change behaviors, people become the biggest challenge. Technology is not the limiting factor. As with any change initiative, the organizations that succeed will be those that:
- Paint a clear, compelling vision about why change is essential
- Develop a road map that shows how the vision can be achieved
- Form teams with the right mix of skills and attitudes to execute
- Deliver early, quick wins to build momentum
- Communicate, communicate, communicate.
What are examples of successful DataOps implementations?
GSK, one of the world's largest pharmaceutical companies, has an impressive DataOps implementation. CDO Mark Ramsey has embraced the technologies from 11 different vendors including Cloudera, Tamr, Zoomdata, Trifacta, Kinetica, and others with the goal of massively accelerating drug development cycles. The analytics outcomes derived from a DataOps methodology eliminate age-old problems and delays in the drug release process, and GSK can make better use of drug patents; provide better, more reliable treatment faster; and increase profitability through better handling of data.
What trends do you predict for DataOps over the next 3-5 years?
I think the rate of adoption for DataOps is going to accelerate quickly. We're already seeing this with the rapid embrace of self-service data preparation tools such as Trifacta, Paxata, and Alteryx. These tools are great for empowering business analysts to quickly create data mashups without being totally reliant on overtaxed IT departments.
However, the next wave of innovation will pivot around enterprise-scale agile data pipelines that deliver accurate, up-to-date, unified views of a company's most important data entities, such as customers, suppliers, products, or parts. We'll be surprised at how wide the gulf will be between enterprises that tackle their data debt head on and those that think it's a problem that can be deferred. I also think that the data ops vendor ecosystem will be comprised of a healthy mix of best-of-breed, proprietary systems that focus on a few tasks they do extremely well, and supporting, interoperable open source technologies.