Let the People Who Know the Data Best Do the Wrangling
The next leap forward in improving information agility is about rethinking who does the preparation work. Self-service data prep increases throughput and allows you to leverage the collective wisdom of the organization.
- By Adam Wilson
- September 29, 2016
Roughly 25 years ago, the ETL market was created to automate much of the tedious coding required to integrate, standardize, and cleanse data before it was entered into a data warehouse. The idea was to give developers a layer of abstraction that would speed things up, mask complexity, and improve productivity. ETL promised to minimize the data janitorial work and improve analytics delivery for the business.
Now, more than two decades later, the results are decidedly mixed. Although the productivity gains versus writing code by hand are undeniable, organizations increasingly look at ETL as the bottleneck in their analytics efforts -- much the same way they looked at code 25 years ago.
Despite the best efforts and intentions of vendors, it’s widely acknowledged that data preparation still accounts for up to 80 percent of the effort in any analytics project. The cost and complexity of ETL projects is exploding exactly at a time when an ever-increasing number of knowledge workers are demanding greater access to clean, refined data for analysis.
These analysts understand intrinsically that their enterprise competes using its information; they believe access to data is their inalienable right. They are increasingly frustrated at being locked out of data sources, forced to wait in line to get data cleaned, passing specs back and forth, and iterating endlessly before they can interrogate the data or run the algorithms that will improve their business.
Sadly, analysts often wait months to get their eyes on the data. When they do, their questions change. They need a self-service way to go from raw to refined data in clicks, not in months.
Putting Users First
Given this situation, it’s no surprise the next leap forward in improving information agility is not just about better automation but rather about rethinking who does this work.
You have to start with the user in mind. Everything flows from there. It’s time to ask why people who know the data best can’t do the preparation. Why aren’t the users with the business context in their heads in a position to take care of the data wrangling? Trying to meet the needs of an exploding number of analysts and data scientists at a time when IT budgets are flat or shrinking is not efficient.
IT organizations simply can’t scale to meet the data provisioning needs of the business. Enterprises need to shift the burden of the work to end users. It’s the only way to keep up and the only way to stay competitive.
Here’s the secret: you shouldn’t covet this work anyway. Remember, it’s janitorial work -- cleansing, structuring, distilling, enriching, validating, etc. You’re going to give this work to those doing the analysis and they are going to thank you for it. This shift will result in faster cycle time and better insights because the people preparing the data actually know how it’s being used to drive decisions.
Democratizing Data Means Improved ROI
Democratizing data preparation increases throughput and allows you to leverage the collective wisdom of the broader organization to achieve better outcomes faster. Together, these factors can have massive business impact.
If the ROI on your data is directly proportional to the number of people using it, self-service data prep allows IT to become the data hero, streamlining the data supply chain and unleashing more data on the organization than ever before. In turn, shifting this work to the information consumers allows IT organizations to focus increasingly scarce resources on data acquisition and broader governance issues such as reuse, standardization, security, and compliance.
Adam Wilson is CEO of Trifacta. Adam spent 18 years in leadership roles focused on data integration and analytics. Under his leadership, Trifacta has become a global leader in data wrangling. Prior to Trifacta, Adam worked at Informatica for 13 years as a general manager for the ILM business and as SVP of product management and marketing for Informatica’s flagship data integration products. You can reach the author at firstname.lastname@example.org.