How to Cut Data Preparation Time for Visualization Tools
Is it possible to keep data preparation processes from becoming unmanageable?
- By Ian Macdonald
- February 15, 2019
Data preparation is the foundation of impactful analysis. Data scientists spend 80 percent of their time on data preparation. (Source: Forbes, March, 2016) However, as data volumes explode, organizations are under intense pressure to maintain efficient extract, transform, and load (ETL) processes. Now, more than ever, organizations need to simplify these processes without sacrificing data connectivity or data preparation functionality. They must also prepare for exponential increases in data volumes and complexity. Organizations must adopt advanced analysis techniques or fall behind.
Given this challenge, is it possible to keep data preparation processes from becoming unmanageable?
Historically, organizations with centralized BI implementations relied on IT to prepare and deliver data for analysis. That approach was fine when OLAP was the dominant method for self-service analytics. However, the new data environment has only made the wait longer -- and more expensive -- for organizations using these legacy solutions. IT is simply too busy -- and too far removed from the day-to-day business problems analysts and business users are trying to solve -- to be effective data wranglers.
The rise of self-service analytics solutions granted end users newfound autonomy, flexibility, and speed, but not without real costs. Accuracy suffered as end users began to pull data onto their own individual desktops -- using their own specific applications (often Excel) from their own department-specific sources -- to analyze data and build visualizations.
The consequences were predictable: data silos, limited perspectives, and a creeping distrust of data accuracy. In addition, although the obvious benefit was that those preparing and analyzing the data were now the people who knew the business challenges at hand, their ability to incorporate a growing list of complex data sources began to suffer.
In recent years, standalone, specialized data preparation tools have emerged, filling the void between these diametrically opposed application types. Although powerful enough to handle many data types and sources, they're often expensive and difficult to learn. It's also unlikely end users would ever need all the advanced functionality the tools offer, and it's even less likely they would have the time to learn how to use them.
Today, we can no longer afford to rely exclusively on IT to prepare data for analysis and visualization tasks, and we can't rely exclusively on desktop tools to provide the holistic view of the business we seek. Standalone tools let us integrate all of our broad and complex data sources, but they only add complexity to the mix.
Despite these realities, it is possible to implement data preparation processes that keep pace in today's environment. There are three critical aspects of a modern analytics platform that can reduce the time and complexity of data preparation:
It must balance ease of use with data governance. Data preparation tools should deliver ease-of-use without sacrificing important data prep functionality, and they must increase data transparency without sacrificing data security. End users demand code-free, visual interfaces that let them connect, prepare, blend, and join data from databases, cloud and on-premises sources, structured and unstructured data, spreadsheets using standard queries, formulas, filters, and joins—all without relying on others to do it for them. However, because of the heightened fear of data breaches, it's vital that IT continues to have visibility into how and what data is being used.
It should be embedded directly in the analytics application. Data preparation must be tightly integrated with other components of the analytics workflow to reduce licensing costs and the learning curve. With some modern analytics and BI applications, powerful functionality is built directly in their platforms. Users can connect, prepare, blend, and join data from all their sources with an intuitive, visually based user interface.
It must practically incorporate AI capabilities. We can strive to integrate AI and machine learning into our businesses, but we must ensure that we have a technology ecosystem in place to support it. By incorporating machine learning during the data preparation stage -- and by including the data preparation capabilities within a complete analytics platform -- we can quickly explore all types of data for analysis, leading to more relevant correlations and insights.
Conclusion: It's Possible to Cut Data Preparation Time
Data preparation has become more difficult, not easier. The explosion of data sources has forced organizations to fundamentally change how they prepare data for analysis. Organizations can no longer rely on legacy analytics systems, nor can they rely on self-service tools that offer limited functionality. Although standalone tools offer deeper functionality for data scientists and experts, analytics are best when all analytics and BI are together on the same platform.
However, organizations can reduce the time it takes to prepare data if they adopt end-user-friendly data preparation processes; integrate these processes on the same platform with other data analysis, visualization, dashboard, and reporting functionality; and expose machine learning during data preparation so business users can apply algorithms to the raw data instead of aggregated data.
Ian Macdonald is the principal technologist for Pyramid Analytics, a SaaS company based in Amsterdam. Their software provides enterprise-level business intelligence and analytics solutions.