Making Peace with the Venerable Old Spreadsheet
For self-service BI, spreadsheets are extremely useful, but there are much better, more user-friendly alternatives to Excel when it comes to self-service data prep.
- By Steve Swoyer
- August 5, 2016
A new report from TDWI Research highlights a stubborn, inconvenient truth: business people don't just continue to use and depend on spreadsheets; in many cases, they actually prefer them.
The spreadsheet is the original self-service analytics tool. As one participant in a recent TDWI discussion on data quality put it, "You'll never, ever get rid of Excel."
In an era of rich data visualization technologies, data exploration and discovery tools, and mobile-first, on-the-go analytics, Excel and other spreadsheets can seem old fashioned, but they are still in widespread use.
"Spreadsheets continue to be ubiquitous as an affordable tool for viewing data, doing calculations, creating graphs, and performing other types of data analysis. Small and midsize firms that have little or no IT function and lack enterprise BI and data warehousing are particularly reliant on spreadsheet applications," writes David Stodder, senior director of research for business intelligence with TDWI, in the recent TDWI Best Practices Report: Improving Data Preparation for Business Analytics.
Spreadsheets Common in Data Preparation
The spreadsheet is by far the most ubiquitous ETL tool too. It used to be (and probably still is) that when business people needed to "integrate" data -- i.e., prepare, blend, transform, and (lightly) cleanse data -- they used a spreadsheet.
Not surprisingly, Stodder writes, spreadsheets continue to loom large in data preparation. "Spreadsheets come up often as a commonly used tool for preparing data and as a source of data that must be prepared for integrated views from within other applications," he writes.
Think of the spreadsheet, then, as the original self-service data prep tool. The spreadsheet also figures as a destination for integrated data. "Spreadsheets are often where users do their data preparation, even in organizations where there is a data warehouse. In interview research, we hear that users tire of waiting for new data to become available in the warehouse and do not want to wait out long IT processes," Stodder writes. "Instead they will copy and paste data on their own into spreadsheets and try to cleanse and prepare it there for personal or departmental use."
Hazards of Spreadsheet-Based Data Prep
The problem is that data prep of this kind is, was (and will continue to be) unsystematic: it's performed haphazardly and doesn't comply with established policies. (As Stodder noted, people often prep their own data precisely to get around established organizational or IT policies.)
More important, spreadsheet-driven data prep isn't repeatable. It's typically one-off, so self-serving data preppers tend to reduplicate their own efforts and/or the efforts of others. "Cleansing and preparing data manually to remove duplicate records, fix errors, and investigate out-of-range values tends to be boring and time-consuming for most business users. Many will clean and prepare data for use in their personal spreadsheets in a haphazard fashion while they are doing other activities, rather than follow clear and repeatable processes," Stodder writes.
"Spreadsheets are also common for performing data transformation, but too often conversions are done to fit a specific need and no set rules are followed. Users can get distracted during preparation steps, lose their place, miss data, and make mistakes. They then have to start over, and in some cases the errors are missed and become internalized in downstream analysis."
Sound familiar? It's a nice capsule description of the challenges involved in using spreadsheets for BI reporting and analytics. The errors Stodder describes correlate with a phenomenon known as "spreadmart hell."
Vendors Now Integrating Spreadsheets
The good news is that the vendors which develop self-service analytics and self-service data prep tools understand this. They're building logic and capabilities into their software that are designed to make it easier to extract, prepare, and integrate data from spreadsheet sources. They're attempting to support spreadsheets -- as both sources and targets -- in other ways, too.
At last year's Tableau Customer Conference (TCC), for example, Tableau announced several new spreadsheet-oriented data prep features, including "data interpreter," a facility that automatically strips headers, footers, and other errata from spreadsheet data sets.
Also at TCC, Tableau trumpeted "Union," a feature that automates the process of integrating data from CSV and spreadsheet files. Union can intelligently integrate a large collection of related spreadsheets, according to Tableau officials. It features a dialog-driven UI so users can quickly define parameters. Self-service data prep was a huge theme at TCC. Elsewhere, Tableau also announced support for cross-database joins via (the equivalent of) federated query.
Tableau isn't the only player to focus on the spreadsheet as both source and target. Some self-service data prep players are arguably trying to accommodate (and therefore co-opt) spreadsheet-based data prep. For example, self-service data prep specialist Paxata -- like competitors Alteryx, DataWatch, and Trifacta -- can automatically parse, profile, and transform data from spreadsheet and CSV files, in addition to nested hierarchical sources such as JSON or XML.
One difference is that Paxata exposes an Excel-like interface that's designed to make spreadsheet users feel at home. According to CEO Prakash Nanduri, Paxata focuses on producing data that can be easily consumed by spreadsheet tools, too. He sought to distinguish between using a spreadsheet as a tool for analysis and discovery -- still a very viable use case, he argued -- and using spreadsheets to explore, profile, and transform data. In working with OLTP data, CSV files, or with smaller data volumes, spreadsheet-based data prep is still a workable, if sub-optimal, approach, Nanduri conceded.
In preparing multistructured data -- and doing so at anything approaching big data scale -- it's much less workable, he told TDWI late last year.
"Basically, 80 percent of the people who do data prep in a company are ... non-technical business-analyst types. These are the folks who know Pivot Tables, they know vlookups. More and more they're having to struggle with the larger and larger amounts of data they're dealing with. They can't do it in Excel, and they have to have a tool for self-service data prep," Nanduri said. "The challenge is to drive [the production of] clean, consumable information that can be blended and analyzed."
Don't Ignore Your Excel Specialists!
The upshot is that spreadsheets aren't going anywhere. It's about time, argues TDWI's Stodder, that organizations start treating -- and managing -- spreadsheets as legitimate tools for BI, decision support, and (increasingly) advanced analysis.
In a sense, this means accepting the spreadsheet on the terms described by Paxata's Nanduri: as a useful tool for both self-service analysis and (to a lesser extent) quick-and-dirty data preparation. Nevertheless, organizations should support a more manageable alternative to using the spreadsheet as a de facto tool for data prep; there's no lack of dedicated data prep tools.
"As a best practice, organizations should not ignore spreadsheet data use and preparation as they develop an overall strategy," Stodder concludes. "They should evaluate technologies that could either reduce the need to rely on spreadsheets for data preparation or improve data preparation involving spreadsheets."
TDWI Best Practices Report: Improving Data Preparation for Business Analytics is available free of charge.
Stephen Swoyer is a technology writer with 20 years of experience. His writing has focused on business intelligence, data warehousing, and analytics for almost 15 years. Swoyer has an abiding interest in tech, but he’s particularly intrigued by the thorny people and process problems technology vendors never, ever want to talk about. You can contact him at firstname.lastname@example.org.