TDWI Articles

3 Best Practices for Implementing Self-Service Data Preparation

Selecting the right data preparation solution can seem like an overwhelming task, but it doesn't have to be if you keep these three best practices in mind.

Business analysts and departmental information workers are desperately seeking to realize the promise of data-driven decisions and new business insights. However, the challenges that come with self-service analytics -- limited data access, poor data quality, and time-consuming data preparation tasks -- continue to hinder progress.

For most analysts and operations employees, data access is limited to personal data sources, historical reports, or data painstakingly controlled by IT and BI gatekeepers. Requests for access to data warehouses, databases, and BI reports can take days or even weeks to deliver -- and the information often turns out to be the same data requested by other individuals performing similar analysis. Teams and individuals are limited to sharing what little data they have via personal Excel spreadsheets, which increase compliance concerns and reduce trust in analytics processes.

The good news is that an effective data preparation platform and strategy can address these and other challenges. Following are three best practices to keep top of mind when determining the best self-service data preparation solution and strategy for your organization.

Best Practice #1: Evolve Beyond IT and Spreadsheets

Every organization needs to maintain control over data access and governance when it comes to analytics, but relying on IT teams to provide needed data sets for analysis is a failed strategy. With the growth of self-service analytics, IT has struggled to respond to ongoing business requests in a timely fashion, essentially inhibiting the progress of an analytics strategy.

In fact, the TDWI Best Practices Report: Improving Data Preparation for Business Analytics , which examined user experiences with data preparation, notes: "The largest percentage of research participants said it takes IT two to six days to respond to data preparation requests; the second-largest percentage said it takes one to two weeks" (page 22).

At the same time, analysts far too often rely on spreadsheets for cleaning, blending, and visualizing data. Even the most Excel-savvy analyst needs to evolve beyond spreadsheets as a tool for preparing data. Spreadsheets offer no control, lineage, or consistency and are far too often riddled with errors.

IT dependencies and broad use of spreadsheets are two of the leading factors causing data preparation to be so time-consuming and inefficient. The same TDWI Best Practices Report states: "Research finds that many participants have to devote the majority of their time to preparing data instead of to analysis and data interaction" (p. 21).

As part of an effective self-service data preparation strategy, it is critical to leverage a solution designed to enable business users to be self-sufficient while maintaining data stewardship. Data preparation should enable users to quickly and easily access, manipulate, enrich, and combine disparate data from virtually any source and prepare it for analysis in a fraction of the time it takes using spreadsheets and other manually intensive measures.

Although self-service data preparation equips business users and data analysts with the speed and agility they require to access the right data, it doesn't mean data governance and security fall to the wayside. Data preparation also satisfies IT's need for security and compliance, providing governance capabilities such as data masking, data retention, data lineage, and role-based permissions that are necessary to uphold corporate and regulatory compliance and enhance trust in data, analytics processes, and results. Fortunately, using this technology, the needs of business users and those of IT are no longer mutually exclusive.

Best Practice #2: Don't Settle for What You Already Have

Data types available for analytics are evolving just as fast as the analytics tools themselves, with big data, streaming data, and machine data adding to the ongoing challenge of analyzing enterprise application information, data warehouses/data lakes, BI data, log data, Web data, and historical data locked in reports and documents.

Gaining access to "corporate" data sets may take a while, but typically these data sets are readily available to a broad set of employees and may already be providing the basis for some analysis. However, it's the data that you don't currently have in your analysis that provides the most important opportunity for impact.

For example, reports from enterprise applications are often a gold mine of rich, multistructured data confined to text files or PDFs. Log data, machine data, and streaming data are also valuable sources that most companies aren't taking advantage of in their analytics. Third-party data available right from Web pages can provide robust data about markets and industries that unveils context and insights needed for data-driven analysis.

The Best Practices Report explains that "difficulty accessing and integrating data across system or application silos is the most significant barrier to improving data preparation" (p. 19). The best data preparation solutions expedite data access and integration across a wide set of enterprise applications and other data types, providing data users with all of the right information for analysis -- resulting in a holistic view of the business and more informed decision making.

Best Practice #3: Create a Central Data Community

Until recently, self-service data preparation has been a disparate task, executed by those individuals who have embarked on a self-service analytics initiative. Often, these are business users and departmental information gatherers who work independently in Excel spreadsheets or individual data preparation tools.

The proliferation of social features across enterprise business applications has introduced the idea of getting instant access to data and being able to easily share it with key stakeholders and coworkers -- this improves collaboration and makes individuals and organizations more informed, agile, and productive.

Today, enterprises can foster and enable a culture of self-service analytics by implementing a data preparation platform that creates a centralized data community, providing more users with access to more data and thus more insights.

A data socialization platform combines self-service data preparation, data cataloging, data stewardship, automation, and governance features with key attributes common to social media platforms, such as user ratings, recommendations, discussions, comments, and popularity. This powerful combination enables groups of data scientists, business analysts, and even novice business users across a company to search for, share, and access raw or prepared data to achieve true enterprise collaboration and agility while building a data community.

TDWI notes: "Shared resources such as data catalogs, glossaries, and metadata repositories can help users find quality sources and gain better knowledge about how data in multiple sources may be related" (Best Practices Report, p. 17). With a shared data preparation platform that offers data socialization, users can gain seamless visibility into the work of others; share raw and curated data sets for reuse and consistency; learn from their colleagues; eliminate redundant tasks; be more productive; and stay better connected overall as they source, cleanse, and prepare data for analytics and operational processes.

Many Options, One Right Answer

There are now dozens of vendors offering standalone data preparation tools, and data preparation features are increasingly being integrated into BI/data science platforms and added to data visualization and data quality offerings. There are solutions available for use onsite or in the cloud. Some have rich scripting and data mining features; others provide automation and modern user experiences intended for nontechnical/business employees.

With a variety of options to choose from, selecting the right data preparation solution can seem like an overwhelming task, but it doesn't have to be. By keeping these three best practices in mind, you'll be well on your way to implementing a self-service data preparation solution that advances analytics, streamlines operational processes, improves decision making, and delivers greater business value.

For Further Reading:

TDWI Membership

Accelerate Your Projects,
and Your Career

TDWI Members have access to exclusive research reports, publications, communities and training.

Individual, Student, and Team memberships available.