RESEARCH & RESOURCES

TDWI Checklist Report | Self-Service Big Data Exploration Best Practices

December 22, 2014

Don't just manage big data and other new datasets; leverage all the data. Capturing and storing big data is just the first step. Ultimately, you want to get full organizational value, and that starts with solid capabilities for the following:

  • Exploring big data—to evaluate big data as an asset and determine what its content offers you
  • Discovering new facts—to learn about your organization and its customers, partners, products, and processes
  • Analyzing new correlations—to reveal root causes of business problems such as churn, definitions of new market opportunities, quantifications of evolving customer segments, and more complete and granular views of customers

Explore data to discover new facts, entities, and correlations. The progression from exploration to discovery to analysis is enabled by new, interactive best practices, as seen in tools for data exploration, data discovery, ad hoc queries, data visualization, and SQL-based analytics. As weíll see, working with big data is challenging because of its unorthodox and evolving structures, plus the many new data sources. Yet, new tools are enabling users to overcome the challenges and wring value from new big data.

Consider Hadoop as a preferred platform for big data. In complex data environments, Hadoop probably wonít be your only data platform. Furthermore, it probably wonít replace older data warehouses, columnar databases, or appliances. Still, Hadoop has proved its abilities by scaling to hundreds of terabytes of multistructured data and the new forms of analytics processing that go with them—all at a fraction of the cost of older platforms. TDWI feels confident that Hadoop will earn a place in modern IT infrastructures and data warehouse environments.

Data exploration is a special link in the chain. If data exploration is hamstrungóbecause itís limited to certain structures, sources, samples, or platformsóthen the subsequent discoveries and analytic insights are likewise hamstrung. Furthermore, SQL continues to be a preferred query language, yet it needs to evolve to support big data platforms and the kinds of data exploration and analytics users want to perform. Thereís also a need for data exploration technologies that discover schema and develop them on the fly to enable SQL to work with the evolving or non-existent schema of modern data types.

This report defines broad and unlimited data exploration best practices, with an emphasis on the roles of SQL and Hadoop.


Your e-mail address is used to communicate with you about your registration, related products and services, and offers from select vendors. Refer to our Privacy Policy for additional information.

TDWI Membership

Get immediate access to training discounts, video library, research, and more.

Find the right level of Membership for you.