TDWI Checklist Report | Self-Service Big Data Exploration Best Practices
December 22, 2014
Don't just manage big data and other new datasets; leverage
all the data. Capturing and storing big data is just the first step.
Ultimately, you want to get full organizational value, and that
starts with solid capabilities for the following:
- Exploring big data—to evaluate big data as an asset and
determine what its content offers you
- Discovering new facts—to learn about your organization and its
customers, partners, products, and processes
- Analyzing new correlations—to reveal root causes of business
problems such as churn, definitions of new market opportunities,
quantifications of evolving customer segments, and more
complete and granular views of customers
Explore data to discover new facts, entities, and correlations.
The progression from exploration to discovery to analysis is
enabled by new, interactive best practices, as seen in tools
for data exploration, data discovery, ad hoc queries, data
visualization, and SQL-based analytics. As weíll see, working with
big data is challenging because of its unorthodox and evolving
structures, plus the many new data sources. Yet, new tools are
enabling users to overcome the challenges and wring value from
new big data.
Consider Hadoop as a preferred platform for big data. In
complex data environments, Hadoop probably wonít be your only
data platform. Furthermore, it probably wonít replace older data
warehouses, columnar databases, or appliances. Still, Hadoop has
proved its abilities by scaling to hundreds of terabytes of multistructured
data and the new forms of analytics processing that
go with them—all at a fraction of the cost of older platforms.
TDWI feels confident that Hadoop will earn a place in modern IT
infrastructures and data warehouse environments.
Data exploration is a special link in the chain. If data
exploration is hamstrungóbecause itís limited to certain
structures, sources, samples, or platformsóthen the subsequent
discoveries and analytic insights are likewise hamstrung.
Furthermore, SQL continues to be a preferred query language, yet
it needs to evolve to support big data platforms and the kinds of
data exploration and analytics users want to perform. Thereís also
a need for data exploration technologies that discover schema and
develop them on the fly to enable SQL to work with the evolving or
non-existent schema of modern data types.
This report defines broad and unlimited data exploration best
practices, with an emphasis on the roles of SQL and Hadoop.