Data Exploration in the Age of Big Data
Learn new facts about your business from a wide range of big data and enterprise data.
- By Philip Russom, Ph.D.
- November 19, 2013
For a variety of reasons, data exploration is an important path to gaining business value from all kinds of data, from traditional enterprise data sources to big data and streaming machine data.
Data is the business. Data keeps a record of organizational activity and performance. If you want to know a business, you must get to know its data.
You have to start somewhere. Poking around in data is how you get a sense of what happened, so you can start building a data set or model that represents a root cause, trend, or some other analytic outcome.
Browsing data can be inspirational. This is how you discover new sources or determine which sources of data are appropriate to a specific report or analysis.
Exploring data is a prerequisite to analyzing data. Analysis makes correlations across data of diverse sources, structures, subjects, and vintages. Finding just the right combination for successful analysis depends on data exploration as a first step.
Data exploration is ascending to a new level of importance because of its indispensible role in most analytic tasks and projects. As more organizations step up their analytic activities, they are -- for the most part -- trying to discover new facts about business entities such as customers, partners, products, financials, and employees. For example, a data analyst may be trying to discover the root cause of the latest customer churn, hidden costs that are eroding the bottom line, compliance and security breaches, fraudulent activity, or an emerging customer segment.
This is sometimes called discovery analytics because of its focus on discovering new facts. In most techniques for discovery analytics– whether based on ad hoc SQL queries, data visualization, data mining, statistics, or something else -- data exploration is key to enabling discovery, whereas other tool functions and techniques are key to polishing the data sets or models that are the final product of analysis.
Data exploration faces a number of challenges, largely due to the rise of big data, which is not only big but also highly diverse in terms of data structure and coming from new sources. Given the size, diversity, and newness of today's big data, you need data exploration tools built for exploring a wide range of both new big data and enterprise data. The tools should include these features:
Search technology for exploring diverse data types. Data exploration should be as easy as Google, it should parse data of many formats and structures, and it should allow any open-ended question, not just one confined to a predefined data model. Search technology satisfies all of these requirements.
High ease of use for user productivity. This is critical because some users are business people who need to see the data for themselves but through a business-friendly view. Ease of use accelerates technical developers' productivity, too.
Short time to use, short time to business value. A data exploration capability with high ease of use enables a wide range of opportunities to get acquainted with data quickly, yet keep digging deeper over time for new business facts and the opportunities they expose.
Query capabilities in support of data exploration. Technical users, in particular, depend on query capabilities to find just the right data and structure the result set of a query so it is ready for immediate use in analytic applications.
Support for all major data platforms, ranging from relational databases to Hadoop. A modern data exploration tool needs to go where the data lives. Given the expanded range of data types now in play in analytics, it takes multiple platforms to manage diverse data in its original form.
For more information on data exploration, please register for the upcoming TDWI Webinar: Data Exploration and Analysis in the Age of Big Data, December 5, 2013 at noon ET.