RESEARCH & RESOURCES

Big Data: Helping to Predict the Unpredictable

Recent advances in analytic technology have provided us with the horsepower necessary to harvest much more big data.

"Big data," the buzzword of the year, is nonetheless a powerful analytic force whose exact meaning is often a topic of debate. Many people describe it in terms of the three Vs: variety (e.g., structured, semi-structured, unstructured), velocity (how quickly it changes; e.g., batch, real-time), and volume (i.e., vast quantities). However, others (especially vendors highlighting the functionality of their products) may augment the V-list to also include veracity (i.e., accuracy and timeliness), vulnerability (i.e., data security and access control), and even value (i.e., the potential benefits).

In many cases, these vendors are merely trying to ride the bandwagon by including "big data" in the headline of their press releases; in previous years, many of these same vendors strived to ensure their place in the worlds of "customer relationship management" (CRM) or "master data management" (MDM) by including these phrases as well. Speaking of master data management, I consider it as a way of providing a single view of a subject such as customer or product, while I view big data as a way of providing a more complete view. The two technologies are definitely complementary (both also complement relationship management).

Although it may be an oversimplification, I like to think of big data as analyzing more data than an organization analyzed in the past. The ability to do so allows organizations to gain new insights by facilitating their discovery of relationships and by making more accurate predictions than they could have done just a few years earlier. Among the factors that have made this possible are advances in hardware and software such as multi-core processors, in-memory analytics, and enhancements to database and data mining technology. This has been accompanied by greatly reduced costs for memory, storage, and access to computing resources (e.g., cloud computing). It has also been aided by colleges offering courses in advanced analytic techniques and the rise of the occupation of "data scientist."

In the past it may have been too expensive to store and process huge quantities of data, but today, organizations are finding that the insights gained by processing this data may easily pay for itself. Among the many areas enhanced by the analysis of big data are drug interactions, fraud detection and prevention, call center interactions, social media comments and sentiment analysis, identification of market influencers, click stream analysis, homeland security, investment analysis, worker productivity and employee turnover, telecommunications churn, infectious disease control, the formulation of patient-specific drugs, customer segmentation and targeted marketing, and product planning and demand forecasting.

For example, the Mayo Clinic and United Health have teamed up to form the Strategic Advanced Research Projects (SHARP) program; one of its goals is to share their vast medical and claims data in an effort to provide better and more cost-effective health-care solutions. President Obama recently announced an initiative to map the human brain; wouldn't it be ironic if neural network data mining techniques were applied to big data to map the human neural network!

Many years ago I formulated a series of data laws or axioms. Among these were "data in a warehouse is like clothes in a closet; even if you haven't accessed it in several years, we still tend not to throw it away." At that time I encouraged the purging of stale data; today, I would advise organizations to identify and archive data with any potential for future analysis.

Big data has always been with us and its growth is certainly accelerating, in part due to social media and machine-generated (e.g., RFID, website tracking, etc.) output. However, recent advances in analytic technology have provided us with the necessary horsepower to harvest much more of it. Future technological advances will enhance our analysis capabilities and provide more insights into now-hidden cause-and-effect relationships. This should enable us to better predict what was formerly considered to be unpredictable.

TDWI Membership

Get immediate access to training discounts, video library, BI Teams, Skills, Budget Report, and more

Individual, Student, & Team memberships available.