Driving Business Value from Big Data
Interview with Jean-Pierre Dijcks, Senior Principal Product Manager, Oracle
Q: What is "big data"?
A: Today, the term big data draws a lot of attention, but behind the hype there's a simple story. For decades, companies have been making business decisions based on transactional data stored in relational databases. Beyond that critical data, however, is a potential treasure trove of nontraditional, less structured data—Web logs, social media, e-mail, sensors, and photographs—that can be mined for useful information. Decreases in the cost of both storage and computing power have made it feasible to collect this data, which would have been thrown away only a few years ago. As a result, more and more companies are looking to include nontraditional (yet potentially valuable) data with traditional enterprise data in their business intelligence analysis.
“Big data” typically refers to nontraditional data, which can be characterized by four parameters: volume, variety, velocity, and value. Typically, big data is generated in much greater quantities than traditional enterprise data, is produced on a more frequent basis and in a wider range of ever-changing formats, and will vary greatly in its economic value.
Q: What is the business value of big data?
A: When big data is distilled and analyzed in combination with traditional enterprise data, companies can develop a more thorough and insightful understanding of their business, which can lead to enhanced productivity, a stronger competitive position, and greater innovation—all of which can have a significant impact on a company’s bottom line.
For example, according to a report from the McKinsey Global Institute, big data could potentially enable U.S. retailers to increase net margins by 60 percent. Today, retailers usually know who buys their products, but by incorporating analysis of Web log files from their e-commerce sites, they can better understand who looked at what products and chose not to buy them. Sentiment analysis based on social media provides further insight into why customers chose not to buy certain products and what they bought instead—information not available through traditional data sources. All of this combined can enable more effective micro customer segmentation and targeted marketing campaigns as well as improve supply-chain efficiencies.
Q: What are the infrastructure requirements for a big data platform?
A: As with data warehousing, Web stores, or any IT platform, an infrastructure for big data has unique requirements. In considering all the components of a big data platform, it is important to remember that the end goal is to easily integrate your big data with your enterprise data to allow you to conduct deep analytics on the combined data set. The requirements for a big data infrastructure span data acquisition, data organization, and data analysis.
Data acquisition. The acquisition phase is one of the major changes in infrastructure from the days before big data. Because big data refers to data streams of higher velocity and higher variety, the infrastructure required to support the acquisition of big data must deliver low, predictable latency in both capturing data and in executing short, simple queries; be able to handle very high transaction volumes, often in a distributed environment; and support flexible, dynamic data structures.
NoSQL databases are frequently used to acquire and store big data. They are well suited for dynamic data structures and are highly scalable. The data stored in a NoSQL database is typically of a high variety because the systems are intended to simply capture all data without categorizing and parsing the data.
Organization. In classical data warehousing terms, organizing data is called data integration. Because there is such a high volume of big data, there is a tendency to organize data at its original storage location, thus saving both time and money by not moving large volumes of data. The infrastructure required for organizing big data must be able to process and manipulate data in the original storage location; support very high throughput (often in batch) to deal with large data processing steps; and handle a large variety of data formats, from unstructured to structured.
Hadoop is a new technology that allows large data volumes to be organized and processed while keeping the data on the original data storage cluster. Hadoop Distributed File System (HDFS) is the long-term storage system for Web logs, for example. These Web logs are turned into browsing behavior (sessions) by running MapReduce programs on the cluster and generating aggregated results on the same cluster. These aggregated results are then loaded into a relational database management system (DBMS) system.
Analysis. Since data is not always moved during the organization phase, the analysis may also be done in a distributed environment, where some data will stay where it was originally stored and be transparently accessed from a data warehouse. The infrastructure required for analyzing big data must be able to support deeper analytics, such as statistical analysis and data mining, on a wider variety of data types stored in diverse systems; scale to extreme data volumes; deliver faster response times driven by changes in behavior; and automate decisions based on analytical models. Most importantly, the infrastructure must be able to integrate analysis on the combination of big data and traditional enterprise data. New insight comes not just from analyzing new data, but from analyzing it within the context of the old to provide new perspectives on old problems.
Q: How does Oracle address these requirements?
A: With the recent introduction of Oracle Big Data Appliance, Oracle is the first vendor to offer a complete and integrated solution to address the full spectrum of enterprise big data requirements. Oracle Big Data Appliance is an engineered system that combines optimized hardware with the most comprehensive software stack, featuring specialized solutions developed by Oracle to deliver a complete, easy-to-deploy-and-support solution for capturing and organizing big data. It is also tightly integrated with Oracle Exadata and Oracle Database, so you can analyze all your data together with extreme performance.
Analyzing new and diverse digital data streams can reveal new sources of economic value, provide fresh insights into customer behavior, and identify market trends early on. But this influx of new data creates challenges for IT departments. To derive real business value from big data, you need the right tools to capture and organize a wide variety of data types from different sources, and be able to easily analyze it within the context of all your enterprise data. By using the Oracle Big Data Appliance in conjunction with Oracle Exadata, enterprises can acquire, organize, and analyze all their enterprise data—structured and unstructured—to make the most informed decisions.
To learn more about Oracle’s solutions for big data, go to www.oracle.com/bigdata.
This article originally appeared in the issue of .