RESEARCH & RESOURCES

Realize the True Potential of Big Data Analytics Using R

A platform once reserved to academia is quietly taking root in the business world, enabling a new generation of advanced analytics that helps enterprises better predict and understand the needs of their customers.

By Raj Devarajan, VP and Global Practice Leader, Symphony Analytics

Market leaders in every vertical industry -- from retail and financial services to insurance and manufacturing -- are realizing the benefits of analytics and are turning to the use of open source frameworks such as R to fuel their projects.

As executives looking to increase data-driven decision making in their organizations learn more about open source innovations driving big data analytics, they'll certainly encounter solutions such as Hadoop that receive considerable attention because of their ability to provide low-cost storage for big data. However, the open source statistics package R is quietly creating a buzz among analytics practitioners.

Safe or Savvy?

This open source software could help reverse the shortage of data science expertise that has stifled the potential power of big data analytics in business. Even so, why should enterprises risk adopting an open source software package when they could play it safe with established brands? The answer is that big data is best analyzed using high-density computing, cloud computing, and virtualization, Web companies such as Google, Amazon, Yahoo, and the Apache/Linux communities are releasing solutions, not the enterprise application and infrastructure vendors.

When it comes to predictive analytics, open source R represents a best-in-class platform accepted by statisticians and academia around the world as a standard platform for analytics, with more than 4,000 purpose-built packages available today. With millions of active users and the influx of new college graduates trained in R joining the industry workforce every year, this platform jumps to the forefront as the analytics platform of choice. R's widespread use in academia is adding to its potential to be one of the most-used analytical software packages.

The Way to Informed, Data-Driven Decisions

Several BI vendors have already integrated support for R in their databases. As high-performance, commercially supported computing versions of R become available, CIOs and business decision-makers are choosing R as a cost-effective performance platform for production-grade implementations. Manual, static, or spreadsheet-based data reporting will become a thing of the past, giving way to comprehensive, accurate predictive analysis that paves the way for thoroughly informed data-driven decisions.

Consider the types of models being developed for business users include:

  • A predictive model for forecasting revenue and product demand that analyzes a variety of market, industry, macro-economic, and customer data as well as historical trends in product sales used by supply-chain planners to manage procurement and inventory

  • A metrics-based segmentation model that separates customers with significant upside opportunity based on an analysis of their current and previous purchasing behavior

  • An analysis of streaming social media and Web data to detect trends in customer sentiment and enable real-time personalization on an e-commerce portal

  • A model that evaluates sensor data stored in Hadoop to predict machine failure events that can be prevented by scheduled maintenance

One of the major challenges in designing predictive models attuned to specific business issues is the need for vital statistical and mathematical skills. Data scientists whose training is limited to SAS or SPSS only know how to use the tools designed by others but lack an understanding of the mathematical and statistical principles underpinning them. A failure to grasp these principles can lead to poor model design and a huge financial investment in return for unhelpful results. Thanks to the integration of R into statistics education, experts have a deep understanding of the mathematical principles of both the tool and the algorithm at its core. This leads to robust model design and intelligent data harvesting, enabling smarter data-driven decision-making.

Analyzing entire sets of live data (rather than samples) early in the process greatly improves the quality of the analysis. Thanks to recent enhancements in core R, large-scale analysis of all data sources is now a reality.

R's Not Perfect

There are drawbacks. For example, native R is command-line driven. Being in a naive state, it raises a barrier to entry as only people well versed in command-line programming can work with it. R is also memory intensive, which can be a restriction when doing data mining.

Additionally, vendors such as SAS have worked with financial services and retail to develop industry-specific solutions for fraud detection and direct marketing. It may not make sense to move these to R --that would be similar to reinventing the wheel for certain financial services companies.

Accelerating the Value of Big Data

Summarized reports, such as those created and supported by traditional BI, miss key data patterns and deliver "overview solutions." Analyzing every interaction of every customer makes it possible to accurately identify live patterns that point to profitable opportunities and real risks. With real-time data and astute insights, businesses have the power to make informed decisions about the next product to launch, market threats that need to be addressed, and how to provide customers with the solutions that best fit their needs.

By following the best practices outlined below, companies can accelerate the value of their big data analytics using R.

  • BI and data architects should design data repositories that contain all raw data essential for analysis in centrally available repositories

  • The IT department should implement virtualized R sandboxes with connectivity to data repositories that are not constrained by memory (such as desktops and laptops)

  • Business leaders should hire and embed R-trained business analysts into business functions so they can develop use cases and prototype models that address business problems and develop insights

  • The CMO/CIO should jointly create a centralized analytics team with trained statisticians who can test and validate these models, provide expert advice, and recommend techniques to scale models and promote them to production

  • The CIO should sponsor projects that leverage a standardized and structured process to develop models, bring them into production, and integrate them into business processes.

Speed is of the Essence

With product speed-to-market and flexibility becoming increasingly significant, R's fast fit into real-time systems and existing databases can increase the agility of data-mining systems and analysis tools. Analytical models can be used in embedded programs that operate directly on the data, bypassing data mining packages and saving valuable time. Programs that take hours to run on statistical packages can offer insights in minutes from an optimized program. New enhancements currently in development will enable R models to be moved directly to production as embedded algorithms in databases, enabling fast development of analytics solutions.

As big data predictive analytics transitions from a strategy employed by innovative retailers or manufacturers to a business necessity, the steady rise of R-based solutions will grow as well. A platform once reserved to academia is quietly taking root in the business world and is enabling a new generation of advanced analytics that is enabling enterprises to better predict and understand the needs of their customers.

Raj Devarajan heads the enterprise practice for the analytics division at Symphony Teleca Corporation. Raj has been leading supply chain initiatives for the last 15 years and implementing data-driven solutions to real-world business problems. The Symphony Analytics team specializes in embedding analytics into demand forecasting, sourcing, pricing management, and sales and operations planning. Raj holds an MBA from the University of Denver and a master's degree in chemical engineering from Tulane University.

Symphony Analytics is a proud partner in decision sciences and modeling services to major enterprises. For more information about how you can use R in your analytics, or general advice on how to integrate predictive analytics directly into your key business processes, reach us at analytics-info@symphonyteleca.com.

TDWI Membership

Get immediate access to training discounts, video library, BI Teams, Skills, Budget Report, and more

Individual, Student, & Team memberships available.