TDWI | Training & Research | Business Intelligence, Analytics, Big Data, Data Warehousing

TDWI Blog

TDWI Blog: Data 360

Big Data Analytics: The View from SAP

Blog by Philip Russom
Research Director for Data Management, TDWI

A few weeks ago, I talked with Mike Eacrett, the vice president of product management for SAP HANA at SAP Labs. Among other things, Mike explained the “secret sauce” that gives SAP HANA flexibility and performance for big data analytics. Give me a moment to recount Mike’s explanation.

Philip Russom: What forms of analytics are you seeing on the rise with SAP customers?

Mike Eacrett: SAP customers continue to expand their investments in online analytic processing (OLAP). But the explosive growth is with exploratory analytics. That’s where a business user needs to learn things that he/she didn’t know to ask before. Or they need to see patterns or the absence of them in the data, typically in response to a change in the business or customer behavior. This kind of exploration requires big data, typically in its original source schema with all its details intact. Instead of transforming and cleansing the data prior to analysis (which can lose desirable data details), the user iteratively develops queries that manipulate data at the analytic tool level, not the physical storage level, as you would when, say, modeling a data warehouse.

Philip Russom: I’m familiar with this analytic method, so I know that it requires a hefty platform for big data analytics. What is SAP offering in this regard?

Mike Eacrett: We offer the SAP In-Memory Computing Appliance, otherwise known as SAP HANA. It’s an enterprise software architecture that enables analytic queries to run against detailed source data—and run fast in real time—without need for transforming the data into data models optimized for a specific type of analysis. To achieve this, SAP HANA implements its own massively parallel distributed processing method (similar to some of the concepts of MapReduce), based on HANA’s in-memory database, running code that utilizes the instruction set and vector processing capabilities of Intel chip sets. That means that the SAP user needn’t define analytic queries months in advance, then wait for IT to model data for them. All the data is available at their fingertips in memory. HANA gives logical data modeling a new twist, so that the analyst user can run queries as fast as he or she thinks them up, and without being limited by data models, data movement, and pre-aggregation constraints.

Philip Russom: You mentioned that SAP HANA gives logical data modeling a new twist. What do you mean?

Mike Eacrett: The term for this new technique is “logical data marting.” It assumes that all the operational source data needed for analytics present in SAP modules is also available in SAP HANA. A logical data model of a data mart is constructed in server memory, based on an analytic query that’s being executed. In SAP HANA-based applications, the same data model is used for online transactional (OLTP) and analytics – in other words, the data marts are a logical view of one persistence layer. The logical model draws data from modules’ underlying memory persisted tables, as needed by queries. As an analyst or HANA-based application iteratively redefines a query, the model automatically redraws itself, using analytic and calculation views. The logical model (based on queries against the pre-built SAP business content) liberates analysts from cumbersome data modeling, and the in-memory processing gives it true real-time speed.

So, what do you think, folks? Let me know. Thanks!