SAS Solution Gives Boost to Statistically Fluent Users
SAS Visual Statistics is designed to make the statistically fluent even more productive. Crucially, this is a class that includes business analysts, power users, and other non-traditional "statisticians."
- By Stephen Swoyer
- October 14, 2014
Data visualization helped to make business intelligence (BI) more usable, interactive, self-serviceable, and intelligible. Could it do the same for statistics?
This would seem to be the rationale behind SAS Visual Statistics, a recent solution from SAS Institute Inc., one of the biggest names in statistics, data mining, and analytics. Two years ago, SAS shipped its inaugural BI discovery product, SAS Visual Analytics, which (like other BI discovery offerings) combined an interactive data visualization-driven experience with self-service capabilities to address a new class of users and to support a visual-exploratory analysis capability. With Visual Statistics, which is based on Visual Insight, is SAS trying to pull off something similar?
Not exactly. The democratization of statistics isn't what SAS has in mind. Instead, officials say, Visual Statistics will help to make the statistically fluent even more productive. Crucially, this is a class that includes business analysts, power-users, and other non-traditional producers of statistical analysis.
Think of Visual Statistics as a kind of statistical workbench for stats geeks and non-geeks alike: i.e., for statistically proficient business analysts, power users, or other kinds of knowledge workers, says Tapan Patel, principal product marketing manager with SAS . If you squeaked by in your college-era Stat 101 class with a "D," SAS Visual Statistics might not be of much help to you. However, Patel argues, if you have a solid grasp of fundamental statistical concepts, methods, and functions, Visual Statistics can help to boost productivity and (over time) to build statistical competency.
Like SAS Visual Analytics and similar visual discovery tools, Visual Statistics exposes a point-and-click, exploratory user experience. Analysts use traditional point-and-click conventions to interact with and explore data sets. Visual discovery tools typically incorporate self-service capabilities of some kind, and Visual Statistics is no different. It purports to automate common tasks (such as the identification and refinement of variables along with the selection of statistical algorithms), Patel explains. Analysts can drag and drop variables into a data set and observe (more or less immediately, depending on the size of the data set) what effect, if any, this has.
"Rather than creating a new model and running separate analyses for each [of these variables], I can explore and test them out in real-time. I can immediately see what works best," he says.
How -- or for which applications -- might stats geeks use Visual Statistics? According to Patel, stats experts can use it to accelerate the work of developing predictive models for data mining and predictive analytics as well as for other kinds of analysis. Visual Statistics, he claims, can simplify complex or time-consuming tasks (e.g., the identification or selection of variables and/or functions) and permits analysts to test changes or tweaks in or close to real time. For similar reasons, he argues, Visual Statistics can be used by less geeky (but stats-savvy) analysts, too. It helps stats geeks work faster; it includes features (wizards, guided variable selection/refinement, and guided algorithm selection) that permit non-geeks to bootstrap their own statistical analyses.
"If I have a set of independent variables or dependent variables and I drag some independent variables [into] my palette and I drag in a dependent variable, once I do that, I have a model in front of me. Let's say I'm not happy with that model. The concept of interactivity is [that it promotes] exploration, which is the ability to just poke around [in the data set] and to experiment with different [variables, functions, and so on]," he explains. "I can add a categorical variable by simply dropping it [onto a model] and, boom!, my model gets refreshed, my model gets refined, I have my new set of model comparison charts, I have my new set of model evaluation charts. [Visual Statistics] tells me that adding these new variables improved the accuracy of my model from X to Y."
Patel's and SAS' claim that Visual Statistics permits analysts to explore (and to tweak models against) a data set in real time needs to be unpacked. The timeliness of what Patel and SAS mean by "real time" is basically a function of the performance of that product's underlying in-memory analytics engine. Instead of having to read data from and write data to physical storage on the fly (or, no less frustrating, to pre-generate analytic artifacts or structures and write/read them to and from disk), Visual Statistics runs against an in-memory engine (viz., SAS LASR) such that a working data set is loaded entirely into physical memory, or RAM.
Analytical workloads, as distinct from OLTP workloads, tend to be write-intensive; as a result, I/O performance -- not CPU performance -- becomes the preeminent gating factor. RAM I/O performance is orders of magnitude faster than disk-based I/O (this is no less true of Flash or solid state disk I/O performance, which -- although much faster than that of its older, rotational kith -- is nonetheless orders of magnitude slower than physical RAM chips). This is one reason in-memory analytic engines enjoy so much cachet. (Applicability is another story. Few BI or decision support workloads, which account for the vast majority of analytics or light-analytics workloads, actually require in-memory I/O performance.)
"Visual Analytics sits on top of LASR. Now Visual Statistics is an add-on for Visual Analytics, which takes this [exploration and discovery experience] a step further," Patel explains. "There are some [use cases] where [Visual Statistics] extends what you can do with Visual Analytics, so I can look at the relationships between my customers. For example, if I'm trying to predict should I extend a credit line increase to a certain customer, I want to see how many times this customer has defaulted in the past, what kind of spend he has with me, what kinds of withdrawals he has with me," he continues. "I can see, these are the top 20 kinds of variables for [predicting default] this individual or for a particular set of [similar] individuals. This is something I can quickly explore in Visual Statistics."