Microsoft Improves Power BI’s Usefulness as Advanced Analysis Platform
Slowly but surely, Microsoft is making Power BI into an extremely useful platform for advanced analysis. New improvements promise to make analysts even more productive.
- By Steve Swoyer
- August 31, 2016
Slowly but surely, Microsoft is making Power BI into an extremely useful platform for advanced analysis. Consider the new features Redmond delivered as part of its July Power BI refresh.
Chief among these is the ability to run R scripts in Power BI's Power Query data discovery and transformation facility. That’s useful because it makes analysts and data scientists more productive, wrote Power BI senior product manager Sharon Laivand in a post on Microsoft's Power BI blog.
Analysts spend a disproportionate amount of their time preparing (or cleansing) data for analysis, Laivand explained. Technology that can accelerate or partially automate this process can make analysts more productive. Instead of having to switch out of Power Query and fire up a separate R environment, analysts can now apply R functions and algorithms in the context of Power Query itself.
“The R language ... includes thousands of libraries that can help with data cleansing,” he writes. “Now that R is supported in Power Query, it also can be used to make general advanced analytics tasks in the data cleansing stage. For example, R can be used to find clusters in the dataset.”
Laivand and Microsoft are onto something. Engineering data so that it's suitable for analysis is becoming a hugely complex problem. This is in part because the dominant tools -- especially the big ETL-centric data integration platforms -- haven't kept pace with the ways analysts use data.
It isn't just that analysts are pulling data from more and different sources. It's that advanced analytics techniques involve putting different pieces of data together into combinations that are time- and frequency-dependent.
One goal of advanced analysis is to establish correlations between seemingly unconnected events. In order to do that, analysts use mathematical and statistical techniques (e.g., curve fitting, function approximation, and regression analysis) to transform the data. Transformation of this kind is outside the scope of conventional ETL tools.
For example, a remote sensor might transmit several events per second or several hundred in the space of a minute. For this information to be useful to an analyst, it must be precisely correlated with information originating at (or about) the same time in other contexts, such as events from other sensors or transactions in an on-premises database.
The easiest way to do this is to embed algorithms and functions into the data flow itself. This is precisely the use case that self-service data preparation tools target. It's likewise a use case Microsoft now supports in Power Query.
In his blog post, Laivand used a relatively prosaic example -- using R in tandem with Power Query to fill in missing random values in a data set -- as a proof of concept. However, there's nothing that prevents Power Query and Power BI being used for more sophisticated analyses.
As Microsoft continues to enhance Power BI (and the rest of its Azure stack) on a monthly basis, its strengths as a cloud-based platform for advanced analytics will continue to improve.
Stephen Swoyer is a technology writer with 20 years of experience. His writing has focused on business intelligence, data warehousing, and analytics for almost 15 years. Swoyer has an abiding interest in tech, but he’s particularly intrigued by the thorny people and process problems technology vendors never, ever want to talk about. You can contact him at firstname.lastname@example.org.