Power BI Gets an Analytics Infusion
Microsoft's Power BI cloud platform got an analytics refresh to close out the year. In addition, Power BI now supports DirectQuery for both Spark and Azure Analysis Services.
- By Steve Swoyer
- January 17, 2017
Microsoft's Power BI cloud platform got an analytics refresh to close out the year, with preview support for clustering along with official support for forecasting. In addition, Power BI now supports DirectQuery for both Spark and Redmond's own Azure Analysis Services.
Support for clustering brings powerful new analytics capabilities to Power BI. Clustering is shorthand for cluster analysis, a kind of statistical analysis that involves grouping similar objects into different clusters. (Similarity is determined on the basis of one or more predefined measures.) Clustering is extremely useful in sales and marketing analytics, among other domains.
According to Microsoft program manager Amanda Cofsky, writing on the Power BI blog, the initial implementation of clustering for Power BI is a preview-only feature that users must manually enable. From there, she explains, the process is relatively simple. "A dialog opens where you can decide how many clusters you want us to find. If you leave it blank, we will automatically find the number of clusters we think makes the most sense with your data.
"After the clustering algorithm runs, we will create a new categorical field with the different cluster groups in it. This new field will be adding to your scatter chart's legend-field well bucket, which you can now use as a source of cross-highlighting like any other legend field. You can also find it in your field list and use it in new visuals just like any other field."
Users can also do clustering with tables in addition to scatterplots, Cofsky says.
Microsoft has yet to provide any details about what Power BI is doing under the covers. There are a slew of different clustering algorithms, after all. Which algorithms is Power BI using -- and how are they implemented? Some, such as k-means, are at once powerful and difficult to implement properly. This is true of any machine learning algorithm (including k-means) that involves an element of randomness: it will give a different result on each iterative execution over the same data. In some cases, these differences will be minor; in others, they can be significant.
Official Availability of Forecasting
Power BI's new support for forecasting is no less important. Microsoft has officially supported forecasting in its Power BI Desktop component. November's Power BI refresh includes the general availability of a forecasting preview feature that Microsoft first announced in September.
"We've now made [forecasting] also available in Power BI on the Web," Cofsky reveals in a promotional video. "Now whenever you're looking at your reports on the Web ... you can go into the same analytics pane that you have on the desktop and choose to add a forecast to your line chart."
What does Microsoft's support for forecasting actually look like? Microsoft senior program manager Kim Manis walked us through this on the Power BI blog back in September. "You can ... use our new forecasting feature on your line chart to do predictive analytics on your data. The forecasting feature utilizes built-in predictive forecasting models to automatically detect the step [i.e., monthly/weekly/annually] and seasonality in your data to provide forecasting results."
DirectQuery Support for Spark, Azure Analysis Services
Power BI users have two options to get at -- or query against -- data. The first and most obvious is Power BI's Import feature; the second is Microsoft's DirectQuery feature.
Neither option requires unpacking. Import can get at virtually any data source/data set as long as the user is willing to do a little (or a lot of) data prep. DirectQuery, by contrast, has the potential to offer seamless query access to live data sources -- provided Microsoft supplies connector support.
As part of its November Power BI refresh, Redmond introduced preview support for DirectQuery access to Spark. Users can query against Spark running in both Microsoft's own Azure HDInsight service and in other contexts and distributions. Presumably, this permits a Power BI user to use Spark (and Spark SQL) to parallelize queries. Spark itself can consume and process data from Hadoop (both file system objects and programmatic access via Hive, HBase, Parquet files, and other Hadoop data sources), other NoSQL databases (such as MongoDB), and relational database systems.
"When you're in a report ... if you use any of our Spark connectors ... [you] have the ability to choose DirectQuery as your import method. Then you can use the DirectQuery option for Spark just like any of our other DirectQuery options that we have on the desktop," Cofsky says in another video clip.
Finally, Microsoft announced in October a public preview of its Azure Analysis Services -- the in-Azure counterpart to Microsoft's on-premises SQL Server Analysis Services. As of November's Power BI refresh, users can now connect to Azure Analysis Services via the on-premises Power BI Desktop. Users of Microsoft's Power BI cloud service can already access Azure Analysis Services.
Stephen Swoyer is a technology writer with 20 years of experience. His writing has focused on business intelligence, data warehousing, and analytics for almost 15 years. Swoyer has an abiding interest in tech, but he’s particularly intrigued by the thorny people and process problems technology vendors never, ever want to talk about. You can contact him at firstname.lastname@example.org.