The Path to Pervasive Intelligence: 2021 Predictions
These three predictive-analytics milestones are worth watching out for in the coming year.
- By Luke Han
- December 2, 2020
We are an industry of unintended consequences. A local area network protocol from the '80s now delivers an average of eight hours of high-definition video every day to quarantined Americans. The smartphone has diminished camera sales and the photographic film industry.
We are witnessing now, in real time, the birth of the mother of all trends that will unleash a vast new order of unintended consequences, both positive and negative. What will happen when machine learning and other advanced analytics become pervasive?
Unintended consequences are hard to predict, but the path to pervasive intelligence in 2021 is much clearer. The past decade in technology has set the stage for embedded intelligence in virtually every sector of society: highly automated cloud infrastructure, a rich, open source software portfolio, rapidly maturing data engineering and data science disciplines, and the general adoption of distributed computing techniques and technologies.
The question is: what are the milestones of 2021 that will push us towards pervasive intelligence?
Milestone #1: Embedded Analytics in SaaS
An ever-growing proportion of enterprise business functions use cloud services to operate. SaaS companies sit on tons of data with tremendous potential value. It's a huge boulder of potential energy sitting at the top of the hill. Until now, excessive friction -- in the form of cost, time, and effort -- has kept this boulder from tumbling down the hill, realizing its kinetic energy in the process.
For enterprise customers, the benefits of analytics services are becoming more immediate and profitable. They can quantify, analyze, and predict business performance across a wide scope of functional areas and business processes. They can identify potential up-sale and cross-sale opportunities and make informed decisions at a granular level about opportunities and threats.
SaaS vendors may be the ultimate beneficiaries of pervasive analytics as more companies turn to SaaS solutions to streamline and modernize their operations while de-emphasizing spending capital on data centers. The ability to offer useful insights at every turn gives SaaS vendors a potential new revenue stream they can offer their existing customer base. New analytics also offers a competitive advantage and extends the life and value of their core product. This will, in turn, increase the stickiness of their solution as customers become accustomed -- and then addicted -- to the new wealth of easy-to-consume insights.
These insights come at a cost. Pervasive analytics capabilities -- whether dashboards for executives, self-service analysis for business users, or reports for line-of-business managers -- create some familiar requirements for SaaS vendors:
Greater concurrency: SaaS vendors depend on economies of scale to efficiently deliver their subscription services with attractive subscription pricing. There are usually hundreds or thousands of enterprise customers each having tens to hundreds of concurrent users. This type of concurrency can bring an analytics platform to its knees.
Data freshness and accuracy: SaaS users expect that the data in their dashboards needs to be fresh to reflect the world as they see it. With many SaaS vendors servicing customers in multiple time zones, daily refreshes of analytics data sets may not be granular enough to be useful. Rapid refresh of these data sets also enables data teams to identify anomalies, duplications, or errors early.
Data privacy: SaaS vendors must be vigilant about who can see what data, at what level, and may have issues of privacy as it pertains to multitenancy.
We can expect more SaaS vendors will offer analytics capabilities as add-on products. The challenges of user concurrency, data freshness and accuracy, and data privacy are all addressed in part by Apache Kylin's distributed multidimensional aggregate indexes (aka cubes).
Milestone #2: Unified Data and Semantics
The line between business analysts and data scientists will continue to blur. Enterprise-grade machine learning software enables business analysts to conduct more advanced data science research. These data-science-savvy business analysts demand a more powerful data service layer that requires a consolidated view of data across the organization.
Business analysts need to analyze at the "speed of thought" and to interact with their data sets without breaking for coffee every time they submit a query. For today's information worker, the speed of thought is not fast enough. With the general availability of machine learning algorithms, we need a data service layer that can deliver insight at machine speed.
This means we need to look beyond today's typical data service architecture, which usually includes data warehouses and/or data lakes with some query engines, plus data science frameworks that read and process large amounts of data. A consolidated data-service layer should be able to serve both human analytics and machine learning workloads, with unified semantics across the enterprise, at the speed 10x or 100x faster than today's most commonly used data technologies. This takes the form of a Unified Semantic Layer.
Milestone #3: Analytics Everywhere and Anywhere
We are seeing multicloud and multiplatform strategies becoming more popular in the enterprise. This is the logical result of data gravity, where a created data set becomes so essential to a company, application, or industry that analytics solutions must grow around the data rather than the other way around.
The most common multicloud approach is to choose different clouds for different applications. This approach allows enterprises to choose the best-of-breed services from different cloud vendors. In a broader context, multicloud can refer to a mix of public cloud, private cloud, and on-premises architectures.
The challenge of this approach is that you have a new set of data silos on a grand scale with different supporting cloud infrastructure. New regulatory requirements (such as the GDPR) make it even harder to connect those silos. An enterprise data service layer that is cloud neutral as well as multicloud friendly becomes essential to enable pervasive analytics without undo operational friction.
Expect to see more customers that will be interested in the capabilities of analyzing data in a multicloud environment, supporting multiple data platforms (data warehouses, data lakes, cloud storage) that enables the ability to merge, unify, compare, and summarize across data platforms and cloud boundaries.
This type of multicloud, multiplatform analytics is foundational to pervasive analytics and therefore must be highly automated and business user friendly. We can't repeat the same mistakes with the technologies that require heavy data engineering efforts. We should expect tighter integration with enterprise data catalogs, a unified semantic layer across cloud boundaries, and the advancement of data governance that is also multicloud ready.
Walking the Path to Pervasive Analytics
The story of pervasive analytics -- or insights at your fingertips -- requires breaking down logical barriers that exist for every cloud and data platform. This means that there is a logical data layer that must exist outside of these platforms that also integrates with these platforms.
This is part of what the authors of Apache Kylin set out to do with a precomputation layer that sits outside the well-known SQL data platforms, cloud data warehouses, and cloud storage service offerings. This is a distributed architecture that many organizations will have to turn to in 2021 to enable intelligence at all levels and layers of their applications and analytics stacks.
Luke Han is co-founder and CEO at Kyligence, a leading metrics store provider. He is also the co-founder and Project Management Committee member for Apache Kylin. Prior to Kyligence, Han was the big data product lead at eBay and chief consultant at Actuate China. You can contact the author on Twitter or LinkedIn.