Kubernetes, Multicloud, and Low-Code Data Science: 2020's Hottest Data Trends
Three trends data professionals should pay attention to this year.
- By Steven Mih
- January 3, 2020
Today's data technologies are paving the way for the next step in our data journey. We've seen Kubernetes lead the way in application automation, more companies bet on the cloud, and how much today's enterprise relies on data science. Couple that with the amount of advanced analytics and artificial intelligence data being generated and we see the door open for even more opportunity in the data management space. Here are some of the biggest data trends I look forward to seeing in 2020.
2020 Trend #1: Kubernetes will drive more operational AI in 2020
When it comes to advanced analytics and AI, 2020 will bring the "Kubernetifying" of the analytics stack. "Kubernetifying" the analytics stack solves data sharing and elasticity challenges by moving data from remote data silos into K8s clusters for tighter data locality. Although containers work well for stateless applications such as web servers and self-contained databases, there's still room for growth in Kubernetes when it comes to advanced analytics and AI. In 2020, Kubernetes will become a critical piece in driving operational AI workloads.
As the analytics stack has shifted from SQL to the tightly coupled relational database to Hadoop to the cloud, it has become more disaggregated. The original database core elements can be their own standalone system or layer. Technologies such as Kubernetes allow for these different pieces to be put together in a way that simplifies running applications in any environment and transforms the way software and applications are deployed and scaled, agnostic of environments.
Now as we look at where today's data trends have taken us -- advanced analytics and AI in particular -- we see a greater need for distributing model training and processing. This requires orchestrating data in and out of your Kubernetes deployment. This is a hard problem to solve because of how today's modern analytics stack is split apart. Data lakes (S3, HDFS, GCS, etc.), computational frameworks (Apache Spark, Presto, Hive, Tensorflow, etc.), and other dependencies such as catalog services (Hive Metastore, AWS Glue, KMS, etc.) all live and are managed on their own. As Kubernetes drives more operational AI, data orchestration technologies will become a critical piece of this trend.
Kubernetes simplifies the complexity of deploying so many distributed systems together, but with disaggregation becoming more commonplace, we'll see more advanced and operational AI running on K8s clusters. The next set of challenges to solve will be data access, data locality, and data elasticity.
To prepare for operational AI in Kubernetes, look at technologies that enable data access in Kubernetes for remote data. Bringing data locality back into the environment is critical for tomorrow's AI workload requirements.
2020 Trend #2: Data science will be simplified thanks to no-code/low-code technologies
In 2020, simplified data science will advance thanks to no-code/low-code technologies. We're seeing more advanced analytics and AI use in the enterprise than ever before; companies are betting their businesses on data-driven insights derived from AI and ML. Today, to get such in-depth analysis and insights on vast amounts of data, you need a data scientist or engineer -- someone with extensive programming skills and a very deep knowledge of mathematics. As you can imagine, these types of people are in very high demand and short supply.
What will 2020 bring?
AI in minutes, not weeks. This is what companies want to achieve, despite a limited supply of professionals who can deliver that. As a result, we'll see more technologies that make it possible for the end user (in most cases the business or data analyst) to glean deep insights from data on their own. These no-code or low-code technologies will bring machine learning to the forefront and make services smarter so the business isn't reliant on individuals with specific expertise. Instead of building and deploying models, for instance, we will see "bring your own model and we'll run the training on it for you" autonomous technology.
Technologies such as Google's Cloud AutoML (a "no-coding AI trainer") and Teachable Machine 2.0 (an on-ramp for new ML practitioners) are among the projects we've seen pop up lately. Such technologies will empower nontechnical end users to implement and run models while avoiding mistakes (which happen quite frequently when building AI models).
This year we saw several new technologies pop up in the low-code AI space from companies C3.ai, Mendix, and Appian, all of which tout low-code platforms that require little to no coding experience while increasing developer productivity.
If this approach is right for you, make sure your no-code/low-code technology has built-in application logic, a managed or a declarative layer that sits above it, and a framework underneath it that works with the data sets and modules.
2020 Trend #3: Cloud powerhouses will focus on multicloud
We have been hearing people talk about the hybrid cloud for the past three years and multicloud has been coming up more over the past year. For the most part, it's only been talk, but in 2020 that changes. We'll see a marked increase when it comes to deploying and leveraging multicloud environments.
We're in the beginning phases of major cloud providers (AWS, GCP, Azure) bringing technologies to market that enable and power multicloud deployments. Not only are they encouraging the use of multicloud deployments even with a competitor, but they've built tools to enable them.
Microsoft Azure Stack Hub allows users to leverage Azure cloud services in their own data centers, and Microsoft just recently announced Azure Arc, a multicloud management layer that extends Azure to other public cloud platforms such as AWS and GCP.
AWS Outposts allows users to run AWS infrastructure on premises for a multicloud/hybrid architecture. Users can leverage any AWS service, infrastructure, or operating model in any data center, colocation space, or on-premises facility.
Google Anthos "brings the cloud to you" and enables applications to run in Google Cloud, in a private data center (which it does), and/or in the other public clouds (Azure and AWS). It enables users to be truly cloud agnostic and multicloud.
Cloud providers are acknowledging that different use cases demand different environments and are building offerings that allow their users to flexibly move from one data center (cloud) to another. This is a far cry from pushing all-in-one cloud for everything.
The benefits of technologies like these are immense. Users get the freedom to deploy, run, and manage their applications with ease anywhere they want while meeting business and technical requirements. Gone are the days of having to learn different environments and different APIs. These technologies allow enterprises to avoid vendor lock-in, get better performance (less latency) by using data centers closer to customers, help comply with data governance requirements (e.g., the GDPR), and provide resilience when outages happen.
In 2020 we'll see enterprises doubling down on multicloud environments. Getting started is easy with these types of services becoming more mainstream.
Steven Mih is the CEO of Alluxio. He has over 20 years of experience in sales, business development, and marketing of enterprise technology solutions. His multifaceted go-to-market experience spans leading organizations including Aviatrix, Couchbase, Transitive, Cadence Design Systems, and AMD.