Why 2022 Will Be About Databases, Data Mesh, and Open Source Communities
From database operations to data lake integrations to data mesh, data and analytics tooling and practices will see significant advances in 2022.
- By Ben Bromhead
- December 15, 2021
The coming year will bring to a head a series of transformative trends around how enterprises harness the power of their data. Organizations should anticipate the arrival of powerful AI/ML predictive capabilities within the database itself, but many will end up taking the long view on the potential of AIOps. The utilization of data lakes will mature significantly, with a more acute focus on data integration.
These two snowballing trends will, in turn, advance the concept of a decentralized "data mesh" as a term of art for vendors and adopters alike to describe a microservices-like distribution of data management responsibilities. Finally, open source communities across the data management and analytics space are poised to flex growing muscles in the coming year amid recent vendor shenanigans over open source licensing.
Trend #1: Machine learning enables predictive database indexing, analytics, and more
The same machine learning and predictive analytics capabilities driving new approaches across the information technology space will bear fruit when applied to the database itself in 2022. Why is this such an essential development for data and analytics professionals? Because the complexity of database management that's been overlooked for far too long will finally be addressed with ML. The parameters of this huge challenge: data design is quite flexible, data usage patterns are difficult to predict, and storage management is out of the database's hands. Traditionally, expert database administrators analyze traffic patterns, measure storage growth, then apply their knowledge in the pursuit of performant queries.
In contrast, ML-powered solutions can actually create data indexes, perform reindexing, and manage storage using predictive models able to guess where data sits. Existing ML products such as HypoPG and GitHub Copilot are already pursuing this capability. These examples are thus far immature, but such is the nature of nascent ML technologies. It's only a matter of time until ML training sets and iterative improvements result in predictive indicators and outputs that are acceptable, then impressive, and then essential to database operation as we know it. Much progress down that road will occur during 2022.
However, although AI/ML-powered indexing, workloads, and capacity management are quite promising, I expect the excitement around AIOps (operations and predictive remediation powered by ML-based decisions) to cool off and pass through a "trough of disillusionment" in its hype cycle in 2022. AIOps products have reached the market, but reality has thus far fallen short of the potential promise. Again, AI/ML is only as good as its training set, and AIOps as envisioned may certainly arrive eventually -- just likely not in the coming year.
Trend #2: The data lake integration ecosystem flourishes, ushering in the age of data mesh
Data lakes will continue their dominance as essential for enabling analytics and data visibility; 2022 will see rapid expansion of a thriving ecosystem around data lakes, driven by enterprises seeking greater data integration. As organizations work out how to introduce data from third-party systems and real-time transactional production workloads into their data lakes, technologies such as Apache Kafka and Pulsar will take on those workloads and grow in adoption.
Beyond introducing data to enable BI reporting and analytics, technologies such as Debezium and Kafka Connect will also enable data lake connectivity, powering services that require active data awareness. Expect that approaches leveraging an enterprise message bus will become increasingly common as well. Organizations in a position to benefit from the rise of integration solutions should certainly move on these opportunities in 2022.
Related to this trend (and to Trend #1 as well): the emerging concept of a data mesh will really come into its own in 2022. As an approach, a data mesh applies the principles of modern distributed architecture to the management of analytics data. It's founded on the ideas of decentralization and the distribution of responsibility for data within a given domain.
Organizations have different uses for data, whether it's BI or core business drivers or ML predictions. With a data mesh, the appropriate team can take responsibility for the data that means the most to them. Already we're seeing vendors trading on the data mesh as a concept in customer-facing communications, discussing how their products interact with customer organizational structures and how they provide teams with appropriate self-service data access. Organizations should expect to see much more formalization around the data mesh as a concept in the coming year, as well as emerging opportunities to implement its advantages.
Trend #3: Where data and analytics vendors steer solutions away from open source licensing, communities will steer them back
Open source solution vendors playing shenanigans by switching those data-layer technologies to more proprietary licensing has, unfortunately, been a theme in recent years. At the same time, the open source communities around those technologies now have more will and the wherewithal to support robust forked versions and move projects forward without the original vendors.
Arguably the clearest recent example of this is Elasticsearch, where Elastic.co's shift away from pure open source licensing led the community to quickly demonstrate its power by releasing OpenSearch, the distributed open source search and analytics suite.
In 2022, we'll see the tension of these scenarios come to a head. With the Elasticsearch example, the next year or so will tell whether users stick with Elastic.co as the dominant vendor and original creator or move to support the fully open source version. As of now, we're seeing organizations increasingly doing the latter. The strength of the OpenSearch community is already driving innovative new enterprise functionality as well.
It's crucial for organizations to remain aware of any licensing changes affecting solutions they rely upon for data technologies, as well as any open source options that become available to them due to this trend. If communities do prove their power to bring solutions back into the open source fold, it will discourage future licensing shenanigans and further ensure that valuable features remain available to everyone.
A Final Word
From database operations to data lake integrations to data meshes, data and analytics tooling and practices will see significant advances in 2022. Organizations would be wise to foresee how to best harness these trends and prepare for what's just around the corner.
About the Author
Ben Bromhead is the chief technology officer at Instaclustr, which provides a managed platform around open source data technologies. Prior to Instaclustr (which he co-founded in 2013), Ben had been working as an independent consultant developing NoSQL solutions for enterprises, and he ran a high-tech cryptographic and cybersecurity formal testing laboratory at BAE Systems and Stratsec. You can contact the author via LinkedIn or Twitter.