TDWI Articles

The Outlook for Data Warehouses in 2023: Hyperscale Data Analysis at a Cost Advantage

As the push to become more data-driven intensifies, enterprises will be turning to hyperscale analytics.

The challenge for business leaders as they look to build on digital transformations is not that they need more data for decision-making. Most businesses already have enough data -- and it just keeps growing.

For Further Reading:

Why It’s Time to Consider a Hyperscale Approach to Data Analytics and Operational Intelligence

The 3 Vs and Unstructured Data Analytics

How to Overcome the Insights Gap with AI-Powered Analytics

What organizations really need are better ways to manage the terabytes, petabytes, and, in some cases, exabytes of data being generated by their users, customers, applications, and systems. They are looking to turn raw data into actionable data and do so without experiencing the escalating costs associated with consumption-based cloud pricing, where expenses can rise sharply with use.

In 2022, we have seen CIOs already start to navigate a tough global economy. Businesses of all sizes are looking for deployment options and licensing terms that let them do more with more data but without runaway costs.

Heading into 2023, the way many organizations will become more data-driven is through modernization of their data warehouses, pipelines, and tools. They will adopt new, cloud-native platforms that are not only faster and more scalable but also engineered for increasingly complex data sets that are integral to digital business. Here are the most important trends worth noting.

Trend #1: Hyperscale will become mainstream

Big data keeps getting bigger. For the past 20 years, enterprise databases have been measured in terabytes. These days, a growing number of organizations are dealing with petabytes of data, a thousand times more. A select few are wrangling exabytes -- a million terabytes.

In other words, data-intensive businesses are moving beyond big data into the realm of hyperscale data, which is exponentially greater. That requires a reevaluation of data infrastructure.

What is driving this kind of data at super scale? More data is being created by more sources -- autonomous vehicles and telematics, sensor-enabled IoT networks, billions of mobile devices, healthcare monitoring, smart homes and factories, 5G networking, and edge computing, to name just a few.

The technology teams responsible for growing data volumes can see the writing on the wall -- even if their databases are not petabyte-scale today, it’s only a matter of time before they will be. For this reason, scalability and elasticity -- the ability to add CPU and storage resources instantaneously -- have become top priorities.

There are many ways to scale up and scale out, from adding server and storage capacity on premises to auto-scaling “serverless” cloud database services to manually provisioning cloud resources. In 2023, data warehouse vendors are sure to develop new ways to build and expand these systems and services.

It’s not just the overall volume of data that technologists must plan for, but also the burgeoning data sets and workloads to be processed. Some leading-edge IT organizations are now working with data sets that comprise billions or trillions of records. In 2023, we could even see data sets of a quadrillion rows in data-intensive industries such as adtech, telecommunications, and geospatial.

Hyperscale data sets will become more common as organizations leverage increasing data volumes in near real time from operations, customers, and on-the-move devices and objects.

Trend #2: Data complexity will increase

The nature of data is changing. There are both more data types and more complex data types with the lines continuing to blur between structured and semistructured data.

At the same time, the software and platforms used to manage and analyze data are evolving. New purpose-built databases specialize in different data types -- graphs, vectors, spatial, documents, lists, video, and many others.

Next-generation cloud data warehouses must be versatile -- able to support multimodal data natively to ensure performance and flexibility in the workloads they handle.

The need to analyze new and more complex data types, including semistructured data, will gain strength in the years ahead, driven by digital transformation and global business requirements. For example, a telecommunications network operator may look to analyze network metadata for visibility into the health of its switches and routers, or a shipping company may want to run geospatial analysis for logistics and route optimization.

Trend #3: Data analysis will be continuous

Data warehouses are becoming “always on” analytics environments. In the years ahead, the flow of data into and out of data warehouses will be not just faster but continuous.

Technology strategists have long sought to utilize real-time data for business decision-making, but architectural and system limitations have made that challenging, if not impossible. Also, consumption-based pricing could make continuous data cost prohibitive.

Increasingly, however, data warehouses and other infrastructure are offering new ways to stream data for real-time applications and use cases.

Popular examples of real-time data in action include stock-ticker feeds, ATM transactions, and interactive games. Now, emerging use cases such as IoT sensor networks, robotic automation, and self-driving vehicles are generating more real-time data that needs to be monitored, analyzed, and utilized.

The Year Ahead: Both Strategic and Cost Advantages

In 2023, the data warehouse market will continue to evolve, as businesses seek new and better ways to manage expanding data stores that, for a growing number of organizations, will reach hyperscale.

It’s not just more data but the changing nature of data -- increasingly complex and continuous -- that will compel data leaders to reassess their strategies and modernize their platforms.

Even so, there are limits to what businesses will spend for petabyte- and exabyte-size data warehouses. They must provide both strategic advantages and cost advantages. In 2023, the data warehouse platforms that can do both are most likely to win in the market.

About the Author

Chris Gladwin is the CEO and co-founder of Ocient , whose mission is to provide the leading platform the world uses to transform, store, and analyze its largest data sets. In 2004, Chris founded Cleversafe, which became the largest object storage vendor in the world according to IDC. The technology Cleversafe created is used by most people in the U.S. every day and generated over 1,000 patents granted or filed. Chris was the founding CEO of startups MusicNow and Cruise Technologies and led product strategy for Zenith Data Systems. He started his career at Lockheed Martin as a database programmer and holds an engineering degree from MIT. You can reach Chris via email, Twitter, or LinkedIn.


TDWI Membership

Accelerate Your Projects,
and Your Career

TDWI Members have access to exclusive research reports, publications, communities and training.

Individual, Student, and Team memberships available.