Choosing the Right Data Architecture
Picking the right data platform for your enterprise can be a challenge. Evaluation at scale may be required.
- By Upside Staff
- June 26, 2023
In a recent "Speaking of Data" podcast, host Andrew Miller talked with Richard Winter about modern data warehouse platforms and architectures for the cloud. Winter is an independent consultant who has been working on data warehouse platforms for the last 20 years. His focus is the data platform at enterprise scale, and he helps customers with strategic decisions around data platforms, including data warehouses and data lakehouses.
Ten years ago, moving to the cloud entailed a large financial commitment. Today, that financial focus has faded. Winter says in the minds of many executives, the big decision is if they’re going to the cloud, followed by which one.
"Once they've made those two decisions, they feel like the big decisions are behind them." However, "if you're going to build a big data warehouse or data lakehouse in the cloud for a major enterprise, that's a multibillion-dollar decision. Over the next several years, you'll spend tens of millions, perhaps hundreds of millions, of dollars building databases, building solutions on top of that platform." Choosing a platform "is a very costly and disruptive decision to change once you've gotten downstream. You're casting your fate at that moment."
True, the cloud offers unlimited resources, but different platforms are engineered and optimized for different problems. "Many customers perceive all cloud data platforms as essentially the same in this way, and they're making their choices based on factors that are important in some situations." Winter warns that "distinctions between cloud platforms tend to get lost, and they get lost partly because everything's invisible -- how big the database is, how complex the database structure is, how many concurrent queries will be running." However, even routine business intelligence reporting may require machine learning or advanced analytics or involve special analytical problems that are extremely challenging at scale.
Choosing the Right Platform
For many enterprises, their data warehouses are small, departmental ones used for a single line of business, performing routine BI and reporting. In these cases, an organization "could choose any platform they wanted. They'd be right to use these more common approaches, where you say, ‘Oh, I'm going to do what my cloud vendor recommends, or I play golf with Joe, and Joe is very happy with such-and-such a data warehouse, so I'm going to use the same one.’ That strategy is fine. If you're building a data warehouse which isn't too big, it isn't too complicated, and doesn't have demanding requirements, any one of the popular platforms will be okay. Then you can do a traditional software evaluation, you can base it on customer satisfaction, price, the services they offer, or whatever it might be.
"However, if you have a strategic requirement, where it may make or break your business, whether the platform delivers the cost-efficiency or the performance or the availability that you really need, then you have to make an engineering-based evaluation."
Some enterprises end up using different platforms for different requirements. Winter explained that there are eight popular platforms he covers in a course he teaches, and "it's not like I would recommend choosing a different one on every data warehouse. I think it's a good idea to have, for your typical data warehouse or data lake requirements, a standard choice and use it as much as possible, just like business intelligence tools. Most companies really don't want to have 50 different tools. They'd like to have two or three and have the customer organization choose the one that fits them best. But for data platform requirements that are demanding, that's when you should do this engineering, architecture-based evaluation."
Sometimes an enterprise can predict what platform satisfies requirements with just an analysis or a modeling exercise. However, Winter warns, when it’s not so predictable, he recommends performing a benchmark or a instigating a pilot program. His rule of thumb: interact with the vendors. Ask about their experience with scale and complexity. Ask for references for customers that look like they'd have a data warehouse or data lake or lakehouse in production that is doing similar things at a similar scale.
"If it turns out that none of those references is close to your scale, doing what you want to do, then you know you're well beyond the frontier of the vendor’s product." If that’s the case, then you need to conduct tests to help control and manage your risk. "The best kind of test is a full-scale, realistic benchmark, and the best case is where you have more than one credible vendor."
Winter recommends testing two or three solutions and comparing the results. You can see if any vendor can demonstrate they have the capability to meet your most critical requirements. If multiple vendors pass this test, then examine differences in cost, complexity, and the agility of the solution. "These differences can be very revealing. Once you've illuminated what's going on via testing, you can get into much deeper conversations with the vendor about what you're seeing in the behavior of the system. We've had remarkable experiences doing this, even with the most modern systems in the cloud."
He cited one example in which when the cloud was updating transactions, it slowed or stopped processing queries from online customers, and sometimes query processing didn’t restart. "It took seeing that actually happening to trigger the conversation with the vendor about why and to get to the bottom of the problem." When the vendor couldn’t explain the problem, it was clear the solution wasn’t right for the enterprise’s application. Winter provided other examples related to query processing that supported his conclusion: "It’s usually only by testing that you can sort out which platform is a good fit for you."
His best advice: it’s all about finding the fit between the platform and the customer’s requirements. "It's a very individual kind of requirements evaluation."
[Editor's notes: You can listen to the full conversation on demand.
Richard Winter is an industry expert in analytics data management at scale. He advises decision makers on the data strategy and the data architecture of the modern data warehouse and the data lake. Winter has been retained to make architecture and platform recommendations or perform engineering tests for over 50 leading enterprises, government agencies, and technology vendors. He is a recognized thought leader and an expert in platform evaluation and benchmarking, having published more than 100 technical reports and articles.
Mr. Winter will be conducting courses about modern data warehouse platforms and about architectures for the cloud at the TDWI Conference in San Diego (August 6-11, 2023).]