How Data Dreams Will Finally Come True in 2021
If you've ever dreamt about smarter data, data democratization, or easier data modeling, 2021 could be the year your dreams become reality.
- By Dan Bruns
- December 18, 2020
When it comes to predictions for 2021, I'm sure many of my colleagues will talk about "Private Cloud dbPaaS," "Data Hub Strategies," "Event Stream Processing," or any other combination of emerging buzzwords. That reminds me of a recent Dilbert I came across. It went something like this:
Pointy-haired boss: Do we have any actionable analytics from our big data in the cloud?
Dilbert: Yes, the data shows that my productivity plunges whenever you learn new jargon.
Pointy-haired Boss: Hmm, maybe in-memory computing will accelerate your applications…
The trends we've listed below, by contrast, come from our real-world feedback and interactions with our customers and community. Overall, customers are becoming much more data savvy, and in many companies are ready to take the plunge into more complex areas of data management that were traditionally handled by specialized resources within IT.
Remember the dawn of the Big Data era, when the goal of every data group in an organization became capturing every little bit of data that existed, regardless of whether there was a valid use case for it? I was head of technical marketing at a large web business back when the big data idea took off. Our IT group saved all the web server logs, which ostensibly gave us insight in to all of our customers' browsing habits -- the classic "let's save it and figure out what to do with it later" situation.
They were asking us in marketing what we could do with the data. We already had the number-of-widgets-sold data stored within the reporting systems; they were looking for something more advanced. "OK," I said, "using this data, we should be able to write a model that can tell us, based on back-and-forth clickstream data without purchasing, who our frustrated customers are, and then we can send specialized emails to them to bring them back." Silence. "Well, the data we're capturing won't help with that," they replied.
Last I heard they're still collecting the log files and are still waiting for a use case that they can solve for ... seven years later!
In contrast, we're beginning to see more companies starting where they should: with the use case. "What are you trying to measure, sell, or improve?" and then creating a data sourcing strategy from that, with the business taking much more of a lead in that space. Enterprises are learning that they don't need to collect every little bit of data to be successful. Accessibility, expediency, and timeliness of data are much more important now -- the classic "quality versus quantity" pendulum has swung back to "quality."
True Democratization of Data
Along with the idea of smarter data, business groups are much more technically sophisticated today. They want access to their data as well as data from other groups within their organization (or even outside of it) very quickly. Instead of collaborating on the next dashboard or visualization, we're seeing groups collaborate a few levels deeper -- into the data warehouse itself.
Although this idea may have been just a pipe dream in the past, today's data warehouse automation tools enable these technical people, who might not have traditional data warehousing experience, to build and manage their own data warehouses automatically, and most important, very quickly. Time to market has been drastically reduced by putting the data in the hands of people who'll use it from the outset.
I was part of another data warehouse project at a different company a few years ago where the IT group was going to build the "one data warehouse to rule them all." It was an ambitious project. However, the time horizon until any data was ready for use was two years! Businesses need to move much faster than that today. By the way, that project ultimately failed 18 months in. One of the main reasons it failed was because of the difficulty of using the data (to get the name, address, and phone number of our customers took 13 joins!) due to the complexity of the data model.
Easier Data Modeling
Remember the days of third-normal-form (3NF), with its arcane definitions and hundreds of tables spread across the warehouse? It's beautiful from an academic perspective, but it is usually very difficult to use. The days of slower disks and processors necessitated that table arrangement in the name of efficiency.
However, processor power and disk speed have grown by orders of magnitude since the days these architectures were designed. As such, some efficiency can be sacrificed in the name of usability. In fact, we're seeing a strong resurgence of the Kimball dimensional model because it strikes a good balance between usability and efficiency. Our community wants the data arranged using data warehousing best practices, of course, but they also want to be able to understand everything that has been loaded and how it all fits together.
A Closing Word
We're finally seeing business groups taking a much more proactive role in data management and use, which is about 10 years in the making, but definitely welcomed. Everything old is new again!
Dan Bruns is the founder and president of Pyramart where he is responsible for driving the strategy of automated data warehousing. You can reach the author via email or on his website at Pyramart.com.