By using tdwi.org website you agree to our use of cookies as described in our cookie policy. Learn More

TDWI Upside - Where Data Means Business

Trends Data Analytics Professionals Should Pay Attention To In 2021

Predictions for AI and ML trends in data access, understanding new data, and executing information based on the data.

It’s that time of year again for prognosticating trends and making annual technology predictions. As we move into 2021, there are three trends data analytics professionals should keep their eyes on: OpenAI, optimized big data storage layers, and data exchanges. What ties these three technologies together is the maturation of the data, AI, and ML landscapes.

For Further Reading:

AI at Long Last?

Why Enterprises Should Take a Layered Approach to Their Data Strategy

Managing the Data Lake Monster

Because there already is a lot of conversation surrounding these topics, it is easy to forget that these technologies and capabilities are fairly recent evolutions. Each technology is moving in the same direction -- going from the concept (is something possible?) to putting it into practice in a way that is effective and scalable, offering value to the organization.

I predict that in 2021 we will see these technologies fulfilling the promise they set out to deliver when they were first conceived.

#1: OpenAI and AI’s Ability to Write

OpenAI is a research and deployment company that last year released what they call GPT3 -- artificial intelligence that generates text that mimics text produced by humans. This AI offering can write prose for blog posts, answer questions as a chatbot, or write software code. It’s risen to a level of sophistication where it is getting more difficult to discern if what it generated was written by a human or a robot. Where this type of AI is familiar to people is in writing email messages; Gmail anticipates what the user will write next and offers words or sentence prompts. GPT3 goes further: the user can create a title or designate a topic and GPT3 will write a thousand-word blog post.

This is an inflection point for AI, which, frankly, hasn’t been all that intelligent up to now. Right now, GPT3 is on a slow rollout and is being used primarily by game developers enabling video gamers to play, for example, Dungeons and Dragons without other humans.

Who would benefit from this technology? Anyone who needs content. It will write code. It can design websites. It can produce articles and content. Will it totally replace humans who currently handle these duties? Not yet, but it can offer production value when an organization is short-staffed. As this technology advances, it will cease to feel artificial and will eventually be truly intelligent. It will be everywhere and we’ll be oblivious to it.

#2: Optimized Big Data Storage Layers

Historically, massive amounts of data have been stored in the cloud, on hard drives, or wherever your company holds information for future use. The problem with these systems has been finding the right data when needed. It hasn’t been well optimized, and the adage “like looking for a needle in the haystack” has been an accurate portrayal of the associated difficulties. The bigger the data got, the bigger the haystack got, and the harder it became to find the needle.

In the past year, a number of technologies have emerged, including Iceberg, Hudi, and Delta Lake, that are optimizing the storage of large analytics data sets and making it easier to find that needle. They organize the hay in such a way that you only have to look at a small, segmented area, not the entire data haystack, making the search much more precise.

This is valuable not only because you can access the right data more efficiently, but because it makes the data retrieval process more approachable, allowing for widespread adoption in companies. Traditionally, you had to be a data scientist or engineer and had to know a lot about underlying systems, but these optimized big data storage layers make it more accessible for the average person. This should decrease the time and cost of accessing and using the data.

For example, Iceberg came out of an R&D project at Netflix and is now open source. Netflix generates a lot of data, and if an executive wanted to use that data to predict what the next big hit will be in its programming, it could take three engineers upwards of four weeks to come up with an answer. With these optimized storage layers, you can now get answers faster, and that leads to more specific questions with more efficient answers.

#3: Data Exchanges

Traditionally, data has stayed siloed within an organization and never leaves. It has become clear that another company may have valuable data in their silo that can help your organization offer a better service to your customers. That’s where data exchanges come in. However, to be effective, a data exchange needs a platform that offers transparency, quality, security, and high-level integration.

Going into 2021 data exchanges are emerging as an important component of the data economy, according to research from Eckerson Group. According to this recent report, “A host of companies are launching data marketplaces to facilitate data sharing among data suppliers and consumers. Some are global in nature, hosting a diverse range of data sets, suppliers, and consumers. Others focus on a single industry, functional area (e.g., sales and marketing), or type of data. Still, others sell data exchange platforms to people or companies who want to run their own data marketplace. Cloud data platform providers have the upper hand since they’ve already captured the lion’s share of data consumers who might be interested in sharing data.”

Data exchanges are very much related to the first two focal points we already mentioned, so much so that data exchanges are emerging as a must-have component of any data strategy. Once you can store data more efficiently, you don’t have to worry about adding greater amounts of data, and when you have AI that works intelligently, you want to be able to use the data you have on hand to fill your needs.

We might reach a point where Netflix isn’t just asking the technology what kind of content to produce but the technology starts producing the content. It uses the data it collects through the data exchanges to find out what kind of shows will be in demand in 2022, and then the AI takes care of the rest. It’s the type of data flow that today might seem far-fetched, but that’s the direction we’re headed.

A Final Thought

One technology is about getting access, one is understanding new data, and one is executing information based on the data. As these three technologies begin to mature, we can expect to see a linear growth pattern and see them all intersect at just the right time.

About the Author

Nick Jordan founded Narrative in 2016 after spending nearly a decade in data-related product management roles; he saw an opportunity to create a platform that eliminates inefficiencies in data transactions. Prior to Narrative, he led product and strategy at Tapad, where he helped evolve the company from a media business into a data and technology licensing business. (Tapad was acquired by Telenor in 2016.) Before joining Tapad, Jordan ran product management at Demdex, a data management platform, prior to its acquisition by Adobe in 2011. He also held roles at Yahoo! running pricing and yield management for newly acquired assets such as Right Media. You can contact the author via LinkedIn or Twitter.


TDWI Membership

Accelerate Your Projects,
and Your Career

TDWI Members have access to exclusive research reports, publications, communities and training.

Individual, Student, and Team memberships available.