Conducting Post-COVID-19 Analytics with Limited Data
The past year left companies with a dearth of reliable information on their customers, but there are ways to make the most of what you do have.
- By Kaycee Lai
- September 10, 2021
In these post-COVID-19 times, your analytics program may be running into a serious issue -- highly limited data. Think about it. Customer behavior during the lockdown was obviously markedly different from what it looked like pre-2020. As we emerge from the lockdown, is it possible to model customer behavior on this data? If not, we can't simply go back to using pre-lockdown data. Customer behavior has changed markedly in the last 18 months; the data we have may no longer reflect current behavior.
That leaves companies with a very short span of data that might be relevant. How do we conduct analytics with such limitations? For companies -- many of which are already struggling due to lost revenue during the pandemic -- the stakes are high for getting it right. Analytics-driven insights based on irrelevant data could have serious repercussions, resulting in poor marketing, customer service, or product development.
To be clear, there are no easy answers to this problem. However, there are ways to maximize the resources at hand and, even in what you might call a "data dearth," still derive keen insights. Here are a few things your organization can consider.
Shorten the Data Analytics Life Cycle
When your data is limited, your ability to iterate quickly is key. You may not have insight on long-term trends, so look for short-term trends instead. Get smaller answers quickly based on what you know now, and continually adjust as new insights become available.
To do this, you can't afford to wait the standard three months for analytics insight. You need answers now. Here are some tactics that can help in shortening the life cycle.
Provide full visibility into the process. When a business user poses a question to the analytics team, make sure that they're not waiting in the dark for the answer. Give them full visibility into the project's progress. This not only holds the analytics team accountable, but also allows a stronger partnership between analytics and business that allows everyone to exchange knowledge. You will inevitably find that business users have information that helps guide the analytics process beyond what they may have offered upfront.
Additionally, automate as much data prep as possible. Currently data scientists and analysts spend an inordinate amount of time preparing data for analytics. This might include removing bad data, making sure syntax is consistent, and identifying outliers, among other tasks. Leverage your available data-prep automation tools to shorten this process.
Make Sure You Have Access to All Relevant Data
When you're already dealing with limited data, the last thing you want to do is let any potential valuable information go to waste. This is often the case when an organization's data is highly fragmented and siloed. Here are two ways to remedy the problem..
First, combine data sets from federated sources. Data sets by themselves may not have the value you need, but when they're combined with attributes from other data sources -- internal or external -- the answers you need may be found. For example, a company that sells ice cream through grocery outlets and restaurants might have data on their internal sales for the past two months. Simply analyzing these sales might produce deceptive results, however, because the data may lack context. How do we know, for instance, what really caused a dip in sales in mid-June?
Such context could be provided by combining the data with historical weather data to look for correlations. Additionally, sales might be correlated with news cycle data. For instance, you might discover that on days when there was an unusually high amount of news about COVID-19 variants, sales from restaurants dipped.
Combining data from such varied sources, however, can be tricky. Software that allows you to run SQL queries across multiple databases stored in different locations can enable your analytics team to accelerate combining data from different tables for analysis.
Second, you can integrate external data to supplement your internal sources. If data is lacking, you may be able to bridge some of the gap by integrating external data sources with your internal sources. This could include various paid subscription sources such as Nielsen, and it could include public sources such as government or other open data sets. It also may involve data from surveys already conducted or which you commission through vendors.
Additionally, social media platforms provide a treasure trove of external data insights. Although you can collect some of this social data yourself, you can also work with services that specialize in collecting and making this kind of data available
Here the challenge is less about acquiring the data than with integrating it with current data sources. To do this efficiently, chief data officers may need to rethink what it means to make data ready for analysis. Do we take the time and resources and incur the risk required to pull external data into internal data repositories in a massive warehousing project or do we use more forward-looking techniques that leverage virtualization to create a data-fabric-styled abstract layer that allows data analytics teams to draw from both internal and external sources as if they were a single source?
According to Gartner's Ashutosh Gupta, "A data fabric utilizes continuous analytics over existing, discoverable and inferenced metadata assets to support the design, deployment, and utilization of integrated and reusable data across all environments, including hybrid and multicloud platforms."
Such an approach would greatly speed up and simplify the data mining process by, among other things, removing the need for labor-intensive ETL (extracting, transforming, and loading data into a warehouse or other repository).
Bootstrapping for Better Insights from Smaller Data Sets
Generally in data science, more data yields more accurate results. However, you don't always have large data sets available to analyze. If you're dealing with only a month or two of data, your data sets will inevitably be smaller than you'd like. The problem of small data sets isn't unique to the COVID-19 era, and over the years some fairly innovative techniques have been developed to deal with it.
Bootstrapping involves repeatedly drawing random samples from a data set for analysis. Rather than just using the whole data set once and running the analysis, you run a series of analyses (using the same technique) multiple times on randomly sampled portions of the data set. Essentially you're treating the samples of the data set as data sets in and of themselves. The final analysis is based on the consensus of the combined analyses from the random samples.
Techniques for bootstrapping can be easily implemented in statistical and developer languages such as R and Python through freely available code libraries such as Scikit-learn.
Although there may be natural limits to the accuracy of the insights on consumer behavior that you enterprise can get during these rather unusual times, there are things you can do to ensure that you obtain the best possible results. Ensure you get access to all of your data as well as external data sources that might be useful, shorten the time it takes to analyze it, and use techniques that allow you to get more accurate insights from smaller data sets. Companies that take this approach will gain a significant advantage over the competition as we all race to adjust to the new normal.