Data Management During and After Coronavirus
Coronavirus will challenge business strategists and data architects to juggle a range of unique circumstances over the coming year and beyond.
- By Barry Devlin
- March 25, 2020
We're in the first few yards of a miles-long journey with the rapidly unfolding COVID-19 pandemic. Although the actual duration remains a subject of intense speculation, there is little doubt that, as a society, we are truly embarking on a marathon rather than a sprint. Indeed, as sociologist and political economist, William Davies, gently points out, "a crisis of this scale will never be truly resolved until many of the fundamentals of our social and economic life have been remade."
The impacts of the crisis fall broadly into three categories: medical, social, and economic. Although the medical impact is both the most severe and distressing, I will focus here on the latter two areas and a number of emerging trends relating to data strategy and architecture -- indeed, all aspects of enterprise architecture -- with implications across a broad swathe of industries, service organizations, and governmental bodies.
Social Distancing to Redefine Distributed Data Strategy
Despite the emerging medical crisis, businesses still need to crunch numbers, analyze trends, gain insights, and make decisions. In fact, in a time of increasing uncertainty, they need to make better decisions with greater urgency and wider import at higher levels of risk. This demands -- more than ever – the highest quality, well-understood data, delivered at speed and in considerable volumes.
Over the past decade, many organizations have developed data delivery and analysis pipelines where the early stages are performed by data scientists and analysts working on premises with data from highly structured warehouses and more fluid lakes. They depend on a physical infrastructure consisting of a mix of powerful local or remote storage and processing connected through high-bandwidth, secure networking. The work often demands powerful workstations with multiple, large visual displays. Cleansed data, prepared information, and pre-baked insights are made available to decision makers who may rework them lightly or more intensively on their laptops or smartphones in downstream self-service work.
Ongoing social distancing and home working by data scientists and analysts disrupts this pipeline, shifting the heavy-lifting processes to widely distributed, largely unsecured home offices on public networks. This presents obvious hardware, software, network capacity, and security infrastructure challenges to IT departments.
Of more significance are the data strategy and architecture questions it raises. Where should the data be stored? Who is responsible for its governance and how will that be enforced? How will these new multilocational pipelines be built and managed? How will problems of duplicate data and change synchronization be addressed?
A simplistic analysis might suggest that businesses with cloud-centric data strategies will be in a stronger position than those that have been slower to move from on-premises approaches. After all, the argument is that cloud data is highly distributed and widely accessible whether from office cubicles or kitchen tables.
However, this leap to the physical solution masks significant governance challenges when sensitive data (in terms of privacy, commercial confidentiality, timeliness, consistency, and other factors) is distributed across on-premises, cloud, and now remote workstation stores and processors. A revamped distributed data architecture -- extending existing data warehousing and data lake concepts, embracing extensive data virtualization, and data cataloging -- will be needed in short order to address these challenges.
Economic Impacts to Drive New Priorities for Development
Early estimates of the economic impact of the pandemic suggest a harder hit than that of the Great Depression on the 1930s. Combined with the recent rise in extreme nationalist and anti-globalization thinking, business strategies will likely turn to self-preservation and cost reduction across a wide range of industries.
The focus for digital transformation in the coming months will likely shift from market domination to basic survival. Analytics will turn its models away from predicting customer satisfaction or upselling and toward optimizing production and overcoming logistics and distribution issues as global supply chains become brittle or break completely.
Some implications for data strategy and data architecture seem clear already. Do more with less. Wring more value out of existing information investments and legacy systems. Stop that expensive migration of a largely functioning data warehouse to the cloud. Data lakes with suspect return on investment may finally be drained, while successful lakes may have to stay on Hadoop for a while longer. Limitless streaming of entertainment videos will be curtailed to prioritize transport of data of real business or societal value.
Other implications are still emerging. With business focus shifted from marketing magic to logistics and operations, data quality requirements may finally come to the fore, especially in data lakes. Real-time data may finally be limited to areas with real-world impact and value, rather than instant gratification of consumer desires. Privacy concerns may have to be set aside in favor of disease tracking and patient management needs.
Time to Reassess Your Data Strategies
It's early days and these ideas are highly speculative at this stage. However, there's little doubt that all our current business and IT certainties are in flux as a result of the Coronavirus pandemic. This is no time to panic, but it certainly is a good time to reassess your data strategies as you set up your home office.
Dr. Barry Devlin defined the first data warehouse architecture in 1985 and is among the world’s foremost authorities on BI, big data, and beyond. His 2013 book, Business unIntelligence, offers a new architecture for modern information use and management.