TDWI Articles

Q&A: Modernization Strategies for Unstructured Data

In this Q&A, Kumar Goswami, CEO of Komprise, answers questions about data modernization strategies and trends.

As advanced analytics and machine learning drive an increase in the collection of unstructured data, data modernization projects need to be adjusted. In this Q&A, Kumar Goswami, CEO of Komprise, discusses data modernization strategies and trends.

For Further Reading:

3 Use Cases for Unstructured Data

Executive Q&A: Getting the Most from Unstructured Data 

What You Need to Know About Data Modernization and the Cloud

Upside: Let’s start with a big-picture question. Why is unstructured data management a top priority and a big business opportunity for IT leaders in large enterprises?

Kumar Goswami: We’ve reached a point at which the volume of unstructured data has begun to place an unreasonable strain on IT budgets and organizations need a better way to manage it. Managing costs is only part of the problem, though. Corralling the data in a way that can be more easily incorporated into cloud data lakes, machine learning, and other new analytics tools is ultimately an even bigger opportunity.

Unstructured data has traditionally been left out of the business intelligence landscape, but that’s changing. AI and machine learning (ML) rely upon these massive troves of file and object data to feed ML models to drive innovation and research, and ultimately improve outcomes.

We know the amount of unstructured data enterprises are gathering is growing exponentially. What’s holding enterprises back from pursuing an aggressive data management strategy?

The old adage -- if it ain’t broke don’t fix it -- may apply here but to be honest, data management is actually broken and people don’t know it. The trouble is, organizations often don’t have the full picture of their unstructured data. They don’t know which data is the most valuable in terms of access frequency or ownership, or where there are hidden silos of unused data eating up expensive storage.

For example, a large percentage of data could be moved to cheaper storage based on usage. Organizations typically actively use only 20 percent of the data they have in storage. That means the rest of it can go to deep archives or to warm tiers in the cloud that are less frequently accessed and are less expensive. Of course, some of it can be deleted altogether. Complicating matters are all the options in the cloud – keeping up with the change and understanding where you can get the greatest bang for the buck. So to answer your question, I think it’s both a lack of knowledge about data assets and a lack of urgency.

With an analytics approach to data management, IT leaders can start implementing a nuanced strategy that takes into account current and future data value. The analytics approach entails understanding your data profile before you take any actions on it, such as migrating or tiering it to the cloud. The key factors to know include age of data, access frequency, file types and owners, where it is stored, and costs. After compiling the analytics view of the data, storage IT people can meet with the data owners to understand the priority of their data and other requirements to inform data management plans. In this way, you can preserve space on your expensive data center storage, move more data to the cloud, and meanwhile consider those cloud data buckets for new uses (such as analytics) rather than just “migrate and forget.”

This brings us to the future of data management -- or as it could be called, data modernization. How do you define data modernization and what are the primary components of a comprehensive data modernization strategy?

A data modernization strategy must consider both data curation and data consumption. The first step is to recognize your current situation and find ways to move from a storage-centric to a data-centric approach. The journey begins in a somewhat disorganized data environment where organizations have minimal visibility and few, if any, insights across the entire data storage ecosystem. At this phase, costs are high and visibility across the data infrastructure is low.

For Further Reading:

3 Use Cases for Unstructured Data

Executive Q&A: Getting the Most from Unstructured Data 

What You Need to Know About Data Modernization and the Cloud

As organizations’ strategies mature and move away from a siloed approach to managing data rather than storage, they begin to treat data differently based on its characteristics, placing it into optimal storage targets through cloud tiering, as well as embracing intelligent life cycle management strategies. Automated policy management follows, along with automated tagging to further refine data for search and action.

Ultimately, modernization fuels data monetization by curating data for it to be easily and cost-effectively used by analytics applications on premises or in the cloud. IT then has the freedom to move data wherever it needs to go, whenever needed, without incurring unreasonable costs or hassle.

Should all big companies be thinking about data modernization or just companies in select, data-heavy industries? What are some signs that a company might need to pursue data modernization?

It has almost become a cliché to say that every company and public sector organization should become data-driven. So-called “data-heavy industries” have the particular burdens of managing legacy IT infrastructure, such as expensive storage hardware, many disparate silos, and internal politics. Smaller companies don’t have the technical debt but they do often require a top-down mandate to be data-driven, which then becomes core to the culture.

Signs that you need to pursue a data modernization strategy include:

  • Disagreements about who has the right data
  • Finger pointing and anecdote-driven decision-making
  • Skyrocketing storage costs and complexity
  • Expensive, long, and slow backups and restores
  • Spiraling cloud costs
  • Frustrated data consumers who cannot access and analyze the right data at the right time in the right format

Does a data modernization strategy automatically assume that all of an organization’s data should be moved to the cloud?

No, not at all. For the foreseeable future there’s going to be a continual push and pull between on premises and the cloud. For example, if an organization is suddenly extra concerned about security, they might want to pull the more sensitive cloud shares and workloads back into their data center so they can have ultimate control.

How should an organization be thinking about the use of the cloud for moving, managing, and storing data? Which cloud technologies should make up a company’s data modernization stack?

IT organizations at large enterprises really need the flexibility of the cloud because they have so many different requirements for data based on geographical and departmental needs. As for specific technologies, it depends on the business. It’s important to take advantage of both cloud file storage and cloud object storage, because each has distinct advantages based on data requirements. Plus, there is significant cost, expertise, and time advantages of using the AI/ML tools in the cloud.

Once an organization has committed to data modernization, what are the first steps toward implementing the plan? What are some best practices and potential pitfalls?

Not surprisingly, a data modernization strategy starts with knowing what data you have and building a plan for how to turn it into insight and value. However, this needs to include unstructured data, which has not typically been part of the broader data modernization strategy. Instead the focus has been on managing costs and ensuring reliable access and data protection.

Although these continue to be priorities, with the rise of cloud analytics and data lakes, as well as AI and ML engines, accessing and analyzing unstructured data has grown in importance. I recommend first segmenting your unstructured data needs into two groups:

-- IT insights: Percentage of hot versus cold data, owners, costs, and data placement savings via data mobility (migration, tiering, replication, etc.).

-- Data consumer insights: Who wants to filter and find data? What is the use case? What tools are in use today? How is this changing? What permissions or access controls will need to be established to deliver self-service capabilities to users?

One note, though: the processes and tools you select today should not lock you into a specific vendor, technology, or approach. They should allow you to make changes without having to redo or undo what you’ve done today. For example, say you use a proprietary approach to move cold data to the cloud today, thereby saving millions of dollars. Tomorrow, when you want to use that data for ML or analytics, you realize it’s in a proprietary format and can’t be accessed directly by the ML tools and services in the cloud. What do you do now? Quick near-term gains are great, but you want to avoid choices that impede future progress or make such progress prohibitively expensive.

How much should an organization expect to spend on data modernization? How do you expect those costs to change, grow, or decrease over time? How should an organization be thinking about ROI for its data modernization efforts?

How much you spend will depend on your starting point. Is this an evolution of your current data infrastructure or a reset? How has data been treated historically? What functional roles (and skills) are already in place? A data modernization strategy is as much about people and process as it is about the underlying technology. I always recommend you think big but start small and get quick wins to build credibility and trust. We’ve seen enough failures from the big-bang approach in the world of data warehousing to know that getting departmental buy-in is essential and ensuring the strategy is both bottom-up and top-down.

What comes after data modernization? What issues/factors should an organization consider when preparing its long-term data management strategy?

Data modernization is a journey. The cloud is evolving, as are data analytics and our use of AI/ML, which means how we store, manage, and use data is also evolving. It’s not a question of what comes after data modernization, but how to modernize so that the transformation you are now making doesn’t become antiquated by the time you are done. Data management should maintain flexibility and agility without creating lock-in to any platform or technology. Data modernization is not only about modernizing the infrastructure but liberating data so that it’s easier for users to search, find, and use it for new applications.

 

TDWI Membership

Accelerate Your Projects,
and Your Career

TDWI Members have access to exclusive research reports, publications, communities and training.

Individual, Student, and Team memberships available.