Executive Q&A: Getting the Most from Unstructured Data
Enterprises are still struggling to mine the wealth of information contained in unstructured data. What are they doing right and wrong? Edward Cui, founder of Graviti, shares his perspective.
- By James E. Powell
- January 5, 2022
If you're analyzing only structured data, you're missing a wealth of insights. Edward Cui, founder of Graviti, explains how to access the value in phone calls, emails, and even social media posts.
Upside: What are some of the most common types of unstructured data enterprises aren't using in their analytics?
Edward Cui: Enterprises have been using structured data for analysis including transactional data, master data, and analytical data. However, over 80 percent of enterprise data is now unstructured. This includes emails, images, recordings, videos, text files, PowerPoint presentations, and social media data. It is obvious that this large volume of unstructured data contains business insights and value that haven't been mined yet.
For improving customer experience, phone calls, online chats, emails, or even comments from social media accounts can provide key values for better understanding customer sentiment. By analyzing this data, you will be able to know what customers like or don't like about your brand, product, or service, and eventually increase marketing effectiveness.
For enterprise management, unstructured data can be analyzed to improve employee productivity, deliver business intelligence, and promote innovation.
After so many years of recognizing the value of unstructured data, and with so much unstructured data being collected, enterprises still find it a challenge to incorporate it into their analytics. What are the challenges they face?
Unstructured data cannot be easily stored in a traditional column-row database like a spreadsheet. Because unstructured data comes in different formats such as videos, images, and phone calls, it does not have a unified standard to analyze. Even when enterprises transform unstructured data to structured data, they must rely on artificial intelligence to analyze it and to achieve their goals. Most enterprises don't have the budget or time to develop such a tool.
Another challenge is that the volume of unstructured data grows too fast, with an estimate showing the volume increases about 55-65 percent per year. No matter whether enterprises choose to manage the data online (uploaded to the cloud) or offline (on an enterprises' local server), the cost of AI could be challenging.
What progress has been made in the last 10 years in using unstructured data? Why hasn't more progress been made?
The world of ten years ago was dominated by structured data. After 2012, though, as sensors became cheaper, cell phones gradually became smartphones, and cameras were installed to make shooting easier. With this, a large amount of unstructured data was generated, and enterprises entered uncharted territory, making progress slow. Some of the inhibitors to progress in this area include:
- Complexity: Unlike structured data which can be analyzed intuitively, unstructured data needs to be further processed and then analyzed, usually best done through artificial intelligence. Machine learning algorithms classify and label content from it. However, it is not easy to identify high-quality data from the data set due to the large amount and complexity of unstructured data -- this has been painful for developer teams and a key challenge to data architectures that are already complex.
- Cost: Although the enterprise recognizes the value of unstructured data, the cost can be a potential obstacle to making use of it. The cost of enterprise infrastructure, human resources, and time can hinder the implementation and development of AI and the data it analyzes.
What progress has been made in analyzing unstructured data in the last decade -- or has there been no progress at all? Are enterprises still stuck not being able to analyze it because they don't have the tools?
As noted, unstructured data didn't play a significant role in the last decade due to its complexity and data expansion. Enterprises could not abstract high-quality data from existing data sets without a fitted AI training model. However, a great example of progress made is Data Version Control (DVC), an open-source version control system for machine learning projects, launched on GitHub in 2017. DVC is built to make machine learning models shareable and reproducible. It is designed to handle large files, data sets, machine learning models, and metrics as well as code.
What recommendations or advice can you give enterprises that want to get started analyzing unstructured data? How should they begin? What best practices will make their job easier?
We recommend that enterprises have a fully prepared plan in place before they start analyzing unstructured data. Because the amount of unstructured data grows rapidly, enterprises must consider these questions clearly before collecting data: where to store the data (in the cloud or locally); how to identify high-quality data; how to develop the training model and iteration with newly collected data, and so on. Additionally, artificial intelligence professionals can help enterprises figure out the questions (and answers) before they begin collecting unstructured data for analyzing.
What kinds of insights can unstructured data reveal?
Rich information can be dug up from unstructured data and the values vary across industries. For example, we've recently partnered with a provider of intelligent logistics in the supply chain industry. The company provides AI monitoring services for streamlining warehousing operations. Based on the data collected, the client then leverages computer vision to identify and help organize the inventory. This data also helps predict and plan logistics well in advance, and improves the productivity, accuracy, and efficiency of production.
Another example comes from the automotive industry. Connected cars could receive direct product feedback from user interactions. Such unstructured data can be processed and analyzed for product planning, automobile development, quality improvement, manufacturing, and, of course, customer satisfaction.
How is Graviti working to make unstructured data easier to use?
Graviti aims to launch the first data platform that enables organizations to work with large volumes of unstructured data to power innovative AI applications. This platform eliminates the hassle and helps developers manage large amounts of unstructured data with the team.
Although most of the available information in AI development is low quality and unstructured, development teams usually spend over half of their time not on building models but rather on identifying, augmenting, or cleansing unstructured data, and that's just the beginning of their work. Graviti offers a more expert data management way to free developers and gives them more time to analyze unstructured data and train artificial intelligence models. We help developers in three dimensions: data discovery, data iteration, and workflow automation.
James E. Powell is the editorial director of TDWI, including research reports, the Business Intelligence Journal, and Upside newsletter. You can contact him
via email here.