TDWI Articles

3 Use Cases for Unstructured Data

Enterprises ignore unstructured data at their peril. Here are three examples of where unstructured data is used to great advantage.

I was looking back through some questions raised at a recent webinar about modern analytics and came across this one, "What are some examples where unstructured or semistructured data is used for modern analytics?"

For Further Reading:

Using OCR: How Accurate is Your Data?

Why Do We Call Text "Unstructured"?

Q&A: What's Ahead for the Data Landscape

First, I define modern analytics as the analysis of often large and disparate data sources that may utilize advanced algorithms and techniques such as geospatial analysis, text analysis, or machine learning. These are the analytics that we've been hearing a lot about over the past five years. They are often real time in nature as organizations want real-time answers.

The disparate data part is important here; TDWI research reveals that organizations that utilize disparate data for analytics are more likely to measure a top- or bottom-line impact from their analytics efforts than those that do not.

Real-World Use Cases

Here are a few examples where unstructured data is being used in analytics today.

Classifying image and sound. Using deep learning, a system can be trained to recognize images and sounds. The systems learn from labeled examples in order to accurately classify new images or sounds. For instance, a computer can be trained to identify certain sounds that indicate that a motor is failing. This kind of application is being used in automobiles and aviation.

Such technology is also being employed to classify business photos for online auto sales or for identifying other products. A photo of an object to be sold in an online auction can be automatically labeled, for example. Image recognition is being put to work in medicine to classify mammograms as potentially cancerous and in genomics to understand disease markers.

As input to predictive models. Text analytics -- using natural language processing (NLP) or machine learning -- is being used to structure unstructured text. For example, organizations can extract entities (people, places, or things), themes, or sentiment from call center notes. That information can then be combined with other information about customers to build predictive models. For example, entities, concepts, and themes can be clustered using statistical techniques.

Additionally, companies can use survey responses verbatim, assigning entities, concepts, and themes as data and using this for prediction without structured data. Some organizations I've spoken with say that these models can outperform models that use only traditional structured data.

Chatbots in customer experience. Chatbots have been in the market for a number of years, but the newer ones have a better understanding of language and are more interactive. Here, based on who you are (e.g., whether you have status with the company) and what you asked for (using NLP for text analysis), you will be routed to the right customer representative to answer your specific questions. Other companies use chatbots for personalized shopping that involves understanding what you and people similar to you bought, in addition to what you are searching for. These use cases require smart NLP-based search as well as machine learning.

Accessing and Using Unstructured Data

In our research we've found that utilizing unstructured data (primarily text) is still in the early stages of maturity; we typically see early mainstream percentages from respondents to our surveys for text. This number is much lower for images or other unstructured data. However, it is an area that is set to grow as more organizations see the value in utilizing text and other unstructured data for insight.

Vendors, too, are providing solutions in the space. For instance, established analytics vendors such as SAS, IBM, and OpenText already provide tools for structuring unstructured text data for use in analytics. Companies such as Datawatch provide tools to extract semistructured data (e.g., from reports) in PDFs and text files into rows and columns for analysis. Open source is another avenue for unstructured data analysis.

Other vendors are providing ways to access unstructured data. Companies such as Cambridge Semantics add a semantic layer to the data lake to help catalog both structured and unstructured data. A new group of companies (such as Cloudtenna) provide a way to search unstructured files that are scattered across the company, which can help with unstructured data access. Both use more advanced analytics such as NLP or machine learning as part of the solution.

The Bottom Line

If your organization hasn't started to mine your text and other unstructured data, consider doing so. There's value to be had in them thar hills!

[Editor's note: Image and text analysis will be among the topics discussed at the TDWI Orlando Leadership Summit, November 12 and 13, 2018.]

About the Author

Fern Halper, Ph.D., is well known in the analytics community, having published hundreds of articles, research reports, speeches, webinars, and more on data mining and information technology over the past 20 years. Halper is also co-author of several “Dummies” books on cloud computing, hybrid cloud, and big data. She is VP and senior research director, advanced analytics at TDWI Research, focusing on predictive analytics, social media analysis, text analytics, cloud computing, and “big data” analytics approaches. She has been a partner at industry analyst firm Hurwitz & Associates and a lead analyst for Bell Labs. Her Ph.D. is from Texas A&M University. You can reach her at [email protected], on Twitter @fhalper, and on LinkedIn at linkedin.com/in/fbhalper.


TDWI Membership

Accelerate Your Projects,
and Your Career

TDWI Members have access to exclusive research reports, publications, communities and training.

Individual, Student, and Team memberships available.