5 Text Analytics Fundamentals You Should Know
Text analytics is becoming more mainstream and interest in it is growing. Here are five basic things about text analytics you need to know now.
- By Fern Halper, Ph.D.
- June 23, 2015
Data extracted from text can be extremely helpful in answering questions involving why and what. For example: Why are my customers unhappy? What is causing a specific problem in my operations? What are predictors of certain risk? Why is my brand reputation declining?
This text data comes from internal sources such as call center notes, e-mail messages, customer records, and claims. It comes from external sources such as social media. TDWI research (Best Practices Report: Next-Generation Analytics and Platforms) indicates that text analytics is becoming more mainstream and interest in it is growing. In a recent TDWI best practices report, for instance, 22 percent of respondents were already using text analytics and 36 percent were planning to use it in the next three years.
If you're considering text analytics for your organization, here are five fundamentals you should know.
1. Text analytics is different from search. Text analytics is the process of analyzing unstructured text, extracting relevant information, and transforming it into structured information that can be leveraged in various ways. The analysis and extraction process takes advantage of techniques that originate in computational linguistics/natural language processing (NLP), statistics, and machine learning. Text analytics is about extracting text; search is about retrieving a document, typically when end users already know what they are looking for. Text analytics can be used to augment search (as is often done in commercial search engines).
2. Text analytics can be used to extract various kinds of information. The typical kinds of information extracted from text include:
- Terms: Another term for keywords.
- Entities (often called named entities): Examples include names of persons, companies, products, geographical locations, dates, and times. Entities are generally about who, what, and where.
- Concepts: Sets of words and phrases that indicate a particular idea or meaning with which the user is interested. A concept might be "cost of living increase" or "healthcare benefits." A particular piece of content generally is only "about" a few concepts.
- Sentiment: Sentiment reflects the tonality or point of view of the text. The concept "unhappy customer" would lead to a negative sentiment.
Different vendors often use different terms to describe this kind of information. Some vendors talk about facts, themes, topics, and events. It is important to understand what each vendor offers. For instance, perhaps entity extraction alone will not be useful to your organization or maybe the text analytics vendor does not offer sentiment capabilities out of the box.
3. You may need to consider a taxonomy. In common usage, a taxonomy is a method for organizing information into hierarchical relationships. This is important in text analytics, especially when you're dealing with specific vocabularies in certain industries. For instance, you may create a taxonomy about products and services or about certain kinds of diseases.
The taxonomy can also use synonyms and alternate expressions. For instance, "yearly increase" might all be referring to "raises." Some vendors will provide baseline taxonomies out of the box, but don't expect that they will work out of the box. Some vendors will tell you that you don't need a taxonomy -- that they work off of already created sematic networks that represent the world or that they have developed techniques that can get around this. For certain subjects, you may get away without building a taxonomy, but be prepared to iterate on what comes out of the tool in order to create your own categories.
4. You can analyze the data separately or marry it with structured data. Organizations that use text data will often integrate it with traditional data sources to analyze it. They view it as simply another form of data. Analyzing text data without merging it with other data in your systems can also be quite informative. For instance, analyzing social media data is often done this way. Some organizations are even creating predictive models with text data that are just as good as or better than those that use both text and traditional structured data. It really depends on the kind of data you want to analyze and what business problems you're trying to solve.
5. A different mindset is required for analyzing text data. Text analytics does not have the same level of accuracy as some statistical techniques. It is best to think of it as being directionally correct, so it is important to go into the analysis with that perspective. The level of actual analytical skills is going to depend on the problem you're trying to solve. Generally, understanding natural language processing is not a prerequisite for text analytics, although some training on the text analytics tool will be necessary.
Interested in text analytics? Want to try it out for yourself? Consider attending some of the hands-on workshops at the TDWI Analytics Experience July 26-31, 2015 in Boston or read the TDWI Checklist Reports Eight Steps for Using Analytics to Gain Value from Text and Unstructured Content and How to Gain Insight from Text.