RESEARCH & RESOURCES

Text Analytics: Working Smarter

Help ensure success in text analytics by understanding its value and by starting small.

Interest in text analytics is growing as companies begin to understand its value in providing insights into unstructured text. With maturing of the technology, it’s now possible to analyze very large amounts of data. In this interview, Seth Redmore with Lexalytics provides insights into dealing with text as data, getting started in text analytics, and how companies are realizing a competitive advantage with text analytics.

Redmore, VP of marketing and product management at Lexalytics, has over 20 years of experience in technology (half of it spent in text analytics) from the perspective of a user and a vendor. He is the co-founder of Netiverse, which built a high-speed server load balancing system purchased by Cisco in 2000. At Cisco, Redmore built Cisco's internal text analytics solution for reputation management which uses Lexalytics' software engine at its core.

BI This Week: Let’s start with a question about the ways in which text analytics is being used in business.

Seth Redmore: Text analysis is a very horizontal technology. It’s much like search -- I really can’t think of a single person who doesn’t use search in their daily job. Text analysis is a smarter search -- that’s one way to look at it. ... After all, if you’re dealing with information, chances are high that some of that information is text.

If I asked you what kinds of companies are using text analysis these days, it sounds like the answer might be: everybody’s using it.

Yes, although some people are using it more explicitly than others. Some classic examples of companies using text analysis include media-monitoring companies; anybody dealing with customer service; and manufacturing and pharmaceutical companies, who might use it for determining root cause or for looking at side effects. It’s really everybody, everywhere.

For a company interested in getting into text analytics, where’s a good place to start? Is there a typical area to begin?

Customer feedback is always a good place. Looking at what people are saying about you is interesting, because you know that people are going to complain. That’s really what it comes down to -- you already know what you should be seeing.

There will be some surprises; there will be some methodology you’ll have to figure out. However, the best place to start with this kind of technology is something you know. Then, when you look at it, there’s some recognition [of the patterns]: "Oh, OK. Yes, that’s right. I see. What’s going on with that?" You can understand what you’re seeing; you’re not looking at something completely new. Customer feedback is often a really good place to start because you’ll see patterns develop that you’re expecting.

What are some of the issues to watch out for as you get started in text analytics?

Remember that meaning can be ambiguous. After all, we’re dealing with language. There’s ambiguity all over the place. You put two people in a room and there’s a less than zero chance that they’re not going to find some way to confuse each other, right? Put a human and a machine in a room and that chance goes up somewhat. The machine is consistent. That’s the great thing about machines. However, they’re going to interpret the text or interpret the world in a particular way, so it’s important to understand that the machine has a view of things. You can tune and tweak things, but it really does come down to understanding the ambiguity of language.

Some areas are easier to deal with than others. For example, security is hard because there’s so much negativity -- viruses, worms, and zero-day attacks. That’s just how it’s discussed. Healthcare is another area where you tend to see a lot of negative discussions [that aren’t necessarily product-related].

What are some big mistakes you see people making around text analytics?

My big piece of advice is always: Try starting small and start with content you know. Start with a small system and look to see whether you can get an interesting result out of just a few thousand pieces of content. Can you find something that’s interesting? More will end up being better, but start small and start with [a relatively simple] question you’re trying to answer.

So many people start with, "I have all these stats -- there’s got to be something good in here." Actually, it’s better to start from the standpoint of, "I have all these questions. When we try to answer those questions, what data do I need?"

So, start with a question and work back from there?

Yes, and again, start small. Don’t drop tons of money on a product until you’ve actually seen that using the data you have, you can answer a question. It might be subpart B of Question 2, but try to answer something.

Where’s the return on investment in text analytics?

ROI is a tricky one; there are lots of possible ways to look at it. ... Oftentimes, the ROI comes from making a difference to the bottom line, and that can be hard to calculate. We’re working on surveys with different customers and different analysis companies to try and sort out ROI. A lot of it really comes down to: OK, if I’m not finding the information that I need through search, maybe I need text analysis. If I have text and I’m trying to figure out what people are saying about my company through it, I need text analysis, right? Then the question becomes: OK, what’s the most effective way to provide it?

Lots of times, it isn’t an ROI calculation. Once they have the application, they may try to figure out the most effective place to apply it. Is it uplift models with respect to future purchase probability? Is it e-commerce? Where can we use it?

Would you say that text analytics might make a company more efficient in various areas but it’s difficult to measure and quantify?

Exactly. I’ve seen companies that said, "Hey, we just want to understand where we were being discussed and where we weren’t, so that we can make a conscious decision about whether we put resources here or there." Maybe the decision was the right decision, maybe the decision was the wrong decision, but they were able to make that decision consciously.

Let’s talk about gathering the data that you want to analyze. How challenging is it to collect enough data for meaningful text analytics and have the sample be clean enough and large enough to be useful?

It depends on the question you’re trying to answer. If you want to know how people perceive your company in social media, for example, that’s easy. If you want to understand the root cause analysis of a particular manufacturing problem, you’re going to have to work with the manufacturing team to actually get all their case sets. That can sometimes be a real political battle; maybe the particular team doesn’t want their little defect talked about, right?

The political side of an enterprise is one of the places where we’re seeing real change. This used to be a big problem -- getting access across companies. The value is really there and I think people are getting snagged around enough that they’re like, "OK, fine. I’ll share my data, but I want access to the whole system."

It sounds like people are beginning to realize the value of this technology and that’s making them willing to share their data in exchange for the big picture.

Yes, absolutely. Companies are saying, "Hey, we have to answer these questions and we need the information from all the silos. We can’t just do it from one viewpoint."

Who typically handles text analytics within a company? What kinds of skills are needed, and how complex is it to set up initially?

I think it’s better to look at where, organizationally speaking, rather than who, in terms of personnel.

The places in the organizations that are pushing text analysis the hardest are PR and customer relations, especially customer support. We’re also starting to see a lot more action in central business planning and forecasting -- people are trying to use discussions to understand and influence where marketing dollars are spent and which product lines are consequently going to have the most uptake.

Customer support is obvious. It’s asking: What are people unhappy about?

PR is the exact opposite. It’s saying, "OK, I’ve got all these products. Which ones are people talking about?"

Text analytics sounds like a technology that takes off once people see how it works and start getting some answers, and start seeing how another department is getting answers.

Yes; you often end up with a lot of [small, separate] efforts within companies. It’s rare to see a central text analysis [effort] -- we’re starting to, but it’s often with someone in the data warehousing group. People start with a question -- that’s why you see text analytics most as point applications right now, but as you said, once people see it, they’re start to think, "Aha, I have a question. I want to ask my question."

Where did you say text analytics is in terms of maturity? You’ve been in the field quite a while --- where are we now and where do you see the technology going?

I think we’ve crossed the proverbial chasm -- the long-term prognosis for text analysis is that it disappears as a separate thing. It’s analysis, right? It’s either analysis or information retrieval. It will be buried in with other analysis tools or it’ll be buried in with information retrieval tools, you know? It’s just another form of information. It just happens to be that it’s complicated enough and the technology was evolving enough that it’s treated separately, but there is no reason for it to be treated separately. It’s just data.

So it will be like search -- almost transparent to what we do? We just assume that it’s going to work and we use it?

Exactly like search -- a completely horizontal technology that you just use.

Turning specifically to your company, how does Lexalytics fit into this discussion? Can you talk about your text analytics engine?

We are an analytics engine, one of the very few pure-play text analysis companies around. We’ve been doing it for 10 years. We came from a media-monitoring, voice-of-customer background. The language that’s common in media-monitoring and voice-of-customer in the way people talk about business; that’s the language that we’ve been focused on.

We have choosy customers in pharmaceuticals and fields such as that, but our background is dealing with media and media content in general. Our customers take our product and our software, and wrap it into an application, so we’re really a software company, not an application company. We build software that other people use to build applications.

TDWI Membership

Get immediate access to training discounts, video library, BI Teams, Skills, Budget Report, and more

Individual, Student, & Team memberships available.