By using website you agree to our use of cookies as described in our cookie policy. Learn More

TDWI Upside - Where Data Means Business

Analytics-Powered Social Listening

Social listening uses technology to facilitate communication over social media channels. It allows companies to scale up their customer care programs using data integration and natural language processing capabilities.

For many industries, social media has been a great vehicle for outbound marketing to customers. Companies are now finding it to be an effective communication platform for customer care programs. Achieving full impact in customer service will require companies to both speak and listen on social media.

Social listening is becoming a hot topic as businesses learn to leverage this new medium. As companies start developing a social listening capability, their first step is generally to employ analysts to physically search for keywords related to the business and respond to customers one by one. This phase is often called social monitoring and can be very labor intensive. With so many conversations happening, it is difficult for an employee to monitor them effectively and respond in a timely manner.

This challenge grows more vexing as the volume of traffic increases. As customers learn that they are being heard and can use social media to interact with the company, they naturally increase the amount of conversation on social channels. It is at this point that a company can greatly benefit from the application of analytics processing to manage the influx of data.

At its core, social listening is like other real-time big data integration challenges. The social media ecosystem is essentially a public, unstructured data lake, housing an ongoing conversation in multiple languages with billions of participants. Social listening is its core a data integration and data processing problem. Effectively implementing a social listening program requires three key processes: ingesting the data, parsing the data, and responding to the data.


The first step is to ingest the data. You don't need to ingest all the data available in all social media channels or all the conversations from even a single channel. You do need to take in enough conversation on the channels where customers are trying to communicate. Many social media platforms provide publicly available APIs for developers to access selected data. These can be called as part of a batch process or streamed in real time in some cases.

Because the competitive advantage of social media platforms is their data, the platforms will often control what data is available and throttle the throughput on access. This can prevent or hinder access to the data you need for an effective social listening program.

To address data access limitations, you might partner with a data provider. Data providers, such as DataSift and GNIP, offer more complete access to data (for a fee) and can provide a more complete data stream. They have established agreements with the social media platforms to remove those throttles.

Further, these companies often offer enhancements to the data that are not available as part of the public data stream. Examples of this data enhancement include the addition of sentiment associated with the message and real or projected demographics associated with the content creator.


Once you have access, you need to bring order to this unstructured source. There are multiple tools in natural language processing that can help you extract meaning from unstructured text data.

The simplest method for parsing posts, tweets, or other social media messages is to break them down into individual words or phrases. This requires tokenization, which is essentially determining how to break the content down into smaller pieces. Whitespace and punctuation are common tokens used to break content into sentences and words. Non Latin-based languages require more complex rules for tokenization. The result of the tokenization process is referred to as a bag of words.

Beyond simply breaking the content into words, another important step is to strip out stop words, words that are so common that they won't provide any context or meaning for that specific message. Some stop words are common across all the content in a language; others are specific to a domain of knowledge.

As words have multiple forms, it is often important to group these different forms together. During the parse step, the processes of stemming and/or lemmatization convert words into their basic root so that similar words can be grouped together.

Beyond manipulating the words in the message, other information can be ascertained from the message. One example is topic identification. Identifying a topic is more complex than merely searching for keywords; it requires inference from the words and phrases in the message.

The primary method for topic identification is to use a supervised learning model. This creates a model from a set of data that has been previously tagged and clustered into topics. When the model receives a new message, it can use the patterns learned from the previous data to assign the message to a topic.


To be successful with social listening, you must determine the optimal place for the machine to end and the human to begin. Reaction to social listening can include alerts when certain topics or keywords are identified -- a human then takes that alert and acts on it. This still requires personnel to take the action, but you have automated the harder task of determining what content needs a response.

Some companies move to the next step by implementing programmatic responses to be pushed back out through the social media channel after parsing is complete and the necessary action is identified. This can provide a bridge so that the customer receives instant feedback while a request is being completed (either by a human or by a downstream automated process). It can also function similar to voice response menus on a phone system directing a caller to the right person to address their inquiry.

A Final Word

As social media grows, the need for companies to interact with it grows. As more customers are using social media as a mechanism to communicate with companies, there is a greater need for social listening programs. Social listening is essentially a data integration and data mining issue. Implemented correctly, it provides a whole new way to interact with current and potential customers.

About the Author

Troy Hiltbrand is the senior vice president of digital product management and analytics at where he is responsible for its enterprise analytics and digital product strategy. You can reach the author via email.

TDWI Membership

Accelerate Your Projects,
and Your Career

TDWI Members have access to exclusive research reports, publications, communities and training.

Individual, Student, and Team memberships available.