TDWI Articles

Finding Value in Email Streams

Getting important information from a simple email involves more potential obstacles than you might imagine.

Consider the common email stream. Everybody has email. Everybody uses email. What could be more mundane than the ordinary email inbox?

Suppose you need to find a piece of information in your corporate email stream. How difficult could that be?

It turns out that there are a lot of obstacles to finding anything in your email stream, let alone finding value in that stream. With the increased emphasis on unstructured data and customer sentiment analysis, getting the greatest value from your email is increasingly important in the enterprise.

Let’s look at four obstacles you must overcome to maximize the information your email offers.

1. Privacy

Your first obstacle is determining if you even have the right to look at the email in the first place. If you are a corporation and you are looking at an employee’s email that was sent or received on your premises and on your corporation’s hardware and software, then (at least in the U.S.) you have the right to look at your employee’s email.

Looking at other email is questionable, but under the circumstances outlined you have the right to look at email that passes through your corporate system.

2. Volume

Second, you must deal with the sheer volume of data that accumulates in your corporate email. Take a look at the messages that pass through your email server. A typical collection of data might look like this.

  • 40% Blather
  • 20% Spam
  • 25% Current business information
  • 10% Outdated business information
  • 5% Up-to-date, sensitive information

Blather is internally generated information that is not relevant to your business. Blather consists of messages such as jokes, bets on the weekend’s football game, personal messages, and other irrelevant messages. You would be amazed at how much blather there is in an email stream.

Spam is similar to blather except that spam is generated externally rather than internally and includes messages that are primarily advertisements or newsletters.

When you look at an email stream this way, you see that there really is not much important business information that passes through your email server.

There is another way to look at these messages, however. You could also say email is made up of the following.

  • 20% System-generated information
  • 5% Structured message control information
  • 75% Actual message

System-generated information is the control information in every email that determines how the system will handle the message. There is no message content inside system-generated information.

Structured message control information includes who is sending the message, to whom the message is being sent, the date and time of the message, and the subject description. Every email has this information.

Finally there is the message itself. It is in the message itself that the meat of the email -- the content -- is found.

When you factor in both the useless system information and the messages that are not relevant, no more than three or four percent of the email stream is actually worth looking at.

Because there is so much irrelevant data in an email stream, important business information is often hidden. You must identify and bring that important information to the front.

Think of this like an orca’s communication. In a day’s time, the orca may receive 100,000 sounds in the ocean. (The ocean is a noisy place.) In that time, the orca will be interested in only 10 or 20 of those sounds. The orca is interested in danger, food, and sex. If a sound does not relate to danger, food, or sex, it is filtered out.

The same filtering needs to take place when examining an email stream. The orca has a brain that does the filtering automatically, but when you process an email stream, you have to invent or invest in automatic filtering to identify important emails.

Are you finished once you have filtered your emails? Hardly, there are more obstacles awaiting you.

3. Language, Meaning, and Interpretation

Your third obstacle in getting the greatest value from your email stream is interpreting your email.

Be aware of the language that the sender is using. You need to be alert to the vernacular, idioms, and language nuances. You need to be able to capture and understand the language that is being spoken.

There are many complexities to reading and understanding language, including which language is being used. One email may be in English, the next in Spanish, and the next in Portuguese.

Then you need to consider context. One person may work in engineering, but the next email may be from someone in finance, and a third from sales. Each person writes with a different context.

All of these aspects must be considered to properly interpret the meaning of an email.

4. Storage and Access

Finally, you must be able to store and recall important and relevant email messages. It is one thing to read an email today but another thing to be able to relate today’s email to yesterday’s email. In order to relate emails over time, you need a database where the text is intelligible to the computer. Only after you have organized your email in this manner can you start to look for patterns and trends.

It is in finding these patterns and trends that the value of your analysis of email becomes apparent.

Getting Value

The next time you send an email, consider all that has to be done in order for the secrets that lie within your email to be uncovered. When you look at all the obstacles between you and the information your email messages contain, it is a remarkable achievement that it can be done at all.

About the Author

Bill Inmon has written 54 books published in 9 languages. Bill’s company -- Forest Rim Technology -- reads textual narrative and disambiguates the text and places the output in a standard data base. Once in the standard data base, the text can be analyzed using standard analytical tools such as Tableau, Qlikview, Concurrent Technologies, SAS, and many more analytical technologies. His latest book is Data Lake Architecture.


TDWI Membership

Accelerate Your Projects,
and Your Career

TDWI Members have access to exclusive research reports, publications, communities and training.

Individual, Student, and Team memberships available.