By using website you agree to our use of cookies as described in our cookie policy. Learn More

TDWI Upside - Where Data Means Business

Getting Started with Natural Language Processing in Your Enterprise

NLP is a necessary enterprise discipline today. Understanding the challenges and possibilities will help you harness this discipline for your language applications.

Natural language processing (NLP) is poised to make major advances in enterprise applications. No longer is this an arcane art continually pushed off due to its difficulty. Today, NLP is beginning to be woven into larger projects as a component for receiving input as language or generating output as language. Enterprise NLP applications include customer service, reputation management, personalized advertising, and market/product intelligence.

For Further Reading:

Using Text Analytics and NLP: An Introduction

Natural Language Generation: 3 Reasons It's the Next Wave of BI

How to Get More From Your Data in 2019

NLP is the study of the computational treatment of natural language. It consists of text analysis, which is the process of deriving meaningful information from natural language, and artificial intelligence-based methods of communicating better intelligence using the natural language.

The Challenge Facing NLP

Computers can easily be confused by human language, and English is actually one of the most difficult languages to deal with. It may not be noticeable to a native English speaker, but English is full of contradictions, exceptions to rules, non-intuitive importance to word order, homophones (words that sound the same but have different meanings or spellings), non-interchangeable synonyms, idioms, etc.

Our language is also full of "garden path" sentences that lead the reader to a dead end and invite misunderstanding, such as these:

  • The old man the boat
  • The prime number few
  • The man whistling tunes pianos
  • The complex houses married and single soldiers and their families

Read any news article thinking about the NLP perspective and you'll find numerous items a computer will have difficulty with. Making things even more challenging, the language for NLP may come from a wide variety of sources such as Internet chat, blogs, reviews, wikis, scientific papers, medical records, and books.

NLP must include the disciplines of linguistics, theoretical computer science, math, statistics, artificial intelligence, and psychology to correctly understand language, but it's possible and worth it to make the effort today.

Recent Progress

NLP has become more effective in the last few years, as evidenced by increasing scores for the major models (BERT, Alice, RoBERTa) on the widely used NLP benchmarks GLUE, RACE, and SuperGlue. Several NLP models have already surpassed human baseline performance on the benchmarks.

Today, NLP is at human-like accuracy in speech recognition for major languages. Natural language understanding will take a few years to reach human-like understanding, machine translation is at human-like accuracy for content in tier 1 languages, and translation between languages with different structures is several years away from human-like accuracy. As for sentiment analysis, it's good enough for many applications, but still a ways off from a human-level understanding of the nuances. Of course, none of this is pertinent to you if you don't take advantage of NLP.

First Steps Toward Applying NLP

There are many steps involved in NLP, such as tokenization, stemming, lemmatization, part of speech tagging, named entity recognition, and chunking. If you are coding your own NLP, you will need to master all of these. Fortunately, there are many abbreviation lists, lexicons, tree bank projects, and parsers available, as well as open source libraries such as spaCy, Textacy, and Neuralcoref.

You could work from open source and buy a pre-tagged sentiment library; this approach can work if you don't need some of the more complex text analytics functions. In fact, open source can be a solid route to choose if your needs are few and you have the technical knowledge required. If you need to address a specific challenge for your company or you need detailed insights from complex text data, you should work with an experienced NLP vendor.

NLP is a necessary enterprise discipline today. It reduces the gap between human and machine communication, automates processes and creates operational efficiency, and extends the capability of existing business intelligence assets in the enterprise. Understanding the challenges, the possibilities, and where the capabilities are today will help you harness this discipline for your enterprise language applications.

About the Author

McKnight Consulting Group is led by William McKnight. He serves as strategist, lead enterprise information architect, and program manager for sites worldwide utilizing the disciplines of data warehousing, master data management, business intelligence, and big data. Many of his clients have gone public with their success stories. McKnight has published hundreds of articles and white papers and given hundreds of international keynotes and public seminars. His teams’ implementations from both IT and consultant positions have won awards for best practices. William is a former IT VP of a Fortune 50 company and a former engineer of DB2 at IBM, and holds an MBA. He is author of the book Information Management: Strategies for Gaining a Competitive Advantage with Data.

TDWI Membership

Accelerate Your Projects,
and Your Career

TDWI Members have access to exclusive research reports, publications, communities and training.

Individual, Student, and Team memberships available.