RESEARCH & RESOURCES

NLP Budgets Soar, Accuracy Remains Challenging According to Survey

Gradient Flow’s annual NLP industry survey sheds light on the practices, technologies, and challenges defining natural language processing this year.

Note: TDWI’s editors carefully choose press releases related to the data and analytics industry. We have edited and/or condensed this release to highlight key features but make no claims as to the accuracy of the statements it contains.

John Snow Labs, developer of the Spark NLP library, revealed the results of the 2021 Natural Language Processing (NLP) Industry Survey, exploring how companies are currently using NLP. The results include budgets, trends, a detailed analysis of NLP technologies being implemented by businesses, widely used tools and cloud platforms, and use cases. The survey was conducted by Gradient Flow, an independent data science analysis and insights provider.

Despite responses from a variety of industries, company sizes, stages of NLP adoption, and geographic locations, the global survey showed NLP budgets are increasing across the board. In fact, 60 percent of tech leaders indicated that their NLP budgets grew by at least 10 percent, while one-third reported a 30 percent increase, and 15 percent of respondents said their budget more than doubled. This is a steady increase compared to 2020, which suggests pandemic-related financial constraints may be stabilizing.

Although investments in NLP have been healthy, practitioners face some significant barriers to progress. Similar to last year’s results, accuracy was the most important requirement when evaluating an NLP solution. However, when asked about key challenges they face when using cloud NLP services, tech leaders specifically cited difficulty in tuning (39 percent) and cost (36 percent) as the top two challenges. This is important because models often need to be tuned and customized for their specific domains and applications. As more difficult use cases (such as Q&A and natural language generation) proliferate, accuracy will remain paramount for success.

Other key findings include:

  • For the second consecutive year, Spark NLP was named the most popular NLP library, with 31 percent of respondents indicating they use it.
  • Most practitioners use multiple libraries. In fact, 53 percent of respondents stated they used at least one of the following NLP libraries popular within the Python ecosystem: Hugging Face, spaCy, Natural Language Toolkit (NLTK), Gensim, or Flair.
  • Among tech leaders, accuracy (40 percent) was the most important requirement when evaluating an NLP solution, followed by production readiness (24 percent) and scalability (16 percent).
  • Fifty-four percent of tech leaders singled out named entity recognition (NER) and 46 percent cited document classification as the primary use cases for NLP.
  • For healthcare industry respondents, entity linking/knowledge graphs (41 percent) and deidentification (39 percent) were among the top use cases.
  • Eighty-three percent of all survey respondents indicated they use at least one of the four NLP cloud services listed (Google, AWS, Azure, IBM) in addition to NLP libraries.
  • The top three data sources for NLP projects are text fields in databases, files (PDFs, .docx, etc.), and online content.
  • The top four industries using NLP represented by survey respondents include healthcare (17 percent), technology (16 percent), education (15 percent), and financial services (7 percent), which is reflective of overall industry adoption.
  • The Spark NLP library is particularly dominant in the healthcare industry, in which 60 percent of respondents reported having adopted it.

“As we move into the next phase of NLP growth, it’s encouraging to see investments and use cases expanding, with mature organizations leading the way,” said Dr. Ben Lorica, survey co-author and external program chair, NLP Summit. “Coming off the political and pandemic-driven uncertainty of last year, it’s exciting to see such progress and potential in a field that is still very much in its infancy.”

The full 2021 NLP Survey results can be downloaded at https://gradientflow.com/2021nlpsurvey/ (registration is required).

TDWI Membership

Get immediate access to training discounts, video library, research, and more.

Find the right level of Membership for you.