SPSS Refreshes Data, Text Mining Toolset

With text analytics poised for growth -- especially as a means to parse and index unstructured data -- it's a propitious time for SPSS

By Stephen Swoyer
February 6, 2008

Clementine may be best known as the lost love of an old-time song, but it's also the brand of a data mining offering, the estimable Clementine from SPSS Inc.

Late last month, the company announced Clementine 12.0, the latest revision of its data mining platform, along with a 12.0 release of its text mining counterpart (the aptly-titled Text Mining for Clementine).

In the years since SPSS acquired both products (Clementine, via its acquisition of the former Integrated Solutions Ltd., in 1999; Text Mining, via the acquisition of the former LexiQuest, in 2002), it's worked hard to make them its own.

To a degree, that involves making them more user-friendly, too. User friendliness -- and user self-service -- have played a big role in SPSS' push to become an important vendor: the last three revisions of its statistical analysis product, for example, have all touted usability and user self-service enhancements (see http://esj.com/business_intelligence/article.aspx?EditorialsID=8614).

The same can be said for for Clementine. Like SPSS' flagship statistical analysis platform, the product is updated about once a year. This time around, SPSS officials say, the revamped Clementine delivers increased analyst productivity, yields better insights, and -- a must in today's data visualization-obsessed marketplace -- boasts enhanced data visualization capabilities.

"With the new version, our customers can extract even deeper insight from data and text, delivering greater return on all of their data assets," said Bob Dutcher, vice-president of product marketing with SPSS.

"Self-service," in a sense, describes a usability problem: it's about making BI and PM tools easier to use and easier to configure so that users can take much of the mundane slack (which is usually pushed off onto IT) themselves.

It's also about empowering users to perform tasks that, although nominally associated with their business activities, might otherwise require the intervention of IT personnel. One such task is data modeling: while it makes sense for business users to have a stake (playing, perhaps, a formative role) in the modeling process, the intricacies of data modeling can overwhelm many business users.

As a result, data modeling tools vendors have recently begun pushing business-friendly spins on data modeling. Both Sybase Inc. (whose PowerDesigner tool is one of the leading best-of-breed offerings) and Kalido (which introduced a new business-friendly data modeling tool, Business Information Manager, last month) market such offerings, which typically feature drag-and-drop GUI displays. Neither vendor claims to be making data modeling a business-user-safe practice. Instead, they argue, their tools help more meaningfully involve business users in the data modeling practice.

Clementine 12.0's built-in data modeling capabilities might not be in the same league as tools such as PowerDesigner or Kalido's new offering, but it does boast what SPSS officials claim is a user self-service spin on data modeling: a new automated data modeling facility.

This tool isn't a panacea, SPSS officials concede. It doesn't automatically generate viable models for any and every conceivable application (no tool can, for that matter), but it should help business users build some of their own data models. More to the point, SPSS officials claim, it involves business users in the modeling process, either on their own or (ideally) in collaboration with IT.

On the data mining front, Clementine 12.0 boasts many new or enhanced amenities, including a numeric predictor (to better support continuous dependent variables), new customer analytics techniques (e.g., recency, frequency, and monetary, or RFM, value-scoring), and a "survival analysis" feature that lets users model customer attrition.

Elsewhere, Clementine 12.0 ships with a new facility called Ensemble that should make it easier for users to combine multiple models. SPSS also claims to have improved Clementine's real-time scoring performance, enhanced integration between Clementine and SPSS reports/tables, and delivered clustering/load-balancing improvements.

The revamped Text Mining for Clementine 12.0 delivers enhanced multi-lingual capabilities, including support for Dutch, English, French, German, Italian, Portuguese, and Spanish text. Multi-lingual support is a key differentiator which separates North-American-only text analytic products from their global competitors.

Thanks to a deal SPSS notched with translation specialist Language Weaver, Text Mining can also process text that's been translated into English from a number of other languages, including Arabic, Chinese, and Russian. Finally, Text Mining 12.0 boasts improved support for verticalization (including a new template editor and updated libraries for sentiment analysis, CRM, security intelligence, competitive intelligence, life sciences and IT).

The Text Mining Tip?

It's a propitious time for SPSS' Text Mining refresh.

All indications are that text mining (along with its related cousin, text analytics) as a means to parse and index unstructured or semi-structured data is only going to grow in importance. Many industry watchers see it as analogous to enterprise or BI search technologies.

"The world is a relatively unstructured place, and the combination of Clementine and Text Mining for Clementine allows organizations to analyze a wide range of both structured and unstructured data including text, surveys, call center interactions, e-mails, RSS feeds and blogs," says Andrew Braunberg, a researcher with consultancy Current Analysis.

In this respect, Braunberg points to SPSS' existing partnership with CallMiner, which lets it offer voice mining capabilities.

Data mining, BI search, and text analytic solutions certainly aren't a magic draught for the growing panoply of enterprise data management ills, but when they're combined with a mature BI and DW stack, they are a good Rx for what ails most shops.

"BI search and text analytics certainly won't replace the traditional BI/DW technology stack. And it's unlikely that they will replace any components of the stack," wrote Philip Russom, senior manager of research with TDWI, in BI Search and Text Analytics, which TDWI published last year. "Instead, BI search and text analytics are being added to BI/DW infrastructure to accommodate unstructured data -- via text analytics -- and related techniques [such as search]."

Not everyone can credibly claim to offer text analytic capabilities. Attensity and ClearForest are the two leading players, focusing exclusively on text analytics, which Russom distinguishes from text mining. The latter category is the long-time domain of analytic powerhouses SPSS and SAS Institute Inc., as well as that of newer players, such as the former Business Objects SA (which acquired text mining specialist Inxight Inc. last year, before itself being acquired by SAP AG), and IBM Corp.

Unlike a garden-variety search tool, a good text analytic tool doesn't just return any and all relevant results, Russom says.

"While documents containing unstructured data can contribute to the decision making of BI, they cannot participate directly in its data-driven reports and analyses -- unless facts discovered in unstructured data are extracted and transformed into structured data that's conducive to reporting and analysis," he points out. Text analytic capabilities speak to two of today's most active data management problems: customer and product data management.

In both cases, Russom says, customers are using text analytics to glean more insight from customer and product data.

TDWI Membership

Get immediate access to training discounts, video library, research, and more.

Find the right level of Membership for you.

Learn More

TDWI | Training & Research | Business Intelligence, Analytics, Big Data, Data Warehousing

RESEARCH & RESOURCES