RESEARCH & RESOURCES

Basis Technology Unveils Next-Generation Text Analysis Platform

Rosette 7 enables fast, accurate, multilingual search-based applications

Note: TDWI’s editors carefully choose vendor-issued press releases about new or upgraded products and services. We have edited and/or condensed this release to highlight key features but make no claims as to the accuracy of the vendor's statements.

Basis Technology Corporation has updated its linguistics platform, Rosette 7. With expanded language coverage, improved entity extraction accuracy, new name matching and translation modules, and native integration with Apache Lucene/Solr, Rosette 7 enables global enterprises, financial institutions, content providers, and intelligence agencies to quickly and accurately identify, analyze, search, and extract meaning from unstructured text in over twenty major languages.

Rosette supports a wide range of applications:

  • Global enterprises are deploying document management systems and XML databases capable of smart retrieval and navigation in multiple languages
  • Legal teams are quickly locating relevant documents buried in multilingual repositories for e-discovery
  • Financial institutions are increasing accuracy and reducing false positives for anti-money laundering and counter-terrorism financing regulatory compliance
  • Intelligence officials are improving watch list implementation by screening documents in their language of origin rather than in translated form

Highlights of Rosette 7 include:

Improved Entity Extraction: Rosette Entity Extractor rapidly locates named entities in large volumes of unstructured text by employing three complementary detection algorithms: rule-based, list-based, and statistical. Rosette 7's improved extractor delivers breakthrough gains in speed and accuracy by dramatically shortening the length of time needed to train its statistical algorithms on new languages or entity types. Search-based applications are exploiting entity extraction to automatically generate metadata to filter search results, enable faceted navigation, deliver alerts, and feed downstream processes.

Integration with Lucene-based Applications: Agile businesses deploying the popular Apache Lucene/Solr open source search toolkits can now benefit from the same advanced linguistic processing used by high-end Web and enterprise search engines. Rosette easily integrates with Lucene to index and search text in English, French, Italian, German, and Spanish as well as such complex languages as Arabic, Chinese, Farsi, Japanese, Korean, and Russian.

Name Matching and Indexing: Rosette Name Indexer matches names of people, places, or organizations, regardless of the language in which they are written, against entries in multilingual databases, while processing many types of intentional and unintentional name variants: script (Arabic vs. Hanzi vs. Latin); phonetic; orthographic; missing or disordered name components; formal and informal titles; initials; nicknames and aliases.

High-Accuracy Name Translation: Rosette Name Translator analyzes the fundamental linguistic structure of foreign personal names in Arabic, Chinese, Dari, Farsi, Korean, Pushto, or Urdu to produce highly accurate translations into English in compliance with applicable institutional or government standards.

Expanded Language Coverage: Rosette 7 offers Pushto and Dari to support peacekeeping and reconstruction efforts in Afghanistan. Improved language disambiguation between Cyrillic languages (such as Russian and Bulgarian) and more accurate name indexing for Arabic, Chinese, Farsi, and Urdu have also been added.

Rosette 7 is available now. For more information, visit www.basistech.com.

TDWI Membership

Get immediate access to training discounts, video library, BI Teams, Skills, Budget Report, and more

Individual, Student, & Team memberships available.