Intelligent Data and Analytics Software: The Latest from Five Vendors
A quick look at how five vendors have embedded machine learning, NLP, or other advanced analytics technologies into their platforms.
- By Fern Halper
- November 15, 2017
Strata Data (formally Strata Hadoop World) is always a good place to speak to lots of vendors. This year was no exception. I met with vendors that offered solutions across the data spectrum -- from data security to data management and to advanced analytics -- and found it particularly interesting how many companies are embedding advanced analytics into their software.
Vendors are implementing advanced analytics in everything from data management and data preparation to analysis and data security. It's permeating the entire analytics life cycle, including data governance. TDWI research points to intelligent data and analytics software as a major market trend. Here are fivevendors I met with that embed machine learning, NLP, or other advanced analytics technologies into their platforms.
Arcadia Data's vision is to enable customers moving to new big data platforms (such as data lakes on Hadoop) or help customers use cloud platforms to perform visual analytics. Arcadia Data believes the future is in machine-assisted insights for self-service analytics. To that end, Arcadia Enterprise provides machine-assisted data discovery.
Prior to Strata, the company announced enhancements to Arcadia Enterprise, including a built-in recommendation engine that provides side-by-side comparisons of ideal charts and/or graphic options for live big data. Business analysts can handle larger volumes of data with advanced capabilities to develop real-time visualizations and navigate the complex data types in many previously untapped data sources.
Ten-year-old Dataguise detects sensitive data (e.g., PII, PCI) and offers remediation services. Its flagship product, DGSecure, uses machine learning, NLP, and context-sensitive search to detect this data. Once found, the company shows the sensitive data to users on a dashboard. It then can provide remediation such as such as masking and encryption.
DGSecure also provides tooling to understand what other sensitive data is connected to and mingling with it and who is accessing that data. This kind of service is important for the European Union's General Data Protection Regulation (GDPR), which dictates how and where customer data must be stored. If you haven't already heard about GDPR, expect to hear more soon. (For an introduction, read "Enterprises Facing New, Stringent Privacy Regulations.")
One major use case for Dataguise is moving data to the cloud for analytics. The company partners with cloud providers such as Google and Microsoft to surface sensitive data. At Strata, the company announced several enhancements. DGSecure 6.2 now includes user and entity behavioral analytics and monitoring using machine learning algorithms. DGSecure Monitor can send alerts about atypical user behavior based on profile analytics and machine learning.
The core H20.ai platform is H20 -- an in-memory distributed machine learning platform that provides numerous algorithms out of the box, refactored to support large-scale distributed environments. H20.ai also offers Deep Water, which integrates open source deep learning frameworks such as MXNet, Tensorflow, and Caffe. Sparkling Water is its integration with Spark. Steam is used to deploy models into production. These are integrated into the H20 user interface.
At Strata, the company announced its driverless AI product -- an automated machine learning tool that engineers features, helps with model interpretation to provide clear explanations of results, and suggests data visualizations. The company describes it as "the intelligence of a Kaggle Grandmaster in a box." The product can use GPU technology for improved performance. Whereas much of the company's past revenue came from enterprise support licenses, this is a commercial product.
Io-Tahoe, originally known as Rokitt Astra, uses machine learning, heuristics, and other algorithms to discover data relationships not necessarily found in metadata. Organizations are particularly interested in Io-Tahoe when they are migrating to the cloud, concerned about data lineage, or dealing with unstructured data in a data lake. The product consists of six main features:
- Relationship discovery uses machine learning algorithms to automatically learn relationships and dependencies within a database environment
- Data flow discovery utilizes machine learning to automatically discover data flows across multiple data sources (including data lakes)
- Sensitive data discovery automatically detects sensitive data (important for GDPR)
- Impact analysis examines how data has changed over time
- Redundant data cleanup detects duplicated data across multiple platforms (such as data lakes or relational databases) and can delete that data
- Data cataloging automatically classifies and catalogs data
Tellius' vision is to build the most intelligent yet simple-to-use interface for business analytics. The company provides AI-powered analytics in three products: SearchQL, Smart Insights Engine, and its advanced analytics library.
SearchQL uses natural language search for self-service analytics against even big data. The product uses data, metadata, and usage patterns to suggest searches as the user types. Tellius also offers its Smart Insights Engine that employs machine learning to deliver automated insights and rank these insights by relevance. The product includes an advanced analytics library that includes predictive analytics tools. Models can be deployed into production using REST APIs.
Tellius runs on Apache Spark to take advantage of its speed for big data analytics.
A Final Word
TDWI expects to hear much more from data and analytics software companies embedding advanced analytics -- i.e., "smarts" -- into their products to make them easier to use and reduce time to value. For more on this topic, see the recently released Best Practices Report on AI, Machine Learning, and NLP from TDWI research.
Fern Halper, Ph.D., is well known in the analytics community, having published hundreds of articles, research reports, speeches, webinars, and more on data mining and information technology over the past 20 years. Halper is also co-author of several “Dummies” books on cloud computing, hybrid cloud, and big data. She is the director of TDWI Research for advanced analytics, focusing on predictive analytics, social media analysis, text analytics, cloud computing, and “big data” analytics approaches. She has been a partner at industry analyst firm Hurwitz & Associates and a lead analyst for Bell Labs. Her Ph.D. is from Texas A&M University. You can reach her at firstname.lastname@example.org, on Twitter @fhalper, and on LinkedIn at linkedin.com/in/fbhalper.