Six Data and Analytics Trends Heard at Strata Hadoop World 2016
Details of six popular analytics trends discussed at Strata Hadoop World that employ advanced analytics and/or make data consumable.
- By Fern Halper
- October 3, 2016
Now that Strata Hadoop World in NYC is over I’m reflecting on what I heard there. It was a busy conference, jam-packed full of vendors and track talks on a range of topics related to big data. The overall theme focused around making data work. We heard keynotes about innovating with data and serendipity, how machine learning can save lives, and the responsibly of data and analytics to benefit all Americans.
More than anything else, the conference focused on the tools and technologies for putting big data to work to gain insight, build new applications, and create new opportunities. Yes, there was talk about architectures and the like. However, much of the discussion revolved around what it takes to gain value from data and how analytics and data science enables this.
This evolution highlights a shift in the market that we’re also seeing at TDWI. It is no longer simply enough to capture and manage data for reporting; the value lies in analyzing increasing amounts of disparate old and new data types.
To that end, the market is moving to address how to analyze data and what to do with data. It is also progressing toward developing applications with big data using open source and microservices that employ advanced analytics such as machine learning and natural language processing. It is pushing intelligence out beyond the dashboard and making it consumable. It is an exciting time.
Here are six trends that I heard a lot about at Strata Hadoop World that fit this theme:
Trend #1: Machine Learning
This was one of the hottest topics at the event. Machine learning involves building systems that can learn from data to identify patterns and predict future results with minimal human intervention. The computer learns from examples using either supervised or unsupervised approaches.
Almost every vendor I spoke with was utilizing machine learning in some way. Analytics vendors are offering machine-learning algorithms as part of their analytics arsenal. Vendors are incorporating it into their software -- for instance, to help in data preparation.
Others are offering machine learning as open source. Google TensorFlow is an open source machine-learning library that is supposed to be so easy to use that a cucumber farmer (who, in all fairness, was an engineer) used it to build an app that used deep learning to sort cucumbers by size, shape, and other parameters. Spark has its machine learning libraries that vendors were hyping.
Others are offering machine learning as a service. For instance, Microsoft offers machine learning in this way as part of Cortana Intelligence Suite.
Trend #2: Data Lakes
A data lake is a large collection of disparate kinds of raw data, and yes, everyone at Strata was talking about it. However, the conversation went further than how to architect a data lake, how the lake fits into an enterprise architecture for big data, how to manage the data, or even how the data lake can be a landing area for data of all kinds.
The conversation at Strata was about analyzing data in that lake -- from smart data lakes to what it takes to analyze the raw data in the data lake. Vendors and users alike were talking about using advanced algorithms to make sense of the raw data and to experiment with it.
Trend #3: Natural Language Processing (NLP)
I was excited to hear vendors and speakers talking about NLP because understanding text is an area that I’ve been following for a long time. NLP involves analyzing, understanding, and generating language to ultimately interface with systems using human language rather than computer languages.
For text, NLP often uses semantics to parse sentences to understand entities (people, places, things), concepts (words and phrases that indicate a particular idea), themes (groups of co-occurring concepts), or sentiments (positive, negative, neutral). Now, vendors including IBM, SiSense, and others are talking about using NLP as part of their discovery products so users can ask a question of the data in a human way.
They are also talking about using NLP with data in a data lake to extract meaning from the data. In other words, the data is not simply a string of text -- it has attributes that can be analyzed. Vendors are talking about using NLP in cognitive systems that combine components of NLP, machine learning, and other advanced techniques.
4: Open Source and Spark
There was a lot of talk about open source at Strata. Vendors such as SAS and IBM (at an offsite event) were touting that they are open. Speakers were pushing the advantages of open source for driving innovation and change.
The loudest open source discussion had to do with Spark. Spark is an open source, scalable, massively parallel, in-memory execution compute environment used for analytics applications. We heard about SQL on Spark and there were tutorials on Spark structured streaming for machine learning. There was talk about making Spark interfaces easier for analysts and advances in SparkR for advanced analytics. TDWI is also seeing increasing interest in Spark.
Trend #5: The Cloud
Yes, we’ve been hearing about the cloud for some time now. TDWI research indicates that resistance to the cloud is definitely diminishing. We believe that the majority of organizations will ultimately use the cloud in a hybrid model.
This was driven home at one keynote that included Nielsen. Although the company might like to move 100 percent of its data to the public cloud, it realizes that it will probably be closer to 50 percent.
Most of the vendors I spoke with at Strata offer their analytics in the public cloud, including SAP, which announced its purchase of Hadoop cloud vendor Altiscale.
Trend #6: Persona-Driven Solutions
We’ve been hearing a lot about how different users across the organization are and will be using various data and analytics solutions, and this was discussed at Strata as well. The same user interface and tool set that might be right for a data scientist isn’t necessarily what might be right for an app developer or a business analyst.
Vendors are realizing this and are offering up persona-driven solutions. For instance, Informatica offers its data management platform geared toward different personas, including the data steward, data engineer, and business analyst. Each persona has a different interface. Ditto for IBM; it offers its new DataWorks platform for different personas, including the data scientist, business analyst, and app developer.
Of course there was more -- much more -- including graph databases and graph analytics to explore relationships between entities, self-service data preparation and advanced analytics (including machine learning), and streaming analytics. Strata is a big conference and it is hard to get to everything. I’ll be writing more about the vendors I met with in future articles.
Fern Halper, Ph.D., is well known in the analytics community, having published hundreds of articles, research reports, speeches, webinars, and more on data mining and information technology over the past 20 years. Halper is also co-author of several “Dummies” books on cloud computing, hybrid cloud, and big data. She is the director of TDWI Research for advanced analytics, focusing on predictive analytics, social media analysis, text analytics, cloud computing, and “big data” analytics approaches. She has been a partner at industry analyst firm Hurwitz & Associates and a lead analyst for Bell Labs. Her Ph.D. is from Texas A&M University. You can reach her at firstname.lastname@example.org, on Twitter @fhalper, and on LinkedIn at linkedin.com/in/fbhalper.