Unifying Big Data Workloads in Apache Spark
TDWI Speaker: Fern Halper, TDWI VP and Senior Research Director
Guest Speaker: Jules S. Damji, Apache Spark Community Evangelist, Databricks
Date: Wednesday, December 13, 2017
Time: 9:00 a.m. PT, 12:00 p.m. ET
Big data can provide a significant path to value for organizations. Organizations are often making use of more advanced analytics against big data as part of this evolution. This includes using machine learning for predictive analytics to better understand and predict customer behavior. It includes analyzing more data in real time to take action on analytics. The use cases are wide and varied.
Apache Spark is gaining attention for advanced analytics because of its in-memory processing engine that is known for its speed as well as its sophisticated analytics library and support for streaming. Apache Spark was designed to offer a unified engine across diverse workloads, such as SQL, streaming, and batch analytics. Although this approach may seem counterintuitive, it has some key benefits—most important, applications can combine workloads in ways that are not possible with specialized engines, and users benefit from a uniform management environment.
Join Fern Halper and Databricks to understand more about Spark. This will include the newest unified API in Spark, Structured Streaming, which lets the engine run batch SQL or DataFrame computations incrementally over a stream of data.
- Big data, real-time use cases for advanced analytics
- Open source and Spark as an environment for big data and advanced analytics
- A unified Spark engine
- Spark Structured Streaming
Fern Halper, Ph.D.