Distributed Computing using Apache Spark™ (NEW)
Duration: One Day Course
This one-day course is for data engineers, analysts, and architects; software engineers; IT operations; and technical managers interested in an overview of Apache Spark. Why take this course? The implementation of in-memory computing within enterprises is now real. Executives are looking to leverage the concept to benefit analytics and deliver superior business insights in near real-time. How do we get there? What needs to be implemented? How can you learn this concept?
This course will discuss in-memory processing, the basic internals of the Apache Spark framework, resilient distributed datasets, SQL DataFrames, Spark’s streaming capabilities, and programming with Spark. Each topic includes slide and lecture content along with examples regarding the use of a Spark cluster.
You Will Learn How To
- Experiment with use cases for Spark, including extract-transform-load operations, data analytics, data visualization, batch analysis, machine learning, graph processing, and stream processing
- Identify Spark capabilities appropriate to your business needs
- Communicate with team members and engineers using appropriate terminology
- Build data pipelines and query large data sets using Spark SQL and DataFrames.
- Execute and modify extract-transform-load (ETL) jobs to process big data using the Spark API, DataFrames, and resilient distributed datasets (RDD).
- Analyze Spark jobs using the administration UIs.