Apache Spark Fundamentals Training Course – Bluechip AI Asia, AI Development Company

Duration

21 hours (usually 3 days including breaks)

Requirements

Experience with the Linux command line
A general understanding of data processing
Programming experience with Java, Scala, Python, or R

Audience

Developers

Overview

Apache Spark is an analytics engine designed to distribute data across a cluster in order to process it in parallel. It contains modules for streaming, SQL, machine learning and graph processing.

This instructor-led, live training (online or onsite) is aimed at engineers who wish to deploy Apache Spark system for processing very large amounts of data.

By the end of this training, participants will be able to:

Install and configure Apache Spark.
Understand the difference between Apache Spark and Hadoop MapReduce and when to use which.
Quickly read in and analyze very large data sets.
Integrate Apache Spark with other machine learning tools.

Format of the Course

Interactive lecture and discussion.
Lots of exercises and practice.
Hands-on implementation in a live-lab environment.

Course Customization Options

To request a customized training for this course, please contact us to arrange.

Course Outline

Introduction

Apache Spark vs Hadoop MapReduce

Overview of Apache Spark Features and Architecture

Choosing a Programming Language

Setting up Apache Spark

Creating a Sample Application

Choosing the Data Set

Running Data Analysis on the Data

Processing of Structured Data with Spark SQL

Processing Streaming Data with Spark Streaming

Integrating Apache Spark with 3rd Part Machine Learning Tools

Using Apache Spark for Graph Processing

Optimizing Apache Spark

Troubleshooting

Summary and Conclusion

Duration

Requirements

Overview

Course Outline

Leave a Reply Cancel reply