Duration
21 hours (usually 3 days including breaks)
Requirements
- .NET programming experience using C# or F#
Audience
- Developers
Overview
Apache Spark is a distributed processing engine for analyzing very large data sets. It can process data in batches and real-time, as well as carry out machine learning, ad-hoc queries, and graph processing. .NET for Apache Spark is a free, open-source, and cross-platform big data analytics framework that supports applications written in C# or F#.
This instructor-led, live training (online or onsite) is aimed at developers who wish to carry out big data analysis using Apache Spark in their .NET applications.
By the end of this training, participants will be able to:
- Install and configure Apache Spark.
- Understand how .NET implements Spark APIs so that they can be accessed from a .NET application.
- Develop data processing applications using C# or F#, capable of handling data sets whose size is measured in terabytes and pedabytes.
- Develop machine learning features for a .NET application using Apache Spark capabilities.
- Carry out exploratory analysis using SQL queries on big data sets.
Format of the Course
- Interactive lecture and discussion.
- Lots of exercises and practice.
- Hands-on implementation in a live-lab environment.
Course Customization Options
- To request a customized training for this course, please contact us to arrange.
Course Outline
Introduction
Overview of Apache Spark Features and Architecture
- Apache Spark modules: Spark SQL, Spark Streaming, MLlib, GraphX
- RDD, Dataframes, drive-workers, DAG, etc.
Setting up Apache Spark on .NET
- Preparing the Java VM
- Running .NET for Apache Spark using .NET Core
Getting Started
- Creating a sample .NET console application
- Adding the Spark driver
- Initializing a SparkSession
- Executing the application
Preparing Data
- Building a data preparation pipeline
- Performing ETL (Extract, Transform, and Load)
Machine Learning
- Building a machine learning model
- Preparing the data
- Training a model
Real-time Processing
- Processed streaming data in real-time
- Case study: monitoring sensor data
Interactive Query
- Working with Spark SQL
- Analyzing structured data
Visualizing Results
- Plotting results
- Using third-party tools to visualize results
Troubleshooting
Summary and Conclusion