Apache Spark in the Cloud Training Course

Duration

21 hours (usually 3 days including breaks)

Requirements

Programing skills (preferably python, scala)

SQL basics

Overview

Apache Spark’s learning curve is slowly increasing at the begining, it needs a lot of effort to get the first return. This course aims to jump through the first tough part. After taking this course the participants will understand the basics of Apache Spark , they will clearly differentiate RDD from DataFrame, they will learn Python and Scala API, they will understand executors and tasks, etc.  Also following the best practices, this course strongly focuses on cloud deployment, Databricks and AWS. The students will also understand the differences between AWS EMR and AWS Glue, one of the lastest Spark service of AWS.  

AUDIENCE:

Data Engineer, DevOps, Data Scientist

Course Outline

Introduction:

  • Apache Spark in Hadoop Ecosystem
  • Short intro for python, scala

Basics (theory):

  • Architecture
  • RDD
  • Transformation and Actions
  • Stage, Task, Dependencies

Using Databricks environment understand the basics (hands-on workshop):

  • Exercises using RDD API
  • Basic action and transformation functions
  • PairRDD
  • Join
  • Caching strategies
  • Exercises using DataFrame API
  • SparkSQL
  • DataFrame: select, filter, group, sort
  • UDF (User Defined Function)
  • Looking into DataSet API
  • Streaming

Using AWS environment understand the deployment (hands-on workshop):

  • Basics of AWS Glue
  • Understand differencies between AWS EMR and AWS Glue
  • Example jobs on both environment
  • Understand pros and cons

Extra:

  • Introduction to Apache Airflow orchestration

Python and Computer Neurals

control machines , neural network between networks , process and process development , cloud

Requirements

  • pycharm , python editor

Description

how to move ahead with construction of system which acts a chip for human intelligence , the first of the work talks about what is data and how data can be seen in neural formats and slightly moving ahead in second part the work talks how the data can be coded into electronic impulses or electronic circuits and how these circuits may help in building a circuit and in the third part of the work the internal process handling is taken out and transfer of process and study on structure of the process and how these processes are managed are included and in the last part of the work the virtualization of these process are presented

Who this course is for:

  • Beginner

Course content