Scala – Bluechip AI Asia, AI Development Company

Duration

21 hours (usually 3 days including breaks)

Requirements

Programming and scripting experience

Audience

Software Engineers

Overview

Scala is a condensed version of Java for large scale functional and object-oriented programming. Apache Spark Streaming is an extended component of the Spark API for processing big data sets as real-time streams. Together, Spark Streaming and Scala enable the streaming of big data.

This instructor-led, live training (online or onsite) is aimed at software engineers who wish to stream big data with Spark Streaming and Scala.

By the end of this training, participants will be able to:

Create Spark applications with the Scala programming language.
Use Spark Streaming to process continuous streams of data.
Process streams of real-time data with Spark Streaming.

Format of the Course

Interactive lecture and discussion.
Lots of exercises and practice.
Hands-on implementation in a live-lab environment.

Course Customization Options

To request a customized training for this course, please contact us to arrange.

Course Outline

Introduction

Scala Programming in Depth Review

Syntax and structure
Flow control and functions

Spark Internals

Resilient Distributed Datasets (RDD)
Spark script to graph to cluster

Overview of Spark Streaming

Streaming architecture
Intervals in streaming
Fault tolerance

Preparing the Development Environment

Installing and configuring Apache Spark
Installing and configuring the Scala IDE
Installing and configuring JDK

Spark Streaming Beginner to Advanced

Working with key/value RDD’s
Filtering RDD’s
Improving Spark scripts with regular expressions
Sharing data on a cluster
Working with network data sets
Implementing BFS algorithms
Creating Spark driver scripts
Tracking in real time with scripts
Writing continuous applications
Streaming linear regression
Using Spark Machine Learning Library

Spark and Clusters

Bundling dependencies and Spark scripts using the SBT tool
Using EMR for illustrating clusters
Optimizing by partitioning RDD’s
Using Spark logs

Integration in Spark Streaming

Integrating Apache Kafka and working with Kafka topics
Integrating Apache Fume and working with pull-based/push-based Flume configurations
Writing a custom receiver class
Integrating Cassandra and exposing data as real-time services

In Production

Packaging an application and running it with Spark-Submit
Troubleshooting, tuning, and debugging Spark Jobs and clusters

Summary and Conclusion

Duration

14 hours (usually 2 days including breaks)

Requirements

Knowledge of Java/Scala programming language. Basic familiarity with statistics and linear algebra is recommended.

Overview

The aim of this course is to provide a basic proficiency in applying Machine Learning methods in practice. Through the use of the Scala programming language and its various libraries, and based on a multitude of practical examples this course teaches how to use the most important building blocks of Machine Learning, how to make data modeling decisions, interpret the outputs of the algorithms and validate the results.

Our goal is to give you the skills to understand and use the most fundamental tools from the Machine Learning toolbox confidently and avoid the common pitfalls of Data Sciences applications.

Course Outline

Introduction to Applied Machine Learning

Statistical learning vs. Machine learning
Iteration and evaluation
Bias-Variance trade-off

Machine Learning with Scala

Choice of libraries
Add-on tools

Regression

Linear regression
Generalizations and Nonlinearity
Exercises

Classification

Bayesian refresher
Naive Bayes
Logistic regression
K-Nearest neighbors
Exercises

Cross-validation and Resampling

Cross-validation approaches
Bootstrap
Exercises

Unsupervised Learning

K-means clustering
Examples
Challenges of unsupervised learning and beyond K-means

Tag: Scala

Apache Spark Streaming with Scala Training Course

Duration

Requirements

Overview

Course Outline

Machine Learning Fundamentals with Scala and Apache Spark Training Course

Duration

Requirements

Overview

Course Outline

Introduction to Applied Machine Learning

Machine Learning with Scala

Regression

Classification

Cross-validation and Resampling

Unsupervised Learning