Spark Streaming with Python and Kafka Training Course

Duration

7 hours (usually 1 day including breaks)

Requirements

  • Experience with Python and Apache Kafka
  • Familiarity with stream-processing platforms

Audience

  • Data engineers
  • Data scientists
  • Programmers

Overview

Apache Spark Streaming is a scalable, open source stream processing system that allows users to process real-time data from supported sources. Spark Streaming enables fault-tolerant processing of data streams.

This instructor-led, live training (online or onsite) is aimed at data engineers, data scientists, and programmers who wish to use Spark Streaming features in processing and analyzing real-time data.

By the end of this training, participants will be able to use Spark Streaming to process live data streams for use in databases, filesystems, and live dashboards.

Format of the Course

  • Interactive lecture and discussion.
  • Lots of exercises and practice.
  • Hands-on implementation in a live-lab environment.

Course Customization Options

  • To request a customized training for this course, please contact us to arrange.

Course Outline

Introduction

Overview of Spark Streaming Features and Architecture

  • Supported data sources
  • Core APIs

Preparing the Environment

  • Dependencies
  • Spark and streaming context
  • Connecting to Kafka

Processing Messages

  • Parsing inbound messages as JSON
  • ETL processes
  • Starting the streaming context

Performing a Windowed Stream Processing

  • Slide interval
  • Checkpoint delivery configuration
  • Launching the environment

Prototyping the Processing Code

  • Connecting to a Kafka topic
  • Retrieving JSON from data source using Paw
  • Variations and additional processing

Streaming the Code

  • Job control variables
  • Defining values to match
  • Functions and conditions

Acquiring Stream Output

  • Counters
  • Kafka output (matched and non-matched)

Troubleshooting

Summary and Conclusion

Stream Processing with Kafka Streams Training Course

Duration

7 hours (usually 1 day including breaks)

Requirements

  • An understanding of Apache Kafka
  • Java programming experience

Overview

Kafka Streams is a client-side library for building applications and microservices whose data is passed to and from a Kafka messaging system. Traditionally, Apache Kafka has relied on Apache Spark or Apache Storm to process data between message producers and consumers. By calling the Kafka Streams API from within an application, data can be processed directly within Kafka, bypassing the need for sending the data to a separate cluster for processing.

In this instructor-led, live training, participants will learn how to integrate Kafka Streams into a set of sample Java applications that pass data to and from Apache Kafka for stream processing.

By the end of this training, participants will be able to:

  • Understand Kafka Streams features and advantages over other stream processing frameworks
  • Process stream data directly within a Kafka cluster
  • Write a Java or Scala application or microservice that integrates with Kafka and Kafka Streams
  • Write concise code that transforms input Kafka topics into output Kafka topics
  • Build, package and deploy the application

Audience

  • Developers

Format of the course

  • Part lecture, part discussion, exercises and heavy hands-on practice

Notes

  • To request a customized training for this course, please contact us to arrange

Course Outline

Introduction

  • Kafka vs Spark, Flink, and Storm

Overview of Kafka Streams Features

  • Stateful and stateless processing, event-time processing, DSL, event-time based windowing operations, etc.

Case Study: Kafka Streams API for Predictive Budgeting

Setting up the Development Environment

Creating a Streams Application

Starting the Kafka Cluster

Preparing the Topics and Input Data

Options for Processing Stream Data

  • High-level Kafka Streams DSL
  • Lower-level Processor

Transforming the Input Data

Inspecting the Output Data

Stopping the Kafka Cluster

Options for Deploying the Application

  • Classic ops tools (Puppet, Chef and Salt)
  • Docker
  • WAR file

Troubleshooting

Summary and Conclusion