Stream Processing – Bluechip AI Asia, AI Development Company

Unified Batch and Stream Processing with Apache Beam Training Course

Posted on September 29, 2023 by admin

Duration

14 hours (usually 2 days including breaks)

Requirements

Experience with Python Programming.
Experience with the Linux command line.

Audience

Developers

Overview

Apache Beam is an open source, unified programming model for defining and executing parallel data processing pipelines. It’s power lies in its ability to run both batch and streaming pipelines, with execution being carried out by one of Beam’s supported distributed processing back-ends: Apache Apex, Apache Flink, Apache Spark, and Google Cloud Dataflow. Apache Beam is useful for ETL (Extract, Transform, and Load) tasks such as moving data between different storage media and data sources, transforming data into a more desirable format, and loading data onto a new system.

In this instructor-led, live training (onsite or remote), participants will learn how to implement the Apache Beam SDKs in a Java or Python application that defines a data processing pipeline for decomposing a big data set into smaller chunks for independent, parallel processing.

By the end of this training, participants will be able to:

Install and configure Apache Beam.
Use a single programming model to carry out both batch and stream processing from withing their Java or Python application.
Execute pipelines across multiple environments.

Format of the Course

Part lecture, part discussion, exercises and heavy hands-on practice

Note

This course will be available Scala in the future. Please contact us to arrange.

Course Outline

Introduction

Apache Beam vs MapReduce, Spark Streaming, Kafka Streaming, Storm and Flink

Installing and Configuring Apache Beam

Overview of Apache Beam Features and Architecture

Beam Model, SDKs, Beam Pipeline Runners
Distributed processing back-ends

Understanding the Apache Beam Programming Model

How a pipeline is executed

Running a sample pipeline

Preparing a WordCount pipeline
Executing the Pipeline locally

Designing a Pipeline

Planning the structure, choosing the transforms, and determining the input and output methods

Creating the Pipeline

Writing the driver program and defining the pipeline
Using Apache Beam classes
Data sets, transforms, I/O, data encoding, etc.

Executing the Pipeline

Executing the pipeline locally, on remote machines, and on a public cloud
Choosing a runner
Runner-specific configurations

Testing and Debugging Apache Beam

Using type hints to emulate static typing
Managing Python Pipeline Dependencies

Processing Bounded and Unbounded Datasets

Windowing and Triggers

Making Your Pipelines Reusable and Maintainable

Create New Data Sources and Sinks

Apache Beam Source and Sink API

Integrating Apache Beam with other Big Data Systems

Apache Hadoop, Apache Spark, Apache Kafka

Troubleshooting

Summary and Conclusion

Stream Processing with Kafka Streams Training Course

Posted on September 29, 2023 by admin

Duration

7 hours (usually 1 day including breaks)

Requirements

An understanding of Apache Kafka
Java programming experience

Overview

Kafka Streams is a client-side library for building applications and microservices whose data is passed to and from a Kafka messaging system. Traditionally, Apache Kafka has relied on Apache Spark or Apache Storm to process data between message producers and consumers. By calling the Kafka Streams API from within an application, data can be processed directly within Kafka, bypassing the need for sending the data to a separate cluster for processing.

In this instructor-led, live training, participants will learn how to integrate Kafka Streams into a set of sample Java applications that pass data to and from Apache Kafka for stream processing.

By the end of this training, participants will be able to:

Understand Kafka Streams features and advantages over other stream processing frameworks
Process stream data directly within a Kafka cluster
Write a Java or Scala application or microservice that integrates with Kafka and Kafka Streams
Write concise code that transforms input Kafka topics into output Kafka topics
Build, package and deploy the application

Audience

Developers

Format of the course

Part lecture, part discussion, exercises and heavy hands-on practice

Notes

To request a customized training for this course, please contact us to arrange

Course Outline

Introduction

Kafka vs Spark, Flink, and Storm

Overview of Kafka Streams Features

Stateful and stateless processing, event-time processing, DSL, event-time based windowing operations, etc.

Case Study: Kafka Streams API for Predictive Budgeting

Setting up the Development Environment

Creating a Streams Application

Starting the Kafka Cluster

Preparing the Topics and Input Data

Options for Processing Stream Data

High-level Kafka Streams DSL
Lower-level Processor

Transforming the Input Data

Inspecting the Output Data

Stopping the Kafka Cluster

Options for Deploying the Application

Classic ops tools (Puppet, Chef and Salt)
Docker
WAR file

Troubleshooting

Summary and Conclusion

Real-Time Stream Processing with MapR Training Course

Posted on September 29, 2023 by admin

Duration

7 hours (usually 1 day including breaks)

Requirements

An understanding of Big Data concepts
An understanding of Hadoop concepts
Java programming experience
Comfortable using a Linux command line

Overview

In this instructor-led, live training, participants will learn the core concepts behind MapR Stream Architecture as they develop a real-time streaming application.

By the end of this training, participants will be able to build producer and consumer applications for real-time stream data procesing.

Audience

Developers
Administrators

Format of the course

Part lecture, part discussion, exercises and heavy hands-on practice

Note

To request a customized training for this course, please contact us to arrange.

Course Outline

Introduction

Overview of MapR Streams Architecture

MapR Stream Core Components

Understanding How Messages Are Managed in MapR Streams

Understanding Producers and Consumers

Developing a MapR Streams Application

Streams, Producer, Consumer
Using the Kafka Java API

Working with Properties and Options

Summary and Conclusion