Apache Druid for Real-Time Data Analysis Training Course

Duration

21 hours (usually 3 days including breaks)

Requirements

  • A basic understanding of data infrastructure.
  • A general knowledge of distributed systems.
  • Basic Linux command line familiarity.

Audience

  • Application developers
  • Software engineers
  • Technical consultants
  • DevOps professionals
  • Architecture engineers

Overview

Apache Druid is an open-source, column-oriented, distributed data store written in Java. It was designed to quickly ingest massive quantities of event data and execute low-latency OLAP queries on that data. Druid is commonly used in business intelligence applications to analyze high volumes of real-time and historical data. It is also well suited for powering fast, interactive, analytic dashboards for end-users. Druid is used by companies such as Alibaba, Airbnb, Cisco, eBay, Netflix, Paypal, and Yahoo.

In this instructor-led, live course we explore some of the limitations of data warehouse solutions and discuss how Druid can compliment those technologies to form a flexible and scalable streaming analytics stack. We walk through many examples, offering participants the chance to implement and test Druid-based solutions in a lab environment.

Format of the Course

  • Part lecture, part discussion, heavy hands-on practice, occasional tests to gauge understanding

Course Outline

Introduction

Installing and Starting Apache Druid

Druid Architecture and Design

Real-Time Ingestion of Event Data

Sharding and Indexing

Loading Data

Querying Data

Visualizing Data

Running a Distributed Cluster

Druid + Apache Hive

Druid + Apache Kafka

Druid + Others

Troubleshooting

Administrative Tasks

Summary and Conclusion

Real-Time Stream Processing with MapR Training Course

Duration

7 hours (usually 1 day including breaks)

Requirements

  • An understanding of Big Data concepts
  • An understanding of Hadoop concepts
  • Java programming experience
  • Comfortable using a Linux command line

Overview

In this instructor-led, live training, participants will learn the core concepts behind MapR Stream Architecture as they develop a real-time streaming application.

By the end of this training, participants will be able to build producer and consumer applications for real-time stream data procesing.

Audience

  • Developers
  • Administrators

Format of the course

  • Part lecture, part discussion, exercises and heavy hands-on practice

Note

  • To request a customized training for this course, please contact us to arrange.

Course Outline

Introduction

Overview of MapR Streams Architecture

MapR Stream Core Components

Understanding How Messages Are Managed in MapR Streams

Understanding Producers and Consumers

Developing a MapR Streams Application

  • Streams, Producer, Consumer
  • Using the Kafka Java API

Working with Properties and Options

Summary and Conclusion

Vespa: Serving Large-Scale Data in Real-Time Training Course

Duration

14 hours (usually 2 days including breaks)

Requirements

  • An understanding of big concepts
  • An understanding of big data systems such as Hadoop and Storm
  • Experience working with the command line

Overview

Vespa is an open-source big data processing and serving engine created by Yahoo.  It is used to respond to user queries, make recommendations, and provide personalized content and advertisements in real-time.

This instructor-led, live training introduces the challenges of serving large-scale data and walks participants through the creation of an application that can compute responses to user requests, over large datasets in real-time.

By the end of this training, participants will be able to:

  • Use Vespa to quickly compute data (store, search, rank, organize) at serving time while a user waits
  • Implement Vespa into existing applications involving feature search, recommendations, and personalization
  • Integrate and deploy Vespa with existing big data systems such as Hadoop and Storm.

Audience

  • Developers

Format of the course

  • Part lecture, part discussion, exercises and heavy hands-on practice

Course Outline

To request a customized course outline for this training, please contact us.

Apache Kylin: From Classic OLAP to Real-Time Data Warehouse Training Course

Duration

14 hours (usually 2 days including breaks)

Requirements

  • An understanding of Hadoop

Overview

Apache Kylin is an extreme, distributed analytics engine for big data.

In this instructor-led live training, participants will learn how to use Apache Kylin to set up a real-time data warehouse.

By the end of this training, participants will be able to:

  • Consume real-time streaming data using Kylin
  • Utilize Apache Kylin’s powerful features, rich SQL interface, spark cubing and subsecond query latency

Note

  • We use the latest version of Kylin (as of this writing, Apache Kylin v2.0)

Audience

  • Big data engineers
  • Big Data analysts

Format of the course

  • Part lecture, part discussion, exercises and heavy hands-on practice

Course Outline

To request a customized course outline for this training, please contact us.