Real-Time – Bluechip AI Asia, AI Development Company

Apache Druid for Real-Time Data Analysis Training Course

Posted on September 29, 2023 by admin

Duration

21 hours (usually 3 days including breaks)

Requirements

A basic understanding of data infrastructure.
A general knowledge of distributed systems.
Basic Linux command line familiarity.

Audience

Application developers
Software engineers
Technical consultants
DevOps professionals
Architecture engineers

Overview

Apache Druid is an open-source, column-oriented, distributed data store written in Java. It was designed to quickly ingest massive quantities of event data and execute low-latency OLAP queries on that data. Druid is commonly used in business intelligence applications to analyze high volumes of real-time and historical data. It is also well suited for powering fast, interactive, analytic dashboards for end-users. Druid is used by companies such as Alibaba, Airbnb, Cisco, eBay, Netflix, Paypal, and Yahoo.

In this instructor-led, live course we explore some of the limitations of data warehouse solutions and discuss how Druid can compliment those technologies to form a flexible and scalable streaming analytics stack. We walk through many examples, offering participants the chance to implement and test Druid-based solutions in a lab environment.

Format of the Course

Part lecture, part discussion, heavy hands-on practice, occasional tests to gauge understanding

Course Outline

Introduction

Installing and Starting Apache Druid

Druid Architecture and Design

Real-Time Ingestion of Event Data

Sharding and Indexing

Loading Data

Querying Data

Visualizing Data

Running a Distributed Cluster

Druid + Apache Hive

Druid + Apache Kafka

Druid + Others

Troubleshooting

Administrative Tasks

Summary and Conclusion

Real-Time Stream Processing with MapR Training Course

Posted on September 29, 2023 by admin

Duration

7 hours (usually 1 day including breaks)

Requirements

An understanding of Big Data concepts
An understanding of Hadoop concepts
Java programming experience
Comfortable using a Linux command line

Overview

In this instructor-led, live training, participants will learn the core concepts behind MapR Stream Architecture as they develop a real-time streaming application.

By the end of this training, participants will be able to build producer and consumer applications for real-time stream data procesing.

Audience

Developers
Administrators

Format of the course

Part lecture, part discussion, exercises and heavy hands-on practice

Note

To request a customized training for this course, please contact us to arrange.

Course Outline

Introduction

Overview of MapR Streams Architecture

MapR Stream Core Components

Understanding How Messages Are Managed in MapR Streams

Understanding Producers and Consumers

Developing a MapR Streams Application

Streams, Producer, Consumer
Using the Kafka Java API

Working with Properties and Options

Summary and Conclusion

Vespa: Serving Large-Scale Data in Real-Time Training Course

Posted on September 22, 2023September 22, 2023 by admin

Duration

14 hours (usually 2 days including breaks)

Requirements

An understanding of big concepts
An understanding of big data systems such as Hadoop and Storm
Experience working with the command line

Overview

Vespa is an open-source big data processing and serving engine created by Yahoo. It is used to respond to user queries, make recommendations, and provide personalized content and advertisements in real-time.

This instructor-led, live training introduces the challenges of serving large-scale data and walks participants through the creation of an application that can compute responses to user requests, over large datasets in real-time.

By the end of this training, participants will be able to:

Use Vespa to quickly compute data (store, search, rank, organize) at serving time while a user waits
Implement Vespa into existing applications involving feature search, recommendations, and personalization
Integrate and deploy Vespa with existing big data systems such as Hadoop and Storm.

Audience

Developers

Format of the course

Part lecture, part discussion, exercises and heavy hands-on practice

Course Outline

To request a customized course outline for this training, please contact us.

Apache Kylin: From Classic OLAP to Real-Time Data Warehouse Training Course

Posted on September 15, 2023 by admin

Duration

14 hours (usually 2 days including breaks)

Requirements

An understanding of Hadoop

Overview

Apache Kylin is an extreme, distributed analytics engine for big data.

In this instructor-led live training, participants will learn how to use Apache Kylin to set up a real-time data warehouse.

By the end of this training, participants will be able to:

Consume real-time streaming data using Kylin
Utilize Apache Kylin’s powerful features, rich SQL interface, spark cubing and subsecond query latency

Note

We use the latest version of Kylin (as of this writing, Apache Kylin v2.0)

Audience

Big data engineers
Big Data analysts

Format of the course

Part lecture, part discussion, exercises and heavy hands-on practice

Course Outline

To request a customized course outline for this training, please contact us.