Duration
21 hours (usually 3 days including breaks)
Requirements
- A basic understanding of data infrastructure.
- A general knowledge of distributed systems.
- Basic Linux command line familiarity.
Audience
- Application developers
- Software engineers
- Technical consultants
- DevOps professionals
- Architecture engineers
Overview
Apache Druid is an open-source, column-oriented, distributed data store written in Java. It was designed to quickly ingest massive quantities of event data and execute low-latency OLAP queries on that data. Druid is commonly used in business intelligence applications to analyze high volumes of real-time and historical data. It is also well suited for powering fast, interactive, analytic dashboards for end-users. Druid is used by companies such as Alibaba, Airbnb, Cisco, eBay, Netflix, Paypal, and Yahoo.
In this instructor-led, live course we explore some of the limitations of data warehouse solutions and discuss how Druid can compliment those technologies to form a flexible and scalable streaming analytics stack. We walk through many examples, offering participants the chance to implement and test Druid-based solutions in a lab environment.
Format of the Course
- Part lecture, part discussion, heavy hands-on practice, occasional tests to gauge understanding
Course Outline
Introduction
Installing and Starting Apache Druid
Druid Architecture and Design
Real-Time Ingestion of Event Data
Sharding and Indexing
Loading Data
Querying Data
Visualizing Data
Running a Distributed Cluster
Druid + Apache Hive
Druid + Apache Kafka
Druid + Others
Troubleshooting
Administrative Tasks
Summary and Conclusion
Duration
7 hours (usually 1 day including breaks)
Requirements
- An understanding of Big Data concepts
- An understanding of Hadoop concepts
- Java programming experience
- Comfortable using a Linux command line
Overview
In this instructor-led, live training, participants will learn the core concepts behind MapR Stream Architecture as they develop a real-time streaming application.
By the end of this training, participants will be able to build producer and consumer applications for real-time stream data procesing.
Audience
- Developers
- Administrators
Format of the course
- Part lecture, part discussion, exercises and heavy hands-on practice
Note
- To request a customized training for this course, please contact us to arrange.
Course Outline
Introduction
Overview of MapR Streams Architecture
MapR Stream Core Components
Understanding How Messages Are Managed in MapR Streams
Understanding Producers and Consumers
Developing a MapR Streams Application
- Streams, Producer, Consumer
- Using the Kafka Java API
Working with Properties and Options
Summary and Conclusion
Duration
14 hours (usually 2 days including breaks)
Requirements
- An understanding of big concepts
- An understanding of big data systems such as Hadoop and Storm
- Experience working with the command line
Overview
Vespa is an open-source big data processing and serving engine created by Yahoo. It is used to respond to user queries, make recommendations, and provide personalized content and advertisements in real-time.
This instructor-led, live training introduces the challenges of serving large-scale data and walks participants through the creation of an application that can compute responses to user requests, over large datasets in real-time.
By the end of this training, participants will be able to:
- Use Vespa to quickly compute data (store, search, rank, organize) at serving time while a user waits
- Implement Vespa into existing applications involving feature search, recommendations, and personalization
- Integrate and deploy Vespa with existing big data systems such as Hadoop and Storm.
Audience
Format of the course
- Part lecture, part discussion, exercises and heavy hands-on practice
Course Outline
To request a customized course outline for this training, please contact us.
Duration
14 hours (usually 2 days including breaks)
Requirements
- An understanding of Hadoop
Overview
Apache Kylin is an extreme, distributed analytics engine for big data.
In this instructor-led live training, participants will learn how to use Apache Kylin to set up a real-time data warehouse.
By the end of this training, participants will be able to:
- Consume real-time streaming data using Kylin
- Utilize Apache Kylin’s powerful features, rich SQL interface, spark cubing and subsecond query latency
Note
- We use the latest version of Kylin (as of this writing, Apache Kylin v2.0)
Audience
- Big data engineers
- Big Data analysts
Format of the course
- Part lecture, part discussion, exercises and heavy hands-on practice
Course Outline
To request a customized course outline for this training, please contact us.