Duration
21 hours (usually 3 days including breaks)
Requirements
- A basic understanding of data infrastructure.
- A general knowledge of distributed systems.
- Basic Linux command line familiarity.
Audience
- Application developers
- Software engineers
- Technical consultants
- DevOps professionals
- Architecture engineers
Overview
Apache Druid is an open-source, column-oriented, distributed data store written in Java. It was designed to quickly ingest massive quantities of event data and execute low-latency OLAP queries on that data. Druid is commonly used in business intelligence applications to analyze high volumes of real-time and historical data. It is also well suited for powering fast, interactive, analytic dashboards for end-users. Druid is used by companies such as Alibaba, Airbnb, Cisco, eBay, Netflix, Paypal, and Yahoo.
In this instructor-led, live course we explore some of the limitations of data warehouse solutions and discuss how Druid can compliment those technologies to form a flexible and scalable streaming analytics stack. We walk through many examples, offering participants the chance to implement and test Druid-based solutions in a lab environment.
Format of the Course
- Part lecture, part discussion, heavy hands-on practice, occasional tests to gauge understanding
Course Outline
Introduction
Installing and Starting Apache Druid
Druid Architecture and Design
Real-Time Ingestion of Event Data
Sharding and Indexing
Loading Data
Querying Data
Visualizing Data
Running a Distributed Cluster
Druid + Apache Hive
Druid + Apache Kafka
Druid + Others
Troubleshooting
Administrative Tasks
Summary and Conclusion