Duration
21 hours (usually 3 days including breaks)
Requirements
- An understanding of Hadoop, NoSQL, and other data storage concepts
- Experience with writing SQL queries
- Experience with Linux command line
Overview
Apache Drill is a schema-free, distributed, in-memory columnar SQL query engine for Hadoop, NoSQL and other Cloud and file storage systems. The power of Apache Drill lies in its ability to join data from multiple data stores using a single query. Apache Drill supports numerous NoSQL databases and file systems, including HBase, MongoDB, MapR-DB, HDFS, MapR-FS, Amazon S3, Azure Blob Storage, Google Cloud Storage, Swift, NAS and local files. Apache Drill is the open source version of Google’s Dremel system which is available as an infrastructure service called Google BigQuery.
In this instructor-led, live training, participants will learn the fundamentals of Apache Drill, then leverage the power and convenience of SQL to interactively query big data across multiple data sources, without writing code. Participants will also learn how to optimize their Drill queries for distributed SQL execution.
By the end of this training, participants will be able to:
- Perform “self-service” exploration on structured and semi-structured data on Hadoop
- Query known as well as unknown data using SQL queries
- Understand how Apache Drills receives and executes queries
- Write SQL queries to analyze different types of data, including structured data in Hive, semi-structured data in HBase or MapR-DB tables, and data saved in files such as Parquet and JSON.
- Use Apache Drill to perform on-the-fly schema discovery, bypassing the need for complex ETL and schema operations
- Integrate Apache Drill with BI (Business Intelligence) tools such as Tableau, Qlikview, MicroStrategy and Excel
Audience
- Data analysts
- Data scientists
- SQL programmers
Format of the course
- Part lecture, part discussion, exercises and heavy hands-on practice
Course Outline
Introduction to Apache Drill
How does Apache Drill compare to Spark SQL, Hive and Impala?
Overview of Apache Drill Features and Architecture
- Apache Drill Components
Performing SQL Queries in Apache Drill
Understanding Data Types and Formats
Working with Schemas
Case Study and Exercise: Querying Sales Data for the Year
Performing Queries on JSON Data
Combining Data Types in SQL Queries
Creating and Dropping Tables and Views
Using Nested Data and Window Functions
Performing Data Analysis with Apache Drill
Case Study and Exercise: Analyzing the Results of a Marketing Campaign
Designing a Query Plan in Apache Drill
Optimizing Queries in Apache Drill
Integrating Apache Drill with MS Excel
Using Apache Drill ODBC/JDBC drivers to plug into Tableau, MicroStrategy, Qlikview, etc.
Case Study and Exercise: Visualizing the Data and the Power of a Good Story
Understanding Apache Drill’s Decentralized Security Model
Apache Drill Performance and Debugging
Summary and Conclusion