Impala for Business Intelligence Training Course

Duration

21 hours (usually 3 days including breaks)

Requirements

  • knowledge of SQL

Overview

Cloudera Impala is an open source massively parallel processing (MPP) SQL query engine for Apache Hadoop clusters.

Impala enables users to issue low-latency SQL queries to data stored in Hadoop Distributed File System and Apache Hbase without requiring data movement or transformation.

Audience

This course is aimed at analysts and data scientists performing analysis on data stored in Hadoop via Business Intelligence or SQL tools.

After this course delegates will be able to

  • Extract meaningful information from Hadoop clusters with Impala.
  • Write specific programs to facilitate Business Intelligence in Impala SQL Dialect.
  • Troubleshoot Impala.

Course Outline

Introduction to Impala

  • What is Impala?
  • How Impala Differs from Relational Databases
  • Limitations and Future Directions
  • Using the Impala Shell
  • The Impala Daemon, Statestore and Catalogue service

Loading Impala

  • Explore a New Impala Instance
  • Load CSV Data from Local Files
  • Point an Impala Table at Existing Data Files

Analyzing Data with Impala

  • Describe the Impala Table
  • Basic Syntax and Querying
  • Data Types
  • Filtering, Sorting, and Limiting Results
  • Joining and Grouping Data
  • Data Loading and Querying Examples
  • Improving Impala Performance
  • How Impala works with Hadoop file formats
  • Hands-On Exercise: Interactive Analysis with Impala

Programming Impala Applications

  • Overview of the Impala SQL Dialect
  • Overview of Impala Programming Interfaces

Troubleshooting Impala

  • Troubleshooting Impala SQL Syntax Issues
  • Troubleshooting I/O Capacity Problems
  • Impala Web User Interface for Debugging

Leave a Reply

Your email address will not be published. Required fields are marked *