
Duration
14 hours (usually 2 days including breaks)
Requirements
- An understanding of databases
- Experience with SQL an asset.
Audience
- Business analysts
- Software developers
- Database developers
Overview
This instructor-led, live training (online or onsite) is aimed at software developers, managers, and business analyst who wish to use big data systems to store and retrieve large amounts of data.
By the end of this training, participants will be able to:
- Query large amounts of data efficiently.
- Understand how Big Data system store and retrieve data
- Use the latest big data systems available
- Wrangle data from data systems into reporting systems
- Learn to write SQL queries in:
- MySQL
- Postgres
- Hive Query Language (HiveQL/HQL)
- Redshift
Format of the Course
- Interactive lecture and discussion.
- Lots of exercises and practice.
- Hands-on implementation in a live-lab environment.
Course Customization Options
- To request a customized training for this course, please contact us to arrange.
Course Outline
Lesson 1 – SQL basics:
- Select statements
- Join types
- Indexes
- Views
- Subqueries
- Union
- Creating tables
- Loading data
- Dumping data
- NoSQL
Lesson 2 – Data Modeling:
- Transaction based ER systems
- Data warehousing
- Data warehouse models
- Star schema
- Snowflake schemas
- Slowly changing dimensions (SCD)
- Structured and non-structured data
- Different table type storage engines:
- Column based
- Document-based
- In Memory
Lesson 3 – Index in the NoSQL/Data science world
- Constraints (Primary)
- Index-based scanning
- performance tuning
Lesson 4 – NoSQL and non-structured data
- When to use NoSQL
- Eventually consistent data
- Schema on read vs. Schema on write
Lesson 5 – SQL for data analytics
- Windowing function
- Lateral Joins
- Lead & Lag
Lesson 6 – HiveQL
- SQL Support
- External and Internal Tables
- Joins
- Partitions
- Correlated subqueries
- Nested queries
- When to use Hive
Lesson 7 – Redshift
- Design and structured
- Locks and shared resources
- Postgres differences
- When to use redshift