SQL For Data Science and Data Analysis Training Course

Duration

14 hours (usually 2 days including breaks)

Requirements

  • An understanding of  databases
  • Experience with SQL an asset.

Audience

  • Business analysts
  • Software developers
  • Database developers

Overview

This instructor-led, live training (online or onsite) is aimed at software developers, managers, and business analyst who wish to use big data systems to store and retrieve large amounts of data.

By the end of this training, participants will be able to:

  • Query large amounts of data efficiently.
  • Understand how Big Data system store and retrieve data
  • Use the latest big data systems available
  • Wrangle data from data systems into reporting systems
  • Learn to write SQL queries in:
    • MySQL
    • Postgres
    • Hive Query Language (HiveQL/HQL)
    • Redshift 

Format of the Course

  • Interactive lecture and discussion.
  • Lots of exercises and practice.
  • Hands-on implementation in a live-lab environment.

Course Customization Options

  • To request a customized training for this course, please contact us to arrange.

Course Outline

Lesson 1 – SQL basics: 

  • Select statements
  • Join types
  • Indexes
  • Views
  • Subqueries
  • Union
  • Creating tables
  • Loading data
  • Dumping data
  • NoSQL

Lesson 2 – Data Modeling:

  • Transaction based ER systems
  • Data warehousing 
  • Data warehouse models
    • Star schema
    • Snowflake schemas
  • Slowly changing dimensions (SCD)
  • Structured and non-structured data
  • Different table type storage engines:
    • Column based
    • Document-based
    • In Memory

Lesson 3 – Index in the NoSQL/Data science world

  • Constraints (Primary)
  • Index-based scanning
  • performance tuning

Lesson 4 – NoSQL and non-structured data

  • When to use NoSQL
  • Eventually consistent data
  • Schema on read vs. Schema on write

Lesson 5 – SQL for data analytics

  • Windowing function
  • Lateral Joins
  • Lead & Lag

Lesson 6 – HiveQL

  • SQL Support
  • External and Internal Tables
  • Joins
  • Partitions
  • Correlated subqueries
  • Nested queries
  • When to use Hive

Lesson 7 – Redshift

  • Design and structured
  • Locks and shared resources
  • Postgres differences
  • When to use redshift

Leave a Reply

Your email address will not be published. Required fields are marked *