Data Vault: Building a Scalable Data Warehouse Training Course

Duration

28 hours (usually 4 days including breaks)

Requirements

  • An understanding of data warehousing concepts
  • An understanding of database and data modeling concepts

Audience

  • Data modelers
  • Data warehousing specialist
  • Business Intelligence specialists
  • Data engineers
  • Database administrators

Overview

Data Vault Modeling is a database modeling technique that provides long-term historical storage of data that originates from multiple sources. A data vault stores a single version of the facts, or “all the data, all the time”. Its flexible, scalable, consistent and adaptable design encompasses the best aspects of 3rd normal form (3NF) and star schema.

In this instructor-led, live training, participants will learn how to build a Data Vault.

By the end of this training, participants will be able to:

  • Understand the architecture and design concepts behind Data Vault 2.0, and its interaction with Big Data, NoSQL and AI.
  • Use data vaulting techniques to enable auditing, tracing, and inspection of historical data in a data warehouse.
  • Develop a consistent and repeatable ETL (Extract, Transform, Load) process.
  • Build and deploy highly scalable and repeatable warehouses.

Format of the course

  • Part lecture, part discussion, exercises and heavy hands-on practice

Course Outline

Introduction

  • The shortcomings of existing data warehouse data modeling architectures
  • Benefits of Data Vault modeling

Overview of Data Vault architecture and design principles

  • SEI / CMM / Compliance

Data Vault applications

  • Dynamic Data Warehousing
  • Exploration Warehousing
  • In-Database Data Mining
  • Rapid Linking of External Information

Data Vault components

  • Hubs, Links, Satellites

Building a Data Vault

Modeling Hubs, Links and Satellites

Data Vault reference rules

How components interact with each other

Modeling and populating a Data Vault

Converting 3NF OLTP to a Data Vault Enterprise Data Warehouse (EDW)

Understanding load dates, end-dates, and join operations

Business keys, relationships, link tables and join techniques

Query techniques

Load processing and query processing

Overview of Matrix Methodology

Getting data into data entities

Loading Hub Entities

Loading Link Entities

Loading Satellites

Using SEI/CMM Level 5 templates to obtain repeatable, reliable, and quantifiable results

Developing a consistent and repeatable ETL (Extract, Transform, Load) process

Building and deploying highly scalable and repeatable warehouses

Closing remarks

Leave a Reply

Your email address will not be published. Required fields are marked *