Duration
14 hours (usually 2 days including breaks)
Requirements
- A general understanding of programming
- An understanding of databases
- Basic knowledge of Linux
Overview
As more and more software and IT projects migrate from local processing and data management to distributed processing and big data storage, Project Managers are finding the need to upgrade their knowledge and skills to grasp the concepts and practices relevant to Big Data projects and opportunities.
This course introduces Project Managers to the most popular Big Data processing framework: Hadoop.
In this instructor-led training in, participants will learn the core components of the Hadoop ecosystem and how these technologies can be used to solve large-scale problems. By learning these foundations, participants will improve their ability to communicate with the developers and implementers of these systems as well as the data scientists and analysts that many IT projects involve.
Audience
- Project Managers wishing to implement Hadoop into their existing development or IT infrastructure
- Project Managers needing to communicate with cross-functional teams that include big data engineers, data scientists and business analysts
Format of the course
- Part lecture, part discussion, exercises and heavy hands-on practice
Course Outline
Introduction
- Why and how project teams adopt Hadoop
- How it all started
- The Project Manager’s role in Hadoop projects
Understanding Hadoop’s Architecture and Key Concepts
- HDFS
- MapReduce
- Other pieces of the Hadoop ecosystem
What Constitutes Big Data?
Different Approaches to Storing Big Data
HDFS (Hadoop Distributed File System) as the Foundation
How Big Data is Processed
- The power of distributed processing
Processing Data with MapReduce
- How data is picked apart step by step
The Role of Clustering in Large-Scale Distributed Processing
- Architectural overview
- Clustering approaches
Clustering Your Data and Processes with YARN
The Role of Non-Relational Database in Big Data Storage
Working with Hadoop’s Non-Relational Database: HBase
Data Warehousing Architectural Overview
Managing Your Data Warehouse with Hive
Running Hadoop from Shell-Scripts
Working with Hadoop Streaming
Other Hadoop Tools and Utilities
Getting Started on a Hadoop Project
- Demystifying complexity
Migrating an Existing Project to Hadoop
- Infrastructure considerations
- Scaling beyond your allocated resources
Hadoop Project Stakeholders and Their Toolkits
- Developers, data scientists, business analysts and project managers
Hadoop as a Foundation for New Technologies and Approaches
Closing Remarks