Big Data & Database Systems Fundamentals Training Course

Duration

14 hours (usually 2 days including breaks)

Overview

The course is part of the Data Scientist skill set (Domain: Data and Technology).

Course Outline

Data Warehousing Concepts

  • What is Data Ware House?
  • Difference between OLTP and Data Ware Housing
  • Data Acquisition
  • Data Extraction
  • Data Transformation.
  • Data Loading
  • Data Marts
  • Dependent vs Independent data Mart
  • Data Base design

ETL Testing Concepts:

  • Introduction.
  • Software development life cycle.
  • Testing methodologies.
  • ETL Testing Work Flow Process.
  • ETL Testing Responsibilities in Data stage.      

Big data Fundamentals

  • Big Data and its role in the corporate world
  • The phases of development of a Big Data strategy within a corporation
  • Explain the rationale underlying a holistic approach to Big Data
  • Components needed in a Big Data Platform
  • Big data storage solution
  • Limits of Traditional Technologies
  • Overview of database types

NoSQL Databases

Hadoop

Map Reduce

Apache Spark

Python, Spark, and Hadoop for Big Data Training Course

Duration

21 hours (usually 3 days including breaks)

Requirements

  • Experience with Spark and Hadoop
  • Python programming experience

Audience

  • Data scientists
  • Developers

Overview

Python is a scalable, flexible, and widely used programming language for data science and machine learning. Spark is a data processing engine used in querying, analyzing, and transforming big data, while Hadoop is a software library framework for large-scale data storage and processing.

This instructor-led, live training (online or onsite) is aimed at developers who wish to use and integrate Spark, Hadoop, and Python to process, analyze, and transform large and complex data sets.

By the end of this training, participants will be able to:

  • Set up the necessary environment to start processing big data with Spark, Hadoop, and Python.
  • Understand the features, core components, and architecture of Spark and Hadoop.
  • Learn how to integrate Spark, Hadoop, and Python for big data processing.
  • Explore the tools in the Spark ecosystem (Spark MlLib, Spark Streaming, Kafka, Sqoop, Kafka, and Flume).
  • Build collaborative filtering recommendation systems similar to Netflix, YouTube, Amazon, Spotify, and Google.
  • Use Apache Mahout to scale machine learning algorithms.

Format of the Course

  • Interactive lecture and discussion.
  • Lots of exercises and practice.
  • Hands-on implementation in a live-lab environment.

Course Customization Options

  • To request a customized training for this course, please contact us to arrange.

Course Outline

Introduction

  • Overview of Spark and Hadoop features and architecture
  • Understanding big data
  • Python programming basics

Getting Started

  • Setting up Python, Spark, and Hadoop
  • Understanding data structures in Python
  • Understanding PySpark API
  • Understanding HDFS and MapReduce

Integrating Spark and Hadoop with Python

  • Implementing Spark RDD in Python
  • Processing data using MapReduce
  • Creating distributed datasets in HDFS

Machine Learning with Spark MLlib

Processing Big Data with Spark Streaming

Working with Recommender Systems

Working with Kafka, Sqoop, Kafka, and Flume

Apache Mahout with Spark and Hadoop

Troubleshooting

Summary and Next Steps

Test Architect Master’s Course

Our Test Architect master’s course lets you become a Test Architect. You will work on real-world projects in Hadoop testing, Selenium testing, Software/Manual testing, ETL testing and more. In this program, you will cover 7 courses and 20 industry-based projects

Key Highlights

  • 101 Hrs Instructor Led Training
  • 96 Hrs Self-paced Videos
  • 154 Hrs Project & Exercises
  • Certification
  • Job Assistance
  • Flexible Schedule
  • Lifetime Free Upgrade
  • Mentor Support

Overview

Intellipaat’s Test Architect master’s course will provide you in-depth knowledge of testing program platforms like Hadoop testing, Selenium testing and Software/Manual testing, along with detailed knowledge of ETL testing. This program is especially designed by Industry experts, and you will get 7 courses with 20 industry-based projects.

List of Courses and Tools Included

Online Instructor-led Courses:

  • Selenium Testing
  • DevOps
  • MS-SQL

Self-paced Courses

  • Software/Manual Testing
  • ETL Testing
  • Hadoop Testing
  • Unit Testing

Tools Covered

  • Appium
  • Selenium
  • Devops
  • Testing Fundamentals

What will you learn in this master’s course?

  • HDFS architecture, flow of data, data replication, Namenode and Datanode
  • MapReduce concepts, Mapper and Reducer functions, Concurrency, Shuffle and Ordering, unit testing of Hadoop Mapper and deploying Pig and Hive
  • Unit Testing of Hadoop Mapper on a MapReduce application
  • Introduction to Selenium WebDriver, Selenium RC and programs like Textbox, Checkbox and multiple Windows
  • Using Selenium Grid for software testing and deploying Selenium IDE functions and commands
  • Advanced study of Sikuli, JUnit and TestNG Plugin in Eclipse
  • ETL basics, ETL testing process, error handling, dependency testing, constraint testing and ETL data validation
  • Designing various test cases and understanding the techniques involved
  • Gaining expertise in Bugzilla test management tool

What are the prerequisites for taking up this training program?

There are no prerequisites for taking up this training program.

Why should you take up this training program?

  • Hadoop Testing Professionals in the US can get a salary of $132,000 – indeed.com
  • Global Software Testing market to reach $50 billion by 2020 – NASSCOM
  • A Selenium Tester in the United States can earn $87,000 – indeed.com
  • A Software Tester in the United States can earn $76,000 – indeed.com

This Intellipaat master’s course has been specifically created to let you master the testing domain, along with help you gain proficiency in the ETL testing domain. Upon the completion of the training, you will be well-versed in extracting valuable insights. This way, you can apply for top jobs in the Software Testing ecosystem.

Selenium Testing

Module 01 – Core Java Concepts
Module 02 – Writing Java Programs Using Java Principles
Module 03 – Getting Started with Selenium
Module 04 – Selenium Features
Module 05 – Deep Dive into Selenium IDE
Module 06 – Selenium WebDriver Automation
Module 07 – Fire Path Installation
Module 08 – Searching Elements
Module 09 – Advanced User Interactions and Cross Browser Testing
Module 10 – Introduction to TestNG Plugin
Module 11 – TestNG Terminology
Module 12 – TestNG Data Providers
Module 13 – Maven Integration
Module 14 – WebDriver Sample Programs
Module 15 – JUnit Operations and the Test Framework
Module 16 – Object Repository
Module 17 – Test Data Management
Module 18 – Selenium Grid Concept
Module 19 – Mobile App Testing Using Appium (Self-paced)
Module 20 – Implementing the BDD Framework Using Cucumber (Self-paced)

DevOps

Module 01 – Infrastructure Setup
Module 02 – Introduction to DevOps
Module 03 – Continuous Testing
Module 04 – Continuous Integration using Jenkins
Module 05 – Software Version Control
Module 06 – Continuous Deployment: Containerization with Docker
Module 07 – Containerization with Docker: Ecosystem and Networking
Module 08 – Configuration Management using Puppet
Module 09 – Configuration Management using Ansible
Module 10 – Continuous Orchestration using Kubernetes
Module 11 – Continuous Monitoring using Nagios
Module 12 – Terraform Modules & Workspaces
Module 13 – MAVEN
Module 14 – SONARQUBE
Module 15 – XLDEPLOY
Module 16 – TEAMCITY
Module 17 – JFROG
Module 18 – MS BUILD
Module 19 – NEXUS
Module 20 – NPM
Module 21 – ELK

MS-SQL

Module 01 – Introduction to SQL
Module 02 – Database Normalization and Entity Relationship Model
Module 03 – SQL Operators
Module 04 – Working with SQL: Join, Tables, and Variables
Module 05 – Deep Dive into SQL Functions
Module 06 – Working with Subqueries
Module 07 – SQL Views, Functions, and Stored Procedures
Module 08 – Deep Dive into User-defined Functions
Module 09 – SQL Optimization and Performance
Module 10 – Advanced Topics
Module 11 – Managing Database Concurrency
Module 12 – Programming Databases Using Transact-SQL
Module 13 – Microsoft Courses: Study Material

Software/Manual Testing

Module 01 – Introduction to Software Testing
Module 02 – Test Planning
Module 03 – Design of Testing
Module 04 – Techniques of testing
Module 05 – Levels & Types of Testing
Module 06 – Executing test
Module 07 – Managing Defect
Module 08 – Team Collaboration & Reporting
Module 09 – Measurement & Metrics
Module 10 – Testing Tools & FAQs

ETL Testing

Module 01 – ETL Testing Overview
Module 02 – Database Testing and Data Warehousing Testing
Module 03 – ETL Testing Scenarios
Module 04 – Correctness, Completeness, Quality, Data Validation
Module 05 – Data Checks with SQL
Module 06 – Reports & Cube testing

Hadoop Testing

Module 01 – Introduction to Hadoop and Its Ecosystem, MapReduce and HDFS
Module 02 – MapReduce
Module 03 – Introduction to Pig and Its Features
Module 04 – Introduction to Hive
Module 05 – Hadoop Stack Integration Testing
Module 06 – Roles and Responsibilities of Hadoop Testing
Module 07 – Framework Called MRUnit for Testing of MapReduce Programs
Module 08 – Unit Testing
Module 09 – Test Execution of Hadoop: Customized
Module 10 – Test Plan Strategy Test Cases of Hadoop Testing

Unit Testing

Module 1 – Why Unit Test?
Module 2 – What is Unit Testing?
Module 3 – Proving Correctness
Module 4 – Strategies for Implementing Unit Tests
Module 5 – Look before You Leap: The Cost of Unit Testing
Module 6 – How Does Unit Testing Work?
Module 7 – Common Unit Test Tools
Module 8 – Testing Basics
Module 9 – Unit Testing with Visual Studio
Module 10 – Unit Testing with NUnit