Moving Data from MySQL to Hadoop with Sqoop Training Course

Duration

14 hours (usually 2 days including breaks)

Requirements

  • An understanding of big data concepts (HDFS, Hive, etc.)
  • An understanding of relational databases (MySQL, etc.)
  • Experience with the Linux command line

Overview

Sqoop is an open source software tool for transfering data between Hadoop and relational databases or mainframes. It can be used to import data from a relational database management system (RDBMS) such as MySQL or Oracle or a mainframe into the Hadoop Distributed File System (HDFS). Thereafter, the data can be transformed in Hadoop MapReduce, and then re-exported back into an RDBMS.

In this instructor-led, live training, participants will learn how to use Sqoop to import data from a traditional relational database to Hadoop storage such HDFS or Hive and vice versa.

By the end of this training, participants will be able to:

  • Install and configure Sqoop
  • Import data from MySQL to HDFS and Hive
  • Import data from HDFS and Hive to MySQL

Audience

  • System administrators
  • Data engineers

Format of the Course

  • Part lecture, part discussion, exercises and heavy hands-on practice

Note

  • To request a customized training for this course, please contact us to arrange.

Course Outline

Introduction

  • Moving data from legacy data stores to Hadoop

Installing and Configuring Sqoop

Overview of Sqoop Features and Architecture

Importing Data from MySQL to HDFS

Importing Data from MySQL to Hive

Transforming Data in Hadoop

Importing Data from HDFS to MySQL

Importing Data from Hive to MySQL

Importing Incrementally with Sqoop Jobs

Troubleshooting

Summary and Conclusion

Sqoop and Flume for Big Data Training Course

Duration

7 hours (usually 1 day including breaks)

Requirements

  • Experience with SQL

Audience

  • Software Engineers

Overview

Apache Sqoop is a command line interface for moving data from relational databases and Hadoop. Apache Flume is a distributed software for managing big data. Using Sqoop and Flume, users can transfer data between systems and import big data into storage architectures such as Hadoop.

This instructor-led, live training (online or onsite) is aimed at software engineers who wish to use Sqoop and Flume for transferring data between systems.

By the end of this training, participants will be able to:

  • Ingest big data with Sqoop and Flume.
  • Ingest data from multiple data sources.
  • Move data from relational databases to HDFS and Hive.
  • Export data from HDFS to a relational database.

Format of the Course

  • Interactive lecture and discussion.
  • Lots of exercises and practice.
  • Hands-on implementation in a live-lab environment.

Course Customization Options

  • To request a customized training for this course, please contact us to arrange.

Course Outline

Introduction

Sqoop and Flume Overview

  • What is Sqoop?
  • What is Flume?
  • Sqoop and Flume features

Preparing the Development Environment

  • Installing and configuring Apache Sqoop
  • Installing and configuring Apache Flume

Apache Flume

  • Creating an agent
  • Using spool sources, file channels, and logger sinks
  • Working with events
  • Accessing data sources

Apache Sqoop

  • Importing MySQL to HDFS and Hive
  • Using Sqoop jobs

Data Ingestion Pipelines

  • Building pipelines
  • Fetching data
  • Ingesting data to HDFS

Summary and Conclusion