Sqoop and Flume for Big Data Training Course

Duration

7 hours (usually 1 day including breaks)

Requirements

  • Experience with SQL

Audience

  • Software Engineers

Overview

Apache Sqoop is a command line interface for moving data from relational databases and Hadoop. Apache Flume is a distributed software for managing big data. Using Sqoop and Flume, users can transfer data between systems and import big data into storage architectures such as Hadoop.

This instructor-led, live training (online or onsite) is aimed at software engineers who wish to use Sqoop and Flume for transferring data between systems.

By the end of this training, participants will be able to:

  • Ingest big data with Sqoop and Flume.
  • Ingest data from multiple data sources.
  • Move data from relational databases to HDFS and Hive.
  • Export data from HDFS to a relational database.

Format of the Course

  • Interactive lecture and discussion.
  • Lots of exercises and practice.
  • Hands-on implementation in a live-lab environment.

Course Customization Options

  • To request a customized training for this course, please contact us to arrange.

Course Outline

Introduction

Sqoop and Flume Overview

  • What is Sqoop?
  • What is Flume?
  • Sqoop and Flume features

Preparing the Development Environment

  • Installing and configuring Apache Sqoop
  • Installing and configuring Apache Flume

Apache Flume

  • Creating an agent
  • Using spool sources, file channels, and logger sinks
  • Working with events
  • Accessing data sources

Apache Sqoop

  • Importing MySQL to HDFS and Hive
  • Using Sqoop jobs

Data Ingestion Pipelines

  • Building pipelines
  • Fetching data
  • Ingesting data to HDFS

Summary and Conclusion