Duration
21 hours (usually 3 days including breaks)
Requirements
- An understanding of Hadoop and big data.
- An understanding of Spark.
- Familiarity with the command line.
- System administration experience.
Audience
- Hadoop administrators
Overview
Hortonworks Data Platform (HDP) is an open-source Apache Hadoop support platform that provides a stable foundation for developing big data solutions on the Apache Hadoop ecosystem.
This instructor-led, live training (online or onsite) introduces Hortonworks Data Platform (HDP) and walks participants through the deployment of Spark + Hadoop solution.
By the end of this training, participants will be able to:
- Use Hortonworks to reliably run Hadoop at a large scale.
- Unify Hadoop’s security, governance, and operations capabilities with Spark’s agile analytic workflows.
- Use Hortonworks to investigate, validate, certify and support each of the components in a Spark project.
- Process different types of data, including structured, unstructured, in-motion, and at-rest.
Format of the Course
- Interactive lecture and discussion.
- Lots of exercises and practice.
- Hands-on implementation in a live-lab environment.
Course Customization Options
- To request a customized training for this course, please contact us to arrange.
Course Outline
Introduction to Hortonworks Data Platform (HDP)
Overview of Big Data and Apache Hadoop
Installing and Configuring HDP
Setting up, Deploying, and Managing Hadoop Cluster
Understanding and ConfiguringYARN and MapReduce
Overview of Job Scheduling
Ensuring Data Integrity
Understanding Enterprise Data Movement
Using HDFS Commands & Services
Transferring Data Using Flume
Working with Hive
Scheduling Workflow Using Oozie
Exploring Hadoop 2.x
Understanding Hbase Architecture
Monitoring HDP2 Services Using Ambari
New Features in HDP
Troubleshooting
Summary and Conclusion