Hadoop with Python Training Course

Duration

28 hours (usually 4 days including breaks)

Requirements

  • Experience with Python programming
  • Basic familiarity with Hadoop

Overview

Hadoop is a popular Big Data processing framework. Python is a high-level programming language famous for its clear syntax and code readibility.

In this instructor-led, live training, participants will learn how to work with Hadoop, MapReduce, Pig, and Spark using Python as they step through multiple examples and use cases.

By the end of this training, participants will be able to:

  • Understand the basic concepts behind Hadoop, MapReduce, Pig, and Spark
  • Use Python with Hadoop Distributed File System (HDFS), MapReduce, Pig, and Spark
  • Use Snakebite to programmatically access HDFS within Python
  • Use mrjob to write MapReduce jobs in Python
  • Write Spark programs with Python
  • Extend the functionality of pig using Python UDFs
  • Manage MapReduce jobs and Pig scripts using Luigi

Audience

  • Developers
  • IT Professionals

Format of the course

  • Part lecture, part discussion, exercises and heavy hands-on practice

Course Outline

Introduction

Understanding Hadoop’s Architecture and Key Concepts

Understanding the Hadoop Distributed File System (HDFS)

  • Overview of HDFS and its Architectural Design
  • Interacting with HDFS
  • Performing Basic File Operations on HDFS
  • Overview of HDFS Command Reference
  • Overview of Snakebite
  • Installing Snakebite
  • Using the Snakebite Client Library
  • Using the CLI Client

Learning the MapReduce Programming Model with Python

  • Overview of the MapReduce Programming Model
  • Understanding Data Flow in the MapReduce Framework
    • Map
    • Shuffle and Sort
    • Reduce
  • Using the Hadoop Streaming Utility
    • Understanding How the Hadoop Streaming Utility Works
    • Demo: Implementing the WordCount Application on Python
  • Using the mrjob Library
    • Overview of mrjob
    • Installing mrjob
    • Demo: Implementing the WordCount Algorithm Using mrjob
    • Understanding How a MapReduce Job Written with the mrjob Library Works
    • Executing a MapReduce Application with mrjob
    • Hands-on: Computing Top Salaries Using mrjob

Learning Pig with Python

  • Overview of Pig
  • Demo: Implementing the WordCount Algorithm in Pig
  • Configuring and Running Pig Scripts and Pig Statements
    • Using the Pig Execution Modes
    • Using the Pig Interactive Mode
    • Using the Pic Batch Mode
  • Understanding the Basic Concepts of the Pig Latin Language
    • Using Statements
    • Loading Data
    • Transforming Data
    • Storing Data
  • Extending Pig’s Functionality with Python UDFs
    • Registering a Python UDF File
    • Demo: A Simple Python UDF
    • Demo: String Manipulation Using Python UDF
    • Hands-on: Calculating the 10 Most Recent Movies Using Python UDF

Using Spark and PySpark

  • Overview of Spark
  • Demo: Implementing the WordCount Algorithm in PySpark
  • Overview of PySpark
    • Using an Interactive Shell
    • Implementing Self-Contained Applications
  • Working with Resilient Distributed Datasets (RDDs)
    • Creating RDDs from a Python Collection
    • Creating RDDs from Files
    • Implementing RDD Transformations
    • Implementing RDD Actions
  • Hands-on: Implementing a Text Search Program for Movie Titles with PySpark

Managing Workflow with Python

  • Overview of Apache Oozie and Luigi
  • Installing Luigi
  • Understanding Luigi Workflow Concepts
    • Tasks
    • Targets
    • Parameters
  • Demo: Examining a Workflow that Implements the WordCount Algorithm
  • Working with Hadoop Workflows that Control MapReduce and Pig Jobs
    • Using Luigi’s Configuration Files
    • Working with MapReduce in Luigi
    • Working with Pig in Luigi

Summary and Conclusion

Data Mining with Python Training Course

Duration

14 hours (usually 2 days including breaks)

Requirements

  • An understanding of Python programming.
  • An understanding of Python libraries in general.

Audience

  • Data analysts
  • Data scientists

Overview

This instructor-led, live training (online or onsite) is aimed at data analysts and data scientists who wish to implement more advanced data analytics techniques for data mining using Python.

By the end of this training, participants will be able to:

  • Understand important areas of data mining, including association rule mining, text sentiment analysis, automatic text summarization, and data anomaly detection.
  • Compare and implement various strategies for solving real-world data mining problems.
  • Understand and interpret the results. 

Format of the Course

  • Interactive lecture and discussion.
  • Lots of exercises and practice.
  • Hands-on implementation in a live-lab environment.

Course Customization Options

  • To request a customized training for this course, please contact us to arrange.

Course Outline

Introduction

Overview of Data Mining Concepts

Data Mining Techniques

Finding Association Rules

Matching Entities

Analyzing Networks

Analyzing the Sentiment of Text

Recognizing Named Entities

Implementing Text Summarization

Generating Topic Models

Detecting Data Anomalies

Best Practices

Summary and Conclusion

Python and Spark for Big Data (PySpark) Training Course

Duration

21 hours (usually 3 days including breaks)

Requirements

  • General programming skills

Audience

  • Developers
  • IT Professionals
  • Data Scientists

Overview

Python is a high-level programming language famous for its clear syntax and code readibility. Spark is a data processing engine used in querying, analyzing, and transforming big data. PySpark allows users to interface Spark with Python.

In this instructor-led, live training, participants will learn how to use Python and Spark together to analyze big data as they work on hands-on exercises.

By the end of this training, participants will be able to:

  • Learn how to use Spark with Python to analyze Big Data.
  • Work on exercises that mimic real world cases.
  • Use different tools and techniques for big data analysis using PySpark.

Format of the course

  • Part lecture, part discussion, exercises and heavy hands-on practice

Course Outline

Introduction

Understanding Big Data

Overview of Spark

Overview of Python

Overview of PySpark

  • Distributing Data Using Resilient Distributed Datasets Framework
  • Distributing Computation Using Spark API Operators

Setting Up Python with Spark

Setting Up PySpark

Using Amazon Web Services (AWS) EC2 Instances for Spark

Setting Up Databricks

Setting Up the AWS EMR Cluster

Learning the Basics of Python Programming

  • Getting Started with Python
  • Using the Jupyter Notebook
  • Using Variables and Simple Data Types
  • Working with Lists
  • Using if Statements
  • Using User Inputs
  • Working with while Loops
  • Implementing Functions
  • Working with Classes
  • Working with Files and Exceptions
  • Working with Projects, Data, and APIs

Learning the Basics of Spark DataFrame

  • Getting Started with Spark DataFrames
  • Implementing Basic Operations with Spark
  • Using Groupby and Aggregate Operations
  • Working with Timestamps and Dates

Working on a Spark DataFrame Project Exercise

Understanding Machine Learning with MLlib

Working with MLlib, Spark, and Python for Machine Learning

Understanding Regressions

  • Learning Linear Regression Theory
  • Implementing a Regression Evaluation Code
  • Working on a Sample Linear Regression Exercise
  • Learning Logistic Regression Theory
  • Implementing a Logistic Regression Code
  • Working on a Sample Logistic Regression Exercise

Understanding Random Forests and Decision Trees

  • Learning Tree Methods Theory
  • Implementing Decision Trees and Random Forest Codes
  • Working on a Sample Random Forest Classification Exercise

Working with K-means Clustering

  • Understanding K-means Clustering Theory
  • Implementing a K-means Clustering Code
  • Working on a Sample Clustering Exercise

Working with Recommender Systems

Implementing Natural Language Processing

  • Understanding Natural Language Processing (NLP)
  • Overview of NLP Tools
  • Working on a Sample NLP Exercise

Streaming with Spark on Python

  • Overview Streaming with Spark
  • Sample Spark Streaming Exercise

Closing Remarks

Python, Spark, and Hadoop for Big Data Training Course

Duration

21 hours (usually 3 days including breaks)

Requirements

  • Experience with Spark and Hadoop
  • Python programming experience

Audience

  • Data scientists
  • Developers

Overview

Python is a scalable, flexible, and widely used programming language for data science and machine learning. Spark is a data processing engine used in querying, analyzing, and transforming big data, while Hadoop is a software library framework for large-scale data storage and processing.

This instructor-led, live training (online or onsite) is aimed at developers who wish to use and integrate Spark, Hadoop, and Python to process, analyze, and transform large and complex data sets.

By the end of this training, participants will be able to:

  • Set up the necessary environment to start processing big data with Spark, Hadoop, and Python.
  • Understand the features, core components, and architecture of Spark and Hadoop.
  • Learn how to integrate Spark, Hadoop, and Python for big data processing.
  • Explore the tools in the Spark ecosystem (Spark MlLib, Spark Streaming, Kafka, Sqoop, Kafka, and Flume).
  • Build collaborative filtering recommendation systems similar to Netflix, YouTube, Amazon, Spotify, and Google.
  • Use Apache Mahout to scale machine learning algorithms.

Format of the Course

  • Interactive lecture and discussion.
  • Lots of exercises and practice.
  • Hands-on implementation in a live-lab environment.

Course Customization Options

  • To request a customized training for this course, please contact us to arrange.

Course Outline

Introduction

  • Overview of Spark and Hadoop features and architecture
  • Understanding big data
  • Python programming basics

Getting Started

  • Setting up Python, Spark, and Hadoop
  • Understanding data structures in Python
  • Understanding PySpark API
  • Understanding HDFS and MapReduce

Integrating Spark and Hadoop with Python

  • Implementing Spark RDD in Python
  • Processing data using MapReduce
  • Creating distributed datasets in HDFS

Machine Learning with Spark MLlib

Processing Big Data with Spark Streaming

Working with Recommender Systems

Working with Kafka, Sqoop, Kafka, and Flume

Apache Mahout with Spark and Hadoop

Troubleshooting

Summary and Next Steps

Python and Deep Learning with OpenCV 4 Training Course

Duration

14 hours (usually 2 days including breaks)

Requirements

  • Basic programming experience

Audience

  • Software Engineers

Overview

OpenCV is a library of programming functions for deciphering images with computer algorithms. OpenCV 4 is the latest OpenCV release and it provides optimized modularity, updated algorithms, and more. With OpenCV 4 and Python, users will be able to view, load, and classify images and videos for advanced image recognition.

This instructor-led, live training (online or onsite) is aimed at software engineers who wish to program in Python with OpenCV 4 for deep learning.

By the end of this training, participants will be able to:

  • View, load, and classify images and videos using OpenCV 4.
  • Implement deep learning in OpenCV 4 with TensorFlow and Keras.
  • Run deep learning models and generate impactful reports from images and videos.

Format of the Course

  • Interactive lecture and discussion.
  • Lots of exercises and practice.
  • Hands-on implementation in a live-lab environment.

Course Customization Options

  • To request a customized training for this course, please contact us to arrange.

Course Outline

Introduction

What is AI

  • Computational Psychology
  • Computational Philosophy

Deep Learning

  • Artificial neural networks
  • Deep learning vs. machine learning

Preparing the Development Environment

  • Installing and configuring OpenCV

OpenCV 4 Quickstart

  • Viewing images
  • Using color channels
  • Viewing videos

Deep Learning Computer Vision

  • Using the DNN module
  • Working with with deep learning models
  • Using SSDs

Neural Networks

  • Using different training methods
  • Measuring performance

Convolutional Neural Networks

  • Training and designing CNNs
  • Building a CNN in Keras
  • Importing data
  • Saving, loading, and displaying a model

Classifiers

  • Building and training a classifier
  • Splitting data
  • Boosting accuracy of results and values

Summary and Conclusion

Computer Vision with Python Training Course

Duration

14 hours (usually 2 days including breaks)

Requirements

  • Programming experience with Python

Overview

Computer Vision is a field that involves automatically extracting, analyzing, and understanding useful information from digital media. Python is a high-level programming language famous for its clear syntax and code readibility.

In this instructor-led, live training, participants will learn the basics of Computer Vision as they step through the creation of set of simple Computer Vision application using Python.

By the end of this training, participants will be able to:

  • Understand the basics of Computer Vision
  • Use Python to implement Computer Vision tasks
  • Build their own face, object, and motion detection systems

Audience

  • Python programmers interested in Computer Vision

Format of the course

  • Part lecture, part discussion, exercises and heavy hands-on practice

Course Outline

Introduction

Understanding Computer Vision Basics

Installing OpenCV with Python Wrappers

Introduction to Using OpenCV

Using Media with Python

  • Loading Images
  • Converting Color to Grayscale
  • Using Metadata

Applying Image Theory with Python

  • Understanding Images as Multidimensional Arrays
  • Understanding the Color Space
  • Overview of Pixels and Coordinates
  • Accessing Pixels
  • Changing Pixels in Images
  • Drawing Lines and Shapes
  • Applying Text on Images
  • Resizing Images
  • Cropping Images

Exploring Common Computer Vision Algorithms and Methods

  • Thresholding
  • Finding Contours
  • Background Subtraction
  • Using Detectors

Implementing Feature Extraction with Python

  • Using Feature Vectors
  • Understanding the Color-mean Features Theory
  • Extracting Histogram Features
  • Extracting Grayscale Histogram Features
  • Extracting Texture Features

Implementing an App to Detect Image Similarity

Implementing a Reverse Image Search Engine

Creating an Object Detection App Using Template Matching

Creating a Face Detection App Using Haar Cascade

Implementing an Object Detection App Using Keypoints

Capturing and Processing Video through a WebCam

Creating a Motion Detection System

Troubleshooting

Summary and Conclusion

NLP with Python and TextBlob Training Course

Duration

14 hours (usually 2 days including breaks)

Requirements

  • An understanding of NLP concepts
  • Python programming experience

Audience

  • Data scientists
  • Developers

Overview

TextBlob is a Python NLP library for processing textual data. It provides a simple API that makes it easy to perform NLP tasks, such as part-of-speech tagging, noun phrase extraction, sentiment analysis, classification, translation, etc.

This instructor-led, live training (online or onsite) is aimed at data scientists and developers who wish to use TextBlob to implement and simplify NLP tasks, such as sentiment analysis, spelling corrections, text classification modeling, etc.

By the end of this training, participants will be able to:

  • Set up the necessary development environment to start implementing NLP tasks with TextBlob.
  • Understand the features, architecture, and advantages of TextBlob.
  • Learn how to build text classification systems using TextBlob.
  • Perform common NLP tasks (Tokenization, WordNet, Sentiment analysis, Spelling correction, etc.)
  • Execute advanced implementations with simple APIs and a few lines of codes.

Format of the Course

  • Interactive lecture and discussion.
  • Lots of exercises and practice.
  • Hands-on implementation in a live-lab environment.

Course Customization Options

  • To request a customized training for this course, please contact us to arrange.

Course Outline

Introduction

  • Overview of TextBlob features and architecture
  • NLP fundamentals

Getting Started

  • Installing TextBlob
  • Importing libraries and data

Building Text Classification Models

  • Loading data and creating classifiers
  • Evaluating classifiers
  • Updating classifiers with new data
  • Using feature extractors

Performing NLP Tasks using TextBlob

  • Tokenization  
  • WordNet integration  
  • Noun phrase extraction  
  • Part-of-speech tagging  
  • Sentiment analysis  
  • Spelling correction
  • Translation and language detection

APIs and Advanced Implementations

  • Sentiment analyzers  
  • Tokenizers
  • Noun phrase chunkers  
  • POS taggers  
  • Parsers  
  • Blobber

Troubleshooting

Summary and Next Steps

Natural Language Processing (NLP) with Python spaCy Training Course

Duration

14 hours (usually 2 days including breaks)

Requirements

  • Python programming experience.
  • A basic understanding of statistics
  • Experience with the command line

Audience

  • Developers
  • Data scientists

Overview

This instructor-led, live training (online or onsite) is aimed at developers and data scientists who wish to use spaCy to process very large volumes of text to find patterns and gain insights.

By the end of this training, participants will be able to:

  • Install and configure spaCy.
  • Understand spaCy’s approach to Natural Language Processing (NLP).
  • Extract patterns and obtain business insights from large-scale data sources.
  • Integrate the spaCy library with existing web and legacy applications.
  • Deploy spaCy to live production environments to predict human behavior.
  • Use spaCy to pre-process text for Deep Learning

Format of the Course

  • Interactive lecture and discussion.
  • Lots of exercises and practice.
  • Hands-on implementation in a live-lab environment.

Course Customization Options

  • To request a customized training for this course, please contact us to arrange.
  • To learn more about spaCy, please visit: https://spacy.io/

Course Outline

Introduction

  • Defining “Industrial-Strength Natural Language Processing”

Installing spaCy

spaCy Components

  • Part-of-speech tagger
  • Named entity recognizer
  • Dependency parser

Overview of spaCy Features and Syntax

Understanding spaCy Modeling

  • Statistical modeling and prediction

Using the SpaCy Command Line Interface (CLI)

  • Basic commands

Creating a Simple Application to Predict Behavior 

Training a New Statistical Model

  • Data (for training)
  • Labels (tags, named entities, etc.)

Loading the Model

  • Shuffling and looping 

Saving the Model

Providing Feedback to the Model

  • Error gradient

Updating the Model

  • Updating the entity recognizer
  • Extracting tokens with rule-based matcher

Developing a Generalized Theory for Expected Outcomes

Case Study

  • Distinguishing Product Names from Company Names

Refining the Training Data

  • Selecting representative data
  • Setting the dropout rate

Other Training Styles

  • Passing raw texts
  • Passing dictionaries of annotations

Using spaCy to Pre-process Text for Deep Learning

Integrating spaCy with Legacy Applications

Testing and Debugging the spaCy Model

  • The importance of iteration

Deploying the Model to Production

Monitoring and Adjusting the Model

Troubleshooting

Summary and Conclusion

Building Chatbots in Python Training Course

Duration

21 hours (usually 3 days including breaks)

Requirements

  • Python programming experience

Overview

ChatBots are computer programs that automatically simulate human responses via chat interfaces. ChatBots help organizations maximize their operations efficiency by providing easier and faster options for their user interactions.

In this instructor-led, live training, participants will learn how to build chatbots in Python.

By the end of this training, participants will be able to:

  • Understand the fundamentals of building chatbots
  • Build, test, deploy, and troubleshoot various chatbots using Python

Audience

  • Developers

Format of the course

  • Part lecture, part discussion, exercises and heavy hands-on practice

Note

  • To request a customized training for this course, please contact us to arrange.

Course Outline

Introduction to ChatBots

Overview of Conversational Software

Building Your First Basic ChatBot

  • Setting Up Your ChatBot to Receive Text and Respond to Users
  • Adding the Basic Elements of Personality
  • Teaching Your ChatBot to Answer Basic Questions
  • Adding Variety to Your ChatBot’s Responses
  • Making Your ChatBot Ask Questions
  • Building Rule-Based Systems for Parsing Text

Using Machine Learning to Turn Natural Language into Structured Data for Your ChatBot

  • Overview of SpaCy, Scikit-learn, and Rasa NLU
  • Installing and Configuring SpaCy, Scikit-learn, and Rasa NLU
  • Intents and Entities and their Classifications
  • Natural Language Processing Fundamentals Theory Refresher
  • Building Models from Real-World Sentences Using the ATIS Dataset

Building Your Virtual Assistant ChatBot

  • Overview of a Virtual Assistant
  • Working with SQL in Python
  • Teaching Your ChatBot to Access Data from a Database
  • Writing Queries from Parameters
  • Building a Database from Natural Language
  • Implementing Custom Virtual Assistant Features on Your ChatBot
    • Answering Specific Queries through Database Access
    • Refining Search, Performing Basic Negation, and Filtering Data

Making Your ChatBot Stateful: Keeping Track of States of Interaction for Better ChatBot Dialogs

  • Performing Basic Actions
  • Asking Contextual Questions and Queuing Answers
  • Dealing with Rejection

Testing and Deploying Your ChatBot

Troubleshooting

Summary and Conclusion

Text Summarization with Python Training Course

Duration

14 hours (usually 2 days including breaks)

Requirements

  • An understanding of Python programming (Python 2.7/3.3+)
  • An understanding of Python libraries in general

Overview

In Python Machine Learning, the Text Summarization feature is able to read the input text and produce a text summary. This capability is available from the command-line or as a Python API/Library. One exciting application is the rapid creation of executive summaries; this is particularly useful for organizations that need to review large bodies of text data before generating reports and presentations.

In this instructor-led, live training, participants will learn to use Python to create a simple application that auto-generates a summary of input text.

By the end of this training, participants will be able to:

  • Use a command-line tool that summarizes text.
  • Design and create Text Summarization code using Python libraries.
  • Evaluate three Python summarization libraries: sumy 0.7.0, pysummarization 1.0.4, readless 1.0.17

Audience

  • Developers
  • Data Scientists

Format of the course

  • Part lecture, part discussion, exercises and heavy hands-on practice

Course Outline

Introduction to Text Summarization with Python

  • Comparing sample text with auto-generated summaries
  • Installing sumy (a Python Command-Line Executable for Text Summarization)
  • Using sumy as a Command-Line Text Summarization Utility (Hands-On Exercise)

Evaluating three Python summarization libraries: sumy 0.7.0, pysummarization 1.0.4, readless 1.0.17 based on documented features

Choosing a library: sumy, pysummarization or readless

Creating a Python application using sumy library on Python 2.7/3.3+

  • Installing the sumy library for Text Summarization
  • Using the Edmundson (Extraction) method in sumy Python Library for Text

Summarization

  • Creating simple Python test code that uses sumy library to generate a text summary

Creating a Python application using pysummarization library on Python 2.7/3.3+

  • Installing pysummarization library for Text Summarization
  • Using the pysummarization library for Text Summarization
  • Creating simple Python test code that uses pysummarization library to generate a text summary

Creating a Python application using readless library on Python 2.7/3.3+

  • Installing readless library for Text Summarization
  • Using the readless library for Text Summarization

Creating simple Python test code that uses readless library to generate a text summary

Troubleshooting and debugging

Closing Remarks