NLP with Python and TextBlob Training Course

Duration

14 hours (usually 2 days including breaks)

Requirements

  • An understanding of NLP concepts
  • Python programming experience

Audience

  • Data scientists
  • Developers

Overview

TextBlob is a Python NLP library for processing textual data. It provides a simple API that makes it easy to perform NLP tasks, such as part-of-speech tagging, noun phrase extraction, sentiment analysis, classification, translation, etc.

This instructor-led, live training (online or onsite) is aimed at data scientists and developers who wish to use TextBlob to implement and simplify NLP tasks, such as sentiment analysis, spelling corrections, text classification modeling, etc.

By the end of this training, participants will be able to:

  • Set up the necessary development environment to start implementing NLP tasks with TextBlob.
  • Understand the features, architecture, and advantages of TextBlob.
  • Learn how to build text classification systems using TextBlob.
  • Perform common NLP tasks (Tokenization, WordNet, Sentiment analysis, Spelling correction, etc.)
  • Execute advanced implementations with simple APIs and a few lines of codes.

Format of the Course

  • Interactive lecture and discussion.
  • Lots of exercises and practice.
  • Hands-on implementation in a live-lab environment.

Course Customization Options

  • To request a customized training for this course, please contact us to arrange.

Course Outline

Introduction

  • Overview of TextBlob features and architecture
  • NLP fundamentals

Getting Started

  • Installing TextBlob
  • Importing libraries and data

Building Text Classification Models

  • Loading data and creating classifiers
  • Evaluating classifiers
  • Updating classifiers with new data
  • Using feature extractors

Performing NLP Tasks using TextBlob

  • Tokenization  
  • WordNet integration  
  • Noun phrase extraction  
  • Part-of-speech tagging  
  • Sentiment analysis  
  • Spelling correction
  • Translation and language detection

APIs and Advanced Implementations

  • Sentiment analyzers  
  • Tokenizers
  • Noun phrase chunkers  
  • POS taggers  
  • Parsers  
  • Blobber

Troubleshooting

Summary and Next Steps

Scaling Data Pipelines with Spark NLP Training Course

Duration

14 hours (usually 2 days including breaks)

Requirements

  • Familiarity with Apache Spark
  • Python programming experience

Audience

  • Data scientists
  • Developers

Overview

Spark NLP is an open source library, built on Apache Spark, for natural language processing with Python, Java, and Scala. It is widely used for enterprise and industry verticals, such as healthcare, finance, life science, and recruiting.

This instructor-led, live training (online or onsite) is aimed at data scientists and developers who wish to use Spark NLP, built on top of Apache Spark, to develop, implement, and scale natural language text processing models and pipelines.

By the end of this training, participants will be able to:

  • Set up the necessary development environment to start building NLP pipelines with Spark NLP.
  • Understand the features, architecture, and benefits of using Spark NLP.
  • Use the pre-trained models available in Spark NLP to implement text processing.
  • Learn how to build, train, and scale Spark NLP models for production-grade projects.
  • Apply classification, inference, and sentiment analysis on real-world use cases (clinical data, customer behavior insights, etc.).

Format of the Course

  • Interactive lecture and discussion.
  • Lots of exercises and practice.
  • Hands-on implementation in a live-lab environment.

Course Customization Options

  • To request a customized training for this course, please contact us to arrange.

Course Outline

Introduction

  • Spark NLP vs NLTK vs spaCy
  • Overview of Spark NLP features and architecture

Getting Started

  • Setup requirements
  • Installing Spark NLP
  • General concepts

Using Pre-trained Pipelines

  • Importing required modules
  • Default annotators
  • Loading a pipeline model
  • Transforming texts

Building NLP Pipelines

  • Understanding the pipeline API
  • Implementing NER models
  • Choosing embeddings
  • Using word, sentence, and universal embeddings

Classification and Inference

  • Document classification use cases
  • Sentiment analysis models
  • Training a document classifier
  • Using other machine learning frameworks
  • Managing NLP models
  • Optimizing models for low-latency inference

Troubleshooting

Summary and Next Steps

Natural Language Processing (NLP) with Python spaCy Training Course

Duration

14 hours (usually 2 days including breaks)

Requirements

  • Python programming experience.
  • A basic understanding of statistics
  • Experience with the command line

Audience

  • Developers
  • Data scientists

Overview

This instructor-led, live training (online or onsite) is aimed at developers and data scientists who wish to use spaCy to process very large volumes of text to find patterns and gain insights.

By the end of this training, participants will be able to:

  • Install and configure spaCy.
  • Understand spaCy’s approach to Natural Language Processing (NLP).
  • Extract patterns and obtain business insights from large-scale data sources.
  • Integrate the spaCy library with existing web and legacy applications.
  • Deploy spaCy to live production environments to predict human behavior.
  • Use spaCy to pre-process text for Deep Learning

Format of the Course

  • Interactive lecture and discussion.
  • Lots of exercises and practice.
  • Hands-on implementation in a live-lab environment.

Course Customization Options

  • To request a customized training for this course, please contact us to arrange.
  • To learn more about spaCy, please visit: https://spacy.io/

Course Outline

Introduction

  • Defining “Industrial-Strength Natural Language Processing”

Installing spaCy

spaCy Components

  • Part-of-speech tagger
  • Named entity recognizer
  • Dependency parser

Overview of spaCy Features and Syntax

Understanding spaCy Modeling

  • Statistical modeling and prediction

Using the SpaCy Command Line Interface (CLI)

  • Basic commands

Creating a Simple Application to Predict Behavior 

Training a New Statistical Model

  • Data (for training)
  • Labels (tags, named entities, etc.)

Loading the Model

  • Shuffling and looping 

Saving the Model

Providing Feedback to the Model

  • Error gradient

Updating the Model

  • Updating the entity recognizer
  • Extracting tokens with rule-based matcher

Developing a Generalized Theory for Expected Outcomes

Case Study

  • Distinguishing Product Names from Company Names

Refining the Training Data

  • Selecting representative data
  • Setting the dropout rate

Other Training Styles

  • Passing raw texts
  • Passing dictionaries of annotations

Using spaCy to Pre-process Text for Deep Learning

Integrating spaCy with Legacy Applications

Testing and Debugging the spaCy Model

  • The importance of iteration

Deploying the Model to Production

Monitoring and Adjusting the Model

Troubleshooting

Summary and Conclusion

Natural Language Processing (NLP) – AI/Robotics Training Course

Duration

21 hours (usually 3 days including breaks)

Requirements

Knowledge and awareness of NLP principals and an appreciation of AI application in business

Overview

This classroom based training session will explore NLP techniques in conjunction with the application of AI and Robotics in business. Delegates will undertake computer based examples and case study solving exercises using Python

Course Outline

Detailed training outline

  1. Introduction to NLP
    • Understanding NLP
    • NLP Frameworks
    • Commercial applications of NLP
    • Scraping data from the web
    • Working with various APIs to retrieve text data
    • Working and storing text corpora saving content and relevant metadata
    • Advantages of using Python and NLTK crash course
  2. Practical Understanding of a Corpus and Dataset
    • Why do we need a corpus?
    • Corpus Analysis
    • Types of data attributes
    • Different file formats for corpora
    • Preparing a dataset for NLP applications
  3. Understanding the Structure of a Sentences
    • Components of NLP
    • Natural language understanding
    • Morphological analysis – stem, word, token, speech tags
    • Syntactic analysis
    • Semantic analysis
    • Handling ambigiuty
  4. Text data preprocessing
    • Corpus- raw text
      • Sentence tokenization
      • Stemming for raw text
      • Lemmization of raw text
      • Stop word removal
    • Corpus-raw sentences
      • Word tokenization
      • Word lemmatization
    • Working with Term-Document/Document-Term matrices
    • Text tokenization into n-grams and sentences
    • Practical and customized preprocessing
  5. Analyzing Text data
    • Basic feature of NLP
      • Parsers and parsing
      • POS tagging and taggers
      • Name entity recognition
      • N-grams
      • Bag of words
    • Statistical features of NLP
      • Concepts of Linear algebra for NLP
      • Probabilistic theory for NLP
      • TF-IDF
      • Vectorization
      • Encoders and Decoders
      • Normalization
      • Probabilistic Models
    • Advanced feature engineering and NLP
      • Basics of word2vec
      • Components of word2vec model
      • Logic of the word2vec model
      • Extension of the word2vec concept
      • Application of word2vec model
    • Case study: Application of bag of words: automatic text summarization using simplified and true Luhn’s algorithms
  6. Document Clustering, Classification and Topic Modeling
    • Document clustering and pattern mining (hierarchical clustering, k-means, clustering, etc.)
    • Comparing and classifying documents using TFIDF, Jaccard and cosine distance measures
    • Document classifcication using Naïve Bayes and Maximum Entropy
  7. Identifying Important Text Elements
    • Reducing dimensionality: Principal Component Analysis, Singular Value Decomposition non-negative matrix factorization
    • Topic modeling and information retrieval using Latent Semantic Analysis
  8. Entity Extraction, Sentiment Analysis and Advanced Topic Modeling
    • Positive vs. negative: degree of sentiment
    • Item Response Theory
    • Part of speech tagging and its application: finding people, places and organizations mentioned in text
    • Advanced topic modeling: Latent Dirichlet Allocation
  9. Case studies
    • Mining unstructured user reviews
    • Sentiment classification and visualization of Product Review Data
    • Mining search logs for usage patterns
    • Text classification
    • Topic modelling

Natural Language Processing (NLP) with Deep Dive in Python and NLTK Training Course

Duration

35 hours (usually 5 days including breaks)

Requirements

There are no specific requirements needed to attend this course.

Overview

By the end of the training the delegates are expected to be sufficiently equipped with the essential python concepts and should be able to sufficiently use NLTK to implement most of the NLP and ML based operations. The training is aimed at giving not just an executional knowledge but also the logical and operational knowledge of the technology therein.

Course Outline

Introduction to Python

Introduction

1 – Installing Python

2 – Numbers

3 – Strings

4 – Slicing up Strings

5 – Lists

6 – Installing PyCharm

Conditional Statements

7 – if elif else

Iterations

8 – for

9 – Range and While

10 – Comments and Break

11 – Continue

Functions

12 – Functions

13 – Return Values

14 – Default Values for Arguments

15 – Variable Scope

16 – Keyword Arguments

17 – Flexible Number of Arguments

18 – Unpacking Arguments

19 – My trip to Walmart and Sets

20 – Dictionary

21 – Modules

Playing with Requests and Files

22 – Download an Image from the Web

23 – How to Read and Write Files

24 – Downloading Files from the Web

Exceptions

28 – Exceptions

Object Oriented Programs

29 – Classes and Objects

30 – init

31 – Class vs Instance Variables

32 – Inheritance

33 – Multiple Inheritance

34 – threading

Playing around with Python

35 – Unpack List or Tuples

36 – Zip (and yeast infection story)

37 – Lamdba

38 – Min, Max, and Sorting Dictionaries

39 – Pillow

40 – Cropping Images

41 – Combine Images Together

42 – Getting Individual Channels

43 – Awesome Merge Effect

44 – Basic Transformations

45 – Modes and Filters

46 – struct

47 – map

48 – Bitwise Operators

49 – Finding Largest or Smallest Items

50 – Dictionary Calculations

51 – Finding Most Frequent Items

52 – Dictionary Multiple Key Sort

53 – Sorting Custom Objects

Add Ons:

54 – Database Connectivity and Querying for MySQL

55 – Quick look into Regular Expressions

56 – Playing around with REST API

Writing a Web Crawler

Natural Language Processing and NLTK

Introduction to NLP (examples in Python of course)

  1. Simple Text Manipulation
    1. Searching Text
    2. Counting Words
    3. Splitting Texts into Words
    4. Lexical dispersion
  2. Processing complex structures
    1. Representing text in Lists
    2. Indexing Lists
    3. Collocations
    4. Bigrams
    5. Frequency Distributions
    6. Conditionals with Words
    7. Comparing Words (startswith, endswith, islower, isalpha, etc…)
  3. Natural Language Understanding
    1. Word Sense Disambiguation
    2. Pronoun Resolution
  4. Machine translations (statistical, rule based, literal, etc…)
  5. Exercises

NLP in Python in examples

  1. Accessing Text Corpora and Lexical Resources
    1. Common sources for corpora
    2. Conditional Frequency Distributions
    3. Counting Words by Genre
    4. Creating own corpus
    5. Pronouncing Dictionary
    6. Shoebox and Toolbox Lexicons
    7. Senses and Synonyms
    8. Hierarchies
    9. Lexical Relations: Meronyms, Holonyms
    10. Semantic Similarity
  2. Processing Raw Text
    1. Priting
    2. struncating
    3. extracting parts of string
    4. accessing individual charaters
    5. searching, replacing, spliting, joining, indexing, etc…
    6. using regular expressions
    7. detecting word patterns
    8. stemming
    9. tokenization
    10. normalization of text
    11. Word Segmentation (especially in Chinese)
  3. Categorizing and Tagging Words
    1. Tagged Corpora
    2. Tagged Tokens
    3. Part-of-Speech Tagset
    4. Python Dictionaries
    5. Words to Propertieis mapping
    6. Automatic Tagging
    7. Determining the Category of a Word (Morphological, Syntactic, Semantic)
  4. Text Classification (Machine Learning)
    1. Supervised Classification
    2. Sentence Segmentation
    3. Cross Validation
    4. Decision Trees
  5. Extracting Information from Text
    1. Chunking
    2. Chinking
    3. Tags vs Trees
  6. Analyzing Sentence Structure
    1. Context Free Grammar
    2. Parsers
  7. Building Feature Based Grammars
    1. Grammatical Features
    2. Processing Feature Structures
  8. Analyzing the Meaning of Sentences
    1. Semantics and Logic
    2. Propositional Logic
    3. First-Order Logic
    4. Discourse Semantics
  9. Managing Linguistic Data
    1. Data Formats (Lexicon vs Text)
    2. Metadata

Artificial Intelligence – the most applied stuff – Data Analysis + Distributed AI + NLP Training Course

Duration

21 hours (usually 3 days including breaks)

Overview

This course is aimed at developers and data scientists who wish to understand and implement AI within their applications. Special focus is given to Data Analysis, Distributed AI and NLP.

Course Outline

  1. Distribution big data
    1. Data mining methods (training single systems + distributed prediction: traditional machine learning algorithms + Mapreduce distributed prediction)
    2. Apache Spark MLlib
  2. Recommendations and Advertising:
    1. Natural language
    2. Text clustering, text categorization (labeling), synonyms
    3. User profile restore, labeling system
    4. Recommended algorithms
    5. Insuring the accuracy of “lift” between and within categories
    6. How to create closed loops for recommendation algorithms
  3. Logical regression, RankingSVM,
  4. Feature recognition (deep learning and automatic feature recognition for graphics)
  5. Natural language
    1. Chinese word segmentation
    2. Theme model (text clustering)
    3. Text classification
    4. Extract keywords
    5. Semantic analysis, semantic parser, word2vec (vector to word)
    6. RNN long-term memory (TSTM) architecture

Natural Language Processing (NLP) Training Course

Duration

21 hours (usually 3 days including breaks)

Requirements

No background in NLP is required.

Required: Familiarity with any programming language (Java, Python, PHP, VBA, etc…).

Expected: Reasonable maths skills (A-level standard), especially in probability, statistics and calculus.

Beneficial: Familiarity with regular expressions.

Overview

This course has been designed for people interested in extracting meaning from written English text, though the knowledge can be applied to other human languages as well.

The course will cover how to make use of text written by humans, such as  blog posts, tweets, etc…

For example, an analyst can set up an algorithm which will reach a conclusion automatically based on extensive data source.

Course Outline

Short Introduction to NLP methods

  • word and sentence tokenization
  • text classification
  • sentiment analysis
  • spelling correction
  • information extraction
  • parsing
  • meaning extraction
  • question answering

Overview of NLP theory

  • probability
  • statistics
  • machine learning
  • n-gram language modeling
  • naive bayes
  • maxent classifiers
  • sequence models (Hidden Markov Models)
  • probabilistic dependency
  • constituent parsing
  • vector-space models of meaning

NLP with Deeplearning4j Training Course

Duration

14 hours (usually 2 days including breaks)

Requirements

Knowledge of Deep Learning, and one of the following languages:

  • Java
  • Scala

and the following software:

  • Java (developer version) 1.7 or later (Only 64-Bit versions supported)
  • Apache Maven
  • IntelliJ IDEA or Eclipse
  • Git

Overview

Deeplearning4j is an open-source, distributed deep-learning library written for Java and Scala. Integrated with Hadoop and Spark, DL4J is designed to be used in business environments on distributed GPUs and CPUs.

Word2Vec is a method of computing vector representations of words introduced by a team of researchers at Google led by Tomas Mikolov.

Audience

This course is directed at researchers, engineers and developers seeking to utilize Deeplearning4J to construct Word2Vec models.

Course Outline

Getting Started

  • DL4J Examples in a Few Easy Steps
  • Using DL4J In Your Own Projects: Configuring the POM.xml File

Word2Vec

  • Introduction
  • Neural Word Embeddings
  • Amusing Word2vec Results
  • the Code
  • Anatomy of Word2Vec
  • Setup, Load and Train
  • A Code Example
  • Troubleshooting & Tuning Word2Vec
  • Word2vec Use Cases
  • Foreign Languages
  • GloVe (Global Vectors) & Doc2Vec

Deep Learning for NLP (Natural Language Processing) Training Course

Duration

28 hours (usually 4 days including breaks)

Requirements

  • An understanding of Python programming
  • An understanding of Python libraries in general

Audience

  • Programmers with interest in linguistics
  • Programmers who seek an understanding of NLP (Natural Language Processing) 

Overview

DL (Deep Learning) is a subset of ML (Machine Learning).

Python is a popular programming language that contains libraries for Deep Learning for NLP.

Deep Learning for NLP (Natural Language Processing) allows a machine to learn simple to complex language processing. Among the tasks currently possible are language translation and caption generation for photos.

In this instructor-led, live training, participants will learn to use Python libraries for NLP as they create an application that processes a set of pictures and generates captions. 

By the end of this training, participants will be able to:

  • Design and code DL for NLP using Python libraries.
  • Create Python code that reads a substantially huge collection of pictures and generates keywords.
  • Create Python Code that generates captions from the detected keywords.

Format of the course

  • Part lecture, part discussion, exercises and heavy hands-on practice

Course Outline

Introduction to Deep Learning for NLP

Differentiating between the various types of  DL models

Using pre-trained vs trained models

Using word embeddings and sentiment analysis to extract meaning from text 

How Unsupervised Deep Learning works

Installing and Setting Up Python Deep Learning libraries

Using the Keras DL library on top of TensorFlow to allow Python to create captions

Working with Theano (numerical computation library) and TensorFlow (general and linguistics library) to use as extended DL libraries for the purpose of creating captions. 

Using Keras on top of TensorFlow or Theano to quickly experiment on Deep Learning

Creating a simple Deep Learning application in TensorFlow to add captions to a collection of pictures

Troubleshooting

A word on other (specialized) DL frameworks

Deploying your DL application

Using GPUs to accelerate DL

Closing remarks

Natural Language Processing (NLP) with TensorFlow Training Course

Duration

35 hours (usually 5 days including breaks)

Requirements

Working knowledge of python

Overview

TensorFlow™ is an open source software library for numerical computation using data flow graphs.

SyntaxNet is a neural-network Natural Language Processing framework for TensorFlow.

Word2Vec is used for learning vector representations of words, called “word embeddings”. Word2vec is a particularly computationally-efficient predictive model for learning word embeddings from raw text. It comes in two flavors, the Continuous Bag-of-Words model (CBOW) and the Skip-Gram model (Chapter 3.1 and 3.2 in Mikolov et al.).

Used in tandem, SyntaxNet and Word2Vec allows users to generate Learned Embedding models from Natural Language input.

Audience

This course is targeted at Developers and engineers who intend to work with SyntaxNet and Word2Vec models in their TensorFlow graphs.

After completing this course, delegates will:

  • understand TensorFlow’s structure and deployment mechanisms
  • be able to carry out installation / production environment / architecture tasks and configuration
  • be able to assess code quality, perform debugging, monitoring
  • be able to implement advanced production like training models, embedding terms, building graphs and logging

Course Outline

Getting Started

  • Setup and Installation

TensorFlow Basics

  • Creation, Initializing, Saving, and Restoring TensorFlow variables
  • Feeding, Reading and Preloading TensorFlow Data
  • How to use TensorFlow infrastructure to train models at scale
  • Visualizing and Evaluating models with TensorBoard

TensorFlow Mechanics 101

  • Prepare the Data
    • Download
    • Inputs and Placeholders
  • Build the Graph
    • Inference
    • Loss
    • Training
  • Train the Model
    • The Graph
    • The Session
    • Train Loop
  • Evaluate the Model
    • Build the Eval Graph
    • Eval Output

Advanced Usage

  • Threading and Queues
  • Distributed TensorFlow
  • Writing Documentation and Sharing your Model
  • Customizing Data Readers
  • Using GPUs
  • Manipulating TensorFlow Model Files

TensorFlow Serving

  • Introduction
  • Basic Serving Tutorial
  • Advanced Serving Tutorial
  • Serving Inception Model Tutorial

Getting Started with SyntaxNet

  • Parsing from Standard Input
  • Annotating a Corpus
  • Configuring the Python Scripts

Building an NLP Pipeline with SyntaxNet

  • Obtaining Data
  • Part-of-Speech Tagging
  • Training the SyntaxNet POS Tagger
  • Preprocessing with the Tagger
  • Dependency Parsing: Transition-Based Parsing
  • Training a Parser Step 1: Local Pretraining
  • Training a Parser Step 2: Global Training

Vector Representations of Words

  • Motivation: Why Learn word embeddings?
  • Scaling up with Noise-Contrastive Training
  • The Skip-gram Model
  • Building the Graph
  • Training the Model
  • Visualizing the Learned Embeddings
  • Evaluating Embeddings: Analogical Reasoning
  • Optimizing the Implementation