Duration
21 hours (usually 3 days including breaks)
Requirements
- Experience with Python
- An understanding of machine learning
- Experience with scikit-learn and pandas
Overview
In this instructor-led, live training, participants will learn how to use the right machine learning and NLP (Natural Language Processing) techniques to extract value from text-based data.
By the end of this training, participants will be able to:
- Solve text-based data science problems with high-quality, reusable code
- Apply different aspects of scikit-learn (classification, clustering, regression, dimensionality reduction) to solve problems
- Build effective machine learning models using text-based data
- Create a dataset and extract features from unstructured text
- Visualize data with Matplotlib
- Build and evaluate models to gain insight
- Troubleshoot text encoding errors
Audience
- Developers
- Data Scientists
Format of the course
- Part lecture, part discussion, exercises and heavy hands-on practice
Course Outline
Introduction
- The value of text-based data
Workflow for a Text-Based Data Science Problem
Choosing the Right Machine Learning Libraries
Overview of NLP Techniques
Preparing a Dataset
Visualizing the Data
Working with Text Data with scikit-learn
Building a Machine Learning Model
Splitting into Train and Test Sets
Applying Linear Regression and Non-Linear Regression
Applying NLP Techniques
Parsing Text Data Using Regular Expressions
Exploring Other Machine Language Approaches
Troubleshooting Text Encoding Issues
Closing Remarks