
Duration
14 hours (usually 2 days including breaks)
Requirements
- Python programming experience.
- A basic understanding of statistics
- Experience with the command line
Audience
- Developers
- Data scientists
Overview
This instructor-led, live training (online or onsite) is aimed at developers and data scientists who wish to use spaCy to process very large volumes of text to find patterns and gain insights.
By the end of this training, participants will be able to:
- Install and configure spaCy.
- Understand spaCy’s approach to Natural Language Processing (NLP).
- Extract patterns and obtain business insights from large-scale data sources.
- Integrate the spaCy library with existing web and legacy applications.
- Deploy spaCy to live production environments to predict human behavior.
- Use spaCy to pre-process text for Deep Learning
Format of the Course
- Interactive lecture and discussion.
- Lots of exercises and practice.
- Hands-on implementation in a live-lab environment.
Course Customization Options
- To request a customized training for this course, please contact us to arrange.
- To learn more about spaCy, please visit: https://spacy.io/
Course Outline
Introduction
- Defining “Industrial-Strength Natural Language Processing”
Installing spaCy
spaCy Components
- Part-of-speech tagger
- Named entity recognizer
- Dependency parser
Overview of spaCy Features and Syntax
Understanding spaCy Modeling
- Statistical modeling and prediction
Using the SpaCy Command Line Interface (CLI)
- Basic commands
Creating a Simple Application to Predict Behavior
Training a New Statistical Model
- Data (for training)
- Labels (tags, named entities, etc.)
Loading the Model
- Shuffling and looping
Saving the Model
Providing Feedback to the Model
- Error gradient
Updating the Model
- Updating the entity recognizer
- Extracting tokens with rule-based matcher
Developing a Generalized Theory for Expected Outcomes
Case Study
- Distinguishing Product Names from Company Names
Refining the Training Data
- Selecting representative data
- Setting the dropout rate
Other Training Styles
- Passing raw texts
- Passing dictionaries of annotations
Using spaCy to Pre-process Text for Deep Learning
Integrating spaCy with Legacy Applications
Testing and Debugging the spaCy Model
- The importance of iteration
Deploying the Model to Production
Monitoring and Adjusting the Model
Troubleshooting
Summary and Conclusion