Duration
28 hours (usually 4 days including breaks)
Requirements
Basic Knowledge of Python
Overview
This course introduces linguists or programmers to NLP in Python. During this course we will mostly use nltk.org (Natural Language Tool Kit), but also we will use other libraries relevant and useful for NLP. At the moment we can conduct this course in Python 2.x or Python 3.x. Examples are in English or Mandarin (普通话). Other languages can be also made available if agreed before booking.
Course Outline
Overview of Python packages related to NLP
Introduction to NLP (examples in Python of course)
- Simple Text Manipulation
- Searching Text
- Counting Words
- Splitting Texts into Words
- Lexical dispersion
- Processing complex structures
- Representing text in Lists
- Indexing Lists
- Collocations
- Bigrams
- Frequency Distributions
- Conditionals with Words
- Comparing Words (startswith, endswith, islower, isalpha, etc…)
- Natural Language Understanding
- Word Sense Disambiguation
- Pronoun Resolution
- Machine translations (statistical, rule based, literal, etc…)
- Exercises
NLP in Python in examples
- Accessing Text Corpora and Lexical Resources
- Common sources for corpora
- Conditional Frequency Distributions
- Counting Words by Genre
- Creating own corpus
- Pronouncing Dictionary
- Shoebox and Toolbox Lexicons
- Senses and Synonyms
- Hierarchies
- Lexical Relations: Meronyms, Holonyms
- Semantic Similarity
- Processing Raw Text
- Priting
- Struncating
- Extracting parts of string
- Accessing individual charaters
- Searching, replacing, spliting, joining, indexing, etc…
- Using regular expressions
- Detecting word patterns
- Stemming
- Tokenization
- Normalization of text
- Word Segmentation (especially in Chinese)
- Categorizing and Tagging Words
- Tagged Corpora
- Tagged Tokens
- Part-of-Speech Tagset
- Python Dictionaries
- Words to Propertieis mapping
- Automatic Tagging
- Determining the Category of a Word (Morphological, Syntactic, Semantic)
- Text Classification (Machine Learning)
- Supervised Classification
- Sentence Segmentation
- Cross Validation
- Decision Trees
- Extracting Information from Text
- Chunking
- Chinking
- Tags vs Trees
- Analyzing Sentence Structure
- Context Free Grammar
- Parsers
- Building Feature Based Grammars
- Grammatical Features
- Processing Feature Structures
- Analyzing the Meaning of Sentences
- Semantics and Logic
- Propositional Logic
- First-Order Logic
- Discourse Semantics
- Managing Linguistic Data
- Data Formats (Lexicon vs Text)
- Metadata