Duration
28 hours (usually 4 days including breaks)
Overview
Objective:
Delegates be able to analyse big data sets, extract patterns, choose the right variable impacting the results so that a new model is forecasted with predictive results.
Course Outline
- Data preprocessing
- Data Cleaning
- Data integration and transformation
- Data reduction
- Discretization and concept hierarchy generation
- Statistical inference
- Probability distributions, Random variables, Central limit theorem
- Sampling
- Confidence intervals
- Statistical Inference
- Hypothesis testing
- Multivariate linear regression
- Specification
- Subset selection
- Estimation
- Validation
- Prediction
- Classification methods
- Logistic regression
- Linear discriminant analysis
- K-nearest neighbours
- Naive Bayes
- Comparison of Classification methods
- Neural Networks
- Fitting neural networks
- Training neural networks issues
- Decision trees
- Regression trees
- Classification trees
- Trees Versus Linear Models
- Bagging, Random Forests, Boosting
- Bagging
- Random Forests
- Boosting
- Support Vector Machines and Flexible disct
- Maximal Margin classifier
- Support vector classifiers
- Support vector machines
- 2 and more classes SVM’s
- Relationship to logistic regression
- Principal Components Analysis
- Clustering
- K-means clustering
- K-medoids clustering
- Hierarchical clustering
- Density based clustering
- Model Assesment and Selection
- Bias, Variance and Model complexity
- In-sample prediction error
- The Bayesian approach
- Cross-validation
- Bootstrap methods