Duration
14 hours (usually 2 days including breaks)
Requirements
Good R knowledge.
Overview
R is an open-source free programming language for statistical computing, data analysis, and graphics. R is used by a growing number of managers and data analysts inside corporations and academia. R has a wide variety of packages for data mining.
Course Outline
Sources of methods
- Artificial intelligence
- Machine learning
- Statistics
- Sources of data
Pre processing of data
- Data Import/Export
- Data Exploration and Visualization
- Dimensionality Reduction
- Dealing with missing values
- R Packages
Data mining main tasks
- Automatic or semi-automatic analysis of large quantities of data
- Extracting previously unknown interesting patterns
- groups of data records (cluster analysis)
- unusual records (anomaly detection)
- dependencies (association rule mining)
Data mining
- Anomaly detection (Outlier/change/deviation detection)
- Association rule learning (Dependency modeling)
- Clustering
- Classification
- Regression
- Summarization
- Frequent Pattern Mining
- Text Mining
- Decision Trees
- Regression
- Neural Networks
- Sequence Mining
- Frequent Pattern Mining
Data dredging, data fishing, data snooping