Duration
21 hours (usually 3 days including breaks)
Overview
R is an open-source free programming language for statistical computing, data analysis, and graphics. R is used by a growing number of managers and data analysts inside corporations and academia. R has a wide variety of packages for data mining.
Course Outline
Introduction and preliminaries
- Making R more friendly, R and available GUIs
- Rstudio
- Related software and documentation
- R and statistics
- Using R interactively
- An introductory session
- Getting help with functions and features
- R commands, case sensitivity, etc.
- Recall and correction of previous commands
- Executing commands from or diverting output to a file
- Data permanency and removing objects
Simple manipulations; numbers and vectors
- Vectors and assignment
- Vector arithmetic
- Generating regular sequences
- Logical vectors
- Missing values
- Character vectors
- Index vectors; selecting and modifying subsets of a data set
- Other types of objects
Objects, their modes and attributes
- Intrinsic attributes: mode and length
- Changing the length of an object
- Getting and setting attributes
- The class of an object
Arrays and matrices
- Arrays
- Array indexing. Subsections of an array
- Index matrices
- The array() function
- The outer product of two arrays
- Generalized transpose of an array
- Matrix facilities
- Matrix multiplication
- Linear equations and inversion
- Eigenvalues and eigenvectors
- Singular value decomposition and determinants
- Least squares fitting and the QR decomposition
- Forming partitioned matrices, cbind() and rbind()
- The concatenation function, (), with arrays
- Frequency tables from factors
Lists and data frames
- Lists
- Constructing and modifying lists
- Concatenating lists
- Data frames
- Making data frames
- attach() and detach()
- Working with data frames
- Attaching arbitrary lists
- Managing the search path
Data manipulation
- Selecting, subsetting observations and variables
- Filtering, grouping
- Recoding, transformations
- Aggregation, combining data sets
- Character manipulation, stringr package
Reading data
- Txt files
- CSV files
- XLS, XLSX files
- SPSS, SAS, Stata,… and other formats data
- Exporting data to txt, csv and other formats
- Accessing data from databases using SQL language
Probability distributions
- R as a set of statistical tables
- Examining the distribution of a set of data
- One- and two-sample tests
Grouping, loops and conditional execution
- Grouped expressions
- Control statements
- Conditional execution: if statements
- Repetitive execution: for loops, repeat and while
Writing your own functions
- Simple examples
- Defining new binary operators
- Named arguments and defaults
- The ‘…’ argument
- Assignments within functions
- More advanced examples
- Efficiency factors in block designs
- Dropping all names in a printed array
- Recursive numerical integration
- Scope
- Customizing the environment
- Classes, generic functions and object orientation
Graphical procedures
- High-level plotting commands
- The plot() function
- Displaying multivariate data
- Display graphics
- Arguments to high-level plotting functions
- Basic visualisation graphs
- Multivariate relations with lattice and ggplot package
- Using graphics parameters
- Graphics parameters list
Time series Forecasting
- Seasonal adjustment
- Moving average
- Exponential smoothing
- Extrapolation
- Linear prediction
- Trend estimation
- Stationarity and ARIMA modelling
Econometric methods (casual methods)
- Regression analysis
- Multiple linear regression
- Multiple non-linear regression
- Regression validation
- Forecasting from regression