Cluster Analysis with R and SAS Training Course

Duration

14 hours (usually 2 days including breaks)

Requirements

  • Experience with R programming
  • SAS experience

Audience

  • Data Analysts

Overview

R is a programming language and software environment for statistical computing. SAS is a statistical software platform for predictive analysis, data management, advanced analytics, and more. With R in SAS, users can find natural groups of data for cluster analysis that are essential to data mining.

This instructor-led, live training (online or onsite) is aimed at data analysts who wish to program with R in SAS for cluster analysis.

By the end of this training, participants will be able to:

  • Use cluster analysis for data mining
  • Master R syntax for clustering solutions.
  • Implement hierarchical and non-hierarchical clustering.
  • Make data-driven decisions to help to improve business operations.

Format of the Course

  • Interactive lecture and discussion.
  • Lots of exercises and practice.
  • Hands-on implementation in a live-lab environment.

Course Customization Options

  • To request a customized training for this course, please contact us to arrange.

Course Outline

Introduction

Cluster Analysis

  • What is cluster analysis?
  • Types of cluster types

Cluster Analysis Continued

  • Cluster analysis vs object segmentation
  • Hierarchical vs non-hierarchical clustering

Preparing the Development Environment

  • Installing and configuring SAS
  • Installing and configuring R

Cluster Analysis with SAS

  • Importing data
  • Standardizing data
  • Implementing hierarchical clustering
  • Interpretting output
  • Working with K means clustering for non-hierarchical
  • Interpretting output

Cluster Analysis with R

  • Using hierarchical clustering functions
  • Working with non-hierarchical clustering functions

Summary and Conclusion

Foundation R Training Course

Duration

7 hours (usually 1 day including breaks)

Requirements

There are no specific requirements needed to attend this course.

Overview

The objective of the course is to enable participants to gain a mastery of the fundamentals of R and how to work with data.

Course Outline

Basic overview of R and R Studio

  • R overview
  • R Studio Environment Windows
    • Script Editor Window
    • Data Environment
    • Console
    • Plots/Help/Packages

Working with Data

  • Introduction to vectors and matrices (data.frame)
  • Different types of variables
    • Numeric, Integer, factor etc
    • Changing variable types
    • Importing data using R Studio menu functions
    • Removing variables ls() command
  • Creating variables at the console prompt – single, vector, data frame
  • Naming vectors and matrices
  • Head and tail commands
  • Introduction to dim, length and class
  • Command line import (reading .csv and tab delimited .txt files)
  • Attaching and detaching data (advantages vs data.frame$)
  • Merging data using cbind and rbind

Exploratory Data Analysis

  • Summarising data
  • Summary command on both vectors and data frames
  • Sub-setting data using square brackets
    • summarising and creating new variables
  • Table and summary commands
  • Summary statistic commands
    • Mean
    • Median
    • Standard Deviation
    • Variance
    • Count & frequencies
    • Min & Max,
    • Quartiles
    • Percentiles
    • Correlation

Exporting data

  • Write table .txt
  • Write to a .csv file

R Workspace

  • Concept of Working Directories and Projects (menu driven and code – setwd())

Introduction to R scripts

  • Creating R Scripts
  • Saving scripts
  • Workspace images

Concepts of packages

  • Installing packages
  • Loading packages into memory

Plotting data (using standard default R plot command and ggplot2 package)

  • Bar Charts and Histograms
  • Boxplots
  • Line charts / time series
  • Scatter plots
  • Stem and leaf
  • Mosaic
  • Modifying plots
    • Titles
    • Legends
    • Axis
    • Plot Area
  • Exporting a plot to a third party application

Introductory R for Biologists Training Course

Duration

28 hours (usually 4 days including breaks)

Overview

R is an open-source free programming language for statistical computing, data analysis, and graphics. R is used by a growing number of managers and data analysts inside corporations and academia. R has also found followers among statisticians, engineers and scientists without computer programming skills who find it easy to use. Its popularity is due to the increasing use of data mining for various goals such as set ad prices, find new drugs more quickly or fine-tune financial models. R has a wide variety of packages for data mining.

Course Outline

I. Introduction and preliminaries

1. Overview

  • Making R more friendly, R and available GUIs
  • Rstudio
  • Related software and documentation
  • R and statistics
  • Using R interactively
  • An introductory session
  • Getting help with functions and features
  • R commands, case sensitivity, etc.
  • Recall and correction of previous commands
  • Executing commands from or diverting output to a file
  • Data permanency and removing objects
  • Good programming practice:  Self-contained scripts, good    readability e.g. structured scripts, documentation, markdown
  • installing packages; CRAN and Bioconductor

2. Reading data

  • Txt files  (read.delim)
  • CSV files

3. Simple manipulations; numbers and vectors  + arrays

  • Vectors and assignment
  • Vector arithmetic
  • Generating regular sequences
  • Logical vectors
  • Missing values
  • Character vectors
  • Index vectors; selecting and modifying subsets of a data set
    • Arrays
  • Array indexing. Subsections of an array
  • Index matrices
  • The array() function + simple operations on arrays e.g. multiplication, transposition  
  • Other types of objects

4. Lists and data frames

  • Lists
  • Constructing and modifying lists
    • Concatenating lists
  • Data frames
    • Making data frames
    • Working with data frames
    • Attaching arbitrary lists
    • Managing the search path

5. Data manipulation

  • Selecting, subsetting observations and variables         
  • Filtering, grouping
  • Recoding, transformations
  • Aggregation, combining data sets
  • Forming partitioned matrices, cbind() and rbind()
  • The concatenation function, (), with arrays
  • Character manipulation, stringr package
  • short intro into grep and regexpr

6. More on Reading data                                            

  • XLS, XLSX files
  • readr  and readxl packages
  • SPSS, SAS, Stata,… and other formats data
  • Exporting data to txt, csv and other formats

6. Grouping, loops and conditional execution

  • Grouped expressions
  • Control statements
  • Conditional execution: if statements
  • Repetitive execution: for loops, repeat and while
  • intro into apply, lapply, sapply, tapply

7. Functions

  • Creating functions
  • Optional arguments and default values
  • Variable number of arguments
  • Scope and its consequences

8. Simple graphics in R

  • Creating a Graph
  • Density Plots
  • Dot Plots
  • Bar Plots
  • Line Charts
  • Pie Charts
  • Boxplots
  • Scatter Plots
  • Combining Plots

II. Statistical analysis in R 

1.    Probability distributions

  • R as a set of statistical tables
  • Examining the distribution of a set of data

2.   Testing of Hypotheses

  • Tests about a Population Mean
  • Likelihood Ratio Test
  • One- and two-sample tests
  • Chi-Square Goodness-of-Fit Test
  • Kolmogorov-Smirnov One-Sample Statistic 
  • Wilcoxon Signed-Rank Test
  • Two-Sample Test
  • Wilcoxon Rank Sum Test
  • Mann-Whitney Test
  • Kolmogorov-Smirnov Test

3. Multiple Testing of Hypotheses

  • Type I Error and FDR
  • ROC curves and AUC
  • Multiple Testing Procedures (BH, Bonferroni etc.)

4. Linear regression models

  • Generic functions for extracting model information
  • Updating fitted models
  • Generalized linear models
    • Families
    • The glm() function
  • Classification
    • Logistic Regression
    • Linear Discriminant Analysis
  • Unsupervised learning
    • Principal Components Analysis
    • Clustering Methods(k-means, hierarchical clustering, k-medoids)

5.  Survival analysis (survival package)

  • Survival objects in r
  • Kaplan-Meier estimate, log-rank test, parametric regression
  • Confidence bands
  • Censored (interval censored) data analysis
  • Cox PH models, constant covariates
  • Cox PH models, time-dependent covariates
  • Simulation: Model comparison (Comparing regression models)

 6.   Analysis of Variance

  • One-Way ANOVA
  • Two-Way Classification of ANOVA
  • MANOVA

III. Worked problems in bioinformatics           

  • Short introduction to limma package
  • Microarray data analysis workflow
  • Data download from GEO: https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE1397
  • Data processing (QC, normalisation, differential expression)
  • Volcano plot             
  • Custering examples + heatmaps

Data Mining with R Training Course

Duration

14 hours (usually 2 days including breaks)

Requirements

Good R knowledge.

Overview

R is an open-source free programming language for statistical computing, data analysis, and graphics. R is used by a growing number of managers and data analysts inside corporations and academia. R has a wide variety of packages for data mining.

Course Outline

Sources of methods

  • Artificial intelligence
  • Machine learning
  • Statistics
  • Sources of data

Pre processing of data

  • Data Import/Export
  • Data Exploration and Visualization
  • Dimensionality Reduction
  • Dealing with missing values
  • R Packages

Data mining main tasks

  • Automatic or semi-automatic analysis of large quantities of data
  • Extracting previously unknown interesting patterns
    • groups of data records (cluster analysis)
    • unusual records (anomaly detection)
    • dependencies (association rule mining)

Data mining

  • Anomaly detection (Outlier/change/deviation detection)
  • Association rule learning (Dependency modeling)
  • Clustering
  • Classification
  • Regression
  • Summarization
  • Frequent Pattern Mining
  • Text Mining
  • Decision Trees
  • Regression
  • Neural Networks
  • Sequence Mining
  • Frequent Pattern Mining

Data dredging, data fishing, data snooping

Programming with Big Data in R Training Course

Duration

21 hours (usually 3 days including breaks)

Overview

Big Data is a term that refers to solutions destined for storing and processing large data sets. Developed by Google initially, these Big Data solutions have evolved and inspired other similar projects, many of which are available as open-source. R is a popular programming language in the financial industry.

Course Outline

Introduction to Programming Big Data with R (bpdR)

  • Setting up your environment to use pbdR
  • Scope and tools available in pbdR
  • Packages commonly used with Big Data alongside pbdR

Message Passing Interface (MPI)

  • Using pbdR MPI 5
  • Parallel processing
  • Point-to-point communication
  • Send Matrices
  • Summing Matrices
  • Collective communication
  • Summing Matrices with Reduce
  • Scatter / Gather
  • Other MPI communications

Distributed Matrices

  • Creating a distributed diagonal matrix
  • SVD of a distributed matrix
  • Building a distributed matrix in parallel

Statistics Applications

  • Monte Carlo Integration
  • Reading Datasets
  • Reading on all processes
  • Broadcasting from one process
  • Reading partitioned data
  • Distributed Regression
  • Distributed Bootstrap

Introduction to R with Time Series Analysis Training Course

Duration

21 hours (usually 3 days including breaks)

Overview

R is an open-source free programming language for statistical computing, data analysis, and graphics. R is used by a growing number of managers and data analysts inside corporations and academia. R has a wide variety of packages for data mining.

Course Outline

Introduction and preliminaries

  • Making R more friendly, R and available GUIs
  • Rstudio
  • Related software and documentation
  • R and statistics
  • Using R interactively
  • An introductory session
  • Getting help with functions and features
  • R commands, case sensitivity, etc.
  • Recall and correction of previous commands
  • Executing commands from or diverting output to a file
  • Data permanency and removing objects

Simple manipulations; numbers and vectors

  • Vectors and assignment
  • Vector arithmetic
  • Generating regular sequences
  • Logical vectors
  • Missing values
  • Character vectors
  • Index vectors; selecting and modifying subsets of a data set
  • Other types of objects

Objects, their modes and attributes

  • Intrinsic attributes: mode and length
  • Changing the length of an object
  • Getting and setting attributes
  • The class of an object

Arrays and matrices

  • Arrays
  • Array indexing. Subsections of an array
  • Index matrices
  • The array() function
  • The outer product of two arrays
  • Generalized transpose of an array
  • Matrix facilities
    • Matrix multiplication
    • Linear equations and inversion
    • Eigenvalues and eigenvectors
    • Singular value decomposition and determinants
    • Least squares fitting and the QR decomposition
  • Forming partitioned matrices, cbind() and rbind()
  • The concatenation function, (), with arrays
  • Frequency tables from factors

Lists and data frames

  • Lists
  • Constructing and modifying lists
    • Concatenating lists
  • Data frames
    • Making data frames
    • attach() and detach()
    • Working with data frames
    • Attaching arbitrary lists
    • Managing the search path

Data manipulation

  • Selecting, subsetting observations and variables          
  • Filtering, grouping
  • Recoding, transformations
  • Aggregation, combining data sets
  • Character manipulation, stringr package

Reading data

  • Txt files
  • CSV files
  • XLS, XLSX files
  • SPSS, SAS, Stata,… and other formats data
  • Exporting data to txt, csv and other formats
  • Accessing data from databases using SQL language

Probability distributions

  • R as a set of statistical tables
  • Examining the distribution of a set of data
  • One- and two-sample tests

Grouping, loops and conditional execution

  • Grouped expressions
  • Control statements
    • Conditional execution: if statements
    • Repetitive execution: for loops, repeat and while

Writing your own functions

  • Simple examples
  • Defining new binary operators
  • Named arguments and defaults
  • The ‘…’ argument
  • Assignments within functions
  • More advanced examples
    • Efficiency factors in block designs
    • Dropping all names in a printed array
    • Recursive numerical integration
  • Scope
  • Customizing the environment
  • Classes, generic functions and object orientation

Graphical procedures

  • High-level plotting commands
    • The plot() function
    • Displaying multivariate data
    • Display graphics
    • Arguments to high-level plotting functions
  • Basic visualisation graphs
  • Multivariate relations with lattice and ggplot package
  • Using graphics parameters
  • Graphics parameters list

Time series Forecasting

  • Seasonal adjustment
  • Moving average
  • Exponential smoothing
  • Extrapolation
  • Linear prediction
  • Trend estimation
  • Stationarity and ARIMA modelling

Econometric methods (casual methods)

  • Regression analysis
  • Multiple linear regression
  • Multiple non-linear regression
  • Regression validation
  • Forecasting from regression

Predictive Modelling with R Training Course

Duration

14 hours (usually 2 days including breaks)

Requirements

This course is part of the Data Scientist skill set (Domain: Analytical Techniques and Methods).

Overview

R is an open-source free programming language for statistical computing, data analysis, and graphics. R is used by a growing number of managers and data analysts inside corporations and academia. R has a wide variety of packages for data mining.

Course Outline

Problems facing forecasters

  • Customer demand planning
  • Investor uncertainty
  • Economic planning
  • Seasonal changes in demand/utilization
  • Roles of risk and uncertainty

Time series Forecasting

  • Seasonal adjustment
  • Moving average
  • Exponential smoothing
  • Extrapolation
  • Linear prediction
  • Trend estimation
  • Stationarity and ARIMA modelling

Econometric methods (casual methods)

  • Regression analysis
  • Multiple linear regression
  • Multiple non-linear regression
  • Regression validation
  • Forecasting from regression

Judgemental methods

  • Surveys
  • Delphi method
  • Scenario building
  • Technology forecasting
  • Forecast by analogy

Simulation and other methods

  • Simulation
  • Prediction market
  • Probabilistic forecasting and Ensemble forecasting

Deep Learning for Banking (with R) Training Course

Duration

28 hours (usually 4 days including breaks)

Requirements

  • Basic experience with R programming
  • General familiarity with financial and banking concepts
  • Basic familiarity with statistics and mathematical concepts

Overview

Machine learning is a branch of Artificial Intelligence wherein computers have the ability to learn without being explicitly programmed. Deep learning is a subfield of machine learning which uses methods based on learning data representations and structures such as neural networks. R is a popular programming language in the financial industry. It is used in financial applications ranging from core trading programs to risk management systems.

In this instructor-led, live training, participants will learn how to implement deep learning models for banking using R as they step through the creation of a deep learning credit risk model.

By the end of this training, participants will be able to:

  • Understand the fundamental concepts of deep learning
  • Learn the applications and uses of deep learning in banking
  • Use R to create deep learning models for banking
  • Build their own deep learning credit risk model using R

Audience

  • Developers
  • Data scientists

Format of the course

  • Part lecture, part discussion, exercises and heavy hands-on practice

Course Outline

Introduction

Understanding the Fundamentals of Artificial Intelligence and Machine Learning

Understanding Deep Learning

  • Overview of the Basic Concepts of Deep Learning
  • Differentiating Between Machine Learning and Deep Learning
  • Overview of Applications for Deep Learning

Overview of Neural Networks

  • What are Neural Networks
  • Neural Networks vs Regression Models
  • Understanding Mathematical Foundations and Learning Mechanisms
  • Constructing an Artificial Neural Network
  • Understanding Neural Nodes and Connections
  • Working with Neurons, Layers, and Input and Output Data
  • Understanding Single Layer Perceptrons
  • Differences Between Supervised and Unsupervised Learning
  • Learning Feedforward and Feedback Neural Networks
  • Understanding Forward Propagation and Back Propagation
  • Understanding Long Short-Term Memory (LSTM)
  • Exploring Recurrent Neural Networks in Practice
  • Exploring Convolutional Neural Networks in practice
  • Improving the Way Neural Networks Learn

Overview of Deep Learning Techniques Used in Banking

  • Neural Networks
  • Natural Language Processing
  • Image Recognition
  • Speech Recognition
  • Sentimental Analysis

Exploring Deep Learning Case Studies for Banking

  • Anti-Money Laundering Programs
  • Know-Your-Customer (KYC) Checks
  • Sanctions List Monitoring
  • Billing Fraud Oversight
  • Risk Management
  • Fraud Detection
  • Product and Customer Segmentation
  • Performance Evaluation
  • General Compliance Functions

Understanding the Benefits of Deep Learning for Banking

Exploring the Different Deep Learning Packages for R
    
Deep Learning in R with Keras and RStudio

  • Overview of the Keras Package for R
  • Installing the Keras Package for R
  • Loading the Data
    • Using Built-in Datasets
    • Using Data from Files
    • Using Dummy Data
  • Exploring the Data
  • Preprocessing the Data
    • Cleaning the Data
    • Normalizing the Data
    • Splitting the Data into Training and Test Sets
  • Implementing One Hot Encoding (OHE)
  • Defining the Architecture of Your Model
  • Compiling and Fitting Your Model to the Data
  • Training Your Model
  • Visualizing the Model Training History
  • Using Your Model to Predict Labels of New Data
  • Evaluating Your Model
  • Fine-Tuning Your Model
  • Saving and Exporting Your Model

Hands-on: Building a Deep Learning Credit Risk Model Using R

Extending your Company’s Capabilities

  • Developing Models in the Cloud
  • Using GPUs to Accelerate Deep Learning
  • Applying Deep Learning Neural Networks for Computer Vision, Voice Recognition, and Text Analysis.

Summary and Conclusion

Deep Learning for Finance (with R) Training Course

Duration

28 hours (usually 4 days including breaks)

Requirements

  • Experience with R programming
  • General familiarity with finance concepts
  • Basic familiarity with statistics and mathematical concepts

Overview

Machine learning is a branch of Artificial Intelligence wherein computers have the ability to learn without being explicitly programmed. Deep learning is a subfield of machine learning which uses methods based on learning data representations and structures such as neural networks. R is a popular programming language in the financial industry. It is used in financial applications ranging from core trading programs to risk management systems.

In this instructor-led, live training, participants will learn how to implement deep learning models for finance using R as they step through the creation of a deep learning stock price prediction model.

By the end of this training, participants will be able to:

  • Understand the fundamental concepts of deep learning
  • Learn the applications and uses of deep learning in finance
  • Use R to create deep learning models for finance
  • Build their own deep learning stock price prediction model using R

Audience

  • Developers
  • Data scientists

Format of the course

  • Part lecture, part discussion, exercises and heavy hands-on practice

Course Outline

Introduction

Understanding the Fundamentals of Artificial Intelligence and Machine Learning

Understanding Deep Learning

  • Overview of the Basic Concepts of Deep Learning
  • Differentiating Between Machine Learning and Deep Learning
  • Overview of Applications for Deep Learning

Overview of Neural Networks

  • What are Neural Networks
  • Neural Networks vs Regression Models
  • Understanding Mathematical Foundations and Learning Mechanisms
  • Constructing an Artificial Neural Network
  • Understanding Neural Nodes and Connections
  • Working with Neurons, Layers, and Input and Output Data
  • Understanding Single Layer Perceptrons
  • Differences Between Supervised and Unsupervised Learning
  • Learning Feedforward and Feedback Neural Networks
  • Understanding Forward Propagation and Back Propagation
  • Understanding Long Short-Term Memory (LSTM)
  • Exploring Recurrent Neural Networks in Practice
  • Exploring Convolutional Neural Networks in practice
  • Improving the Way Neural Networks Learn

Overview of Deep Learning Techniques Used in Finance

  • Neural Networks
  • Natural Language Processing
  • Image Recognition
  • Speech Recognition
  • Sentimental Analysis

Exploring Deep Learning Case Studies for Finance

  • Pricing
  • Portfolio Construction
  • Risk Management
  • High Frequency Trading
  • Return Prediction

Understanding the Benefits of Deep Learning for Finance

Exploring the Different Deep Learning Packages for R

Deep Learning in R with Keras and RStudio

  • Overview of the Keras Package for R
  • Installing the Keras Package for R
  • Loading the Data
    • Using Built-in Datasets
    • Using Data from Files
    • Using Dummy Data
  • Exploring the Data
  • Preprocessing the Data
    • Cleaning the Data
    • Normalizing the Data
    • Splitting the Data into Training and Test Sets
  • Implementing One Hot Encoding (OHE)
  • Defining the Architecture of Your Model
  • Compiling and Fitting Your Model to the Data
  • Training Your Model
  • Visualizing the Model Training History
  • Using Your Model to Predict Labels of New Data
  • Evaluating Your Model
  • Fine-Tuning Your Model
  • Saving and Exporting Your Model

Hands-on: Building a Deep Learning Model for Stock Price Prediction Using R

Extending your Company’s Capabilities

  • Developing Models in the Cloud
  • Using GPUs to Accelerate Deep Learning
  • Applying Deep Learning Neural Networks for Computer Vision, Voice Recognition, and Text Analysis

Summary and Conclusion

Advanced Machine Learning with R Training Course

Duration

21 hours (usually 3 days including breaks)

Requirements

  • R programming experience
  • An understanding of machine learning concepts

Overview

In this instructor-led, live training, participants will learn advanced techniques for Machine Learning with R as they step through the creation of a real-world application.

By the end of this training, participants will be able to:

  • Understand and implement unsupervised learning techniques
  • Apply clustering and classification to make predictions based on real world data.
  • Visualize data to quicly gain insights, make decisions and further refine analysis.
  • Improve the performance of a machine learning model using hyper-parameter tuning.
  • Put a model into production for use in a larger application.
  • Apply advanced machine learning techniques to answer questions involving social network data, big data, and more.

Audience

  • Developers
  • Analysts
  • Data scientists

Format of the course

  • Part lecture, part discussion, exercises and heavy hands-on practice

Course Outline

Introduction

Setting up the R Development Environment

Deep Learning vs Neural Network vs Machine Learning

Building an Unsupervised Learning Model

Case Study: Predicting an Outcome Using Existing Data

Preparing Test and Training Data Sets For Analysis

Clustering Data

Classifying Data

Visualizing Data

Evaluating the Performance of a Model

Iterating Through Model Parameters

Hyper-parameter Tuning 

Integrating a Model with a Real-World Application

Deploying a Machine Learning Application

Troubleshooting

Summary and Conclusion