Duration
14 hours (usually 2 days including breaks)
Requirements
- Experience with R programming
- SAS experience
Audience
Overview
R is a programming language and software environment for statistical computing. SAS is a statistical software platform for predictive analysis, data management, advanced analytics, and more. With R in SAS, users can find natural groups of data for cluster analysis that are essential to data mining.
This instructor-led, live training (online or onsite) is aimed at data analysts who wish to program with R in SAS for cluster analysis.
By the end of this training, participants will be able to:
- Use cluster analysis for data mining
- Master R syntax for clustering solutions.
- Implement hierarchical and non-hierarchical clustering.
- Make data-driven decisions to help to improve business operations.
Format of the Course
- Interactive lecture and discussion.
- Lots of exercises and practice.
- Hands-on implementation in a live-lab environment.
Course Customization Options
- To request a customized training for this course, please contact us to arrange.
Course Outline
Introduction
Cluster Analysis
- What is cluster analysis?
- Types of cluster types
Cluster Analysis Continued
- Cluster analysis vs object segmentation
- Hierarchical vs non-hierarchical clustering
Preparing the Development Environment
- Installing and configuring SAS
- Installing and configuring R
Cluster Analysis with SAS
- Importing data
- Standardizing data
- Implementing hierarchical clustering
- Interpretting output
- Working with K means clustering for non-hierarchical
- Interpretting output
Cluster Analysis with R
- Using hierarchical clustering functions
- Working with non-hierarchical clustering functions
Summary and Conclusion
Duration
7 hours (usually 1 day including breaks)
Requirements
There are no specific requirements needed to attend this course.
Overview
The objective of the course is to enable participants to gain a mastery of the fundamentals of R and how to work with data.
Course Outline
Basic overview of R and R Studio
- R overview
- R Studio Environment Windows
- Script Editor Window
- Data Environment
- Console
- Plots/Help/Packages
Working with Data
- Introduction to vectors and matrices (data.frame)
- Different types of variables
- Numeric, Integer, factor etc
- Changing variable types
- Importing data using R Studio menu functions
- Removing variables ls() command
- Creating variables at the console prompt – single, vector, data frame
- Naming vectors and matrices
- Head and tail commands
- Introduction to dim, length and class
- Command line import (reading .csv and tab delimited .txt files)
- Attaching and detaching data (advantages vs data.frame$)
- Merging data using cbind and rbind
Exploratory Data Analysis
- Summarising data
- Summary command on both vectors and data frames
- Sub-setting data using square brackets
- summarising and creating new variables
- Table and summary commands
- Summary statistic commands
- Mean
- Median
- Standard Deviation
- Variance
- Count & frequencies
- Min & Max,
- Quartiles
- Percentiles
- Correlation
Exporting data
- Write table .txt
- Write to a .csv file
R Workspace
- Concept of Working Directories and Projects (menu driven and code – setwd())
Introduction to R scripts
- Creating R Scripts
- Saving scripts
- Workspace images
Concepts of packages
- Installing packages
- Loading packages into memory
Plotting data (using standard default R plot command and ggplot2 package)
- Bar Charts and Histograms
- Boxplots
- Line charts / time series
- Scatter plots
- Stem and leaf
- Mosaic
- Modifying plots
- Titles
- Legends
- Axis
- Plot Area
- Exporting a plot to a third party application
Duration
28 hours (usually 4 days including breaks)
Overview
R is an open-source free programming language for statistical computing, data analysis, and graphics. R is used by a growing number of managers and data analysts inside corporations and academia. R has also found followers among statisticians, engineers and scientists without computer programming skills who find it easy to use. Its popularity is due to the increasing use of data mining for various goals such as set ad prices, find new drugs more quickly or fine-tune financial models. R has a wide variety of packages for data mining.
Course Outline
I. Introduction and preliminaries
1. Overview
- Making R more friendly, R and available GUIs
- Rstudio
- Related software and documentation
- R and statistics
- Using R interactively
- An introductory session
- Getting help with functions and features
- R commands, case sensitivity, etc.
- Recall and correction of previous commands
- Executing commands from or diverting output to a file
- Data permanency and removing objects
- Good programming practice: Self-contained scripts, good readability e.g. structured scripts, documentation, markdown
- installing packages; CRAN and Bioconductor
2. Reading data
- Txt files (read.delim)
- CSV files
3. Simple manipulations; numbers and vectors + arrays
- Vectors and assignment
- Vector arithmetic
- Generating regular sequences
- Logical vectors
- Missing values
- Character vectors
- Index vectors; selecting and modifying subsets of a data set
- Array indexing. Subsections of an array
- Index matrices
- The array() function + simple operations on arrays e.g. multiplication, transposition
- Other types of objects
4. Lists and data frames
- Lists
- Constructing and modifying lists
- Data frames
- Making data frames
- Working with data frames
- Attaching arbitrary lists
- Managing the search path
5. Data manipulation
- Selecting, subsetting observations and variables
- Filtering, grouping
- Recoding, transformations
- Aggregation, combining data sets
- Forming partitioned matrices, cbind() and rbind()
- The concatenation function, (), with arrays
- Character manipulation, stringr package
- short intro into grep and regexpr
6. More on Reading data
- XLS, XLSX files
- readr and readxl packages
- SPSS, SAS, Stata,… and other formats data
- Exporting data to txt, csv and other formats
6. Grouping, loops and conditional execution
- Grouped expressions
- Control statements
- Conditional execution: if statements
- Repetitive execution: for loops, repeat and while
- intro into apply, lapply, sapply, tapply
7. Functions
- Creating functions
- Optional arguments and default values
- Variable number of arguments
- Scope and its consequences
8. Simple graphics in R
- Creating a Graph
- Density Plots
- Dot Plots
- Bar Plots
- Line Charts
- Pie Charts
- Boxplots
- Scatter Plots
- Combining Plots
II. Statistical analysis in R
1. Probability distributions
- R as a set of statistical tables
- Examining the distribution of a set of data
2. Testing of Hypotheses
- Tests about a Population Mean
- Likelihood Ratio Test
- One- and two-sample tests
- Chi-Square Goodness-of-Fit Test
- Kolmogorov-Smirnov One-Sample Statistic
- Wilcoxon Signed-Rank Test
- Two-Sample Test
- Wilcoxon Rank Sum Test
- Mann-Whitney Test
- Kolmogorov-Smirnov Test
3. Multiple Testing of Hypotheses
- Type I Error and FDR
- ROC curves and AUC
- Multiple Testing Procedures (BH, Bonferroni etc.)
4. Linear regression models
- Generic functions for extracting model information
- Updating fitted models
- Generalized linear models
- Families
- The glm() function
- Classification
- Logistic Regression
- Linear Discriminant Analysis
- Unsupervised learning
- Principal Components Analysis
- Clustering Methods(k-means, hierarchical clustering, k-medoids)
5. Survival analysis (survival package)
- Survival objects in r
- Kaplan-Meier estimate, log-rank test, parametric regression
- Confidence bands
- Censored (interval censored) data analysis
- Cox PH models, constant covariates
- Cox PH models, time-dependent covariates
- Simulation: Model comparison (Comparing regression models)
6. Analysis of Variance
- One-Way ANOVA
- Two-Way Classification of ANOVA
- MANOVA
III. Worked problems in bioinformatics
- Short introduction to limma package
- Microarray data analysis workflow
- Data download from GEO: https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE1397
- Data processing (QC, normalisation, differential expression)
- Volcano plot
- Custering examples + heatmaps
Duration
14 hours (usually 2 days including breaks)
Requirements
Good R knowledge.
Overview
R is an open-source free programming language for statistical computing, data analysis, and graphics. R is used by a growing number of managers and data analysts inside corporations and academia. R has a wide variety of packages for data mining.
Course Outline
Sources of methods
- Artificial intelligence
- Machine learning
- Statistics
- Sources of data
Pre processing of data
- Data Import/Export
- Data Exploration and Visualization
- Dimensionality Reduction
- Dealing with missing values
- R Packages
Data mining main tasks
- Automatic or semi-automatic analysis of large quantities of data
- Extracting previously unknown interesting patterns
- groups of data records (cluster analysis)
- unusual records (anomaly detection)
- dependencies (association rule mining)
Data mining
- Anomaly detection (Outlier/change/deviation detection)
- Association rule learning (Dependency modeling)
- Clustering
- Classification
- Regression
- Summarization
- Frequent Pattern Mining
- Text Mining
- Decision Trees
- Regression
- Neural Networks
- Sequence Mining
- Frequent Pattern Mining
Data dredging, data fishing, data snooping
Duration
21 hours (usually 3 days including breaks)
Overview
Big Data is a term that refers to solutions destined for storing and processing large data sets. Developed by Google initially, these Big Data solutions have evolved and inspired other similar projects, many of which are available as open-source. R is a popular programming language in the financial industry.
Course Outline
Introduction to Programming Big Data with R (bpdR)
- Setting up your environment to use pbdR
- Scope and tools available in pbdR
- Packages commonly used with Big Data alongside pbdR
Message Passing Interface (MPI)
- Using pbdR MPI 5
- Parallel processing
- Point-to-point communication
- Send Matrices
- Summing Matrices
- Collective communication
- Summing Matrices with Reduce
- Scatter / Gather
- Other MPI communications
Distributed Matrices
- Creating a distributed diagonal matrix
- SVD of a distributed matrix
- Building a distributed matrix in parallel
Statistics Applications
- Monte Carlo Integration
- Reading Datasets
- Reading on all processes
- Broadcasting from one process
- Reading partitioned data
- Distributed Regression
- Distributed Bootstrap
Duration
21 hours (usually 3 days including breaks)
Overview
R is an open-source free programming language for statistical computing, data analysis, and graphics. R is used by a growing number of managers and data analysts inside corporations and academia. R has a wide variety of packages for data mining.
Course Outline
Introduction and preliminaries
- Making R more friendly, R and available GUIs
- Rstudio
- Related software and documentation
- R and statistics
- Using R interactively
- An introductory session
- Getting help with functions and features
- R commands, case sensitivity, etc.
- Recall and correction of previous commands
- Executing commands from or diverting output to a file
- Data permanency and removing objects
Simple manipulations; numbers and vectors
- Vectors and assignment
- Vector arithmetic
- Generating regular sequences
- Logical vectors
- Missing values
- Character vectors
- Index vectors; selecting and modifying subsets of a data set
- Other types of objects
Objects, their modes and attributes
- Intrinsic attributes: mode and length
- Changing the length of an object
- Getting and setting attributes
- The class of an object
Arrays and matrices
- Arrays
- Array indexing. Subsections of an array
- Index matrices
- The array() function
- The outer product of two arrays
- Generalized transpose of an array
- Matrix facilities
- Matrix multiplication
- Linear equations and inversion
- Eigenvalues and eigenvectors
- Singular value decomposition and determinants
- Least squares fitting and the QR decomposition
- Forming partitioned matrices, cbind() and rbind()
- The concatenation function, (), with arrays
- Frequency tables from factors
Lists and data frames
- Lists
- Constructing and modifying lists
- Data frames
- Making data frames
- attach() and detach()
- Working with data frames
- Attaching arbitrary lists
- Managing the search path
Data manipulation
- Selecting, subsetting observations and variables
- Filtering, grouping
- Recoding, transformations
- Aggregation, combining data sets
- Character manipulation, stringr package
Reading data
- Txt files
- CSV files
- XLS, XLSX files
- SPSS, SAS, Stata,… and other formats data
- Exporting data to txt, csv and other formats
- Accessing data from databases using SQL language
Probability distributions
- R as a set of statistical tables
- Examining the distribution of a set of data
- One- and two-sample tests
Grouping, loops and conditional execution
- Grouped expressions
- Control statements
- Conditional execution: if statements
- Repetitive execution: for loops, repeat and while
Writing your own functions
- Simple examples
- Defining new binary operators
- Named arguments and defaults
- The ‘…’ argument
- Assignments within functions
- More advanced examples
- Efficiency factors in block designs
- Dropping all names in a printed array
- Recursive numerical integration
- Scope
- Customizing the environment
- Classes, generic functions and object orientation
Graphical procedures
- High-level plotting commands
- The plot() function
- Displaying multivariate data
- Display graphics
- Arguments to high-level plotting functions
- Basic visualisation graphs
- Multivariate relations with lattice and ggplot package
- Using graphics parameters
- Graphics parameters list
Time series Forecasting
- Seasonal adjustment
- Moving average
- Exponential smoothing
- Extrapolation
- Linear prediction
- Trend estimation
- Stationarity and ARIMA modelling
Econometric methods (casual methods)
- Regression analysis
- Multiple linear regression
- Multiple non-linear regression
- Regression validation
- Forecasting from regression
Duration
14 hours (usually 2 days including breaks)
Requirements
This course is part of the Data Scientist skill set (Domain: Analytical Techniques and Methods).
Overview
R is an open-source free programming language for statistical computing, data analysis, and graphics. R is used by a growing number of managers and data analysts inside corporations and academia. R has a wide variety of packages for data mining.
Course Outline
Problems facing forecasters
- Customer demand planning
- Investor uncertainty
- Economic planning
- Seasonal changes in demand/utilization
- Roles of risk and uncertainty
Time series Forecasting
- Seasonal adjustment
- Moving average
- Exponential smoothing
- Extrapolation
- Linear prediction
- Trend estimation
- Stationarity and ARIMA modelling
Econometric methods (casual methods)
- Regression analysis
- Multiple linear regression
- Multiple non-linear regression
- Regression validation
- Forecasting from regression
Judgemental methods
- Surveys
- Delphi method
- Scenario building
- Technology forecasting
- Forecast by analogy
Simulation and other methods
- Simulation
- Prediction market
- Probabilistic forecasting and Ensemble forecasting
Duration
28 hours (usually 4 days including breaks)
Requirements
- Basic experience with R programming
- General familiarity with financial and banking concepts
- Basic familiarity with statistics and mathematical concepts
Overview
Machine learning is a branch of Artificial Intelligence wherein computers have the ability to learn without being explicitly programmed. Deep learning is a subfield of machine learning which uses methods based on learning data representations and structures such as neural networks. R is a popular programming language in the financial industry. It is used in financial applications ranging from core trading programs to risk management systems.
In this instructor-led, live training, participants will learn how to implement deep learning models for banking using R as they step through the creation of a deep learning credit risk model.
By the end of this training, participants will be able to:
- Understand the fundamental concepts of deep learning
- Learn the applications and uses of deep learning in banking
- Use R to create deep learning models for banking
- Build their own deep learning credit risk model using R
Audience
- Developers
- Data scientists
Format of the course
- Part lecture, part discussion, exercises and heavy hands-on practice
Course Outline
Introduction
Understanding the Fundamentals of Artificial Intelligence and Machine Learning
Understanding Deep Learning
- Overview of the Basic Concepts of Deep Learning
- Differentiating Between Machine Learning and Deep Learning
- Overview of Applications for Deep Learning
Overview of Neural Networks
- What are Neural Networks
- Neural Networks vs Regression Models
- Understanding Mathematical Foundations and Learning Mechanisms
- Constructing an Artificial Neural Network
- Understanding Neural Nodes and Connections
- Working with Neurons, Layers, and Input and Output Data
- Understanding Single Layer Perceptrons
- Differences Between Supervised and Unsupervised Learning
- Learning Feedforward and Feedback Neural Networks
- Understanding Forward Propagation and Back Propagation
- Understanding Long Short-Term Memory (LSTM)
- Exploring Recurrent Neural Networks in Practice
- Exploring Convolutional Neural Networks in practice
- Improving the Way Neural Networks Learn
Overview of Deep Learning Techniques Used in Banking
- Neural Networks
- Natural Language Processing
- Image Recognition
- Speech Recognition
- Sentimental Analysis
Exploring Deep Learning Case Studies for Banking
- Anti-Money Laundering Programs
- Know-Your-Customer (KYC) Checks
- Sanctions List Monitoring
- Billing Fraud Oversight
- Risk Management
- Fraud Detection
- Product and Customer Segmentation
- Performance Evaluation
- General Compliance Functions
Understanding the Benefits of Deep Learning for Banking
Exploring the Different Deep Learning Packages for R
Deep Learning in R with Keras and RStudio
- Overview of the Keras Package for R
- Installing the Keras Package for R
- Loading the Data
- Using Built-in Datasets
- Using Data from Files
- Using Dummy Data
- Exploring the Data
- Preprocessing the Data
- Cleaning the Data
- Normalizing the Data
- Splitting the Data into Training and Test Sets
- Implementing One Hot Encoding (OHE)
- Defining the Architecture of Your Model
- Compiling and Fitting Your Model to the Data
- Training Your Model
- Visualizing the Model Training History
- Using Your Model to Predict Labels of New Data
- Evaluating Your Model
- Fine-Tuning Your Model
- Saving and Exporting Your Model
Hands-on: Building a Deep Learning Credit Risk Model Using R
Extending your Company’s Capabilities
- Developing Models in the Cloud
- Using GPUs to Accelerate Deep Learning
- Applying Deep Learning Neural Networks for Computer Vision, Voice Recognition, and Text Analysis.
Summary and Conclusion
Duration
28 hours (usually 4 days including breaks)
Requirements
- Experience with R programming
- General familiarity with finance concepts
- Basic familiarity with statistics and mathematical concepts
Overview
Machine learning is a branch of Artificial Intelligence wherein computers have the ability to learn without being explicitly programmed. Deep learning is a subfield of machine learning which uses methods based on learning data representations and structures such as neural networks. R is a popular programming language in the financial industry. It is used in financial applications ranging from core trading programs to risk management systems.
In this instructor-led, live training, participants will learn how to implement deep learning models for finance using R as they step through the creation of a deep learning stock price prediction model.
By the end of this training, participants will be able to:
- Understand the fundamental concepts of deep learning
- Learn the applications and uses of deep learning in finance
- Use R to create deep learning models for finance
- Build their own deep learning stock price prediction model using R
Audience
- Developers
- Data scientists
Format of the course
- Part lecture, part discussion, exercises and heavy hands-on practice
Course Outline
Introduction
Understanding the Fundamentals of Artificial Intelligence and Machine Learning
Understanding Deep Learning
- Overview of the Basic Concepts of Deep Learning
- Differentiating Between Machine Learning and Deep Learning
- Overview of Applications for Deep Learning
Overview of Neural Networks
- What are Neural Networks
- Neural Networks vs Regression Models
- Understanding Mathematical Foundations and Learning Mechanisms
- Constructing an Artificial Neural Network
- Understanding Neural Nodes and Connections
- Working with Neurons, Layers, and Input and Output Data
- Understanding Single Layer Perceptrons
- Differences Between Supervised and Unsupervised Learning
- Learning Feedforward and Feedback Neural Networks
- Understanding Forward Propagation and Back Propagation
- Understanding Long Short-Term Memory (LSTM)
- Exploring Recurrent Neural Networks in Practice
- Exploring Convolutional Neural Networks in practice
- Improving the Way Neural Networks Learn
Overview of Deep Learning Techniques Used in Finance
- Neural Networks
- Natural Language Processing
- Image Recognition
- Speech Recognition
- Sentimental Analysis
Exploring Deep Learning Case Studies for Finance
- Pricing
- Portfolio Construction
- Risk Management
- High Frequency Trading
- Return Prediction
Understanding the Benefits of Deep Learning for Finance
Exploring the Different Deep Learning Packages for R
Deep Learning in R with Keras and RStudio
- Overview of the Keras Package for R
- Installing the Keras Package for R
- Loading the Data
- Using Built-in Datasets
- Using Data from Files
- Using Dummy Data
- Exploring the Data
- Preprocessing the Data
- Cleaning the Data
- Normalizing the Data
- Splitting the Data into Training and Test Sets
- Implementing One Hot Encoding (OHE)
- Defining the Architecture of Your Model
- Compiling and Fitting Your Model to the Data
- Training Your Model
- Visualizing the Model Training History
- Using Your Model to Predict Labels of New Data
- Evaluating Your Model
- Fine-Tuning Your Model
- Saving and Exporting Your Model
Hands-on: Building a Deep Learning Model for Stock Price Prediction Using R
Extending your Company’s Capabilities
- Developing Models in the Cloud
- Using GPUs to Accelerate Deep Learning
- Applying Deep Learning Neural Networks for Computer Vision, Voice Recognition, and Text Analysis
Summary and Conclusion
Duration
21 hours (usually 3 days including breaks)
Requirements
- R programming experience
- An understanding of machine learning concepts
Overview
In this instructor-led, live training, participants will learn advanced techniques for Machine Learning with R as they step through the creation of a real-world application.
By the end of this training, participants will be able to:
- Understand and implement unsupervised learning techniques
- Apply clustering and classification to make predictions based on real world data.
- Visualize data to quicly gain insights, make decisions and further refine analysis.
- Improve the performance of a machine learning model using hyper-parameter tuning.
- Put a model into production for use in a larger application.
- Apply advanced machine learning techniques to answer questions involving social network data, big data, and more.
Audience
- Developers
- Analysts
- Data scientists
Format of the course
- Part lecture, part discussion, exercises and heavy hands-on practice
Course Outline
Introduction
Setting up the R Development Environment
Deep Learning vs Neural Network vs Machine Learning
Building an Unsupervised Learning Model
Case Study: Predicting an Outcome Using Existing Data
Preparing Test and Training Data Sets For Analysis
Clustering Data
Classifying Data
Visualizing Data
Evaluating the Performance of a Model
Iterating Through Model Parameters
Hyper-parameter Tuning
Integrating a Model with a Real-World Application
Deploying a Machine Learning Application
Troubleshooting
Summary and Conclusion