Machine Learning for Banking (with R) Training Course

Duration

28 hours (usually 4 days including breaks)

Requirements

  • Programming experience with any language
  • Basic familiarity with statistics and linear algebra

Overview

In this instructor-led, live training, participants will learn how to apply machine learning techniques and tools for solving real-world problems in the banking industry. R will be used as the programming language.

Participants first learn the key principles, then put their knowledge into practice by building their own machine learning models and using them to complete a number of live projects.

Audience

  • Developers
  • Data scientists
  • Banking professionals with a technical background

Format of the course

  • Part lecture, part discussion, exercises and heavy hands-on practice

Course Outline

Introduction

  • Difference between statistical learning (statistical analysis) and machine learning
  • Adoption of machine learning technology by finance and banking companies

Different Types of Machine Learning

  • Supervised learning vs unsupervised learning
  • Iteration and evaluation
  • Bias-variance trade-off
  • Combining supervised and unsupervised learning (semi-supervised learning)

Machine Learning Languages and Toolsets

  • Open source vs proprietary systems and software
  • R vs Python vs Matlab
  • Libraries and frameworks

Machine Learning Case Studies

  • Consumer data and big data
  • Assessing risk in consumer and business lending
  • Improving customer service through sentiment analysis
  • Detecting identity fraud, billing fraud and money laundering

Introduction to R

  • Installing the RStudio IDE
  • Loading R packages
  • Data structures
  • Vectors
  • Factors
  • Lists
  • Data Frames
  • Matrixes and Arrays

How to Load Machine Learning Data

  • Databases, data warehouses and streaming data
  • Distributed storage and processing with Hadoop and Spark
  • Importing data from a database
  • Importing data from Excel and CSV

Modeling Business Decisions with Supervised Learning

  • Classifying your data (classification)
  • Using regression analysis to predict outcome
  • Choosing from available machine learning algorithms
  • Understanding decision tree algorithms
  • Understanding random forest algorithms
  • Model evaluation
  • Exercise

Regression Analysis

  • Linear regression
  • Generalizations and Nonlinearity
  • Exercise

Classification

  • Bayesian refresher
  • Naive Bayes
  • Logistic regression
  • K-Nearest neighbors
  • Exercise

Hands-on: Building an Estimation Model

  • Assessing lending risk based on customer type and history

Evaluating the performance of Machine Learning Algorithms

  • Cross-validation and resampling
  • Bootstrap aggregation (bagging)
  • Exercise

Modeling Business Decisions with Unsupervised Learning

  • When sample data sets are not available
  • K-means clustering
  • Challenges of unsupervised learning
  • Beyond K-means
  • Bayes networks and Markov Hidden Models
  • Exercise

Hands-on: Building a Recommendation System

  • Analyzing past customer behavior to improve new service offerings

Extending your company’s capabilities

  • Developing models in the cloud
  • Accelerating machine learning with additional GPUs
  • Applying Deep Learning neural networks for computer vision, voice recognition and text analysis

Closing Remarks

Data Mining & Machine Learning with R Training Course

Duration

14 hours (usually 2 days including breaks)

Requirements

This course is part of the Data Scientist skill set (Domain: Analytical Techniques and Methods)

Overview

R is an open-source free programming language for statistical computing, data analysis, and graphics. R is used by a growing number of managers and data analysts inside corporations and academia. R has a wide variety of packages for data mining.

Course Outline

Introduction to Data mining and Machine Learning

  • Statistical learning vs. Machine learning
  • Iteration and evaluation
  • Bias-Variance trade-off

Regression

  • Linear regression
  • Generalizations and Nonlinearity
  • Exercises

Classification

  • Bayesian refresher
  • Naive Bayes
  • Dicriminant analysis
  • Logistic regression
  • K-Nearest neighbors
  • Support Vector Machines
  • Neural networks
  • Decision trees
  • Exercises

Cross-validation and Resampling

  • Cross-validation approaches
  • Bootstrap
  • Exercises

Unsupervised Learning

  • K-means clustering
  • Examples
  • Challenges of unsupervised learning and beyond K-means

Advanced topics

  • Ensemble models
  • Mixed models
  • Boosting
  • Examples

Multidimensional reduction

  • Factor Analysis
  • Principal Component Analysis
  • Examples

Machine Learning Fundamentals with R Training Course

Duration

14 hours (usually 2 days including breaks)

Requirements

Knowledge of R programming language. Basic familiarity with statistics and linear algebra is recommended.

Overview

The aim of this course is to provide a basic proficiency in applying Machine Learning methods in practice. Through the use of the R programming platform and its various libraries, and based on a multitude of practical examples this course teaches how to use the most important building blocks of Machine Learning, how to make data modeling decisions, interpret the outputs of the algorithms and validate the results.

Our goal is to give you the skills to understand and use the most fundamental tools from the Machine Learning toolbox confidently and avoid the common pitfalls of Data Sciences applications.

Course Outline

Introduction to Applied Machine Learning

  • Statistical learning vs. Machine learning
  • Iteration and evaluation
  • Bias-Variance trade-off

Regression

  • Linear regression
  • Generalizations and Nonlinearity
  • Exercises

Classification

  • Bayesian refresher
  • Naive Bayes
  • Logistic regression
  • K-Nearest neighbors
  • Exercises

Cross-validation and Resampling

  • Cross-validation approaches
  • Bootstrap
  • Exercises

Unsupervised Learning

  • K-means clustering
  • Examples
  • Challenges of unsupervised learning and beyond K-means

NLP: Natural Language Processing with R Training Course

Duration

21 hours (usually 3 days including breaks)

Requirements

  • Some familiarity with programming.

Audience

  • Linguists and programmers

Overview

It is estimated that unstructured data accounts for more than 90 percent of all data, much of it in the form of text. Blog posts, tweets, social media, and other digital publications continuously add to this growing body of data.

This instructor-led, live course centers around extracting insights and meaning from this data. Utilizing the R Language and Natural Language Processing (NLP) libraries, we combine concepts and techniques from computer science, artificial intelligence, and computational linguistics to algorithmically understand the meaning behind text data. Data samples are available in various languages per customer requirements.

By the end of this training participants will be able to prepare data sets (large and small) from disparate sources, then apply the right algorithms to analyze and report on its significance.

Format of the Course

  • Part lecture, part discussion, heavy hands-on practice, occasional tests to gauge understanding

Course Outline

Introduction

  • NLP and R vs Python

Installing and Configuring R Studio

Installing R Packages Related to Natural Language Processing (NLP)

An Overview of R’s Text Manipulation Capabilities

Getting Started with an NLP Project in R

Reading and Importing Data Files into R

Text Manipulation with R

Document Clustering in R

Parts of Speech Tagging in R

Sentence Parsing in R

Working with Regular Expressions in R

Named-Entity Recognition in R

Topic Modeling in R

Text Classification in R

Working with Very Large Data Sets

Visualizing Your Results

Optimization

Integrating R with Other Languages (Java, Python, etc.)

Summary and Conclusion