Duration
21 hours (usually 3 days including breaks)
Overview
Big Data is a term that refers to solutions destined for storing and processing large data sets. Developed by Google initially, these Big Data solutions have evolved and inspired other similar projects, many of which are available as open-source. R is a popular programming language in the financial industry.
Course Outline
Introduction to Programming Big Data with R (bpdR)
- Setting up your environment to use pbdR
- Scope and tools available in pbdR
- Packages commonly used with Big Data alongside pbdR
Message Passing Interface (MPI)
- Using pbdR MPI 5
- Parallel processing
- Point-to-point communication
- Send Matrices
- Summing Matrices
- Collective communication
- Summing Matrices with Reduce
- Scatter / Gather
- Other MPI communications
Distributed Matrices
- Creating a distributed diagonal matrix
- SVD of a distributed matrix
- Building a distributed matrix in parallel
Statistics Applications
- Monte Carlo Integration
- Reading Datasets
- Reading on all processes
- Broadcasting from one process
- Reading partitioned data
- Distributed Regression
- Distributed Bootstrap