From Data to Decision with Big Data and Predictive Analytics Training Course

Duration

21 hours (usually 3 days including breaks)

Requirements

Understanding of traditional data management and analysis methods like SQL, data warehouses, business intelligence, OLAP, etc… Understanding of basic statistics and probability (mean, variance, probability, conditional probability, etc….)

Overview

Audience

If you try to make sense out of the data you have access to or want to analyse unstructured data available on the net (like Twitter, Linked in, etc…) this course is for you.

It is mostly aimed at decision makers and people who need to choose what data is worth collecting and what is worth analyzing.

It is not aimed at people configuring the solution, those people will benefit from the big picture though.

Delivery Mode

During the course delegates will be presented with working examples of mostly open source technologies.

Short lectures will be followed by presentation and simple exercises by the participants

Content and Software used

All software used is updated each time the course is run, so we check the newest versions possible.

It covers the process from obtaining, formatting, processing and analysing the data, to explain how to automate decision making process with machine learning.

Course Outline

Quick Overview

  • Data Sources
  • Minding Data
  • Recommender systems
  • Target Marketing

Datatypes

  • Structured vs unstructured
  • Static vs streamed
  • Attitudinal, behavioural and demographic data
  • Data-driven vs user-driven analytics
  • data validity
  • Volume, velocity and variety of data

Models

  • Building models
  • Statistical Models
  • Machine learning

Data Classification

  • Clustering
  • kGroups, k-means, the nearest neighbours
  • Ant colonies, birds flocking

Predictive Models

  • Decision trees
  • Support vector machine
  • Naive Bayes classification
  • Neural networks
  • Markov Model
  • Regression
  • Ensemble methods

ROI

  • Benefit/Cost ratio
  • Cost of software
  • Cost of development
  • Potential benefits

Building Models

  • Data Preparation (MapReduce)
  • Data cleansing
  • Choosing methods
  • Developing model
  • Testing Model
  • Model evaluation
  • Model deployment and integration

Overview of Open Source and commercial software

  • Selection of R-project package
  • Python libraries
  • Hadoop and Mahout
  • Selected Apache projects related to Big Data and Analytics
  • Selected commercial solution
  • Integration with existing software and data sources

Leave a Reply

Your email address will not be published. Required fields are marked *