Big Data Analytics – Bluechip AI Asia, AI Development Company

Pentaho Open Source BI Suite Community Edition (CE) Training Course

Posted on December 1, 2023 by admin

Introduction to Pentaho Open Source BI Suite Community Edition (CE)

Overview of CE Features and Architecture

Pentaho Community Edition vs. Enterprise Edition
Pentaho CE Tools

Installing and Configuring Pentaho CE

Using the Pentaho CE Business Analytics User Console

Creating Reports with the Pentaho CE Business Analytics Report Designer

Performing Data Integration in Pentaho CE

Working with Databases in Pentaho CE

Relational Databases
NoSQL Sources
Analytic Databases

Working with the Analysis View in Pentaho CE

Predictive Analytics

Working with Big Data in Pentaho CE

Graphical Designer for Big Data

Maximizing the Community Online Forums of Pentaho CE

Deploying or Embedding Your Pentaho CE Project

Licensing

Troubleshooting

Data Analytics With R Training Course

Posted on November 30, 2023 by admin

Day One: Language Basics

Course Introduction
About Data Science
- Data Science Definition
- Process of Doing Data Science.
Introducing R Language
Variables and Types
Control Structures (Loops / Conditionals)
R Scalars, Vectors, and Matrices
- Defining R Vectors
- Matricies
String and Text Manipulation
- Character data type
- File IO
Lists
Functions
- Introducing Functions
- Closures
- lapply/sapply functions
DataFrames
Labs for all sections

Day Two: Intermediate R Programming

DataFrames and File I/O
Reading data from files
Data Preparation
Built-in Datasets
Visualization
- Graphics Package
- plot() / barplot() / hist() / boxplot() / scatter plot
- Heat Map
- ggplot2 package (qplot(), ggplot())
Exploration With Dplyr
Labs for all sections

Day Three: Advanced Programming With R

Statistical Modeling With R
- Statistical Functions
- Dealing With NA
- Distributions (Binomial, Poisson, Normal)
Regression
- Introducing Linear Regressions
Recommendations
Text Processing (tm package / Wordclouds)
Clustering
- Introduction to Clustering
- KMeans
Classification
- Introduction to Classification
- Naive Bayes
- Decision Trees
- Training using caret package
- Evaluating Algorithms
R and Big Data
- Connecting R to databases
- Big Data Ecosystem
Labs for all sections

Databricks Training Course

Posted on November 30, 2023 by admin

Introduction

Overview of Databricks and Apache Spark
Understanding the Databricks architecture

Getting Started

Setting up the Environment
Setting up and configuring Databricks
Navigating the Databricks user interface
Creating a Databricks workspace

Working with Data in Databricks

Connecting to an Apache Spark data source
Understanding the basics columns and datatypes
Managing file system into Notebooks

Managing Jobs and Clusters

Creating and configuring clusters
Creating jobs using Notebook
Running jobs
Viewing jobs and job details

Using Delta Lake in Databricks

Loading data into Delta Lake
Managing data in Delta Lake

Securing Databricks

Managing Databricks security
Managing backup and recovery

Troubleshooting

Big Data Analytics in Health Training Course

Posted on September 22, 2023 by admin

Duration

21 hours (usually 3 days including breaks)

Requirements

An understanding of machine learning and data mining concepts
Advanced programming experience (Python, Java, Scala)
Proficiency in data and ETL processes

Overview

Big data analytics involves the process of examining large amounts of varied data sets in order to uncover correlations, hidden patterns, and other useful insights.

The health industry has massive amounts of complex heterogeneous medical and clinical data. Applying big data analytics on health data presents huge potential in deriving insights for improving delivery of healthcare. However, the enormity of these datasets poses great challenges in analyses and practical applications to a clinical environment.

In this instructor-led, live training (remote), participants will learn how to perform big data analytics in health as they step through a series of hands-on live-lab exercises.

By the end of this training, participants will be able to:

Install and configure big data analytics tools such as Hadoop MapReduce and Spark
Understand the characteristics of medical data
Apply big data techniques to deal with medical data
Study big data systems and algorithms in the context of health applications

Audience

Developers
Data Scientists

Format of the Course

Part lecture, part discussion, exercises and heavy hands-on practice.

Note

To request a customized training for this course, please contact us to arrange.

Course Outline

Introduction to Big Data Analytics in Health

Overview of Big Data Analytics Technologies

Apache Hadoop MapReduce
Apache Spark

Installing and Configuring Apache Hadoop MapReduce

Installing and Configuring Apache Spark

Using Predictive Modeling for Health Data

Using Apache Hadoop MapReduce for Health Data

Performing Phenotyping & Clustering on Health Data

Classification Evaluation Metrics
Classification Ensemble Methods

Using Apache Spark for Health Data

Working with Medical Ontology

Using Graph Analysis on Health Data

Dimensionality Reduction on Health Data

Working with Patient Similarity Metrics

Troubleshooting

Summary and Conclusion

Data Science for Big Data Analytics Training Course

Posted on September 22, 2023 by admin

Duration

35 hours (usually 5 days including breaks)

Overview

Big data is data sets that are so voluminous and complex that traditional data processing application software are inadequate to deal with them. Big data challenges include capturing data, data storage, data analysis, search, sharing, transfer, visualization, querying, updating and information privacy.

Course Outline

Introduction to Data Science for Big Data Analytics

Data Science Overview
Big Data Overview
Data Structures
Drivers and complexities of Big Data
Big Data ecosystem and a new approach to analytics
Key technologies in Big Data
Data Mining process and problems
- Association Pattern Mining
- Data Clustering
- Outlier Detection
- Data Classification

Introduction to Data Analytics lifecycle

Discovery
Data preparation
Model planning
Model building
Presentation/Communication of results
Operationalization
Exercise: Case study

From this point most of the training time (80%) will be spent on examples and exercises in R and related big data technology.

Getting started with R

Installing R and Rstudio
Features of R language
Objects in R
Data in R
Data manipulation
Big data issues
Exercises

Getting started with Hadoop

Installing Hadoop
Understanding Hadoop modes
HDFS
MapReduce architecture
Hadoop related projects overview
Writing programs in Hadoop MapReduce
Exercises

Integrating R and Hadoop with RHadoop

Components of RHadoop
Installing RHadoop and connecting with Hadoop
The architecture of RHadoop
Hadoop streaming with R
Data analytics problem solving with RHadoop
Exercises

Pre-processing and preparing data

Data preparation steps
Feature extraction
Data cleaning
Data integration and transformation
Data reduction – sampling, feature subset selection,
Dimensionality reduction
Discretization and binning
Exercises and Case study

Exploratory data analytic methods in R

Descriptive statistics
Exploratory data analysis
Visualization – preliminary steps
Visualizing single variable
Examining multiple variables
Statistical methods for evaluation
Hypothesis testing
Exercises and Case study

Data Visualizations

Basic visualizations in R
Packages for data visualization ggplot2, lattice, plotly, lattice
Formatting plots in R
Advanced graphs
Exercises

Regression (Estimating future values)

Linear regression
Use cases
Model description
Diagnostics
Problems with linear regression
Shrinkage methods, ridge regression, the lasso
Generalizations and nonlinearity
Regression splines
Local polynomial regression
Generalized additive models
Regression with RHadoop
Exercises and Case study

Classification

The classification related problems
Bayesian refresher
Naïve Bayes
Logistic regression
K-nearest neighbors
Decision trees algorithm
Neural networks
Support vector machines
Diagnostics of classifiers
Comparison of classification methods
Scalable classification algorithms
Exercises and Case study

Assessing model performance and selection

Bias, Variance and model complexity
Accuracy vs Interpretability
Evaluating classifiers
Measures of model/algorithm performance
Hold-out method of validation
Cross-validation
Tuning machine learning algorithms with caret package
Visualizing model performance with Profit ROC and Lift curves

Ensemble Methods

Bagging
Random Forests
Boosting
Gradient boosting
Exercises and Case study

Support vector machines for classification and regression

Maximal Margin classifiers
- Support vector classifiers
- Support vector machines
- SVM’s for classification problems
- SVM’s for regression problems
Exercises and Case study

Identifying unknown groupings within a data set

Feature Selection for Clustering
Representative based algorithms: k-means, k-medoids
Hierarchical algorithms: agglomerative and divisive methods
Probabilistic base algorithms: EM
Density based algorithms: DBSCAN, DENCLUE
Cluster validation
Advanced clustering concepts
Clustering with RHadoop
Exercises and Case study

Discovering connections with Link Analysis

Link analysis concepts
Metrics for analyzing networks
The Pagerank algorithm
Hyperlink-Induced Topic Search
Link Prediction
Exercises and Case study

Association Pattern Mining

Frequent Pattern Mining Model
Scalability issues in frequent pattern mining
Brute Force algorithms
Apriori algorithm
The FP growth approach
Evaluation of Candidate Rules
Applications of Association Rules
Validation and Testing
Diagnostics
Association rules with R and Hadoop
Exercises and Case study

Constructing recommendation engines

Understanding recommender systems
Data mining techniques used in recommender systems
Recommender systems with recommenderlab package
Evaluating the recommender systems
Recommendations with RHadoop
Exercise: Building recommendation engine

Text analysis

Text analysis steps
Collecting raw text
Bag of words
Term Frequency –Inverse Document Frequency
Determining Sentiments
Exercises and Case study