Data Science – Page 2 – Bluechip AI Asia, AI Development Company

SMACK Stack for Data Science Training Course

Posted on September 22, 2023 by admin

Duration

14 hours (usually 2 days including breaks)

Requirements

An understanding of data processing systems

Audience

Data Scientists

Overview

SMACK is a collection of data platform softwares, namely Apache Spark, Apache Mesos, Apache Akka, Apache Cassandra, and Apache Kafka. Using the SMACK stack, users can create and scale data processing platforms.

This instructor-led, live training (online or onsite) is aimed at data scientists who wish to use the SMACK stack to build data processing platforms for big data solutions.

By the end of this training, participants will be able to:

Implement a data pipeline architecture for processing big data.
Develop a cluster infrastructure with Apache Mesos and Docker.
Analyze data with Spark and Scala.
Manage unstructured data with Apache Cassandra.

Format of the Course

Interactive lecture and discussion.
Lots of exercises and practice.
Hands-on implementation in a live-lab environment.

Course Customization Options

To request a customized training for this course, please contact us to arrange.

Course Outline

Introduction

SMACK Stack Overview

What is Apache Spark? Apache Spark features
What is Apache Mesos? Apache Mesos features
What is Apache Akka? Apache Akka features
What is Apache Cassandra? Apache Cassandra features
What is Apache Kafka? Apache Kafka features

Scala Language

Scala syntax and structure
Scala control flow

Preparing the Development Environment

Installing and configuring the SMACK stack
Installing and configuring Docker

Apache Akka

Using actors

Apache Cassandra

Creating a database for read operations
Working with backups and recovery

Connectors

Creating a stream
Building an Akka application
Storing data with Cassandra
Reviewing connectors

Apache Kafka

Working with clusters
Creating, publishing, and consuming messages

Apache Mesos

Allocating resources
Running clusters
Working with Apache Aurora and Docker
Running services and jobs
Deploying Spark, Cassandra, and Kafka on Mesos

Apache Spark

Managing data flows
Working with RDDs and dataframes
Performing data analysis

Troubleshooting

Handling failure of services and errors

Summary and Conclusion

Big Data – Data Science Training Course

Posted on September 22, 2023 by admin

Duration

14 hours (usually 2 days including breaks)

Requirements

Delegates should have an awareness and some experience of storgage tools and an awreness of handling large data sets

Overview

This classroom based training session will explore Big Data. Delegates will have computer based examples and case study exercises to undertake with relevant big data tools

Course Outline

Big data fundamentals
- Big Data and its role in the corporate world
- The phases of development of a Big Data strategy within a corporation
- Explain the rationale underlying a holistic approach to Big Data
- Components needed in a Big Data Platform
- Big data storage solution
- Limits of Traditional Technologies
- Overview of database types
- The four dimensions of Big Data
Big data impact on business
- Business importance of Big Data
- Challenges of extracting useful data
- Integrating Big data with traditional data
Big data storage technologies
- Overview of big data technologies
  - Data storage models
  - Hadoop
  - Hive
  - Cassandra
  - MongoDB
- Choosing the right big data technology
Processing big data
- Connecting and extracting data from database
- Transforming and preparation data for processing
- Using Hadoop MapReduce for processing distributed data
- Monitoring and executing Hadoop MapReduce jobs
- Hadoop distributed file system building blocks
- Mapreduce and Yarn
- Handling streaming data with Spark
Big data analysis tools and technologies
- Programming Hadoop with Pig Latin language
- Querying big data with Hive
- Mining data with Mahout
- Visualizing and reporting tools
Big data in business
- Managing and establishing Big Data needs
- Business importance of Big Data
- Selecting the right big data tools for the problem

Data Warehousing Concepts

What is Data Ware House?
Difference between OLTP and Data Ware Housing
Data Acquisition
Data Extraction
Data Transformation.
Data Loading
Data Marts
Dependent vs Independent data Mart
Data Base design

ETL Testing Concepts:

Introduction.
Software development life cycle.
Testing methodologies.
ETL Testing Work Flow Process.
ETL Testing Responsibilities in Data stage.

Big data Fundamentals

Big Data and its role in the corporate world
The phases of development of a Big Data strategy within a corporation
Explain the rationale underlying a holistic approach to Big Data
Components needed in a Big Data Platform
Big data storage solution
Limits of Traditional Technologies
Overview of database types

NoSQL Databases

Hadoop

Map Reduce

Apache Spark

Data Science for Big Data Analytics Training Course

Posted on September 22, 2023 by admin

Duration

35 hours (usually 5 days including breaks)

Overview

Big data is data sets that are so voluminous and complex that traditional data processing application software are inadequate to deal with them. Big data challenges include capturing data, data storage, data analysis, search, sharing, transfer, visualization, querying, updating and information privacy.

Course Outline

Introduction to Data Science for Big Data Analytics

Data Science Overview
Big Data Overview
Data Structures
Drivers and complexities of Big Data
Big Data ecosystem and a new approach to analytics
Key technologies in Big Data
Data Mining process and problems
- Association Pattern Mining
- Data Clustering
- Outlier Detection
- Data Classification

Introduction to Data Analytics lifecycle

Discovery
Data preparation
Model planning
Model building
Presentation/Communication of results
Operationalization
Exercise: Case study

From this point most of the training time (80%) will be spent on examples and exercises in R and related big data technology.

Getting started with R

Installing R and Rstudio
Features of R language
Objects in R
Data in R
Data manipulation
Big data issues
Exercises

Getting started with Hadoop

Installing Hadoop
Understanding Hadoop modes
HDFS
MapReduce architecture
Hadoop related projects overview
Writing programs in Hadoop MapReduce
Exercises

Integrating R and Hadoop with RHadoop

Components of RHadoop
Installing RHadoop and connecting with Hadoop
The architecture of RHadoop
Hadoop streaming with R
Data analytics problem solving with RHadoop
Exercises

Pre-processing and preparing data

Data preparation steps
Feature extraction
Data cleaning
Data integration and transformation
Data reduction – sampling, feature subset selection,
Dimensionality reduction
Discretization and binning
Exercises and Case study

Exploratory data analytic methods in R

Descriptive statistics
Exploratory data analysis
Visualization – preliminary steps
Visualizing single variable
Examining multiple variables
Statistical methods for evaluation
Hypothesis testing
Exercises and Case study

Data Visualizations

Basic visualizations in R
Packages for data visualization ggplot2, lattice, plotly, lattice
Formatting plots in R
Advanced graphs
Exercises

Regression (Estimating future values)

Linear regression
Use cases
Model description
Diagnostics
Problems with linear regression
Shrinkage methods, ridge regression, the lasso
Generalizations and nonlinearity
Regression splines
Local polynomial regression
Generalized additive models
Regression with RHadoop
Exercises and Case study

Classification

The classification related problems
Bayesian refresher
Naïve Bayes
Logistic regression
K-nearest neighbors
Decision trees algorithm
Neural networks
Support vector machines
Diagnostics of classifiers
Comparison of classification methods
Scalable classification algorithms
Exercises and Case study

Assessing model performance and selection

Bias, Variance and model complexity
Accuracy vs Interpretability
Evaluating classifiers
Measures of model/algorithm performance
Hold-out method of validation
Cross-validation
Tuning machine learning algorithms with caret package
Visualizing model performance with Profit ROC and Lift curves

Ensemble Methods

Bagging
Random Forests
Boosting
Gradient boosting
Exercises and Case study

Support vector machines for classification and regression

Maximal Margin classifiers
- Support vector classifiers
- Support vector machines
- SVM’s for classification problems
- SVM’s for regression problems
Exercises and Case study

Identifying unknown groupings within a data set

Feature Selection for Clustering
Representative based algorithms: k-means, k-medoids
Hierarchical algorithms: agglomerative and divisive methods
Probabilistic base algorithms: EM
Density based algorithms: DBSCAN, DENCLUE
Cluster validation
Advanced clustering concepts
Clustering with RHadoop
Exercises and Case study

Discovering connections with Link Analysis

Link analysis concepts
Metrics for analyzing networks
The Pagerank algorithm
Hyperlink-Induced Topic Search
Link Prediction
Exercises and Case study

Association Pattern Mining

Frequent Pattern Mining Model
Scalability issues in frequent pattern mining
Brute Force algorithms
Apriori algorithm
The FP growth approach
Evaluation of Candidate Rules
Applications of Association Rules
Validation and Testing
Diagnostics
Association rules with R and Hadoop
Exercises and Case study

Constructing recommendation engines

Understanding recommender systems
Data mining techniques used in recommender systems
Recommender systems with recommenderlab package
Evaluating the recommender systems
Recommendations with RHadoop
Exercise: Building recommendation engine

Text analysis

Text analysis steps
Collecting raw text
Bag of words
Term Frequency –Inverse Document Frequency
Determining Sentiments
Exercises and Case study

SQL For Data Science and Data Analysis Training Course

Posted on September 15, 2023 by admin

Duration

14 hours (usually 2 days including breaks)

Requirements

An understanding of databases
Experience with SQL an asset.

Audience

Business analysts
Software developers
Database developers

Overview

This instructor-led, live training (online or onsite) is aimed at software developers, managers, and business analyst who wish to use big data systems to store and retrieve large amounts of data.

By the end of this training, participants will be able to:

Query large amounts of data efficiently.
Understand how Big Data system store and retrieve data
Use the latest big data systems available
Wrangle data from data systems into reporting systems
Learn to write SQL queries in:
- MySQL
- Postgres
- Hive Query Language (HiveQL/HQL)
- Redshift

Format of the Course

Interactive lecture and discussion.
Lots of exercises and practice.
Hands-on implementation in a live-lab environment.

Course Customization Options

To request a customized training for this course, please contact us to arrange.

Course Outline

Lesson 1 – SQL basics:

Select statements
Join types
Indexes
Views
Subqueries
Union
Creating tables
Loading data
Dumping data
NoSQL

Lesson 2 – Data Modeling:

Transaction based ER systems
Data warehousing
Data warehouse models
- Star schema
- Snowflake schemas
Slowly changing dimensions (SCD)
Structured and non-structured data
Different table type storage engines:
- Column based
- Document-based
- In Memory

Lesson 3 – Index in the NoSQL/Data science world

Constraints (Primary)
Index-based scanning
performance tuning

Lesson 4 – NoSQL and non-structured data

When to use NoSQL
Eventually consistent data
Schema on read vs. Schema on write

Lesson 5 – SQL for data analytics

Windowing function
Lateral Joins
Lead & Lag

Lesson 6 – HiveQL

SQL Support
External and Internal Tables
Joins
Partitions
Correlated subqueries
Nested queries
When to use Hive

Lesson 7 – Redshift

Design and structured
Locks and shared resources
Postgres differences
When to use redshift

Visual Analytics – Data science Training Course

Posted on September 12, 2023 by admin

Duration

14 hours (usually 2 days including breaks)

Requirements

Experience of analysis, statistics and producing data an advantage

Overview

This classroom based training session will contain presentations and computer based examples and case study exercises to undertake.

Course Outline

Introduction to Visual Analytics
- 5 Principles of Data Visualisation
- Tables vs charts
- What makes visualisations effective
- Gestalt Principles of Visual Perception
Types of charts and how to choose the right one
- Common types of charts
- Choosing the right chart for your data
- Understanding your audience
- Handling missing data
Advanced charts
- Sankey
- Radar
- Treemap
- Heatmap
- Boxplot, violin plot
- Choosing the right chart for your data
- Choosing the right chart for your audience
- Eliminating clutter from charts
Storytelling with data
- The importance of storytelling
- Building a narrative structure
- Drawing attention
- Including call to action
Creating dashboards and infographics
- Exploratory vs explanatory analysis
- How to convey your message
- Live presentation vs report
- Visualisations that are simple, informative and engaging
- The characteristics of a good dashboard
- The characteristics of a good infographic
Common mistakes and misleading charts
- Charts that should be avoided
- How we are being deceived by colour, scale and size
Visual analytics case studies

Neural computing – Data science Training Course

Posted on September 7, 2023 by admin

Duration

14 hours (usually 2 days including breaks)

Requirements

Knowledge/appreciation of machine learning, systems architecutre and programming languages are desirable

Overview

This classroom based training session will contain presentations and computer based examples and case study exercises to undertake with relevant neural and deep network libraries

Course Outline

Overview of neural networks and deep learning
- The concept of Machine Learning (ML)
- Why we need neural networks and deep learning?
- Selecting networks to different problems and data types
- Learning and validating neural networks
- Comparing logistic regression to neural network
Neural network
- Biological inspirations to Neural network
- Neural Networks– Neuron, Perceptron and MLP(Multilayer Perceptron model)
- Learning MLP – backpropagation algorithm
- Activation functions – linear, sigmoid, Tanh, Softmax
- Loss functions appropriate to forecasting and classification
- Parameters – learning rate, regularization, momentum
- Building Neural Networks in Python
- Evaluating performance of neural networks in Python
Basics of Deep Networks
- What is deep learning?
- Architecture of Deep Networks– Parameters, Layers, Activation Functions, Loss functions, Solvers
- Restricted Boltzman Machines (RBMs)
- Autoencoders
Deep Networks Architectures
- Deep Belief Networks(DBN) – architecture, application
- Autoencoders
- Restricted Boltzmann Machines
- Convolutional Neural Network
- Recursive Neural Network
- Recurrent Neural Network
Overview of libraries and interfaces available in Python
- Caffee
- Theano
- Tensorflow
- Keras
- Mxnet
- Choosing appropriate library to problem
Building deep networks in Python
- Choosing appropriate architecture to given problem
- Hybrid deep networks
- Learning network – appropriate library, architecture definition
- Tuning network – initialization, activation functions, loss functions, optimization method
- Avoiding overfitting – detecting overfitting problems in deep networks, regularization
- Evaluating deep networks
Case studies in Python
- Image recognition – CNN
- Detecting anomalies with Autoencoders
- Forecasting time series with RNN
- Dimensionality reduction with Autoencoder
- Classification with RBM

Snorkel: Rapidly Process Training Data Training Course

Posted on August 30, 2023 by admin

Duration

7 hours (usually 1 day including breaks)

Requirements

An understanding of machine learning

Overview

Snorkel is a system for rapidly creating, modeling, and managing training data. It focuses on accelerating the development of structured or “dark” data extraction applications for domains in which large labeled training sets are not available or easy to obtain.

In this instructor-led, live training, participants will learn techniques for extracting value from unstructured data such as text, tables, figures, and images through modeling of training data with Snorkel.

By the end of this training, participants will be able to:

Programmatically create training sets to enable the labeling of massive training sets
Train high-quality end models by first modeling noisy training sets
Use Snorkel to implement weak supervision techniques and apply data programming to weakly-supervised machine learning systems

Audience

Developers
Data scientists

Format of the course

Part lecture, part discussion, exercises and heavy hands-on practice

Course Outline

To request a customized course outline for this training, please contact us.

Embedding Projector: Visualizing Your Training Data Training Course

Posted on August 30, 2023 by admin

Duration

14 hours (usually 2 days including breaks)

Requirements

Working experience with data visualization tools
Knowledge of machine learning is helpful, but not required
Knowledge of TensorFlow is helpful, but not required

Overview

Embedding Projector is an open-source web application for visualizing the data used to train machine learning systems. Created by Google, it is part of TensorFlow.

This instructor-led, live training introduces the concepts behind Embedding Projector and walks participants through the setup of a demo project.

By the end of this training, participants will be able to:

Explore how data is being interpreted by machine learning models
Navigate through 3D and 2D views of data to understand how a machine learning algorithm interprets it
Understand the concepts behind Embeddings and their role in representing mathematical vectors for images, words and numerals.
Explore the properties of a specific embedding to understand the behavior of a model
Apply Embedding Project to real-world use cases such building a song recommendation system for music lovers

Audience

Developers
Data scientists

Format of the course

Part lecture, part discussion, exercises and heavy hands-on practice

Course Outline

To request a customized course outline for this training, please contact us.

Introduction to Data Science and AI using Python Training Course

Posted on August 28, 2023 by admin

Duration

35 hours (usually 5 days including breaks)

Requirements

None

Overview

This is a 5 day introduction to Data Science and Artificial Intelligence (AI).

The course is delivered with examples and exercises using Python

Course Outline

Introduction to Data Science/AI

Knowledge acquisition through data
Knowledge representation
Value creation
Data Science overview
AI ecosystem and new approach to analytics
Key technologies

Data Science workflow

Crisp-dm
Data preparation
Model planning
Model building
Communication
Deployment

Data Science technologies

Languages used for prototyping
Big Data technologies
End to end solutions to common problems
Introduction to Python language
Integrating Python with Spark

AI in Business

AI ecosystem
Ethics of AI
How to drive AI in business

Data sources

Types of data
SQL vs NoSQL
Data Storage
Data preparation

Data Analysis – Statistical approach

Probability
Statistics
Statistical modeling
Applications in business using Python

Machine learning in business

Supervised vs unsupervised
Forecasting problems
Classfication problems
Clustering problems
Anomaly detection
Recommendation engines
Association pattern mining
Solving ML problems with Python language

Deep learning

Problems where traditional ML algorithms fails
Solving complicated problems with Deep Learning
Introduction to Tensorflow

Natural Language processing

Data visualization

Visual reporting outcomes from modeling
Common pitfalls in visualization
Data visualization with Python

From Data to Decision – communication

Making impact: data driven story telling
Influence effectivnes
Managing Data Science projects

Mastering Python Programming (April 2023)

Posted on June 4, 2023 by admin

You will learn about Basics of Python Programming and its features

You will learn and explore on Cloud Client Libraries in GCP

You will get to know the use of Python in Data Science

You will learn on working with ML application using Python

Requirements

If you have an understanding of Basic Python Programming
And Working Knowledge on GCP Cloud services

Description

If you are looking for building the skills on Python programming along with Machine learning, Data science and use of Python in cloud platforms, then this is the course for you!

This course takes you through hands-on approach with python programming using IDLE (Python 3.11 64-bit)

Python is an interpreted, high-level and general-purpose programming language. Python is easy to learn and it is powerful programming language. Python has syntax that allows developers to write programs with fewer lines compared to other programming languages

In this course you will learn about Python and its features, data types and data structures in Python. Looping and conditional statements, functions and modules.

You will also learn the OOPs concept of Python, decorators, generators, exception handling and file handling in Python

In this course you will learn to use the Python Libraries in GCP.

And how to use Python in Machine Learning and Data Science.

Our focus is to teach topics that flow smoothly. The course teaches you everything you need to know about python programming with hands-on examples

This course gives a quick introduction to python programming with an emphasis on its activity lessons

What are you waiting for?

Every day is a missed opportunity.

Hurry up!!!!!!

Who this course is for:

Developers interested in Mastering Python Programming
Python Developers
Data Scientist
Data Analysts
Software Developers and Cloud Developers

Duration

Requirements

Overview

Course Outline

Duration

Requirements

Overview

Course Outline

Duration

Overview

Course Outline

Introduction to Data Science for Big Data Analytics

Introduction to Data Analytics lifecycle

From this point most of the training time (80%) will be spent on examples and exercises in R and related big data technology.

Getting started with R

Getting started with Hadoop

Integrating R and Hadoop with RHadoop

Pre-processing and preparing data

Exploratory data analytic methods in R

Data Visualizations

Regression (Estimating future values)

Classification

Assessing model performance and selection

Ensemble Methods

Support vector machines for classification and regression

Identifying unknown groupings within a data set

Discovering connections with Link Analysis

Association Pattern Mining

Constructing recommendation engines

Text analysis

Duration

Requirements

Overview

Course Outline

Lesson 1 – SQL basics:

Lesson 2 – Data Modeling:

Lesson 3 – Index in the NoSQL/Data science world

Lesson 4 – NoSQL and non-structured data

Lesson 5 – SQL for data analytics

Lesson 6 – HiveQL

Lesson 7 – Redshift

Duration

Requirements

Overview

Course Outline

Duration

Requirements

Overview

Course Outline

Duration

Requirements

Overview

Course Outline

Duration

Requirements

Overview

Course Outline

Duration

Requirements

Overview

Course Outline

Introduction to Data Science/AI

Data Science workflow

Data Science technologies

AI in Business

Data sources

Data Analysis – Statistical approach

Machine learning in business

Deep learning

Natural Language processing

Data visualization

From Data to Decision – communication

Requirements

Description

Who this course is for:

Course content