data – Bluechip AI Asia, AI Development Company

Moving Data from MySQL to Hadoop with Sqoop Training Course

Posted on September 29, 2023 by admin

Duration

14 hours (usually 2 days including breaks)

Requirements

An understanding of big data concepts (HDFS, Hive, etc.)
An understanding of relational databases (MySQL, etc.)
Experience with the Linux command line

Overview

Sqoop is an open source software tool for transfering data between Hadoop and relational databases or mainframes. It can be used to import data from a relational database management system (RDBMS) such as MySQL or Oracle or a mainframe into the Hadoop Distributed File System (HDFS). Thereafter, the data can be transformed in Hadoop MapReduce, and then re-exported back into an RDBMS.

In this instructor-led, live training, participants will learn how to use Sqoop to import data from a traditional relational database to Hadoop storage such HDFS or Hive and vice versa.

By the end of this training, participants will be able to:

Install and configure Sqoop
Import data from MySQL to HDFS and Hive
Import data from HDFS and Hive to MySQL

Audience

System administrators
Data engineers

Format of the Course

Part lecture, part discussion, exercises and heavy hands-on practice

Note

To request a customized training for this course, please contact us to arrange.

Course Outline

Introduction

Moving data from legacy data stores to Hadoop

Installing and Configuring Sqoop

Overview of Sqoop Features and Architecture

Importing Data from MySQL to HDFS

Importing Data from MySQL to Hive

Transforming Data in Hadoop

Importing Data from HDFS to MySQL

Importing Data from Hive to MySQL

Importing Incrementally with Sqoop Jobs

Troubleshooting

Summary and Conclusion

Snorkel: Rapidly Process Training Data Training Course

Posted on August 30, 2023 by admin

Duration

7 hours (usually 1 day including breaks)

Requirements

An understanding of machine learning

Overview

Snorkel is a system for rapidly creating, modeling, and managing training data. It focuses on accelerating the development of structured or “dark” data extraction applications for domains in which large labeled training sets are not available or easy to obtain.

In this instructor-led, live training, participants will learn techniques for extracting value from unstructured data such as text, tables, figures, and images through modeling of training data with Snorkel.

By the end of this training, participants will be able to:

Programmatically create training sets to enable the labeling of massive training sets
Train high-quality end models by first modeling noisy training sets
Use Snorkel to implement weak supervision techniques and apply data programming to weakly-supervised machine learning systems

Audience

Developers
Data scientists

Format of the course

Part lecture, part discussion, exercises and heavy hands-on practice

Course Outline

To request a customized course outline for this training, please contact us.

IBM Cloud Pak for Data Training Course

Posted on August 30, 2023 by admin

Duration

14 hours (usually 2 days including breaks)

Requirements

Experience with data processing and AI concepts

Audience

Data Scientists
Business Analysts
Data Engineers
Developers
System Administrators

Overview

IBM Cloud Pak for Data is a multi-cloud software platform for collecting, organizing and analyzing data for use in AI.

This instructor-led, live training (online or onsite) is aimed at data scientists who wish to use IBM Cloud Pak to prepare data for use in AI solutions.

By the end of this training, participants will be able to:

Install and configure Cloud Pak for Data.
Unify the collection, organization and analysis of data.
Integrate Cloud Pak for Data with a variety of services to solve common business problems.
Implement workflows for collaborating with team members on the development of an AI solution.

Format of the Course

Interactive lecture and discussion.
Lots of exercises and practice.
Hands-on implementation in a live-lab environment.

Course Customization Options

To request a customized training for this course, please contact us to arrange.

Course Outline

Introduction

Overview of Cloud Pac for Data Features and Architecture

Red Hat OpenShift Container Platform
Containers, Kubernetes, and Helm
Red Hat OpenShift security

Setting up Cloud Pac for Data

Pre-installation tasks
Installation
Post-installation tasks

Setting up a Workflows

Setting up roles and permissions for collaboration
Creating a workflow
Searching and requesting data

Collecting Data

Connecting to a data source
Adding data to a project

Organizing Data

Working with catalogs
Curating catalog data
Governing data to comply with regulations
Automating the discovery process

Preparing Data

Transforming data
Refining data
Virtualizing data

Analyzing Data

Analyzing data using notebooks
Analyzing data using other tools
Analyzing data automatically using AutoAI

Implementing an AI Solution

Building a machine learning model
Deploying the model
Validating the model
Monitoring the model

Integrating Cloud Data Pac with Other Services

Finding services in a catalog
Finding services outside a catalog
Integrating IBM Cloud Pak for Data with other applications

Administering Cloud Data Pac

Managing an IBM Cloud Pak for Data cluster
Managing an IBM Cloud Pak for Data web client
Uninstalling Cloud Pak for Data

Troubleshooting

Summary and Conclusion

Introduction to Data Science and AI using Python Training Course

Posted on August 28, 2023 by admin

Duration

35 hours (usually 5 days including breaks)

Requirements

None

Overview

This is a 5 day introduction to Data Science and Artificial Intelligence (AI).

The course is delivered with examples and exercises using Python

Course Outline

Introduction to Data Science/AI

Knowledge acquisition through data
Knowledge representation
Value creation
Data Science overview
AI ecosystem and new approach to analytics
Key technologies

Data Science workflow

Crisp-dm
Data preparation
Model planning
Model building
Communication
Deployment

Data Science technologies

Languages used for prototyping
Big Data technologies
End to end solutions to common problems
Introduction to Python language
Integrating Python with Spark

AI in Business

AI ecosystem
Ethics of AI
How to drive AI in business

Data sources

Types of data
SQL vs NoSQL
Data Storage
Data preparation

Data Analysis – Statistical approach

Probability
Statistics
Statistical modeling
Applications in business using Python

Machine learning in business

Supervised vs unsupervised
Forecasting problems
Classfication problems
Clustering problems
Anomaly detection
Recommendation engines
Association pattern mining
Solving ML problems with Python language

Deep learning

Problems where traditional ML algorithms fails
Solving complicated problems with Deep Learning
Introduction to Tensorflow

Natural Language processing

Data visualization

Visual reporting outcomes from modeling
Common pitfalls in visualization
Data visualization with Python

From Data to Decision – communication

Making impact: data driven story telling
Influence effectivnes
Managing Data Science projects