data storage – Bluechip AI Asia, AI Development Company

A Practical Introduction to Data Science Training Course

Posted on December 1, 2023 by admin

Introduction

The Data Science Process
Roles and responsibilities of a Data Scientist

Preparing the Development Environment

Libraries, frameworks, languages and tools
Local development
Collaborative web-based development

Data Collection

Different Types of Data
- Structured
  - Local databases
  - Database connectors
  - Common formats: xlxs, XML, Json, csv, …
- Un-Structured
  - Clicks, censors, smartphones
  - APIs
  - Internet of Things (IoT)
  - Documents, pictures, videos, sounds
Case study: Collecting large amounts of unstructured data continuosly

Data Storage

Relational databases
Non-relational databases
Hadoop: Distributed File System (HDFS)
Spark: Resilient Distributed Dataset (RDD)
Cloud storage

Data Preparation

Ingestion, selection, cleansing, and transformation
Ensuring data quality – correctness, meaningfulness, and security
Exception reports

Languages used for Preparation, Processing and Analysis

R language
- Introduction to R
- Data manipulation, calculation and graphical display
Python
- Introduction to Python
- Manipulating, processing, cleaning, and crunching data

Data Analytics

Exploratory analysis
- Basic statistics
- Draft visualizations
- Understand data
Causality
Features and transformations
Machine Learning
- Supervised vs unsurpevised
- When to use what model
Natural Language Processing (NLP)

Data Visualization

Best Practices
Selecting the right chart for the right data
Color pallets
Taking it to the next level
- Dashboards
- Interactive Visualizations
Storytelling with data

Which data storage to choose – from flat files, through SQL, NoSQL to massive distributed systems Training Course

Posted on October 6, 2023 by admin

Duration

7 hours (usually 1 day including breaks)

Requirements

Though no technical background is required, understanding the examples requires some level of database theory (e.g. SQL, etc…)

Overview

This course helps customer to chose the write data storage depend on their needs. It covers almost all possible modern approaches.

Course Outline

File Document Storage (Cloud Storage)
1. Features (OCR, Scalaibility, Search, etc…)
2. Open Source examples (e.g. Next Cloud)
3. Some commercial examples
Flat file storage
1. XML databases
2. CSV databases
Relational databases
1. Normalization
2. Dependencies and Constrants
3. Scalability – replications, clusters
4. Open Source and commercial software (MySQL, PostrgreSQL, DM7, Oracle, etc.)
NoSQL Storage
1. Document Oriented Databases (MongoDB, CouchDB etc…)
2. Column Orientation (Canadra, Scylla etc…)
3. Search Orientation (Elasticsearch…
NewSQL
1. CAP Theorem
2. Opensource software (SequoiaDB, etc…)
Search Engines
1. Features (text processing, relevancy, etc…)
2. Open Source examples
3. Scalability, High Availability, Load Balacing, etc….
Traditional Datawherehouses
1. Business Inteligence, OLTP and Datawherehouse
2. Opensource and commercial solutions
MapReduce and Distributed Parallel Processing
1. Hadoop-like (Hive, HFS, Impala)
Distributed filesystem
1. Overview of opensource (Ceph etc…)
In-memory Databases
1. Opensource solution (e.g. ApacheIgnite)
Others
1. Hypertable (Google Bigtable)
2. BigQuery
3. AWS solutsion (S3, etc…)
Beyond present – future trends