Introduction
- The Data Science Process
- Roles and responsibilities of a Data Scientist
Preparing the Development Environment
- Libraries, frameworks, languages and tools
- Local development
- Collaborative web-based development
Data Collection
- Different Types of Data
- Structured
- Local databases
- Database connectors
- Common formats: xlxs, XML, Json, csv, …
- Un-Structured
- Clicks, censors, smartphones
- APIs
- Internet of Things (IoT)
- Documents, pictures, videos, sounds
- Structured
- Case study: Collecting large amounts of unstructured data continuosly
Data Storage
- Relational databases
- Non-relational databases
- Hadoop: Distributed File System (HDFS)
- Spark: Resilient Distributed Dataset (RDD)
- Cloud storage
Data Preparation
- Ingestion, selection, cleansing, and transformation
- Ensuring data quality – correctness, meaningfulness, and security
- Exception reports
Languages used for Preparation, Processing and Analysis
- R language
- Introduction to R
- Data manipulation, calculation and graphical display
- Python
- Introduction to Python
- Manipulating, processing, cleaning, and crunching data
Data Analytics
- Exploratory analysis
- Basic statistics
- Draft visualizations
- Understand data
- Causality
- Features and transformations
- Machine Learning
- Supervised vs unsurpevised
- When to use what model
- Natural Language Processing (NLP)
Data Visualization
- Best Practices
- Selecting the right chart for the right data
- Color pallets
- Taking it to the next level
- Dashboards
- Interactive Visualizations
- Storytelling with data