Introduction
Data Science in Depth
- What is Plotly? What is Dash?
- Pandas overview
- Numpy overview
Plotly Basics
Preparing the Development Environment
- Installing and configuring Plotly
- Installing and configuring Dash
Dash Core Components
- Using drowdown and slider components
- Uploading CSV, XLS, and images
- Working with Dash layouts
- Converting Plotly plots to dashboards
- Using callbacks
- Working with inputs and outputs
Dash Dashboards
- Pulling API data
- Building a binance dashboard
- Connecting Dash components
- Using alpha vantage
- Cleaning data
- Controlling callbacks
- Updating graphs
- Working with layout updating
Deployment
- Working with app authorization
- Deploying with Heroku
Summary and Conclusion
Introduction
Core Programming and Syntax in R
- Variables
- Loops
- Conditional statements
Fundamentals of R
- What are vectors?
- Functions and packages in R
Preparing the Development Environment
- Installing and configuring R and RStudio
- Setting up Rserve
Classifying Data
- Moving data between R and Tableau
- Preparing and cleaning data
- Modeling and scripting in R
Regressions in R and Tableau
- Creating a regression model
- Visualizing regressions
- Predicting and comparing values
Clustering and Models
- Working with clustering algorithms
- Creating clusters
- Visualizing clustered data
Advanced Analytics with R and Tableau
- Using CRISP-DM
- Working with TDSP models
- Summarizing data
Duration
35 hours (usually 5 days including breaks)
Requirements
- An understanding of Data Structure.
- Experience with Programming.
Audience
- Programmers
- Data Scientist
- Engineers
Overview
The training course will help the participants prepare for Web Application Development using Python Programming with Data Analytics. Such data visualization is a great tool for Top Management in decision making.
Course Outline
Day 1
- Data Science
- Data Science Team Composition (Data Scientist, Data Engineer, Data Visualizer, Process Owner)
- Business Intelligence
- Types of Business Intelligence
- Developing Business Intelligence Tools
- Business Intelligence and the Data Visualization
- Data Visualization
- Importance of Data Visualization
- The Visual Data Presentation
- The Data Visualization Tools (infographics, dials and gauges, geographic maps, sparklines, heat maps, and detailed bar, pie and fever charts)
- Painting by Numbers and Playing with Colors in Making Visual Stories
- Activity
Day 2
- Data Visualization in Python Programming
- Data Science with Python
- Review on Python Fundamentals
- Variables and Data Types (str, numeric, sequence, mapping, set types, Boolean, binary, casting)
- Operators, Lists, Tuples. Sets, Dictionaries
- Conditional Statements
- Functions, Lambda, Arrays, Classes, Objects, Inheritance, Iterators
- Scope, Modules, Dates, JSON, RegEx, PIP
- Try / Except, Command Input, String Formatting
- File Handling
- Activity
Day 3
- Python and MySQL
- Creating Database and Table
- Manipulating Database (Insert, Select, Update, Delete, Where Statement, Order by)
- Drop Table
- Limit
- Joining Tables
- Removing List Duplicates
- Reverse a String
- Data Visualization with Python and MySQL
- Using Matplotlib (Basic Plotting)
- Dictionaries and Pandas
- Logic, Control Flow and Filtering
- Manipulating Graphs Properties (Font, Size, Color Scheme)
- Activity
Day 4
- Plotting Data in Different Graph Format
- Histogram
- Line
- Bar
- Box Plot
- Pie Chart
- Donut
- Scatter Plot
- Radar
- Area
- 2D / 3D Density Plot
- Dendogram
- Map (Bubble, Heat)
- Stacked Chart
- Venn Diagram
- Seaborn
- Activity
Day 5
- Data Visualization with Python and MySQL
- Group Work: Create a Top Management Data Visualization Presentation Using ITDI Local ULIMS Data
- Presentation of Output
Duration
14 hours (usually 2 days including breaks)
Requirements
- An understanding of big data concepts (HDFS, Hive, etc.)
- An understanding of relational databases (MySQL, etc.)
- Experience with the Linux command line
Overview
Sqoop is an open source software tool for transfering data between Hadoop and relational databases or mainframes. It can be used to import data from a relational database management system (RDBMS) such as MySQL or Oracle or a mainframe into the Hadoop Distributed File System (HDFS). Thereafter, the data can be transformed in Hadoop MapReduce, and then re-exported back into an RDBMS.
In this instructor-led, live training, participants will learn how to use Sqoop to import data from a traditional relational database to Hadoop storage such HDFS or Hive and vice versa.
By the end of this training, participants will be able to:
- Install and configure Sqoop
- Import data from MySQL to HDFS and Hive
- Import data from HDFS and Hive to MySQL
Audience
- System administrators
- Data engineers
Format of the Course
- Part lecture, part discussion, exercises and heavy hands-on practice
Note
- To request a customized training for this course, please contact us to arrange.
Course Outline
Introduction
- Moving data from legacy data stores to Hadoop
Installing and Configuring Sqoop
Overview of Sqoop Features and Architecture
Importing Data from MySQL to HDFS
Importing Data from MySQL to Hive
Transforming Data in Hadoop
Importing Data from HDFS to MySQL
Importing Data from Hive to MySQL
Importing Incrementally with Sqoop Jobs
Troubleshooting
Summary and Conclusion
Duration
14 hours (usually 2 days including breaks)
Requirements
Good R knowledge.
Overview
R is an open-source free programming language for statistical computing, data analysis, and graphics. R is used by a growing number of managers and data analysts inside corporations and academia. R has a wide variety of packages for data mining.
Course Outline
Sources of methods
- Artificial intelligence
- Machine learning
- Statistics
- Sources of data
Pre processing of data
- Data Import/Export
- Data Exploration and Visualization
- Dimensionality Reduction
- Dealing with missing values
- R Packages
Data mining main tasks
- Automatic or semi-automatic analysis of large quantities of data
- Extracting previously unknown interesting patterns
- groups of data records (cluster analysis)
- unusual records (anomaly detection)
- dependencies (association rule mining)
Data mining
- Anomaly detection (Outlier/change/deviation detection)
- Association rule learning (Dependency modeling)
- Clustering
- Classification
- Regression
- Summarization
- Frequent Pattern Mining
- Text Mining
- Decision Trees
- Regression
- Neural Networks
- Sequence Mining
- Frequent Pattern Mining
Data dredging, data fishing, data snooping