Power BI Dashboards Training Course

Power BI Desktop & Introducing Key terms

  • Download and Install BI Desktop
  • Connect to Oracle Database Server
  • Understanding Power BI Screen → Report | Data | Relationship
  • Add, Rename, Duplicate, Hide and Delete Pages
  • Get Data from Excel Files
  • Get Data from Text Files
  • Load Data from Multiple Data Sources
  • Remove Unwanted Columns from Tables

Power BI Charts

  • Column Chart
  • Bar Chart
  • Card
  • Clustered Column Chart
  • Introducing natural-language queries
  • Importing custom visuals

Power BI Filters

  • Slicer
  • Basic Filters
  • Advanced Filters
  • Top N Filters
  • Filters on Measures
  • Page Level Filters
  • Report Level Filters
  • Drill through Filters

Working with Power BI Dashboards & Best Practices

  • Dashboard best practices
  • Understanding Relationship
  • Dashboard Actions
  • Add Reports to a Dashboard
  • Add Title to Dashboard
  • How to Add Image to Dashboard
  • Add Video to Dashboard
  • Add Web Content to Dashboard
  • Dashboard Settings
  • Delete a Dashboard
  • Pin Report to a Dashboard

Sharing Power BI Work

  • Inviting a user to see a dashboard (Share Dashboard)
    • Internal
    • External (Inviting users outside your organization)
  • Share a Report
  • Sharing Workspace
  • Understanding data refresh
    • Configuring automatic refresh

Power BI Administration (behind the scenes)

  • Understanding the Power BI admin portal
    • The admin portal presents five features:
      • Usage metrics
      • Users
      • Audit logs
      • Tenant settings
      • Premium settings
  • Assign users to the admin role in office 365 (Office 365 admin center)
  • Three actors in play for administration

Power BI Security Access Control

  • Giving access to Apps and Contents Packs
  • Row Level Security
  • Managing Users and Licenses
    • Enabling / disabling users
    • Audit Power BI Activity

Pentaho Open Source BI Suite Community Edition (CE) Training Course

Introduction to Pentaho Open Source BI Suite Community Edition (CE)

Overview of CE Features and Architecture

  • Pentaho Community Edition vs. Enterprise Edition
  • Pentaho CE Tools

Installing and Configuring Pentaho CE

Using the Pentaho CE Business Analytics User Console

Creating Reports with the Pentaho CE Business Analytics Report Designer

Performing Data Integration in Pentaho CE

Working with Databases in Pentaho CE

  • Relational Databases
  • NoSQL Sources
  • Analytic Databases

Working with the Analysis View in Pentaho CE

  • Predictive Analytics

Working with Big Data in Pentaho CE

  • Graphical Designer for Big Data

Maximizing the Community Online Forums of Pentaho CE

Deploying or Embedding Your Pentaho CE Project

  • Licensing

Troubleshooting

AWS QuickSight Training Course

Introduction

  • Overview of AWS QuickSight
  • What is AWS and QuickSight

Getting Started with AWS QuickSight

  • Creating an AWS and QuickSight account
  • Understanding the QuickSight workflow
  • Navigating the QuickSight UI

Preparing Data in QuickSight

  • Understanding data preparation in QuickSight
  • SPICE vs. direct query
  • Uploading and importing data to QuickSight
  • Working with columns and fields
  • Understanding calculated fields, functions, and operators
  • Adding calculated fields using strings to our project
  • Extracting information out of strings
  • Using conditional functions
  • Creating calculated fields with numeric values
  • Adding different filters to a project

Analyzing and Visualizing Data

  • Understanding the difference between preparing and analyzing data
  • Creating the data analysis
  • Creating visuals
  • Understanding dimensions and measures
  • Adding additional data sets
  • Field formatting, aggregation, and granularity
  • Formatting visuals
  • Creating a story and treemap
  • Using filters and tables
  • Adding a KPI visual

Exporting and Sharing Project Data

  • Understanding refresh and schedule refresh
  • Exporting project data as .csv files
  • Adding users to an account
  • Sharing data set and analysis
  • Creating and sharing dashboards

Using Databases as Data Sources

  • Setting up a database
  • Preparing dummy data
  • Connecting QuickSight to a database
  • Importing data into SPICE
  • Importing data as a Query
  • Importing calculated fields and query
  • Using NoSQL databases

Business Intelligence with SSAS Training Course

Duration

14 hours (usually 2 days including breaks)

Requirements

  • An understanding of data analysis
  • Experience with Microsoft SQL Server

Overview

SSAS (SQL Server Analysis Services), is a Microsoft SQL Server transactional processing (OLAP) and data mining tool for analyzing data across multiple databases, tables or files. The semantic data models provided by SSAS are used by client applications such as Power BI, Excel, Reporting Services, and other data visualization tools.

In this instructor-led, live training (onsite or remote), participants will learn how to use SSAS to analyze large volumes of data in databases and data warehouses.

By the end of this training, participants will be able to:

  • Install and configure SSAS
  • Understand the relationship between SSAS, SSIS, and SSRS
  • Apply multidimensional data modeling to extract business insights from data
  • Design OLAP (Online Analytical Processing) cubes 
  • Query and manipulate multidimensional data using the MDX (Multidimensional Expressions) query language
  • Deploy real-world BI solutions using SSAS

Audience

  • BI (Business Intelligence) professionals
  • Data Analysts
  • Database and data warehousing professionals

 
Format of the Course

  • Interactive lecture and discussion
  • Lots of exercises and practice
  • Hands-on implementation in a live-lab environment

Course Customization Options

  • This training is based on the latest version of Microsoft SQL Server and SSAS.
  • To request a customized training for this course, please contact us to arrange.

Course Outline

Introduction

Installing and Configuring SSAS

Overview of SSAS Features and Architecture

Data Insight and Business Intelligence

Operational Analytics

New functionality Column store Index

Querying Data in SSAS Tabular

Querying Multidimensional Data

Enhanced SSIS

Enhanced MDS

Troubleshooting

Summary and Conclusion

Developing in SQL Server Business Intelligence Training Course

Duration

35 hours (usually 5 days including breaks)

Requirements

Before attending this course, students must have:

  • Understanding of relational database concepts
  • Experience of querying a relational database
  • Fundamental understanding of reporting and analysis

Overview

Developers, analysts and business users need to quickly analyse large amounts of data, get insight into that data, retrieve hidden knowledge inside it, and report against various data sources professionally and effectively. This course will enable them to do this with SQL Server Business Intelligence. The 5 day course will cover the SQL Server components and tools used for B.I. projects.

The course will also include the new features in SQL Server B.I. 2016. Reporting Services, Analysis Services and Integration Services will be explored and worked with. Components of Business Intelligence in SQL Server are completely independent from the SQL Server databases, so they can be used over any data sources. Therefore, it is not necessary to know the SQL Server itself.

4 Top Takeaways from the Course

  1. Make your existing data work harder for you
  2. Integrate diverse data stores to a single repository
  3. Transform raw data to business intelligence
  4. Create eloquent reports from raw data

Course Outline

Module 1: Introduction to SQL Server Reporting Services – SSRS

  • Overview of SSRS
  • Installing Reporting Services
  • The Reporting Lifecycle
  • Highlights of Reporting Services
  • Reporting Services Scenarios
  • Reporting Services Developer Tools

Module 2: Authoring Basic Reports

  • Creating a Basic Table Report
  • Report Definition Language
  • Accessing Data
  • Formatting Report Pages
  • Headers and Footers
  • Calculating Values
  • Common Aggregate Functions

Module 3: Enhancing Reports

  • Interactive Navigation
  • Use Show/Hide to Provide Drill-Down Interactivity
  • Navigate From Report To Report Using Links
  • Working with Data Regions
  • Using Report Manager
  • Distribute and Manage Published Reports

Module 4: Introduction to SQL Server Integration Services – SSIS

  • What Is SSIS?
  • When to Use SSIS?
  • SSIS Architecture
  • Integration Services Scenarios
  • Integration Services Developer Tools
  • Control Flow and Design Flow Design Surfaces
  • Migrating data across Excel, Flat Files, XML and database providers
  • Adding Tasks to a Package

Module 5: SSIS Going Further

  • Building a Package
  • Troubleshooting a Package
  • Deploying a Package to the SSIS Server
  • Scheduling a Package with SQL Server Agent
  • Securing a Package in Management Studio
  • Using Variables, Event Handlers and Configurations

Module 6: Introduction to SQL Server Analysis Services – SSAS

  •   What Is SSAS?
  •   When to Use SSAS
  •    SSAS Architecture
  •    Analysis Services Tools
  •    Understanding Fact and Dimension tables

Module 7: SSAS Going Further

  • Creating an Analysis Services Project in Visual Studio
  • Fact and Dimension Tables
  • Creating a Data Source
  • Creating a Data Source View
  • Creating a Cube Object Definition
  • Deploy Definitions to OLAP Server and Load in Data
  • Browse a Cube to Query Data
  • Creating Key Performance Indicators (KPI’s) to Give Meaning to Data

Business Intelligence in MS SQL Server 2008 Training Course

Duration

14 hours (usually 2 days including breaks)

Requirements

Knowledge of Windows, basic knowledge of SQL and relational databases.

Overview

Training is dedicated to the basics of create a data warehouse environment based on MS SQL Server 2008.

Course participant gain the basis for the design and construction of a data warehouse that runs on MS SQL Server 2008.

Gain knowledge of how to build a simple ETL process based on the SSIS and then design and implement a data cube using SSAS.

The participant will be able to manage OLAP database: create and delete database OLAP Processing a partition changes on-line.

The participant will acquire knowledge of scripting XML / A and MDX.

Course Outline

  • basis, objectives and application of data warehouse, data warehouse server types
  • base building ETL processes in SSIS
  • basic design data cubes in an Analysis Services: measure group measure
  • dimensions, hierarchies, attributes,
  • development of the project data cubes: measures calculated, partitions, perspectives, translations, actions, KPIs,
  • Build and deploy, processing a partition
  • the base XML / A: Partitioning, processes and overall Incremental, delete partitions, processes of aggregation,
  • base MDX language

Impala for Business Intelligence Training Course

Duration

21 hours (usually 3 days including breaks)

Requirements

  • knowledge of SQL

Overview

Cloudera Impala is an open source massively parallel processing (MPP) SQL query engine for Apache Hadoop clusters.

Impala enables users to issue low-latency SQL queries to data stored in Hadoop Distributed File System and Apache Hbase without requiring data movement or transformation.

Audience

This course is aimed at analysts and data scientists performing analysis on data stored in Hadoop via Business Intelligence or SQL tools.

After this course delegates will be able to

  • Extract meaningful information from Hadoop clusters with Impala.
  • Write specific programs to facilitate Business Intelligence in Impala SQL Dialect.
  • Troubleshoot Impala.

Course Outline

Introduction to Impala

  • What is Impala?
  • How Impala Differs from Relational Databases
  • Limitations and Future Directions
  • Using the Impala Shell
  • The Impala Daemon, Statestore and Catalogue service

Loading Impala

  • Explore a New Impala Instance
  • Load CSV Data from Local Files
  • Point an Impala Table at Existing Data Files

Analyzing Data with Impala

  • Describe the Impala Table
  • Basic Syntax and Querying
  • Data Types
  • Filtering, Sorting, and Limiting Results
  • Joining and Grouping Data
  • Data Loading and Querying Examples
  • Improving Impala Performance
  • How Impala works with Hadoop file formats
  • Hands-On Exercise: Interactive Analysis with Impala

Programming Impala Applications

  • Overview of the Impala SQL Dialect
  • Overview of Impala Programming Interfaces

Troubleshooting Impala

  • Troubleshooting Impala SQL Syntax Issues
  • Troubleshooting I/O Capacity Problems
  • Impala Web User Interface for Debugging

Big Data Business Intelligence for Criminal Intelligence Analysis Training Course

Duration

35 hours (usually 5 days including breaks)

Requirements

  • Knowledge of law enforcement processes and data systems
  • Basic understanding of SQL/Oracle or relational database
  • Basic understanding of statistics (at Spreadsheet level)

Overview

Advances in technologies and the increasing amount of information are transforming how law enforcement is conducted. The challenges that Big Data pose are nearly as daunting as Big Data’s promise. Storing data efficiently is one of these challenges; effectively analyzing it is another.

In this instructor-led, live training, participants will learn the mindset with which to approach Big Data technologies, assess their impact on existing processes and policies, and implement these technologies for the purpose of identifying criminal activity and preventing crime. Case studies from law enforcement organizations around the world will be examined to gain insights on their adoption approaches, challenges and results.

By the end of this training, participants will be able to:

  • Combine Big Data technology with traditional data gathering processes to piece together a story during an investigation
  • Implement industrial big data storage and processing solutions for data analysis
  • Prepare a proposal for the adoption of the most adequate tools and processes for enabling a data-driven approach to criminal investigation

Audience

  • Law Enforcement specialists with a technical background

Format of the course

  • Part lecture, part discussion, exercises and heavy hands-on practice

Course Outline

=====
Day 01
=====
Overview of Big Data Business Intelligence for Criminal Intelligence Analysis

  • Case Studies from Law Enforcement – Predictive Policing
  • Big Data adoption rate in Law Enforcement Agencies and how they are aligning their future operation around Big Data Predictive Analytics
  • Emerging technology solutions such as gunshot sensors, surveillance video and social media
  • Using Big Data technology to mitigate information overload
  • Interfacing Big Data with Legacy data
  • Basic understanding of enabling technologies in predictive analytics
  • Data Integration & Dashboard visualization
  • Fraud management
  • Business Rules and Fraud detection
  • Threat detection and profiling
  • Cost benefit analysis for Big Data implementation

Introduction to Big Data

  • Main characteristics of Big Data — Volume, Variety, Velocity and Veracity.
  • MPP (Massively Parallel Processing) architecture
  • Data Warehouses – static schema, slowly evolving dataset
  • MPP Databases: Greenplum, Exadata, Teradata, Netezza, Vertica etc.
  • Hadoop Based Solutions – no conditions on structure of dataset.
  • Typical pattern : HDFS, MapReduce (crunch), retrieve from HDFS
  • Apache Spark for stream processing
  • Batch- suited for analytical/non-interactive
  • Volume : CEP streaming data
  • Typical choices – CEP products (e.g. Infostreams, Apama, MarkLogic etc)
  • Less production ready – Storm/S4
  • NoSQL Databases – (columnar and key-value): Best suited as analytical adjunct to data warehouse/database

NoSQL solutions

  • KV Store – Keyspace, Flare, SchemaFree, RAMCloud, Oracle NoSQL Database (OnDB)
  • KV Store – Dynamo, Voldemort, Dynomite, SubRecord, Mo8onDb, DovetailDB
  • KV Store (Hierarchical) – GT.m, Cache
  • KV Store (Ordered) – TokyoTyrant, Lightcloud, NMDB, Luxio, MemcacheDB, Actord
  • KV Cache – Memcached, Repcached, Coherence, Infinispan, EXtremeScale, JBossCache, Velocity, Terracoqua
  • Tuple Store – Gigaspaces, Coord, Apache River
  • Object Database – ZopeDB, DB40, Shoal
  • Document Store – CouchDB, Cloudant, Couchbase, MongoDB, Jackrabbit, XML-Databases, ThruDB, CloudKit, Prsevere, Riak-Basho, Scalaris
  • Wide Columnar Store – BigTable, HBase, Apache Cassandra, Hypertable, KAI, OpenNeptune, Qbase, KDI

Varieties of Data: Introduction to Data Cleaning issues in Big Data

  • RDBMS – static structure/schema, does not promote agile, exploratory environment.
  • NoSQL – semi structured, enough structure to store data without exact schema before storing data
  • Data cleaning issues

Hadoop

  • When to select Hadoop?
  • STRUCTURED – Enterprise data warehouses/databases can store massive data (at a cost) but impose structure (not good for active exploration)
  • SEMI STRUCTURED data – difficult to carry out using traditional solutions (DW/DB)
  • Warehousing data = HUGE effort and static even after implementation
  • For variety & volume of data, crunched on commodity hardware – HADOOP
  • Commodity H/W needed to create a Hadoop Cluster

Introduction to Map Reduce /HDFS

  • MapReduce – distribute computing over multiple servers
  • HDFS – make data available locally for the computing process (with redundancy)
  • Data – can be unstructured/schema-less (unlike RDBMS)
  • Developer responsibility to make sense of data
  • Programming MapReduce = working with Java (pros/cons), manually loading data into HDFS

=====
Day 02
=====
Big Data Ecosystem — Building Big Data ETL (Extract, Transform, Load) — Which Big Data Tools to use and when?

  • Hadoop vs. Other NoSQL solutions
  • For interactive, random access to data
  • Hbase (column oriented database) on top of Hadoop
  • Random access to data but restrictions imposed (max 1 PB)
  • Not good for ad-hoc analytics, good for logging, counting, time-series
  • Sqoop – Import from databases to Hive or HDFS (JDBC/ODBC access)
  • Flume – Stream data (e.g. log data) into HDFS

Big Data Management System

  • Moving parts, compute nodes start/fail :ZooKeeper – For configuration/coordination/naming services
  • Complex pipeline/workflow: Oozie – manage workflow, dependencies, daisy chain
  • Deploy, configure, cluster management, upgrade etc (sys admin) :Ambari
  • In Cloud : Whirr

Predictive Analytics — Fundamental Techniques and Machine Learning based Business Intelligence

  • Introduction to Machine Learning
  • Learning classification techniques
  • Bayesian Prediction — preparing a training file
  • Support Vector Machine
  • KNN p-Tree Algebra & vertical mining
  • Neural Networks
  • Big Data large variable problem — Random forest (RF)
  • Big Data Automation problem – Multi-model ensemble RF
  • Automation through Soft10-M
  • Text analytic tool-Treeminer
  • Agile learning
  • Agent based learning
  • Distributed learning
  • Introduction to Open source Tools for predictive analytics : R, Python, Rapidminer, Mahut

Predictive Analytics Ecosystem and its application in Criminal Intelligence Analysis

  • Technology and the investigative process
  • Insight analytic
  • Visualization analytics
  • Structured predictive analytics
  • Unstructured predictive analytics
  • Threat/fraudstar/vendor profiling
  • Recommendation Engine
  • Pattern detection
  • Rule/Scenario discovery – failure, fraud, optimization
  • Root cause discovery
  • Sentiment analysis
  • CRM analytics
  • Network analytics
  • Text analytics for obtaining insights from transcripts, witness statements, internet chatter, etc.
  • Technology assisted review
  • Fraud analytics
  • Real Time Analytic

=====
Day 03
=====
Real Time and Scalable Analytics Over Hadoop

  • Why common analytic algorithms fail in Hadoop/HDFS
  • Apache Hama- for Bulk Synchronous distributed computing
  • Apache SPARK- for cluster computing and real time analytic
  • CMU Graphics Lab2- Graph based asynchronous approach to distributed computing
  • KNN p — Algebra based approach from Treeminer for reduced hardware cost of operation

Tools for eDiscovery and Forensics

  • eDiscovery over Big Data vs. Legacy data – a comparison of cost and performance
  • Predictive coding and Technology Assisted Review (TAR)
  • Live demo of vMiner for understanding how TAR enables faster discovery
  • Faster indexing through HDFS – Velocity of data
  • NLP (Natural Language processing) – open source products and techniques
  • eDiscovery in foreign languages — technology for foreign language processing

Big Data BI for Cyber Security – Getting a 360-degree view, speedy data collection and threat identification

  • Understanding the basics of security analytics — attack surface, security misconfiguration, host defenses
  • Network infrastructure / Large datapipe / Response ETL for real time analytic
  • Prescriptive vs predictive – Fixed rule based vs auto-discovery of threat rules from Meta data

Gathering disparate data for Criminal Intelligence Analysis

  • Using IoT (Internet of Things) as sensors for capturing data
  • Using Satellite Imagery for Domestic Surveillance
  • Using surveillance and image data for criminal identification
  • Other data gathering technologies — drones, body cameras, GPS tagging systems and thermal imaging technology
  • Combining automated data retrieval with data obtained from informants, interrogation, and research
  • Forecasting criminal activity

=====
Day 04
=====
Fraud prevention BI from Big Data in Fraud Analytics

  • Basic classification of Fraud Analytics — rules-based vs predictive analytics
  • Supervised vs unsupervised Machine learning for Fraud pattern detection
  • Business to business fraud, medical claims fraud, insurance fraud, tax evasion and money laundering

Social Media Analytics — Intelligence gathering and analysis

  • How Social Media is used by criminals to organize, recruit and plan
  • Big Data ETL API for extracting social media data
  • Text, image, meta data and video
  • Sentiment analysis from social media feed
  • Contextual and non-contextual filtering of social media feed
  • Social Media Dashboard to integrate diverse social media
  • Automated profiling of social media profile
  • Live demo of each analytic will be given through Treeminer Tool

Big Data Analytics in image processing and video feeds

  • Image Storage techniques in Big Data — Storage solution for data exceeding petabytes
  • LTFS (Linear Tape File System) and LTO (Linear Tape Open)
  • GPFS-LTFS (General Parallel File System –  Linear Tape File System) — layered storage solution for Big image data
  • Fundamentals of image analytics
  • Object recognition
  • Image segmentation
  • Motion tracking
  • 3-D image reconstruction

Biometrics, DNA and Next Generation Identification Programs

  • Beyond fingerprinting and facial recognition
  • Speech recognition, keystroke (analyzing a users typing pattern) and CODIS (combined DNA Index System)
  • Beyond DNA matching: using forensic DNA phenotyping to construct a face from DNA samples

Big Data Dashboard for quick accessibility of diverse data and display :

  • Integration of existing application platform with Big Data Dashboard
  • Big Data management
  • Case Study of Big Data Dashboard: Tableau and Pentaho
  • Use Big Data app to push location based services in Govt.
  • Tracking system and management

=====
Day 05
=====
How to justify Big Data BI implementation within an organization:

  • Defining the ROI (Return on Investment) for implementing Big Data
  • Case studies for saving Analyst Time in collection and preparation of Data – increasing productivity
  • Revenue gain from lower database licensing cost
  • Revenue gain from location based services
  • Cost savings from fraud prevention
  • An integrated spreadsheet approach for calculating approximate expenses vs. Revenue gain/savings from Big Data implementation.

Step by Step procedure for replacing a legacy data system with a Big Data System

  • Big Data Migration Roadmap
  • What critical information is needed before architecting a Big Data system?
  • What are the different ways for calculating Volume, Velocity, Variety and Veracity of data
  • How to estimate data growth
  • Case studies

Review of Big Data Vendors and review of their products.

  • Accenture
  • APTEAN (Formerly CDC Software)
  • Cisco Systems
  • Cloudera
  • Dell
  • EMC
  • GoodData Corporation
  • Guavus
  • Hitachi Data Systems
  • Hortonworks
  • HP
  • IBM
  • Informatica
  • Intel
  • Jaspersoft
  • Microsoft
  • MongoDB (Formerly 10Gen)
  • MU Sigma
  • Netapp
  • Opera Solutions
  • Oracle
  • Pentaho
  • Platfora
  • Qliktech
  • Quantum
  • Rackspace
  • Revolution Analytics
  • Salesforce
  • SAP
  • SAS Institute
  • Sisense
  • Software AG/Terracotta
  • Soft10 Automation
  • Splunk
  • Sqrrl
  • Supermicro
  • Tableau Software
  • Teradata
  • Think Big Analytics
  • Tidemark Systems
  • Treeminer
  • VMware (Part of EMC)

Q/A session

Big Data Business Intelligence for Criminal Intelligence Analysis Training Course

Duration

35 hours (usually 5 days including breaks)

Requirements

  • Knowledge of law enforcement processes and data systems
  • Basic understanding of SQL/Oracle or relational database
  • Basic understanding of statistics (at Spreadsheet level)

Overview

Advances in technologies and the increasing amount of information are transforming how law enforcement is conducted. The challenges that Big Data pose are nearly as daunting as Big Data’s promise. Storing data efficiently is one of these challenges; effectively analyzing it is another.

In this instructor-led, live training, participants will learn the mindset with which to approach Big Data technologies, assess their impact on existing processes and policies, and implement these technologies for the purpose of identifying criminal activity and preventing crime. Case studies from law enforcement organizations around the world will be examined to gain insights on their adoption approaches, challenges and results.

By the end of this training, participants will be able to:

  • Combine Big Data technology with traditional data gathering processes to piece together a story during an investigation
  • Implement industrial big data storage and processing solutions for data analysis
  • Prepare a proposal for the adoption of the most adequate tools and processes for enabling a data-driven approach to criminal investigation

Audience

  • Law Enforcement specialists with a technical background

Format of the course

  • Part lecture, part discussion, exercises and heavy hands-on practice

Course Outline

=====
Day 01
=====
Overview of Big Data Business Intelligence for Criminal Intelligence Analysis

  • Case Studies from Law Enforcement – Predictive Policing
  • Big Data adoption rate in Law Enforcement Agencies and how they are aligning their future operation around Big Data Predictive Analytics
  • Emerging technology solutions such as gunshot sensors, surveillance video and social media
  • Using Big Data technology to mitigate information overload
  • Interfacing Big Data with Legacy data
  • Basic understanding of enabling technologies in predictive analytics
  • Data Integration & Dashboard visualization
  • Fraud management
  • Business Rules and Fraud detection
  • Threat detection and profiling
  • Cost benefit analysis for Big Data implementation

Introduction to Big Data

  • Main characteristics of Big Data — Volume, Variety, Velocity and Veracity.
  • MPP (Massively Parallel Processing) architecture
  • Data Warehouses – static schema, slowly evolving dataset
  • MPP Databases: Greenplum, Exadata, Teradata, Netezza, Vertica etc.
  • Hadoop Based Solutions – no conditions on structure of dataset.
  • Typical pattern : HDFS, MapReduce (crunch), retrieve from HDFS
  • Apache Spark for stream processing
  • Batch- suited for analytical/non-interactive
  • Volume : CEP streaming data
  • Typical choices – CEP products (e.g. Infostreams, Apama, MarkLogic etc)
  • Less production ready – Storm/S4
  • NoSQL Databases – (columnar and key-value): Best suited as analytical adjunct to data warehouse/database

NoSQL solutions

  • KV Store – Keyspace, Flare, SchemaFree, RAMCloud, Oracle NoSQL Database (OnDB)
  • KV Store – Dynamo, Voldemort, Dynomite, SubRecord, Mo8onDb, DovetailDB
  • KV Store (Hierarchical) – GT.m, Cache
  • KV Store (Ordered) – TokyoTyrant, Lightcloud, NMDB, Luxio, MemcacheDB, Actord
  • KV Cache – Memcached, Repcached, Coherence, Infinispan, EXtremeScale, JBossCache, Velocity, Terracoqua
  • Tuple Store – Gigaspaces, Coord, Apache River
  • Object Database – ZopeDB, DB40, Shoal
  • Document Store – CouchDB, Cloudant, Couchbase, MongoDB, Jackrabbit, XML-Databases, ThruDB, CloudKit, Prsevere, Riak-Basho, Scalaris
  • Wide Columnar Store – BigTable, HBase, Apache Cassandra, Hypertable, KAI, OpenNeptune, Qbase, KDI

Varieties of Data: Introduction to Data Cleaning issues in Big Data

  • RDBMS – static structure/schema, does not promote agile, exploratory environment.
  • NoSQL – semi structured, enough structure to store data without exact schema before storing data
  • Data cleaning issues

Hadoop

  • When to select Hadoop?
  • STRUCTURED – Enterprise data warehouses/databases can store massive data (at a cost) but impose structure (not good for active exploration)
  • SEMI STRUCTURED data – difficult to carry out using traditional solutions (DW/DB)
  • Warehousing data = HUGE effort and static even after implementation
  • For variety & volume of data, crunched on commodity hardware – HADOOP
  • Commodity H/W needed to create a Hadoop Cluster

Introduction to Map Reduce /HDFS

  • MapReduce – distribute computing over multiple servers
  • HDFS – make data available locally for the computing process (with redundancy)
  • Data – can be unstructured/schema-less (unlike RDBMS)
  • Developer responsibility to make sense of data
  • Programming MapReduce = working with Java (pros/cons), manually loading data into HDFS

=====
Day 02
=====
Big Data Ecosystem — Building Big Data ETL (Extract, Transform, Load) — Which Big Data Tools to use and when?

  • Hadoop vs. Other NoSQL solutions
  • For interactive, random access to data
  • Hbase (column oriented database) on top of Hadoop
  • Random access to data but restrictions imposed (max 1 PB)
  • Not good for ad-hoc analytics, good for logging, counting, time-series
  • Sqoop – Import from databases to Hive or HDFS (JDBC/ODBC access)
  • Flume – Stream data (e.g. log data) into HDFS

Big Data Management System

  • Moving parts, compute nodes start/fail :ZooKeeper – For configuration/coordination/naming services
  • Complex pipeline/workflow: Oozie – manage workflow, dependencies, daisy chain
  • Deploy, configure, cluster management, upgrade etc (sys admin) :Ambari
  • In Cloud : Whirr

Predictive Analytics — Fundamental Techniques and Machine Learning based Business Intelligence

  • Introduction to Machine Learning
  • Learning classification techniques
  • Bayesian Prediction — preparing a training file
  • Support Vector Machine
  • KNN p-Tree Algebra & vertical mining
  • Neural Networks
  • Big Data large variable problem — Random forest (RF)
  • Big Data Automation problem – Multi-model ensemble RF
  • Automation through Soft10-M
  • Text analytic tool-Treeminer
  • Agile learning
  • Agent based learning
  • Distributed learning
  • Introduction to Open source Tools for predictive analytics : R, Python, Rapidminer, Mahut

Predictive Analytics Ecosystem and its application in Criminal Intelligence Analysis

  • Technology and the investigative process
  • Insight analytic
  • Visualization analytics
  • Structured predictive analytics
  • Unstructured predictive analytics
  • Threat/fraudstar/vendor profiling
  • Recommendation Engine
  • Pattern detection
  • Rule/Scenario discovery – failure, fraud, optimization
  • Root cause discovery
  • Sentiment analysis
  • CRM analytics
  • Network analytics
  • Text analytics for obtaining insights from transcripts, witness statements, internet chatter, etc.
  • Technology assisted review
  • Fraud analytics
  • Real Time Analytic

=====
Day 03
=====
Real Time and Scalable Analytics Over Hadoop

  • Why common analytic algorithms fail in Hadoop/HDFS
  • Apache Hama- for Bulk Synchronous distributed computing
  • Apache SPARK- for cluster computing and real time analytic
  • CMU Graphics Lab2- Graph based asynchronous approach to distributed computing
  • KNN p — Algebra based approach from Treeminer for reduced hardware cost of operation

Tools for eDiscovery and Forensics

  • eDiscovery over Big Data vs. Legacy data – a comparison of cost and performance
  • Predictive coding and Technology Assisted Review (TAR)
  • Live demo of vMiner for understanding how TAR enables faster discovery
  • Faster indexing through HDFS – Velocity of data
  • NLP (Natural Language processing) – open source products and techniques
  • eDiscovery in foreign languages — technology for foreign language processing

Big Data BI for Cyber Security – Getting a 360-degree view, speedy data collection and threat identification

  • Understanding the basics of security analytics — attack surface, security misconfiguration, host defenses
  • Network infrastructure / Large datapipe / Response ETL for real time analytic
  • Prescriptive vs predictive – Fixed rule based vs auto-discovery of threat rules from Meta data

Gathering disparate data for Criminal Intelligence Analysis

  • Using IoT (Internet of Things) as sensors for capturing data
  • Using Satellite Imagery for Domestic Surveillance
  • Using surveillance and image data for criminal identification
  • Other data gathering technologies — drones, body cameras, GPS tagging systems and thermal imaging technology
  • Combining automated data retrieval with data obtained from informants, interrogation, and research
  • Forecasting criminal activity

=====
Day 04
=====
Fraud prevention BI from Big Data in Fraud Analytics

  • Basic classification of Fraud Analytics — rules-based vs predictive analytics
  • Supervised vs unsupervised Machine learning for Fraud pattern detection
  • Business to business fraud, medical claims fraud, insurance fraud, tax evasion and money laundering

Social Media Analytics — Intelligence gathering and analysis

  • How Social Media is used by criminals to organize, recruit and plan
  • Big Data ETL API for extracting social media data
  • Text, image, meta data and video
  • Sentiment analysis from social media feed
  • Contextual and non-contextual filtering of social media feed
  • Social Media Dashboard to integrate diverse social media
  • Automated profiling of social media profile
  • Live demo of each analytic will be given through Treeminer Tool

Big Data Analytics in image processing and video feeds

  • Image Storage techniques in Big Data — Storage solution for data exceeding petabytes
  • LTFS (Linear Tape File System) and LTO (Linear Tape Open)
  • GPFS-LTFS (General Parallel File System –  Linear Tape File System) — layered storage solution for Big image data
  • Fundamentals of image analytics
  • Object recognition
  • Image segmentation
  • Motion tracking
  • 3-D image reconstruction

Biometrics, DNA and Next Generation Identification Programs

  • Beyond fingerprinting and facial recognition
  • Speech recognition, keystroke (analyzing a users typing pattern) and CODIS (combined DNA Index System)
  • Beyond DNA matching: using forensic DNA phenotyping to construct a face from DNA samples

Big Data Dashboard for quick accessibility of diverse data and display :

  • Integration of existing application platform with Big Data Dashboard
  • Big Data management
  • Case Study of Big Data Dashboard: Tableau and Pentaho
  • Use Big Data app to push location based services in Govt.
  • Tracking system and management

=====
Day 05
=====
How to justify Big Data BI implementation within an organization:

  • Defining the ROI (Return on Investment) for implementing Big Data
  • Case studies for saving Analyst Time in collection and preparation of Data – increasing productivity
  • Revenue gain from lower database licensing cost
  • Revenue gain from location based services
  • Cost savings from fraud prevention
  • An integrated spreadsheet approach for calculating approximate expenses vs. Revenue gain/savings from Big Data implementation.

Step by Step procedure for replacing a legacy data system with a Big Data System

  • Big Data Migration Roadmap
  • What critical information is needed before architecting a Big Data system?
  • What are the different ways for calculating Volume, Velocity, Variety and Veracity of data
  • How to estimate data growth
  • Case studies

Review of Big Data Vendors and review of their products.

  • Accenture
  • APTEAN (Formerly CDC Software)
  • Cisco Systems
  • Cloudera
  • Dell
  • EMC
  • GoodData Corporation
  • Guavus
  • Hitachi Data Systems
  • Hortonworks
  • HP
  • IBM
  • Informatica
  • Intel
  • Jaspersoft
  • Microsoft
  • MongoDB (Formerly 10Gen)
  • MU Sigma
  • Netapp
  • Opera Solutions
  • Oracle
  • Pentaho
  • Platfora
  • Qliktech
  • Quantum
  • Rackspace
  • Revolution Analytics
  • Salesforce
  • SAP
  • SAS Institute
  • Sisense
  • Software AG/Terracotta
  • Soft10 Automation
  • Splunk
  • Sqrrl
  • Supermicro
  • Tableau Software
  • Teradata
  • Think Big Analytics
  • Tidemark Systems
  • Treeminer
  • VMware (Part of EMC)

Q/A session

Big Data Business Intelligence for Telecom and Communication Service Providers Training Course

Duration

35 hours (usually 5 days including breaks)

Requirements

  • Should have basic knowledge of business operation and data systems in Telecom in their domain
  • Must have basic understanding of SQL/Oracle or relational database
  • Basic understanding of Statistics (in Excel levels)

Overview

Overview

Communications service providers (CSP) are facing pressure to reduce costs and maximize average revenue per user (ARPU), while ensuring an excellent customer experience, but data volumes keep growing. Global mobile data traffic will grow at a compound annual growth rate (CAGR) of 78 percent to 2016, reaching 10.8 exabytes per month.

Meanwhile, CSPs are generating large volumes of data, including call detail records (CDR), network data and customer data. Companies that fully exploit this data gain a competitive edge. According to a recent survey by The Economist Intelligence Unit, companies that use data-directed decision-making enjoy a 5-6% boost in productivity. Yet 53% of companies leverage only half of their valuable data, and one-fourth of respondents noted that vast quantities of useful data go untapped. The data volumes are so high that manual analysis is impossible, and most legacy software systems can’t keep up, resulting in valuable data being discarded or ignored.

With Big Data & Analytics’ high-speed, scalable big data software, CSPs can mine all their data for better decision making in less time. Different Big Data products and techniques provide an end-to-end software platform for collecting, preparing, analyzing and presenting insights from big data. Application areas include network performance monitoring, fraud detection, customer churn detection and credit risk analysis. Big Data & Analytics products scale to handle terabytes of data but implementation of such tools need new kind of cloud based database system like Hadoop or massive scale parallel computing processor ( KPU etc.)

This course work on Big Data BI for Telco covers all the emerging new areas in which CSPs are investing for productivity gain and opening up new business revenue stream. The course will provide a complete 360 degree over view of Big Data BI in Telco so that decision makers and managers can have a very wide and comprehensive overview of possibilities of Big Data BI in Telco for productivity and revenue gain.

Course objectives

Main objective of the course is to introduce new Big Data business intelligence techniques in 4 sectors of Telecom Business (Marketing/Sales, Network Operation, Financial operation and Customer Relation Management). Students will be introduced to following:

  • Introduction to Big Data-what is 4Vs (volume, velocity, variety and veracity) in Big Data- Generation, extraction and management from Telco perspective
  • How Big Data analytic differs from legacy data analytic
  • In-house justification of Big Data -Telco perspective
  • Introduction to Hadoop Ecosystem- familiarity with all Hadoop tools like Hive, Pig, SPARC –when and how they are used to solve Big Data problem
  • How Big Data is extracted to analyze for analytics tool-how Business Analysis’s can reduce their pain points of collection and analysis of data through integrated Hadoop dashboard approach
  • Basic introduction of Insight analytics, visualization analytics and predictive analytics for Telco
  • Customer Churn analytic and Big Data-how Big Data analytic can reduce customer churn and customer dissatisfaction in Telco-case studies
  • Network failure and service failure analytics from Network meta-data and IPDR
  • Financial analysis-fraud, wastage and ROI estimation from sales and operational data
  • Customer acquisition problem-Target marketing, customer segmentation and cross-sale from sales data
  • Introduction and summary of all Big Data analytic products and where they fit into Telco analytic space
  • Conclusion-how to take step-by-step approach to introduce Big Data Business Intelligence in your organization

Target Audience

  • Network operation, Financial Managers, CRM managers and top IT managers in Telco CIO office.
  • Business Analysts in Telco
  • CFO office managers/analysts
  • Operational managers
  • QA managers

Course Outline

Breakdown of topics on daily basis: (Each session is 2 hours)

Day-1: Session -1: Business Overview of Why Big Data Business Intelligence in Telco.

  • Case Studies from T-Mobile, Verizon etc.
  • Big Data adaptation rate in North American Telco & and how they are aligning their future business model and operation around Big Data BI
  • Broad Scale Application Area
  • Network and Service management
  • Customer Churn Management
  • Data Integration & Dashboard visualization
  • Fraud management
  • Business Rule generation
  • Customer profiling
  • Localized Ad pushing

Day-1: Session-2 : Introduction of Big Data-1

  • Main characteristics of Big Data-volume, variety, velocity and veracity. MPP architecture for volume.
  • Data Warehouses – static schema, slowly evolving dataset
  • MPP Databases like Greenplum, Exadata, Teradata, Netezza, Vertica etc.
  • Hadoop Based Solutions – no conditions on structure of dataset.
  • Typical pattern : HDFS, MapReduce (crunch), retrieve from HDFS
  • Batch- suited for analytical/non-interactive
  • Volume : CEP streaming data
  • Typical choices – CEP products (e.g. Infostreams, Apama, MarkLogic etc)
  • Less production ready – Storm/S4
  • NoSQL Databases – (columnar and key-value): Best suited as analytical adjunct to data warehouse/database

Day-1 : Session -3 : Introduction to Big Data-2

NoSQL solutions

  • KV Store – Keyspace, Flare, SchemaFree, RAMCloud, Oracle NoSQL Database (OnDB)
  • KV Store – Dynamo, Voldemort, Dynomite, SubRecord, Mo8onDb, DovetailDB
  • KV Store (Hierarchical) – GT.m, Cache
  • KV Store (Ordered) – TokyoTyrant, Lightcloud, NMDB, Luxio, MemcacheDB, Actord
  • KV Cache – Memcached, Repcached, Coherence, Infinispan, EXtremeScale, JBossCache, Velocity, Terracoqua
  • Tuple Store – Gigaspaces, Coord, Apache River
  • Object Database – ZopeDB, DB40, Shoal
  • Document Store – CouchDB, Cloudant, Couchbase, MongoDB, Jackrabbit, XML-Databases, ThruDB, CloudKit, Prsevere, Riak-Basho, Scalaris
  • Wide Columnar Store – BigTable, HBase, Apache Cassandra, Hypertable, KAI, OpenNeptune, Qbase, KDI

Varieties of Data: Introduction to Data Cleaning issue in Big Data

  • RDBMS – static structure/schema, doesn’t promote agile, exploratory environment.
  • NoSQL – semi structured, enough structure to store data without exact schema before storing data
  • Data cleaning issues

Day-1 : Session-4 : Big Data Introduction-3 : Hadoop

  • When to select Hadoop?
  • STRUCTURED – Enterprise data warehouses/databases can store massive data (at a cost) but impose structure (not good for active exploration)
  • SEMI STRUCTURED data – tough to do with traditional solutions (DW/DB)
  • Warehousing data = HUGE effort and static even after implementation
  • For variety & volume of data, crunched on commodity hardware – HADOOP
  • Commodity H/W needed to create a Hadoop Cluster

Introduction to Map Reduce /HDFS

  • MapReduce – distribute computing over multiple servers
  • HDFS – make data available locally for the computing process (with redundancy)
  • Data – can be unstructured/schema-less (unlike RDBMS)
  • Developer responsibility to make sense of data
  • Programming MapReduce = working with Java (pros/cons), manually loading data into HDFS

Day-2: Session-1.1: Spark : In Memory distributed database

  • What is “In memory” processing?
  • Spark SQL
  • Spark SDK
  • Spark API
  • RDD
  • Spark Lib
  • Hanna
  • How to migrate an existing Hadoop system to Spark

Day-2 Session -1.2: Storm -Real time processing in Big Data

  • Streams
  • Sprouts
  • Bolts
  • Topologies

Day-2: Session-2: Big Data Management System

  • Moving parts, compute nodes start/fail :ZooKeeper – For configuration/coordination/naming services
  • Complex pipeline/workflow: Oozie – manage workflow, dependencies, daisy chain
  • Deploy, configure, cluster management, upgrade etc (sys admin) :Ambari
  • In Cloud : Whirr
  • Evolving Big Data platform tools for tracking
  • ETL layer application issues

Day-2: Session-3: Predictive analytics in Business Intelligence -1: Fundamental Techniques & Machine learning based BI :

  • Introduction to Machine learning
  • Learning classification techniques
  • Bayesian Prediction-preparing training file
  • Markov random field
  • Supervised and unsupervised learning
  • Feature extraction
  • Support Vector Machine
  • Neural Network
  • Reinforcement learning
  • Big Data large variable problem -Random forest (RF)
  • Representation learning
  • Deep learning
  • Big Data Automation problem – Multi-model ensemble RF
  • Automation through Soft10-M
  • LDA and topic modeling
  • Agile learning
  • Agent based learning- Example from Telco operation
  • Distributed learning –Example from Telco operation
  • Introduction to Open source Tools for predictive analytics : R, Rapidminer, Mahut
  • More scalable Analytic-Apache Hama, Spark and CMU Graph lab

Day-2: Session-4 Predictive analytics eco-system-2: Common predictive analytic problems in Telecom

  • Insight analytic
  • Visualization analytic
  • Structured predictive analytic
  • Unstructured predictive analytic
  • Customer profiling
  • Recommendation Engine
  • Pattern detection
  • Rule/Scenario discovery –failure, fraud, optimization
  • Root cause discovery
  • Sentiment analysis
  • CRM analytic
  • Network analytic
  • Text Analytics
  • Technology assisted review
  • Fraud analytic
  • Real Time Analytic

Day-3 : Sesion-1 : Network Operation analytic- root cause analysis of network failures, service interruption from meta data, IPDR and CRM:

  • CPU Usage
  • Memory Usage
  • QoS Queue Usage
  • Device Temperature
  • Interface Error
  • IoS versions
  • Routing Events
  • Latency variations
  • Syslog analytics
  • Packet Loss
  • Load simulation
  • Topology inference
  • Performance Threshold
  • Device Traps
  • IPDR ( IP detailed record) collection and processing
  • Use of IPDR data for Subscriber Bandwidth consumption, Network interface utilization, modem status and diagnostic
  • HFC information

Day-3: Session-2: Tools for Network service failure analysis:

  • Network Summary Dashboard: monitor overall network deployments and track your organization’s key performance indicators
  • Peak Period Analysis Dashboard: understand the application and subscriber trends driving peak utilization, with location-specific granularity
  • Routing Efficiency Dashboard: control network costs and build business cases for capital projects with a complete understanding of interconnect and transit relationships
  • Real-Time Entertainment Dashboard: access metrics that matter, including video views, duration, and video quality of experience (QoE)
  • IPv6 Transition Dashboard: investigate the ongoing adoption of IPv6 on your network and gain insight into the applications and devices driving trends
  • Case-Study-1: The Alcatel-Lucent Big Network Analytics (BNA) Data Miner
  • Multi-dimensional mobile intelligence (m.IQ6)

Day-3 : Session 3: Big Data BI for Marketing/Sales –Understanding sales/marketing from Sales data: ( All of them will be shown with a live predictive analytic demo )

  • To identify highest velocity clients
  • To identify clients for a given products
  • To identify right set of products for a client ( Recommendation Engine)
  • Market segmentation technique
  • Cross-Sale and upsale technique
  • Client segmentation technique
  • Sales revenue forecasting technique

Day-3: Session 4: BI needed for Telco CFO office:

  • Overview of Business Analytics works needed in a CFO office
  • Risk analysis on new investment
  • Revenue, profit forecasting
  • New client acquisition forecasting
  • Loss forecasting
  • Fraud analytic on finances ( details next session )

Day-4 : Session-1: Fraud prevention BI from Big Data in Telco-Fraud analytic:

  • Bandwidth leakage / Bandwidth fraud
  • Vendor fraud/over charging for projects
  • Customer refund/claims frauds
  • Travel reimbursement frauds

Day-4 : Session-2: From Churning Prediction to Churn Prevention:

  • 3 Types of Churn : Active/Deliberate , Rotational/Incidental, Passive Involuntary
  • 3 classification of churned customers: Total, Hidden, Partial
  • Understanding CRM variables for churn
  • Customer behavior data collection
  • Customer perception data collection
  • Customer demographics data collection
  • Cleaning CRM Data
  • Unstructured CRM data ( customer call, tickets, emails) and their conversion to structured data for Churn analysis
  • Social Media CRM-new way to extract customer satisfaction index
  • Case Study-1 : T-Mobile USA: Churn Reduction by 50%

Day-4 : Session-3: How to use predictive analysis for root cause analysis of customer dis-satisfaction :

  • Case Study -1 : Linking dissatisfaction to issues – Accounting, Engineering failures like service interruption, poor bandwidth service
  • Case Study-2: Big Data QA dashboard to track customer satisfaction index from various parameters such as call escalations, criticality of issues, pending service interruption events etc.

Day-4: Session-4: Big Data Dashboard for quick accessibility of diverse data and display :

  • Integration of existing application platform with Big Data Dashboard
  • Big Data management
  • Case Study of Big Data Dashboard: Tableau and Pentaho
  • Use Big Data app to push location based Advertisement
  • Tracking system and management

Day-5 : Session-1: How to justify Big Data BI implementation within an organization:

  • Defining ROI for Big Data implementation
  • Case studies for saving Analyst Time for collection and preparation of Data –increase in productivity gain
  • Case studies of revenue gain from customer churn
  • Revenue gain from location based and other targeted Ad
  • An integrated spreadsheet approach to calculate approx. expense vs. Revenue gain/savings from Big Data implementation.

Day-5 : Session-2: Step by Step procedure to replace legacy data system to Big Data System:

  • Understanding practical Big Data Migration Roadmap
  • What are the important information needed before architecting a Big Data implementation
  • What are the different ways of calculating volume, velocity, variety and veracity of data
  • How to estimate data growth
  • Case studies in 2 Telco

Day-5: Session 3 & 4: Review of Big Data Vendors and review of their products. Q/A session:

  • AccentureAlcatel-Lucent
  • Amazon –A9
  • APTEAN (Formerly CDC Software)
  • Cisco Systems
  • Cloudera
  • Dell
  • EMC
  • GoodData Corporation
  • Guavus
  • Hitachi Data Systems
  • Hortonworks
  • Huawei
  • HP
  • IBM
  • Informatica
  • Intel
  • Jaspersoft
  • Microsoft
  • MongoDB (Formerly 10Gen)
  • MU Sigma
  • Netapp
  • Opera Solutions
  • Oracle
  • Pentaho
  • Platfora
  • Qliktech
  • Quantum
  • Rackspace
  • Revolution Analytics
  • Salesforce
  • SAP
  • SAS Institute
  • Sisense
  • Software AG/Terracotta
  • Soft10 Automation
  • Splunk
  • Sqrrl
  • Supermicro
  • Tableau Software
  • Teradata
  • Think Big Analytics
  • Tidemark Systems
  • VMware (Part of EMC)