Introduction to Graph Computing Training Course

Duration

28 hours (usually 4 days including breaks)

Requirements

  • An undersanding of Java programming and frameworks
  • A general understanding of Python is helpful but not required
  • A general understanding of database concepts

Audience

  • Developers

Overview

Many real world problems can be described in terms of graphs. For example, the Web graph, the social network graph, the train network graph and the language graph. These graphs tend to be extremely large; processing them requires a specialized set of tools and processes — these tools and processes can be referred to as Graph Computing (also known as Graph Analytics).

In this instructor-led, live training, participants will learn about the technology offerings and implementation approaches for processing graph data. The aim is to identify real-world objects, their characteristics and relationships, then model these relationships and process them as data using a Graph Computing (also known as Graph Analytics and Distributed Graph Processing) approach. We start with a broad overview and narrow in on specific tools as we step through a series of case studies, hands-on exercises and live deployments.

By the end of this training, participants will be able to:

  • Understand how graph data is persisted and traversed.
  • Select the best framework for a given task (from graph databases to batch processing frameworks.)
  • Implement Hadoop, Spark, GraphX and Pregel to carry out graph computing across many machines in parallel.
  • View real-world big data problems in terms of graphs, processes and traversals.

Format of the course

  • Part lecture, part discussion, exercises and heavy hands-on practice

Course Outline

Introduction

  • Graph databases and libraries

Understanding Graph Data

  • The graph as a data structure
  • Using vertices (dots) and edges (lines) to model real-world scenarios

Using Graph Databases to Model, Persist and Process Graph Data

  • Local graph algorithms/traversals
  • neo4j, OrientDB and Titan

Exercise: Modeling Graph Data with neo4j

  • Whiteboard data modeling

Beyond Graph Databases: Graph Computing

  • Understanding the property graph
  • Graph modeling different scenarios (software graph, discussion graph, concept graph)

Solving Real-World Problems with Traversals

  • Algorithmic/directed walk over the graph
  • Determining circular cependencies

Case Study: Ranking Discussion Contributors

  • Ranking by number and depth of contributed discussions
  • A note on sentiment and concept analysis

Graph Computing: Local, In-Memory Graph toolkits

  • Graph analysis and visualization
  • JUNG, NetworkX, and iGraph

Exercise: Modeling Graph Data with NetworkX

  • Using NetworkX to model a complex system

Graph Computing: Batch Processing Graph Frameworks

  • Leveraging Hadoop for storage (HDFS) and processing (MapReduce)
  • Overview of iterative algorithms
  • Hama, Giraph, and GraphLab

Graph Computing: Graph-Parallel Computation

  • Unifying ETL, exploratory analysis, and iterative graph computation within a single system
  • GraphX

Setup and Installation

  • Hadoop and Spark

GraphX Operators

  • Property, structural, join, neighborhood aggregation, caching and uncaching

Iterating with Pregel API

  • Passing arguments for sending, receiving and computing

Building a Graph

  • Using vertices and edges in an RDD or on disk

Designing Scalable Algorithms

  • GraphX Optimization

Accessing Additional Algorithms

  • PageRank, Connected Components, Triangle Counting

Exercis: Page Rank and Top Users

  • Building and processing graph data using text files as input

Deploying to Production

Closing Remarks

Leave a Reply

Your email address will not be published. Required fields are marked *