Kubeflow on GCP Training Course

Duration

28 hours (usually 4 days including breaks)

Requirements

  • An understanding of machine learning concepts.
  • Knowledge of cloud computing concepts.
  • A general understanding of containers (Docker) and orchestration (Kubernetes).
  • Some Python programming experience is helpful.
  • Experience working with a command line.

Audience

  • Data science engineers.
  • DevOps engineers interesting in machine learning model deployment.
  • Infrastructure engineers interesting in machine learning model deployment.
  • Software engineers wishing to automate the integration and deployment of machine learning features with their application.

Overview

Kubeflow is a framework for running Machine Learning workloads on Kubernetes. TensorFlow is one of the most popular machine learning libraries. Kubernetes is an orchestration platform for managing containerized applications.

This instructor-led, live training (online or onsite) is aimed at engineers who wish to deploy Machine Learning workloads to Google Cloud Platform (GCP).

By the end of this training, participants will be able to:

  • Install and configure Kubernetes, Kubeflow and other needed software on GCP and GKE.
  • Use GKE (Kubernetes Kubernetes Engine) to simplify the work of initializing a Kubernetes cluster on GCP.
  • Create and deploy a Kubernetes pipeline for automating and managing ML models in production.
  • Train and deploy TensorFlow ML models across multiple GPUs and machines running in parallel.
  • Leverage other GCP services to extend an ML application.

Format of the Course

  • Interactive lecture and discussion.
  • Lots of exercises and practice.
  • Hands-on implementation in a live-lab environment.

Course Customization Options

  • To request a customized training for this course, please contact us to arrange.

Course Outline

Introduction

  • Kubeflow on GCK vs on-premise vs on other public cloud providers

Overview of Kubeflow Features on GCP

  • Declarative management of resources
  • GKE autoscaling for machine learning (ML) workloads
  • Secure connections to Jupyter
  • Persistent logs for debugging and troubleshooting
  • GPUs and TPUs to accelerate workloads

Overview of Environment Setup

  • Virtual machine preparation
  • Kubernetes cluster setup
  • Kubeflow installation

Deploying Kubeflow

  • Deploying  Kubeflow on GCP
  • Deploying Kubeflow across on-premises and cloud environments
  • Deploying Kubeflow on GKE
  • Setting up a custom domain on GKE

Pipelines on GCP

  • Setting up an end-to-end Kubeflow pipeline
  • Customizing Kubeflow Pipelines

Securing a Kubeflow Cluster

  • Setting up authentication and authorization
  • Using VPC service controls and private GKE

Storing, Accessing, Managing Data

  • Understanding shared filesystems and Network Attached Storage (NAS)
  • Using managed file storage services in GCE

Running an ML Training Job

  • Training an MNIST model

Administering Kubeflow

  • Logging and monitoring

Troubleshooting

Summary and Conclusion

Leave a Reply

Your email address will not be published. Required fields are marked *