Kubeflow Training Course

Duration

35 hours (usually 5 days including breaks)

Requirements

  • Familiarity with Python syntax 
  • Experience with Tensorflow, PyTorch, or other machine learning framework
  • An AWS account with necessary resources

Audience

  • Developers
  • Data scientists

Overview

Kubeflow is a toolkit for making Machine Learning (ML) on Kubernetes easy, portable and scalable. AWS EKS (Elastic Kubernetes Service) is an Amazon managed service for running the Kubernetes on AWS.

This instructor-led, live training (online or onsite) is aimed at developers and data scientists who wish to build, deploy, and manage machine learning workflows on Kubernetes.

By the end of this training, participants will be able to:

  • Install and configure Kubeflow on premise and in the cloud using AWS EKS (Elastic Kubernetes Service).
  • Build, deploy, and manage ML workflows based on Docker containers and Kubernetes.
  • Run entire machine learning pipelines on diverse architectures and cloud environments.
  • Using Kubeflow to spawn and manage Jupyter notebooks.
  • Build ML training, hyperparameter tuning, and serving workloads across multiple platforms.

Format of the Course

  • Interactive lecture and discussion.
  • Lots of exercises and practice.
  • Hands-on implementation in a live-lab environment.

Course Customization Options

  • To request a customized training for this course, please contact us to arrange.

Course Outline

Introduction

  • Introduction to Kubernetes
  • Overview of Kubeflow Features and Architecture
  • Kubeflow on AWS vs on-premise vs on other public cloud providers

Setting up a Cluster using AWS EKS

Setting up an On-Premise Cluster using Microk8s

Deploying Kubernetes using a GitOps Approach

Data Storage Approaches

Creating a Kubeflow Pipeline

Triggering a Pipeline

Defining Output Artifacts

Storing Metadata for Datasets and Models

Hyperparameter Tuning with TensorFlow

Visualizing and Analyzing the Results

Multi-GPU Training

Creating an Inference Server for Deploying ML Models

Working with JupyterHub

Networking and Load Balancing

Auto Scaling a Kubernetes Cluster

Troubleshooting

Summary and Conclusion

Leave a Reply

Your email address will not be published. Required fields are marked *