Kubeflow on AWS Training Course

Duration

28 hours (usually 4 days including breaks)

Requirements

  • An understanding of machine learning concepts.
  • Knowledge of cloud computing concepts.
  • A general understanding of containers (Docker) and orchestration (Kubernetes).
  • Some Python programming experience is helpful.
  • Experience working with a command line.

Audience

  • Data science engineers.
  • DevOps engineers interesting in machine learning model deployment.
  • Infrastructure engineers interesting in machine learning model deployment.
  • Software engineers wishing to integrate and deploy machine learning features with their application.

Overview

Kubeflow is a framework for running Machine Learning workloads on Kubernetes. TensorFlow is a machine learning library and Kubernetes is an orchestration platform for managing containerized applications.

This instructor-led, live training (online or onsite) is aimed at engineers who wish to deploy Machine Learning workloads to an AWS EC2 server.

By the end of this training, participants will be able to:

  • Install and configure Kubernetes, Kubeflow and other needed software on AWS.
  • Use EKS (Elastic Kubernetes Service) to simplify the work of initializing a Kubernetes cluster on AWS.
  • Create and deploy a Kubernetes pipeline for automating and managing ML models in production.
  • Train and deploy TensorFlow ML models across multiple GPUs and machines running in parallel.
  • Leverage other AWS managed services to extend an ML application.

Format of the Course

  • Interactive lecture and discussion.
  • Lots of exercises and practice.
  • Hands-on implementation in a live-lab environment.

Course Customization Options

  • To request a customized training for this course, please contact us to arrange.

Course Outline

Introduction

  • Kubeflow on AWS vs on-premise vs on other public cloud providers

Overview of Kubeflow Features and Architecture

Activating an AWS Account

Preparing and Launching GPU-enabled AWS Instances

Setting up User Roles and Permissions

Preparing the Build Environment

Selecting a TensorFlow Model and Dataset

Packaging Code and Frameworks into a Docker Image

Setting up a Kubernetes Cluster Using EKS

Staging the Training and Validation Data

Configuring Kubeflow Pipelines

Launching a Training Job using Kubeflow in EKS

Visualizing the Training Job in Runtime

Cleaning up After the Job Completes

Troubleshooting

Summary and Conclusion

Leave a Reply

Your email address will not be published. Required fields are marked *