Duration
28 hours (usually 4 days including breaks)
Requirements
- An understanding of machine learning concepts.
- Knowledge of cloud computing concepts.
- A general understanding of containers (Docker) and orchestration (Kubernetes).
- Some Python programming experience is helpful.
- Experience working with a command line.
Audience
- Data science engineers.
- DevOps engineers interesting in machine learning model deployment.
- Infrastructure engineers interesting in machine learning model deployment.
- Software engineers wishing to integrate and deploy machine learning features with their application.
Overview
Kubeflow is a framework for running Machine Learning workloads on Kubernetes. TensorFlow is a machine learning library and Kubernetes is an orchestration platform for managing containerized applications.
This instructor-led, live training (online or onsite) is aimed at engineers who wish to deploy Machine Learning workloads to an AWS EC2 server.
By the end of this training, participants will be able to:
- Install and configure Kubernetes, Kubeflow and other needed software on AWS.
- Use EKS (Elastic Kubernetes Service) to simplify the work of initializing a Kubernetes cluster on AWS.
- Create and deploy a Kubernetes pipeline for automating and managing ML models in production.
- Train and deploy TensorFlow ML models across multiple GPUs and machines running in parallel.
- Leverage other AWS managed services to extend an ML application.
Format of the Course
- Interactive lecture and discussion.
- Lots of exercises and practice.
- Hands-on implementation in a live-lab environment.
Course Customization Options
- To request a customized training for this course, please contact us to arrange.
Course Outline
Introduction
- Kubeflow on AWS vs on-premise vs on other public cloud providers
Overview of Kubeflow Features and Architecture
Activating an AWS Account
Preparing and Launching GPU-enabled AWS Instances
Setting up User Roles and Permissions
Preparing the Build Environment
Selecting a TensorFlow Model and Dataset
Packaging Code and Frameworks into a Docker Image
Setting up a Kubernetes Cluster Using EKS
Staging the Training and Validation Data
Configuring Kubeflow Pipelines
Launching a Training Job using Kubeflow in EKS
Visualizing the Training Job in Runtime
Cleaning up After the Job Completes
Troubleshooting
Summary and Conclusion