Duration
28 hours (usually 4 days including breaks)
Requirements
- An understanding of machine learning concepts.
- Knowledge of cloud computing concepts.
- A general understanding of containers (Docker) and orchestration (Kubernetes).
- Some Python programming experience is helpful.
- Experience working with a command line.
Audience
- Data science engineers.
- DevOps engineers interesting in machine learning model deployment.
- Infrastructure engineers interested in machine learning model deployment.
- Software engineers wishing to automate the integration and deployment of machine learning features with their application.
Overview
Kubeflow is a framework for running Machine Learning workloads on Kubernetes. TensorFlow is one of the most popular machine learning libraries. Kubernetes is an orchestration platform for managing containerized applications.
This instructor-led, live training (online or onsite) is aimed at engineers who wish to deploy Machine Learning workloads to Azure cloud.
By the end of this training, participants will be able to:
- Install and configure Kubernetes, Kubeflow and other needed software on Azure.
- Use Azure Kubernetes Service (AKS) to simplify the work of initializing a Kubernetes cluster on Azure.
- Create and deploy a Kubernetes pipeline for automating and managing ML models in production.
- Train and deploy TensorFlow ML models across multiple GPUs and machines running in parallel.
- Leverage other AWS managed services to extend an ML application.
Format of the Course
- Interactive lecture and discussion.
- Lots of exercises and practice.
- Hands-on implementation in a live-lab environment.
Course Customization Options
- To request a customized training for this course, please contact us to arrange.
Course Outline
Introduction
- Kubeflow on Azure vs on-premise vs on other public cloud providers
Overview of Kubeflow Features and Architecture
Overview of the Deployment Process
Activating an Azure Account
Preparing and Launching GPU-enabled Virtual Machines
Setting up User Roles and Permissions
Preparing the Build Environment
Selecting a TensorFlow Model and Dataset
Packaging Code and Frameworks into a Docker Image
Setting up a Kubernetes Cluster Using AKS
Staging the Training and Validation Data
Configuring Kubeflow Pipelines
Launching a Training Job.
Visualizing the Training Job in Runtime
Cleaning up After the Job Completes
Troubleshooting
Summary and Conclusion