Duration
28 hours (usually 4 days including breaks)
Requirements
- An understanding of machine learning concepts.
- Knowledge of cloud computing concepts.
- A general understanding of containers (Docker) and orchestration (Kubernetes).
- Some Python programming experience is helpful.
- Experience working with a command line.
Audience
- Data science engineers.
- DevOps engineers interesting in machine learning model deployment.
- Infrastructure engineers interesting in machine learning model deployment.
- Software engineers wishing to automate the integration and deployment of machine learning features with their application.
Overview
Kubeflow is a framework for running Machine Learning workloads on Kubernetes. TensorFlow is one of the most popular machine learning libraries. Kubernetes is an orchestration platform for managing containerized applications.
This instructor-led, live training (online or onsite) is aimed at engineers who wish to deploy Machine Learning workloads to Google Cloud Platform (GCP).
By the end of this training, participants will be able to:
- Install and configure Kubernetes, Kubeflow and other needed software on GCP and GKE.
- Use GKE (Kubernetes Kubernetes Engine) to simplify the work of initializing a Kubernetes cluster on GCP.
- Create and deploy a Kubernetes pipeline for automating and managing ML models in production.
- Train and deploy TensorFlow ML models across multiple GPUs and machines running in parallel.
- Leverage other GCP services to extend an ML application.
Format of the Course
- Interactive lecture and discussion.
- Lots of exercises and practice.
- Hands-on implementation in a live-lab environment.
Course Customization Options
- To request a customized training for this course, please contact us to arrange.
Course Outline
Introduction
- Kubeflow on GCK vs on-premise vs on other public cloud providers
Overview of Kubeflow Features on GCP
- Declarative management of resources
- GKE autoscaling for machine learning (ML) workloads
- Secure connections to Jupyter
- Persistent logs for debugging and troubleshooting
- GPUs and TPUs to accelerate workloads
Overview of Environment Setup
- Virtual machine preparation
- Kubernetes cluster setup
- Kubeflow installation
Deploying Kubeflow
- Deploying Kubeflow on GCP
- Deploying Kubeflow across on-premises and cloud environments
- Deploying Kubeflow on GKE
- Setting up a custom domain on GKE
Pipelines on GCP
- Setting up an end-to-end Kubeflow pipeline
- Customizing Kubeflow Pipelines
Securing a Kubeflow Cluster
- Setting up authentication and authorization
- Using VPC service controls and private GKE
Storing, Accessing, Managing Data
- Understanding shared filesystems and Network Attached Storage (NAS)
- Using managed file storage services in GCE
Running an ML Training Job
- Training an MNIST model
Administering Kubeflow
- Logging and monitoring
Troubleshooting
Summary and Conclusion