Deploying innovative AI models in different production environments becomes a common problem as AI applications become more ubiquitous in our daily lives. Deployment of both training and inference workloads bring great challenges as we start to support a combinatorial choice of models and environment. Additionally, real world applications bring with a multitude of goals, such as minimizing dependencies, broader model coverage, leveraging the emerging hardware primitives for performance, reducing memory footprint, and scaling to larger environments.
Solving these problems for training and inference involves a combination of ML programming abstractions, learning-driven search, compilation, and optimized library runtime. These themes form an emerging topic – machine learning compilation that contains active ongoing developments. In this tutorials sequence, we offer the first comprehensive treatment of its kind to study key elements in this emerging field systematically. We will learn the key abstractions to represent machine learning programs, automatic optimization techniques, and approaches to optimize dependency, memory, and performance in end-to-end machine learning deployment.
This material serves as the reference for MLC course, we will populate notes and tutorials here as course progresses.
- 1. Introduction
- 1.1. What is ML Compilation
- 1.2. Why Study ML Compilation
- 1.3. Key Elements of ML Compilation
- 1.4. Summary
- 2. Tensor Program Abstraction
- 2.1. Primitive Tensor Function
- 2.2. Tensor Program Abstraction
- 2.3. Summary
- 2.4. TensorIR: Tensor Program Abstraction Case Study
- 2.5. Exercises for TensorIR
- 3. End to End Model Execution
- 3.1. Prelude
- 3.2. Preparations
- 3.3. End to End Model Integration
- 3.4. Constructing an End to End IRModule in TVMScript
- 3.5. Build and Run the Model
- 3.6. Integrate Existing Libraries in the Environment
- 3.7. Mixing TensorIR Code and Libraries
- 3.8. Bind Parameters to IRModule
- 3.9. Discussions
- 3.10. Summary
- 4. Automatic Program Optimization
- 4.1. Prelude
- 4.2. Preparations
- 4.3. Recap: Transform a Primitive Tensor Function.
- 4.4. Stochastic Schedule Transformation
- 4.5. Search Over Stochastic Transformations
- 4.6. Putting Things Back to End to End Model Execution
- 4.7. Discussions
- 4.8. Summary
- 5. Integration with Machine Learning Frameworks
- 5.1. Prelude
- 5.2. Preparations
- 5.3. Build an IRModule Through a Builder
- 5.4. Import Model From PyTorch
- 5.5. Coming back to FashionMNIST Example
- 5.6. Remark: Translating into High-level Operators
- 5.7. Discussions
- 5.8. Summary
- 6. GPU and Hardware Acceleration
- 6.1. Part 1
- 6.2. Part 2
- 7. Computational Graph Optimization
- 7.1. Prelude
- 7.2. Preparations
- 7.3. Pattern Match and Rewriting
- 7.4. Fuse Linear and ReLU
- 7.5. Map to TensorIR Calls
- 7.6. Build and Run
- 7.7. Discussion
- 7.8. Summary