Deep Learning Model Compression

Practice model compression using Tensorflow, Pytorch, ONNX, and TensorRT

Serve compressed model in AWS Sagemaker

Understand model compression algorithms, pruning, quantization, distillation and factorization

Conduct literature survey about most recent compression techniques


  • Python programming
  • Familiar with deep learning model components


This course is intended to provide learners with an in-depth understanding of techniques used in compressing deep learning models. The techniques covered in the course include pruning, quantization, knowledge distillation, and factorization, all of which are essential for anyone working in the field of deep learning, particularly those focused on computer vision and natural language processing. These techniques should be generally applicable to all deep learning models.

One of the primary objectives of this course is to provide advanced content that is updated with the latest algorithms. This includes product quantization and its variants, tensor factorization, and other cutting-edge techniques that are rapidly evolving in the field of deep learning. To ensure learners are equipped with the knowledge they need to succeed in this field, the course will summarize these techniques based on academic papers, while avoiding an emphasis on experiment result details. It’s worth noting that leaderboard results are updated frequently, and new models may require compression. As a result, the course will focus on the technical aspects of these techniques, helping learners understand what happens behind the scenes.

Upon completion of the course, learners will feel confident in their ability to read news, blogs, and academic papers related to model compression. You will be encouraged to apply these techniques to your own work and share the knowledge with others.

Who this course is for:

  • deep learning model developers
  • model compression research beginners

Course content