Computer Vision with EfficientNet

Describe the innovations and novelties that the EfficientNet paper

Describe what it means to measure a convolutional neural network

Describe the various ways to scale a convolutional neural network architecture

Perform image classification using a pretrained EfficientNetB0 model architecture

Requirements

  • The target learners are students with a strong foundation in machine learning and a basic understanding of deep learning. These students need to learn about the history and current state of computer vision, as well as gain practical skills for developing and training deep neural networks for image classification tasks.

Description

How do you measure how big a convolutional neural network is?

You can’t weigh it or use a ruler to measure it. And if you can’t measure it…then how can you scale it? Until 2020, the process of measuring a convolutional neural network was never well understood. That is until researchers set out to answer an important question:

Is there a principled method to scale up ConvNets, so they achieve better accuracy and efficiency?

And in the process, they accomplished two feats which changed the direction of deep learning:

1) Discovered a novel scaling method called compound scaling.

2) Created a new family of SOTA architectures called EfficientNet.

Now, back to the original question: how do we measure the size of a ConvNet?

By looking at three factors:

1) Resolution (dimensions of its inputs)

2) Width (number of feature maps)

3) Depth (number of layers in the network)

All three factors — depth, width, and resolution — impact the accuracy and efficiency of your network. Ideally, you want to maximize all these factors and accomplish the following:

• Retain the baseline model architecture, i.e. keep the operations in each layer fixed.

• Leave the memory footprint of your model constrained to some target hardware.

• Keep the number of FLOPs below some predefined threshold.

But there’s a catch…

Scaling up only one network dimension (width, depth or resolution) improves accuracy, but the accuracy rapidly diminishes. For better accuracy and efficiency, you must balance all network width, depth, and resolution dimensions during ConvNet scaling.

Who this course is for:

  • To complete this course, learners should have a strong foundation in machine learning and a basic understanding of computer vision. This includes knowledge of supervised learning, neural networks, and image processing. Regarding skill level, learners should to be advanced beginners to intermediate. They have a solid understanding of the fundamental concepts and techniques of machine learning but may still be learning about more advanced topics such as computer vision. They have experience with Python, Pandas, scikit-learn and PyTorch.

Course content