Machine learning, deep learning, and Artificial Intelligence (AI) are buzzwords that everyone is talking about. These terms often seem to be used interchangeably which creates lots of misconceptions in people’s understanding. Hence, the need for why it is important to dispel the myth that these concepts are synonymous and understand the difference between the three.
Both machine learning and deep learning help discover latent patterns in data, but they involve dramatically different techniques and coverage. Machine learning and deep learning are both subsets of AI. More precisely, deep learning is a specific kind of machine learning. They both start with splitting input data into training and testing, fitting a model, and going through an optimization process to find the right set of parameters which can result in the best fit model. Both can handle numeric (regression) and non-numeric (classification) problems, although there are several specialized application areas, such as computer vision, NLP, etc. where deep learning models perform significantly better.
Figure 1
Image source: bluehexagon.ai
For Figure 1 above, in a machine learning technique, an engineer/expert would need to define features that can separate Target 1’s and 0’s before they can fit data into model which can give verdicts based upon these features. The inherent challenge engineers face is handcrafting these features. Whereas, deep learning does not necessarily need structured data so an engineer would not need to manually define any features for the model. Here, each network hierarchy determines the specific features/characteristics that defines Target. This model can now be used to detect a variety of potential threats or even the ones which might be very novel.
1. Data Dependencies
One more important difference between machine learning and deep learning can be identified based on the performance of each. When the learning data is small, machine learning algorithms outperform deep learning because deep learning thrives on big data to understand it and generalization error bound shrinks as the training data size increases. On the other hand, machine learning algorithms with their handcrafted rules start to collapse as size of the data increases.
Figure 2
Image source: bluehexagon.ai
This means that while deep learning continues to excel in performance and efficacy, machine learning systems will degrade at some point no matter how much more training data to which it’s exposed. (Refer to Figure 2). Deep learning models are very useful in non-linear separation problems. These models can convert complex non-separable problems to separable non-linear functions. Also, no feature engineering is required, nor is a necessity for structured data and minimal human intervention which minimizes the likelihood of human bias to the model.
2. Hardware Dependencies
Machine learning is basically a mathematical/probabilistic model that requires tons of computations that are virtually impossible for humans to solve. The consumer hardware we use can support machine learning with limited data but may not be appropriate for extensive computation required for iterative deep learning networks.
Four-major steps to prepare a deep learning model:
- Pre-process Input Data
- Train deep learning model
- Store trained model
- Deploy model
Out of these four, training a deep learning model is computationally the most expensive task. To accomplish this, we need special hardware arrangements. The graphics processing unit (GPU) suits this purpose, with several thousand cores designed to compute with almost 100 percent efficiency. GPUs are suited to meet the performance needs of deep networks, and because of their fast advances, deep learning models can be trained and optimized in a more efficient manner than before.
There are some alternatives to GPUs, like FPGAs and ASIC since all devices do not contain the amount of power required to run a GPU.
3. Execution time
Typically, machine learning algorithms require structured data, but are not suitable for solving complex queries containing large amount of data. On the other hand, deep learning is used to solve complex and massive queries. In fact, given the number of layers, hierarchies, and characteristics that these networks handle, it is wise to use deep learning for complex calculations only.
A deep learning algorithm takes a long time to train because there are so many parameters. Even state-of-the-art deep learning algorithms such as GoogleNet and ResNet take a couple of days to train completely from scratch. Machine learning comparatively takes much less time to train, ranging from a few minutes to a few hours.
When it comes to testing, this is the complete opposite. At test time, a deep learning algorithm runs much faster than machine learning algorithms. Whereas, if we compare it with k-nearest neighbors (a type of machine learning algorithm), test time increases with an increase in the size of data. Although this is not applicable to all machine learning algorithms, many of them have small testing times too.
4. Interpretability
Interpretability is one of the key comparison factors of machine learning and deep learning.
Suppose we were to use a deep learning algorithm to give automated scoring. The scoring performance is quite excellent and is near accurate. But there’s an issue. The algorithm does not tell us why it has given that score. Mathematically we can find out which nodes of a network were activated, but we don’t know what these layers of neurons were doing collectively/iteratively and how they learned. So, we’re unable to interpret the results.
On the other hand, white box machine learning algorithms like Linear, logistic, tree-based, bagging and boosting give us either mathematical equations or feature importance lists, to clearly interpret the reasoning behind why the algorithm chose what, so it is particularly easy to justify and build a business story around it.
5. Applications
There are countless applications in every industry for machine learning and deep learning.
Machine learning techniques are more suitable on structured data (cross-sectional/panel data). It works well without needing special hardware capabilities, has a quick turn-around and is easy to interpret. Typical application areas are media advertising, Fraud analytics, forecasting, marketing and retail analytics. Conversely, deep learning techniques are more suitable for complex and large amounts of data (Images, text). They need special hardware capabilities, have a longer turn-around and are more difficult to interpret. Common application areas are Computer vision, search engines, medical diagnosis and NLP.