Machine Learning Models Explained

What Is a model?

The output from model training may be used for inference, which means making predictions on new data. A model is a distilled representation of what a machine learning system has learned. Machine learning models are akin to mathematical functions — they take a request in the form of input data, make a prediction on that input data, and then serve a response.

In supervised and unsupervised machine learning, the model describes the signal in the noise or the pattern detected from the training data.

In reinforcement learning, the model describes the best possible course of action given a specific situation.

The final set of trainable parameters (the information the model contains) depends on the specific type of model — in deep neural networks, a model is the final state of the trained weights of the network, in regression it contains coefficients, and in decision trees it contains the split locations.


Neural Networks

There are many different types of models such as GANs, LSTMs & RNNs, CNNs, Autoencoders, and Deep Reinforcement Learning models. Deep neural networks are used for object detection, speech recognition and synthesis , image processing, style transfer , and machine translation, and can replace most classical machine learning algorithms (see below) . This modern method can learn extremely complex patterns and is especially successful on unstructured datasets such as images, video, and audio.

Ensemble Methods

Ensemble techniques like Random Forests and Gradient Boosting can achieve superior performance over classical machine learning techniques by aggregating weaker models and learning non-linear relationships.

Classical Machine Learning

Popular ML algorithms include: linear regression, logistic regression, SVMs, nearest neighbor, decision trees, PCA, naive Bayes classifier, and k-means clustering. Classical machine learning algorithms are used for a wide range of applications.

Types of Supervised Learning Models


Deep neural networks, classification trees (ensembles), and logistic regression (classical machine learning) are all used to perform regression tasks.

Popular use cases: Spam filtering , language detection , a search of similar documents , sentiment analysis , recognition of handwritten characters, and fraud detection.

Binary Classification Goal: Predict a binary outcome.


  • “Is this email spam or not spam?”
  • “Is this user fraudulent or not?”
  • “Is this picture a cat or not?”

Multi-class Classification Goal: Predict one out of two or more discrete outcomes.


  • “Which genre does this user prefer?”
  • “Is this mail is spam or important or a promotion?”
  • “Is this picture a cat or a dog or a fox?”


Deep neural networks, regression trees (ensembles), and linear regression (classical machine learning) are all used to perform regression tasks.

Popular use cases: Forecasting stock prices, predicting sales volume, etc .

Goal: Predict a numeric value.


  • “What will the temperature be in NYC tomorrow?”
  • “What is the price of house in this specific neighborhood?”

Types of Unsupervised Learning models

Neural Networks

Deep neural network architectures such as autoencoders and GANs can be applied to a wide variety of unsupervised learning problems.

Clustering (Classical ML)

Popular use cases: For customer segmentation , labeling data , detecting anomalous behavior, etc .

Popular algorithms: K-means, Mean-Shift, DBSCAN

Goal: Group similar things together.


  • “Cluster different news articles into different types of news”

Association Rule (Classical ML)

Popular use cases: Helping stores cross-sell products , uncovering how items are related or complementary, and understanding which symptoms are likely to co-occur in a patient (comorbidity).

Popular algorithms: Apriori, Euclat, FP-growth

Goal: Infer patterns (associations) in data.


  • “If you bought a phone, you are likely to buy a phone case.”

Dimensionality Reduction (Classical ML)

Popular use-cases: Recommender systems, topic modeling, modeling semantics, document search, face recognition, and anomaly detection.

Popular algorithms: Principal Component Analysis (PCA), Singular Value Decomposition (SVD), Latent Dirichlet Allocation (LDA), Latent Semantic Analysis (LSA, pLSA, GLSA), and t-SNE.

Goal: To generalize data and distill the relevant information.


  • “Intelligently group and combine similar features into higher-level abstractions.”

Types of Reinforcement Learning Models

Popular use-cases: Robotic motion, recommender systems, autonomous transport, text mining, trade execution in finance, and optimization for treatment policies in healthcare.

Popular algorithms: Q-Learning, SARSA, DQN, A3C

Goal: Perform complex tasks without training data.


  • “Robotic motion control learned by trial and error.”

Imitation Learning is an exciting area in Reinforcement Learning, designed to overcome some of the challenges or shortcomings inherent in Reinforcement Learning techniques. These techniques are often used together.

Machine Learning Models

What is a machine learning Model?

A machine learning model is a program that can find patterns or make decisions from a previously unseen dataset. For example, in natural language processing, machine learning models can parse and correctly recognize the intent behind previously unheard sentences or combinations of words. In image recognition, a machine learning model can be taught to recognize objects – such as cars or dogs. A machine learning model can perform such tasks by having it ‘trained’ with a large dataset. During training, the machine learning algorithm is optimized to find certain patterns or outputs from the dataset, depending on the task. The output of this process – often a computer program with specific rules and data structures – is called a machine learning model.

What is a machine learning Algorithm?

A machine learning algorithm is a mathematical method to find patterns in a set of data. Machine Learning algorithms are often drawn from statistics, calculus, and linear algebra. Some popular examples of machine learning algorithms include linear regression, decision trees, random forest, and XGBoost.

What is Model Training in machine learning?

The process of running a machine learning algorithm on a dataset (called training data) and optimizing the algorithm to find certain patterns or outputs is called model training. The resulting function with rules and data structures is called the trained machine learning model.

What are the different types of Machine Learning?

In general, most machine learning techniques can be classified into supervised learning, unsupervised learning, and reinforcement learning.

What is Supervised Machine Learning?

In supervised machine learning, the algorithm is provided an input dataset, and is rewarded or optimized to meet a set of specific outputs. For example, supervised machine learning is widely deployed in image recognition, utilizing a technique called classification. Supervised machine learning is also used in predicting demographics such as population growth or health metrics, utilizing a technique called regression.

What is Unsupervised Machine Learning?

In unsupervised machine learning, the algorithm is provided an input dataset, but not rewarded or optimized to specific outputs, and instead trained to group objects by common characteristics. For example, recommendation engines on online stores rely on unsupervised machine learning, specifically a technique called clustering.

What is Reinforcement Learning?

In reinforcement learning, the algorithm is made to train itself using many trial and error experiments. Reinforcement learning happens when the algorithm interacts continually with the environment, rather than relying on training data. One of the most popular examples of reinforcement learning is autonomous driving.

What are the different machine learning models?

There are many machine learning models, and almost all of them are based on certain machine learning algorithms. Popular classification and regression algorithms fall under supervised machine learning, and clustering algorithms are generally deployed in unsupervised machine learning scenarios.

Supervised Machine Learning

  • Logistic Regression: Logistic Regression is used to determine if an input belongs to a certain group or not
  • SVM: SVM, or Support Vector Machines create coordinates for each object in an n-dimensional space and uses a hyperplane to group objects by common features
  • Naive Bayes: Naive Bayes is an algorithm that assumes independence among variables and uses probability to classify objects based on features
  • Decision Trees: Decision trees are also classifiers that are used to determine what category an input falls into by traversing the leaf’s and nodes of a tree
  • Linear Regression: Linear regression is used to identify relationships between the variable of interest and the inputs, and predict its values based on the values of the input variables.
  • kNN: The k Nearest Neighbors technique involves grouping the closest objects in a dataset and finding the most frequent or average characteristics among the objects.
  • Random Forest: Random forest is a collection of many decision trees from random subsets of the data, resulting in a combination of trees that may be more accurate in prediction than a single decision tree.
  • Boosting algorithms: Boosting algorithms, such as Gradient Boosting Machine, XGBoost, and LightGBM, use ensemble learning. They combine the predictions from multiple algorithms (such as decision trees) while taking into account the error from the previous algorithm.

Unsupervised Machine Learning

  • K-Means: The K-Means algorithm finds similarities between objects and groups them into K different clusters.
  • Hierarchical Clustering: Hierarchical clustering builds a tree of nested clusters without having to specify the number of clusters.

What is a Decision Tree in Machine Learning (ML)?

A Decision Tree is a predictive approach in ML to determine what class an object belongs to. As the name suggests, a decision tree is a tree-like flow chart where the class of an object is determined step-by-step using certain known conditions.

Decision Tree in Machine Learning

A decision tree visualized in the Databricks

What is Regression in Machine Learning?

Regression in data science and machine learning is a statistical method that enables predicting outcomes based on a set of input variables. The outcome is often a variable that depends on a combination of the input variables.

Regression in Machine Learning

A linear regression model performed on the Databricks

What is a Classifier in Machine Learning?

A classifier is a machine learning algorithm that assigns an object as a member of a category or group. For example, classifiers are used to detect if an email is spam, or if a transaction is fraudulent.

How many models are there in machine learning?

Many! Machine learning is an evolving field and there are always more machine learning models being developed.

What is the best model for machine learning?

The machine learning model most suited for a specific situation depends on the desired outcome. For example, to predict the number of vehicle purchases in a city from historical data, a supervised learning technique such as linear regression might be most useful. On the other hand, to identify if a potential customer in that city would purchase a vehicle, given their income and commuting history, a decision tree might work best.

What is model deployment in Machine Learning (ML)?

Model deployment is the process of making a machine learning model available for use on a target environment—for testing or production. The model is usually integrated with other applications in the environment (such as databases and UI) through APIs. Deployment is the stage after which an organization can actually make a return on the heavy investment made in model development.

Model Deployment in Machine Learning

A full machine learning model lifecycle on the Databricks Lakehouse.

What are Deep Learning Models?

Deep learning models are a class of ML models that imitate the way humans process information. The model consists of several layers of processing (hence the term ‘deep’) to extract high-level features from the data provided. Each processing layer passes on a more abstract representation of the data to the next layer, with the final layer providing a more human-like insight. Unlike traditional ML models which require data to be labeled, deep learning models can ingest large amounts of unstructured data. They are used to perform more human-like functions such as facial recognition and natural language processing.

Deep Learning Models

A simplified representation of deep learning.Source:

What is Time Series Machine Learning?

A time-series machine learning model is one in which one of the independent variables is a successive length of time minutes, days, years etc.), and has a bearing on the dependent or predicted variable. Time series machine learning models are used to predict time-bound events, for example – the weather in a future week, expected number of customers in a future month, revenue guidance for a future year, and so on.

Where can I learn more about machine learning?

  • Check out this free eBook to discover the many fascinating machine learning use-cases being deployed by enterprises globally.
  • To get a deeper understanding of machine learning from the experts, check out the Databricks Machine Learning blog.