Machine Learning Cheat Sheet

When working with machine learning, it’s easy to try them all out without understanding what each model does, and when to use them. In this cheat sheet, you’ll find a handy guide describing the most widely used machine learning models, their advantages, disadvantages, and some key use-cases.

Supervised Learning

Supervised learning models are models that map inputs to outputs, and attempt to extrapolate patterns learned in past data on unseen data. Supervised learning models can be either regression models, where we try to predict a continuous variable, like stock prices—or classification models, where we try to predict a binary or multi-class variable, like whether a customer will churn or not. In the section below, we’ll explain two popular types of supervised learning models: linear models, and tree-based models. 

Linear Models

In a nutshell, linear models create a best-fit line to predict unseen data. Linear models imply that outputs are a linear combination of features. In this section, we’ll specify commonly used linear models in machine learning, their advantages, and disadvantages.

AlgorithmDescriptionApplicationsAdvantagesDisadvantages
Linear RegressionA simple algorithm that models a linear relationship between inputs and a continuous numerical output variableStock Price PredictionPredicting housing pricesPredicting customer lifetime valueExplainable methodInterpretable results by its output coefficientFaster to train than other machine learning modelsAssumes linearity between inputs and outputSensitive to outliersCan underfit with small, high-dimensional data 
Logistic RegressionA simple algorithm that models a linear relationship between inputs and a categorical output (1 or 0)Predicting credit risk scoreCustomer churn predictionInterpretable and explainableLess prone to overfitting when using regularizationApplicable for multi-class predictionsAssumes linearity between inputs and outputsCan overfit with small, high-dimensional data 
Ridge RegressionPart of the regression family — it penalizes features that have low predictive outcomes by shrinking their coefficients closer to zero. Can be used for classification or regressionPredictive maintenance for automobilesSales revenue predictionLess prone to overfittingBest suited where data suffer from multicollinearityExplainable & interpretableAll the predictors are kept in the final modelDoesn’t perform feature selection
Lasso RegressionPart of the regression family — it penalizes features that have low predictive outcomes by shrinking their coefficients to zero. Can be used for classification or regressionPredicting housing pricesPredicting clinical outcomes based on health dataLess prone to overfittingCan handle high-dimensional dataNo need for feature selectionCan lead to poor interpretability as it can keep highly correlated variables

Tree-based models

In a nutshell, tree-based models use a series of “if-then” rules to predict from decision trees. In this section, we’ll specify commonly used linear models in machine learning, their advantages, and disadvantages.

AlgorithmDescriptionApplicationsAdvantagesDisadvantages
Decision TreeDecision Tree models make decision rules on the features to produce predictions. It can be used for classification or regressionCustomer churn predictionCredit score modelingDisease predictionExplainable and interpretableCan handle missing valuesProne to overfittingSensitive to outliers
Random ForestsAn ensemble learning method that combines the output of multiple decision treesCredit score modelingPredicting housing pricesReduces overfittingHigher accuracy compared to other modelsTraining complexity can be highNot very interpretable
Gradient Boosting RegressionGradient Boosting Regression employs boosting to make predictive models from an ensemble of weak predictive learnersPredicting car emissionsPredicting ride-hailing fare amountBetter accuracy compared to other regression modelsIt can handle multicollinearity
It can handle non-linear relationships
Sensitive to outliers and can therefore cause overfittingComputationally expensive and has high complexity
XGBoostGradient Boosting algorithm that is efficient & flexible. Can be used for both classification and regression tasksChurn predictionClaims processing in insuranceProvides accurate resultsCaptures non-linear relationshipsHyperparameter tuning can be complexDoes not perform well on sparse datasets
LightGBM RegressorA gradient boosting framework that is designed to be more efficient than other implementationsPredicting flight time for airlinesPredicting cholesterol levels based on health dataCan handle large amounts of dataComputational efficient & fast training speedLow memory usageCan overfit due to leaf-wise splitting and high sensitivityHyperparameter tuning can be complex

Unsupervised Learning

Unsupervised learning is about discovering general patterns in data. The most popular example is clustering or segmenting customers and users. This type of segmentation is generalizable and can be applied broadly, such as to documents, companies, and genes. Unsupervised learning consists of clustering models, that learn how to group similar data points together, or association algorithms, that group different data points based on pre-defined rules. 

Clustering models

AlgorithmDescriptionApplicationsAdvantagesDisadvantages
K-MeansK-Means is the most widely used clustering approach—it determines K clusters based on euclidean distancesCustomer segmentationRecommendation systemsScales to large datasetsSimple to implement and interpretResults in tight clustersRequires the expected number of clusters from the beginningHas troubles with varying cluster sizes and densities
Hierarchical ClusteringA “bottom-up” approach where each data point is treated as its own cluster—and then the closest two clusters are merged together iterativelyFraud detectionDocument clustering based on similarityThere is no need to specify the number
of clustersThe resulting dendrogram is informative Doesn’t always result in the best clusteringNot suitable for large datasets due to high complexity
Gaussian Mixture ModelsA probabilistic model for modeling normally distributed clusters within a datasetCustomer segmentationRecommendation systemsComputes a probability for an observation belonging to a clusterCan identify overlapping clustersMore accurate results compared to K-meansRequires complex tuningRequires setting the number of expected mixture components or clusters

Association

AlgorithmDescriptionApplicationsAdvantagesDisadvantages
Apriori AlgorithmRule based approach that identifies the most frequent itemset in a given dataset where prior knowledge of frequent itemset properties is usedProduct placementsRecommendation enginesPromotion optimizationResults are intuitive and InterpretableExhaustive approach as it finds all rules based on the confidence and supportGenerates many uninteresting itemsetsComputationally and memory intensive.
Results in many overlapping item sets