Anomaly Detection: Machine Learning, Deep Learning, AutoML

Course content

  • Introduction
  • The Three Types of Anomalies
  • Anomaly Detection – Time Series
  • Anomaly Detection –
  • Unsupervised DBSCAN
  • Anomaly Detection – Unsupervised Isolation Forest
  • Anomaly Detection – Supervised
  • Anomaly Detection – Images
  • Anomaly Detection Using Deep Learning
  • PyOD: A comparison of 10 algorithms
  • Predicting High Impact Low Volume Events: Predictive Maintenance
  • No Code (AutoML) approach to anomaly detection using PowerBl
  • Machine Learning
  • Bonus Lecture

Machine Learning Techniques

Introduction to Machine Learning Techniques

Machine Learning Techniques (like Regression, Classification, Clustering, Anomaly detection, etc.) are used to build the training data or a mathematical model using certain algorithms based upon the computations statistic to make prediction without the need of programming, as these techniques are influential in making the system futuristic, models and promotes automation of things with reduced cost and manpower.

Techniques of Machine Learning

There are a few methods that are influential in promoting the systems to automatically learn and improve as per the experience. But they fall under various categories or types like Supervised Learning, Unsupervised Learning, Reinforcement Learning, Representation Learning, etc. Below are the techniques which fall under Machine Learning:

1. Regression

Regression algorithms are mostly used to make predictions on numbers i.e when the output is a real or continuous value. As it falls under Supervised Learning, it works with trained data to predict new test data. For example, age can be a continuous value as it increases with time. There are some Regression models as shown below:

machine learning tech

Some widely used algorithms in Regression techniques

  • Simple Linear Regression Model: It is a statistical method that analyses the relationship between two quantitative variables. This technique is mostly used in financial fields, real estate, etc.
  • Lasso Regression: Least Absolute Selection Shrinkage Operator or LASSO is used when there is a need for a subset of the predictor to minimize the prediction error in a continuous variable.
  • Logistic Regression: It is carried out in cases of fraud detection, clinical trials, etc. wherever the output is binary.
  • Support Vector Regression: SVR is a bit different from SVM. In simple regression, the aim is to minimize the error, while in SVR, we adjust the error within a threshold.
  • Multivariate Regression Algorithm: This technique is used in the case of multiple predictor variables. It can be operated with matrix operations and Python’s Numpy library.
  • Multiple Regression Algorithm: It works with multiple quantitative variables in both linear and non-linear regression algorithms.

2. Classification

A classification model, a method of Supervised Learning, draws a conclusion from observed values as one or more outcomes in a categorical form. For example, email has filters like inbox, drafts, spam, etc. There is a number of algorithms in the Classification model like Logistic Regression, Decision Tree, Random Forest, Multilayer Perception, etc. In this model, we classify our data specifically and assign labels accordingly to those classes. Classifiers are of two types:

  • Binary Classifiers: Classification with 2 distinct classes and 2 output.
  • Multi-class Classifiers: Classification with more than 2 classes.

3. Clustering

Clustering is a Machine Learning technique that involves classifying data points into specific groups. If we have some objects or data points, then we can apply the clustering algorithm(s) to analyze and group them as per their properties and features. This method of unsupervised technique is used because of its statistical techniques. Cluster algorithms make predictions based on training data and create clusters on the basis of similarity or unfamiliarity.

Clustering methods:

  • Density-based methods: In this method, clusters are considered dense regions depending on their similarity and difference from the lower dense region.
  • Hierarchical methods: The clusters formed in this method are the tree-like structures. This method forms trees or clusters from the previous cluster. There are two types of hierarchical methods: Agglomerative (Bottom-up approach) and Divisive (Top-down approach).
  • Partitioning methods: This method partitions the objects based on k-clusters and each method form a single cluster.
  • Gris based methods: In this method, data are combined into a number of cells that form a grid-like structure.

4. Anomaly detection

Anomaly detection is the process of detecting unexpected items or events in a data set. Some areas where this technique is used are fraud detection, fault detection, system health monitoring, etc. Anomaly detection can be broadly categorized as:

  1. Point anomalies: Point anomalies are defined when a single data is unexpected.
  2. Contextual anomalies: When anomalies are context-specific, then it’s called contextual anomalies.
  3. Collective anomalies: When a collection or group of related data items are anomalous, then it’s called collective anomalous.

There are certain techniques in Anomaly detection as follows:

  • Statistical methods: It helps in identifying anomalies by pointing the data that deviates from statistical methods like mean, median, mode, etc.
  • Density-based anomaly detection: It based on the k-nearest neighbor algorithm.
  • Clustering-based anomaly algorithm: Data points are collected as a cluster when they fall under the same group and are determined from the local centroids.
  • Super Vector Machine: The algorithm trains itself to cluster the normal data instances and identifies the anomalies using the training data.

Working on Machine Learning Techniques

Machine Learning utilizes a lot of algorithms to handle and work with large and complex datasets to make predictions as per need.

For example, we search a bus image on Google. So, Google basically gets a number of examples or datasets labeled as bus and the system finds the patterns of pixels and colors that will help in finding correct images of the bus.

Google’s system will make a random guess of the bus like images with the help of patterns. If any mistake occurs, then it adjusts itself for accuracy. In the end, those patterns will be learned by a large computer system modeled like a human brain or Deep Neural Network to identify the accurate results from the images. This is how ML techniques work to get the best result always.

Conclusion

Machine Learning has various applications in real life to help business houses, individuals, etc. to attain certain results as per need. To get the best results, certain techniques are important which have been discussed above. These techniques are modern, futuristic and promote automation of things with less manpower and cost.