Machine Learning Cheatsheet



This machine learning cheatsheet serves as a quick reference guide for key concepts and commonly used algorithms in machine learning. It includes essential topics such as supervised learning, unsupervised learning, and reinforcement learning, as well as commonly used algorithms like linear regression and decision trees. This machine learning (ML) cheatsheet is valuable for anyone interested in machine learning.

Machine Learning Cheatsheet

Table of Contents

Supervised Machine Learning

Supervised machine learning is a type of machine learning that trains the algorithms using labeled datasets to predict outcomes.

The main objective of supervised learning is to make algorithms learn an association between input data samples and corresponding outputs after performing multiple training data instances.

Supervised Machine Learning Algorithms

Supervised learning algorithms are categorized into two types of tasks - classification and regression. Below, we have listed commonly used supervised machine learning algorithms, their applications, advantages and disadvantages.

AlgorithmDescriptionApplicationsAdvantagesDisadvantages
Linear RegressionPredicts a continuous numerical value based on a linear relationship between input and output variables.Predicting house prices, stock prices, sales figures.Simple to implement, interpretable, efficient.Sensitive to outliers, assumes linearity.
Logistic RegressionPredicts a categorical value (e.g., binary classification) using a logistic function.Classifying email as spam or not spam, predicting customer churn.Interpretable, efficient, can handle categorical features.Prone to overfitting, limited to linear relationships.
Ridge RegressionRegularized linear regression that adds a penalty term to the loss function to prevent overfitting.Regression tasks, feature selection.Can handle multicollinearity, improves model generalization.Requires tuning the regularization parameter.
Lasso RegressionRegularized linear regression that adds a penalty term to the loss function to encourage sparsity (feature selection).Regression tasks, feature selection.Can handle multicollinearity, performs feature selection.May introduce bias in feature selection.
K-Nearest Neigrs (KNN)Classifies or predicts the value of a new data point based on the majority class or average value of its k nearest neigrs in the training dataset.Classification, regression, recommendation systems.Simple to implement, no training phase required, can handle non-linear relationships.Can be computationally expensive for large datasets, sensitive to the choice of distance metric and the value of k.
Support Vector Machines (SVMs)Finds the optimal hyperplane to separate data points into different classes.Image classification, text classification, anomaly detection.Effective for high-dimensional data, handles non-linear relationships with kernels.Can be computationally expensive for large datasets, sensitive to outliers.
Decision TreeCreates a tree-like model to make decisions based on a series of rules.Classification, regression, predictive modeling.Easy to understand and interpret, can handle both numerical and categorical features.Prone to overfitting, can be sensitive to small changes in data.
Random ForestsAn ensemble of decision trees, combining multiple models to improve accuracy and reduce overfitting.Classification, regression, predictive modeling.More accurate than individual decision trees, robust to noise and outliers.Can be computationally expensive for large datasets.
Naive BayesA probabilistic classifier based on Bayes' theorem, assuming independence of features.Text classification, spam filtering, sentiment analysis.Simple to implement, efficient, can handle categorical and numerical features.Assumes independence of features, which may not always hold true.
Gradient Boosting RegressionAn ensemble method that iteratively trains weak models to improve accuracy.Regression, classification, predictive modeling.Highly accurate, can handle complex relationships.Can be computationally expensive, requires careful tuning of hyperparameters.
XGBoostA scalable and efficient gradient boosting framework.Regression, classification, ranking.Highly accurate, efficient, can handle large datasets.Can be complex to configure.
LightGBM RegressorA gradient boosting framework that uses histograms and gradient boosting for efficient training.Regression, classification, ranking.Faster than XGBoost, efficient for large datasets.May have slightly lower accuracy than XGBoost in some cases.
Neural Networks (Deep Learning)Complex models with multiple layers, capable of learning complex patterns and relationships.Image classification, natural language processing, speech recognition.Highly accurate, can handle complex tasks.Can be computationally expensive, requires careful tuning of hyperparameters.

Unsupervised Machine Learning

Unsupervised machine learning is a type of machine learning that learns patterns and structures within the data without human supervision. Unsupervised learning uses machine learning algorithms to analyze the data and discover underlying patterns within unlabeled data sets.

Unsupervised Machine Learning Algorithms

Unsupervised learning algorithms are categorised into three categories − clustering, association, and dimensionality reduction. Below, we have listed commonly used unsupervised machine learning algorithms, their applications, advantages and disadvantages.

AlgorithmDescriptionApplicationsAdvantagesDisadvantages
K-Means ClusteringPartitions data into K clusters based on similarity.Customer segmentation, image segmentation, anomaly detection.Simple to implement, efficient, can handle large datasets.Requires specifying the number of clusters, sensitive to initialization.
Hierarchical ClusteringCreates a hierarchy of clusters, either agglomerative (bottom-up) or divisive (top-down).Customer segmentation, image segmentation, outlier detection.Can reveal hierarchical structures, doesn't require specifying the number of clusters.Can be computationally expensive for large datasets, sensitive to distance metrics.
Principal Component Analysis (PCA)Reduces the dimensionality of data while preserving the most important features.Data visualization, feature engineering, noise reduction.Efficient, can reveal underlying patterns in data.May lose some information in the dimensionality reduction process.
Singular Value Decomposition (SVD)Decomposes a matrix into its singular values and vectors.Data analysis, recommendation systems, image compression.Can be used for dimensionality reduction and feature extraction.Can be computationally expensive for large matrices.
Independent Component Analysis (ICA)Identifies independent sources of signals from mixed observations.Blind source separation, signal processing.Can separate mixed signals, useful in applications like speech recognition.Can be sensitive to initialization and assumptions about the independence of sources.
Gaussian Mixture Model (GMM)Models data as a mixture of Gaussian distributions, assuming each cluster is generated from a Gaussian distribution.Clustering, density estimation, anomaly detection.Can handle complex data distributions, flexible.Can be computationally expensive, sensitive to initialization.
Apriori AlgorithmA frequent itemset mining algorithm used to discover associations between items in a dataset.Market basket analysis, recommendation systems.Efficient for finding frequent itemsets, can be used for association rule mining.May not be suitable for large datasets with many items.
t-SNENon-linear dimensionality reduction technique that preserves local structure.Data visualization, clustering, anomaly detection.Effective for visualizing high-dimensional data in low-dimensional space.Can be computationally expensive, sensitive to parameters.
UMAPAnother non-linear dimensionality reduction technique that preserves global structure and local relationships.Data visualization, clustering, anomaly detection.Often faster and more scalable than t-SNE, preserves global structure well.May require careful parameter tuning.

Reinforcement Learning

Reinforcement learning is a type of machine learning where an agent (generally a software entity) is trained to interpret the environment by performing actions and monitoring the results. For every good action, the agent gets positive feedback and for every bad action the agent gets negative feedback. It's inspired by how animals learn from their experiences, making decisions based on the consequences of their actions.

Reinforcement Learning Algorithms

In this section, we have listed some well known reinforcement learning algorithms, their applications, advantages and disadvantages.

AlgorithmDescriptionApplicationsAdvantagesDisadvantages
Q-LearningOff-policy learning algorithm that learns the optimal action-value function.Game playing, robotics, control systems.Simple to implement, can handle complex environments.Can be computationally expensive for large state spaces.
SARSAOn-policy learning algorithm that updates the action-value function based on the current policy.Game playing, robotics, control systems.Can handle continuous action spaces, suitable for online learning.Can be sensitive to exploration-exploitation trade-off.
Deep Q-Networks (DQN)Combines deep learning with Q-learning, using a neural network to approximate the action-value function.Atari game playing, robotics, self-driving cars.Can handle complex environments with large state and action spaces.Requires careful tuning of hyperparameters, can be computationally expensive.
Policy GradientsDirectly optimizes the policy function to maximize rewards.Robotics, game playing, natural language processing.Can handle continuous action spaces, can be more sample efficient than value-based methods.Can be sensitive to noise and instability.
Actor-CriticCombines policy-based and value-based methods, using both a policy function and a value function.Robotics, game playing, natural language processing.Can be more stable and efficient than pure policy-based or value-based methods.Requires careful balancing of exploration and exploitation.
Asynchronous Advantage Actor-Critic (A3C)A parallel version of actor-critic that can handle complex environments with large state spaces.Robotics, game playing, natural language processing.Can be more efficient than traditional actor-critic methods, suitable for distributed training.Can be complex to implement.