AI models Cheat Sheet with a simple explanation for non-tech audience

AI models Cheat Sheet with a simple explanation for non-tech audience

Intro

I always had this feeling that I never properly understood the various areas of machine learning. There is deep learning and there is machine learning, and then there are different models, each trained differently and designed to work with different types of data. It was a mess in my head! However, everything started to make sense once I began categorizing different models by their areas of application, the type of training they undergo, and their unique characteristics. In this article, I will share an easy-to-comprehend way of organizing this knowledge in your head. My aim is to create a cheat sheet or reference guide to quickly understand the various categories of machine learning models.

Distinguishing Machine Learning from Deep Learning

Understanding the difference between Machine Learning (ML) and Deep Learning (DL) is important for grasping how different models work and are trained.

Machine Learning usually deals with structured data, like tables of numbers. Imagine a spreadsheet where each row is an example and each column is a feature, such as age, height, or income. ML models, like linear regression or decision trees, use these numbers to make predictions or decisions.

Deep Learning works with more complex data like images, audio, video, and text. This data is turned into a form called tensors, which are like 3D grids of numbers. To help visualize this, think of a Rubik's cube. Each small square on the cube has a position in three dimensions: X, Y, and Z. Now, imagine that each little cube inside the Rubik's cube is a piece of data. A tensor is like a multi-dimensional Rubik's cube. In Deep Learning, we manipulate these tensors through various operations. For example, when you send a text to a language model, the model translates the text into tensors (like a bunch of Rubik's cubes). The model processes these tensors, performing complex parallel calculations and therefore requires a GPU. When it generates a reply, it creates new tensors that are then translated back into text.

In contrast, Machine Learning models don't typically involve these tensor operations. They work directly with numerical data without converting it into these complex 3D forms and therefore it doesn't require a GPU.

By understanding this basic difference, you can see why Machine Learning and Deep Learning are used for different types of tasks. ML is great for simpler, structured data, while DL shines with more complex data like images and text.

Types of Learning in Machine Learning

I had another "Aha!" moment when I understood the different methods of training a model. For now, let’s focus just on machine learning. By understanding these types of learning, you can see how different methods are used to train machine learning models based on the kind of data and the problem they aim to solve. This knowledge forms the foundation for deeper insights into the world of machine learning and helps us appreciate the diversity and capabilities of various models.

Supervised Learning
Think of supervised learning like teaching a dog tricks by showing it exactly what to do. You show the dog a trick, then give it a treat when it does the trick correctly. In supervised learning, the computer is like the dog. You give it a bunch of examples (like pictures labeled "cat" or "dog") and tell it what each one is. The computer then learns from these examples. Once it's trained, you can show it new pictures it's never seen, and it should be able to tell you if they're cats or dogs based on what it learned. Common use cases are email filtering, learning to classify emails as spam or not spam, medical diagnosis, using symptoms and test results to diagnose diseases, price prediction, predicting house prices based on features like location, size, and number of rooms.

Unsupervised Learning
Unsupervised learning is like giving a child a box of LEGO bricks of different colors and shapes and letting them organize the bricks however they like. The computer looks at data that hasn’t been labeled or categorized, and tries to organize it into groups based on similarities all by itself. It might start to see patterns or groupings that weren’t obvious before. Common use cases, market segmentation, grouping customers based on purchasing behavior or preferences,
anomaly detection, finding unusual patterns or errors in data, like detecting fraudulent transactions, organizing large databases, sorting through large amounts of data and organizing it into similar groups.

Semi-Supervised Learning
Semi-supervised learning is a mix of the first two methods. Imagine you have a large pile of photos, but you only have time to label a few of them as "cats" or "dogs." You can use these few labeled photos to help guide the computer to learn about the rest of the unlabeled photos. It uses the patterns it sees in the labeled photos to make educated guesses about the unlabeled ones. Common use cases, large scale image classification, when you have lots of images but labeling them all is too time-consuming, web content classification, helping sort web pages into categories when only some pages are labeled, language translation, improving translation models when only some text examples have translations.

Reinforcement Learning
Reinforcement learning is like teaching a cat to navigate a maze by rewarding it with treats at certain points. The cat tries different paths and learns which paths lead to treats. Similarly, the computer tries different actions and learns which actions give the best results through trial and error. It’s often used in situations where making decisions is complex and the environment might keep changing. Common use cases, video games, teaching AI to play games and improve over time based on the score or other in-game rewards, robot navigation, helping robots learn to move through environments in optimal ways, think Roomba hoover, resource management, in industries, managing resources dynamically to maximize efficiency based on rewards like cost savings.

Common Machine Learning Models

Now that we understand the different training techniques, let's explore how various machine learning models are trained using these methods and what they are suitable for. By knowing which models are trained in which ways, we can better appreciate their strengths and weaknesses. We’ll look at some common models, see how they are built using different training approaches, and understand what types of problems they are best suited to solve. Note that we are talking about Machine Learning and all of these models don't require GPU and can perfectly run on a CPU.

1. Logistic Regression

  • Training Method: Supervised Learning
  • Explanation: You provide examples with known labels (e.g., spam or not spam), and the model learns to predict these labels for new data.
  • What It Does: Helps decide between two options, like whether an email is spam or not. Imagine you are deciding if it will rain or not based on clouds. Logistic regression helps in predicting yes/no outcomes using past data.
  • Common Use: Medical diagnoses, email filtering, and predicting whether a customer will buy a product.
  • Strength: Simple and fast for binary choices.
  • Weakness: Not great for more complex problems with lots of variables or categories.

2. Linear Regression

  • Training Method: Supervised Learning
  • Explanation: Trained with labeled data where the continuous outcome is known, and the model learns to predict this outcome from the input variables.
  • What It Does: Predicts a continuous value based on one or more variables. Imagine you're saving allowance money to buy a toy. Linear regression is like guessing how many weeks of saving will get you there, based on the past.
  • Common Use: Real estate pricing, forecasting stock prices, predicting sales.
  • Strength: Provides a clear quantitative output and is easy to understand.
  • Weakness: Assumes a linear relationship and is sensitive to outliers.

3. Decision Trees

  • Training Method: Supervised Learning
  • Explanation: Decision trees are trained on data with known labels, learning to make splits that best differentiate the data into the desired categories.
  • What It Does: Makes decisions by splitting data into branches to reach a conclusion. Think of making a decision, like choosing what to wear based on the weather. A decision tree helps computers make choices by asking a series of "yes or no" questions.
  • Common Use: Loan approval, customer segmentation, diagnosing diseases.
  • Strength: Easy to understand and visualize.
  • Weakness: Can create overly complex trees that do not generalize well (overfitting).

4. Naive Bayes

  • Training Method: Supervised Learning
  • Explanation: This method requires a dataset with known labels to train on probabilities of how each feature relates to a label.
  • What It Does: Predicts the probability of different outcomes based on certain features. If you guess the flavor of a candy by its color, you're making a prediction based on what you know. Naive Bayes does this with math, predicting outcomes based on past information.
  • Common Use: Spam filtering, sentiment analysis, disease prediction.
  • Strength: Performs well with a large number of features and is very efficient.
  • Weakness: Assumes all features are independent, which isn’t always true.

5. k-Nearest Neighbors (k-NN)

  • Training Method: Supervised Learning
  • Explanation: Requires labeled data to function. The model predicts the label of a new point based on the majority label among its closest labeled points.
  • What It Does: Classifies new data points based on the closest existing data points. It's like finding your favorite book by asking friends for recommendations. k-NN finds the closest neighbors to make predictions.
  • Common Use: Recommendation systems, classification on small datasets, image recognition.
  • Strength: Simple and effective without needing a complicated model.
  • Weakness: Slows down as data grows; not great with high dimensional data.

6. Support Vector Machines (SVM)

  • Training Method: Supervised Learning
  • Explanation: SVM requires labeled data to find the decision boundary (hyperplane) that best separates different classes.
  • What It Does: Finds the best boundary that separates data into categories. Think of separating apples and oranges in a basket by drawing the straightest line possible. SVM helps find that perfect line to tell them apart.
  • Common Use: Face detection, text category assignment, image classification.
  • Strength: Effective in high dimensional spaces and with non-linear boundaries.
  • Weakness: Not suitable for large datasets and requires careful parameter tuning.

7. Random Forests

  • Training Method: Supervised Learning
  • Explanation: Uses many decision trees trained on different parts of the same data set with known outcomes to make more accurate predictions.
  • What It Does: Uses many decision trees to make a more accurate prediction than just one tree. Think of asking many friends for advice and combining their answers. Random forests ask multiple decision trees and use the most popular answer.
  • Common Use: Credit scoring, medical diagnosis, stock market analysis.
  • Strength: Very accurate, handles outliers well, and can deal with unbalanced data.
  • Weakness: Can be slow and complex to create and understand.

8. N-Beats

  • Training Method: Supervised Learning
  • Explanation: This model is trained on historical time series data with known outcomes to predict future values.
  • What It Does: A deep learning model designed specifically for time series forecasting.
  • Common Use: Financial forecasting, demand forecasting in retail, energy load prediction.
  • Strength: Highly accurate for predicting patterns over time.
  • Weakness: Requires substantial computational resources and deep learning expertise.

9. Gradient Boosting

  • Training Method: Supervised Learning
  • Explanation: Sequentially adds new models that improve on the errors of the previous models, using labeled data to guide the corrections.
  • What It Does: Improves predictions by correcting errors of prior models, adding them up to make a final model. Imagine getting better at a video game by fixing small mistakes each time you play. Gradient boosting improves predictions by learning from past errors in small steps.
  • Common Use: Search ranking, ecology (species distribution), risk assessment.
  • Strength: Extremely accurate and effective for various applications.
  • Weakness: Can overfit if not tuned properly and is computationally expensive.

10. K-Means Clustering

  • Training Method: Unsupervised Learning
  • Explanation: You don't need to provide any labels. The model organizes the data into groups based on similarity alone.
  • What It Does: Groups similar things together without knowing the groups ahead of time. It's like sorting your toys into groups without being told the categories. The algorithm finds toys that are alike and puts them together.
  • Common Use: Market segmentation, organizing computer clusters, image segmentation.
  • Strength: Easy to implement and works well with large datasets.
  • Weakness: Needs the number of clusters specified beforehand, and can get confused by outliers or uneven cluster sizes.

11. Principal Component Analysis (PCA)

  • Training Method: Unsupervised Learning
  • Explanation: PCA doesn't require labels. It analyzes the data to find the directions (principal components) that capture the most variability in the data.
  • What It Does: Simplifies data by reducing its complexity but keeps essential parts. Think of PCA as packing a suitcase. You have lots of clothes (data) and PCA helps you fit the most important pieces into a small space.
  • Common Use: Image processing, genetics data analysis, market research.
  • Strength: Reduces storage and improves visualization without losing key information.
  • Weakness: Can be misled by outliers and doesn't handle noise well.

12. Bayesian Networks

  • Training Method: Can be both Supervised and Unsupervised
  • Explanation: Can learn from data with known relationships (supervised) or by discovering new relationships (unsupervised).
  • What It Does: Uses probability to predict different outcomes based on known relationships. Think of predicting weather by considering different factors like temperature and humidity. Bayesian networks use probabilities to make predictions considering various factors.
  • Common Use: Disease diagnosis, genetic probability, risk assessment.
  • Strength: Can handle uncertainty and missing data well.
  • Weakness: Complex to implement and requires a lot of domain knowledge.

13. Genetic Algorithms

  • Training Method: Typically used in optimization, not standard learning
  • Explanation: Mimics natural evolutionary processes such as selection and mutation, often used to find optimal solutions to problems by generating, evaluating, and evolving solutions.
  • What It Does: Mimics natural selection to solve optimization and search problems. Think of creating a super pet by mixing the best traits of different pets. Genetic algorithms evolve solutions by combining the best options over time.
  • Common Use: Scheduling, modeling, and designing complex systems like networks or buildings.
  • Strength: Very flexible and good at finding solutions to complex problems where other algorithms fail.
  • Weakness: Requires lots of computations and can be random, not guaranteeing the best solution.

So after looking at a bunch of ML models we are developing a sense of theme in terms of which models do what. Maybe a table below would nicely wrap up this section by categorizing the use case for each model.

Category Description
Basic Sorting & Classification Decision Trees, K-Means Clustering, Logistic Regression
Probabilistic Guessing Naive Bayes, k-Nearest Neighbors (k-NN)
Advanced Classification & Regression Linear Regression, Support Vector Machines (SVM)
Data Reduction & Transformation Principal Component Analysis (PCA)
Ensemble Learning Gradient Boosting, Random Forests
Optimization Genetic Algorithms
Time Series Forecasting N-Beats

Common Deep Learning Models

Many people first learned about Deep Learning through apps like Prisma, which can turn your photo into a painting in the style of Van Gogh, or apps like ChatGPT, which can have conversations with you. These apps made it easy to understand because you interact with them naturally: you give them a photo or a text, and they give you an amazing result. These apps use deep learning models, but there are many more models out there. In this section, we'll explore different types of deep learning models and what they can do. Also note, that since this is deep learning models, all of these models will require GPU for training as well as inference.

1. Neural Networks

  • Training Method: Supervised Learning
  • Explanation: Neural networks are like a web of neurons that learn to make predictions by adjusting connections based on the data provided. They're trained using known data labels. Your brain has lots of cells working together to help you learn. Neural networks are computer versions, helping machines learn from examples.
  • Common Use: Recognizing handwriting, translating languages, making recommendations.
  • Strength: Very flexible and can learn complex patterns.
  • Weakness: Require a lot of data and computing power, and can be hard to interpret.

2. Recurrent Neural Networks (RNN)

  • Training Method: Supervised Learning
  • Explanation: RNNs are a type of neural network designed for sequence data (like text or time series), where each piece of data is dependent on the previous one. They loop information back through the network for context. Think of remembering a story by recalling previous sentences. RNNs help computers understand sequences, like text or time-series data.
  • Common Use: Speech recognition, music composition, language translation.
  • Strength: Excellent at handling sequences of data.
  • Weakness: Can be difficult to train and prone to problems like forgetting earlier inputs.

3. Convolutional Neural Networks (CNN)

  • Training Method: Supervised Learning
  • Explanation: CNNs are specialized neural networks for processing structured arrays of data like images. They use filters to capture spatial hierarchies in data. Imagine your brain recognizing faces in photos or shapes of animals when you look at a cloud. CNNs are specialized neural networks that help computers see and understand images.
  • Common Use: Image recognition, video analysis, medical imaging.
  • Strength: Very effective at picking out spatial patterns.
  • Weakness: Primarily limited to visual data; can be computationally intensive.

4. Autoencoders

  • Training Method: Unsupervised Learning
  • Explanation: Autoencoders learn to compress data (encode) and then reconstruct (decode) it back to the original. They learn the most important features of the data without any labels. Imagine compressing a big picture into a tiny image and then expanding it back. Autoencoders reduce data size and then reconstruct it.
  • Common Use: Data denoising, dimensionality reduction, feature extraction.
  • Strength: Good at learning efficient representations.
  • Weakness: Can struggle to handle complex data or generalize beyond training examples.

5. Reinforcement Learning

  • Training Method: Reinforcement Learning
  • Explanation: In reinforcement learning, an agent learns to make decisions by receiving rewards or penalties for actions, learning from direct interaction with the environment. Imagine training a dog with treats. Reinforcement learning helps computers learn by rewarding them for good actions and correcting mistakes.
  • Common Use: Game AI, robotics, real-time decision making.
  • Strength: Very powerful for decision-making tasks in dynamic environments.
  • Weakness: Requires a lot of data and interactions to perform well; may be unstable or unpredictable.

6. Q-Learning

  • Training Method: Reinforcement Learning
  • Explanation: A specific type of reinforcement learning where an agent learns the best actions to take in given states of the environment based on a reward system, updating its strategy based on the expected utility of its actions. Imagine finding the fastest way through a maze. Q-Learning helps computers find the best path by learning from exploration and rewards.
  • Common Use: Video game AI, navigation systems, robotic controllers.
  • Strength: Simple and effective at finding strategies in defined environments.
  • Weakness: Struggles with large or complex states because it needs to estimate the value of each action in every possible state.

7. Generative Adversarial Networks (GANs)

  • Training Method: Unsupervised Learning
  • Explanation: GANs involve two neural networks contesting with each other in a game. One network generates new data instances, while the other evaluates them. Together, they improve until the generated data is indistinguishable from real data.
  • Common Use: Creating realistic images, videos, and voice audio; data augmentation.
  • Strength: Can generate highly realistic samples.
  • Weakness: Training can be unstable and requires careful balance between the two networks.

8. Generative Pretrained Transformers (GPT)

  • Training Method: Supervised and Unsupervised Learning
  • Explanation: GPT models are a type of transformer that uses large amounts of data to generate text based on the context it's given. It can be fine-tuned with supervised learning on specific tasks after pretraining in an unsupervised manner.
  • Common Use: Chatbots, writing assistance, content generation.
  • Strength: Highly flexible and capable of understanding and generating human-like text.
  • Weakness: Requires extensive computation and data to train effectively; can generate biased or inaccurate outputs if not well-regulated.

9. Long Short-Term Memory Networks (LSTMs)

  • Training Method: Supervised Learning
  • Explanation: LSTMs are an advanced type of RNN designed to avoid the vanishing gradient problem, enabling them to learn long-term dependencies in data. It's like a type of brain that's great at remembering things for a long time, which helps it understand stories or predict what comes next in a video.
  • Common Use: Sequence prediction, speech recognition, time series analysis.
  • Strength: Effective in learning long sequences over time.
  • Weakness: Computationally intensive and slower to train compared to some newer models like Transformers.

10. Capsule Networks (CapsNets)

  • Training Method: Supervised Learning
  • Explanation: CapsNets structure neural networks in "capsules," which are groups of neurons that specialize in recognizing different features of objects, maintaining spatial hierarchies. Imagine a brain that not only sees pictures but understands which way things are turned and how they're placed.
  • Common Use: Image recognition, especially where object orientation and spatial relationships are crucial.
  • Strength: Better at handling spatial relationships and rotations within images.
  • Weakness: More complex and computationally heavier than traditional CNNs; still under active research for scalability.

11. Transformer Networks

  • Training Method: Supervised and Unsupervised Learning (depending on the application, such as training language models)
  • Explanation: Transformers use self-attention mechanisms to process sequences of data in parallel, significantly improving the efficiency of handling long sequences. Imagine this as a very fast brain that listens to stories and pays attention to every word and how it relates to others, helping it understand the whole story better.
  • Common Use: Natural language processing, including tasks like translation, text summarization, and content generation.
  • Strength: Highly efficient at processing sequences in parallel, leading to faster training.
  • Weakness: Requires large amounts of data and computational resources to train effectively.

12. Variational Autoencoders (VAEs)

  • Training Method: Unsupervised Learning
  • Explanation: VAEs are generative models that learn to encode data into a compressed representation and then decode it back to the original form, while learning the data distribution. Imagine a brain of an artist that learns to draw new pictures that look like ones it has seen before, but each drawing is slightly different.
  • Common Use: Image generation, anomaly detection, and more complex generative tasks.
  • Strength: Capable of generating new data instances that are similar to the original data.
  • Weakness: Often produce less sharp results compared to GANs and can be tricky to train.

13. Siamese Networks

  • Training Method: Supervised Learning
  • Explanation: Siamese Networks use twin networks to learn similarity or differences between paired inputs, making them ideal for verification tasks. It's like two brains that look at two things to decide if they are the same or different, like telling apart two similar-looking puppies.
  • Common Use: Face verification, signature recognition, and similar or dissimilar decision-making tasks.
  • Strength: Excellent for learning fine-grained differences between very similar inputs.
  • Weakness: Limited to pairwise comparison tasks and may require substantial training data for each specific task.

14. U-Net

  • Training Method: Supervised Learning
  • Explanation: U-Net is a convolutional network architecture designed for fast and precise segmentation tasks, featuring a symmetric expanding path that captures precise localization. A brain that's an expert at coloring inside the lines, used especially for medical pictures to help doctors see things clearly.
  • Common Use: Medical image segmentation, satellite image processing.
  • Strength: Highly effective in detailed image segmentation tasks.
  • Weakness: Primarily optimized for segmentation, less versatile for other types of tasks.

15. BERT (Bidirectional Encoder Representations from Transformers)

  • Training Method: Supervised and Unsupervised Learning
  • Explanation: BERT revolutionizes NLP by training language models in a bidirectional manner, meaning it learns context from both the left and right side of a token in the data. A very clever brain that reads a page in a book by looking at every word and understanding how each word is connected to the ones before and after it.
  • Common Use: Enhancing search engines, improving text understanding, question answering systems.
  • Strength: Offers state-of-the-art results in many NLP tasks due to deep understanding of context.
  • Weakness: Extremely resource-intensive to train and can be prone to biases present in training data.

If we were to logically categorize these models these is what it would like:

Category Models
General-Purpose Learning Neural Networks, Long Short-Term Memory Networks (LSTMs)
Sequential Data Processing Recurrent Neural Networks (RNN), LSTMs, Transformer Networks, Generative Pretrained Transformers (GPT), BERT (Bidirectional Encoder Representations from Transformers)
Image Processing Convolutional Neural Networks (CNN), Capsule Networks (CapsNets), U-Net
Data Compression & Reconstruction Autoencoders, Variational Autoencoders (VAEs)
Decision Making & Game Theory Reinforcement Learning, Q-Learning
Generative Models Generative Adversarial Networks (GANs), VAEs, GPT
Comparison and Verification Siamese Networks

Conclusion

In this exploration, we've delved into various model learning paradigms—supervised, unsupervised, and reinforcement learning—alongside a diverse range of machine learning and deep learning models. As we saw, each model offers unique strengths, from processing sequential data and images to making decisions based on environmental feedback. We've learned that while general machine learning (ML) models typically handle numerical tabular data effectively, deep learning (DL) models extend their prowess to not only numerical but also audio, image, video, and text data, demonstrating remarkable versatility.

Now, imagine you're faced with the challenge of analyzing the Bitcoin market to predict its price actions. With the knowledge you've acquired about different models, consider which one (or combo) would best suit this task. Would you opt for time series forecasting models like LSTMs or GPT for their ability to understand sequential data, or perhaps a blend of models to capture both the quantitative trends and qualitative market sentiments? What approach would you take to harness the power of these advanced tools in predicting something as volatile as Bitcoin prices?

Subscribe to Vitalij Neverkevic Blog

Don’t miss out on the latest issues. Sign up now to get access to the library of members-only issues.
jamie@example.com
Subscribe