Chapter 2: Types of Machine Learning

1. The Big Picture of Machine Learning

Machine Learning (ML) is a collective term for algorithms that automatically learn patterns from data. Based on how they learn, ML methods fall into three main categories, with deep learning cutting across all of them.

Machine learning taxonomy diagram: shows the three main categories — supervised learning, unsupervised learning, and reinforcement learning — and how deep learning and generative models span across them Machine Learning Taxonomy Machine Learning Supervised Learning Regression Classification Unsupervised Learning Clustering Dim. Reduction Reinforcement Learning Value-Based Policy-Based Deep Learning — Spans All Categories Above CNN / ViT RNN / LSTM Transformer Generative Models GAN VAE Diffusion Model LLM
Figure 1: Machine Learning Taxonomy — The Three Main Categories and Deep Learning

Why Understanding the "Types" Matters

Selecting a learning approach that matches the nature of your problem is the first step toward a successful machine learning project. Does your data come with labels? Do you want to uncover hidden structure? Do you need sequential decision-making? This assessment is the starting point for choosing a method.

2. Supervised Learning

Supervised learning trains on pairs of input $x$ and label $y$ to learn a mapping $f: x \mapsto y$. It is the most widely used learning paradigm in practice.

2.1 Regression

Prediction problems where the output $y$ is a continuous value.

MethodOverviewApplications
Linear Regression Models a linear relationship: $y = w^\top x + b$ House price prediction, sales forecasting
Ridge / Lasso Regularized linear regression ($L_2$ / $L_1$) High-dimensional multivariate data
Decision Tree Regression Predicts values via conditional splits Scenarios requiring interpretability
Random Forest Ensemble of multiple decision trees Tabular data in general
Gradient Boosting
(XGBoost, LightGBM)
Sequentially adds weak learners Top performer in Kaggle competitions
Neural Network Regression Learns nonlinear relationships with multilayer perceptrons Large-scale, complex data

2.2 Classification

Prediction problems where the output $y$ is a discrete class.

MethodOverviewApplications
Logistic Regression Outputs probability via the sigmoid function: $P(y=1|x) = \sigma(w^\top x + b)$ Spam detection, credit scoring
Support Vector Machine (SVM) Maximum-margin classifier with kernel trick Text classification, small-to-medium datasets
k-Nearest Neighbors (kNN) Majority vote among the $k$ closest neighbors Recommendation, anomaly detection
Naive Bayes Bayes' theorem with feature independence assumption Text classification, fast inference
Random Forest / GBDT (Gradient Boosted Decision Trees) Ensemble methods (classification variant) De facto standard for tabular data
CNN (Image Classification) Extracts spatial features via convolution Image recognition, medical imaging

3. Unsupervised Learning

Discovers hidden structure and patterns from unlabeled data.

3.1 Clustering

Groups data points based on similarity.

MethodOverviewCharacteristics
k-means Assigns each point to the nearest of $k$ centroids Fast; works well for spherical clusters
Hierarchical Clustering Builds a dendrogram showing hierarchical structure No need to specify $k$ in advance
DBSCAN Density-based; handles arbitrary cluster shapes Can detect noise points
Gaussian Mixture Model (GMM) Mixture of multiple Gaussian distributions Probabilistic cluster membership

3.2 Dimensionality Reduction

Compresses high-dimensional data into fewer dimensions for visualization and preprocessing.

MethodOverviewCharacteristics
PCA (Principal Component Analysis) Projects data onto directions of maximum variance Linear, fast
t-SNE Preserves local structure in 2-3D embeddings Good for visualization; nonlinear
UMAP Faster alternative to t-SNE; preserves global structure Scales to large datasets
Autoencoder Nonlinear compression via neural networks Flexible; extensible to generative models

3.3 Anomaly Detection

Identifies data points that deviate from the normal data distribution. Common methods include Isolation Forest, One-Class SVM, and autoencoders.

4. Reinforcement Learning

An agent interacts with an environment and learns a policy that maximizes cumulative reward. This trial-and-error approach has achieved remarkable success in games (AlphaGo) and robotic control.

MethodOverviewApplications
Q-Learning Learns the state-action value function $Q(s, a)$ Discrete action spaces
Deep Q-Network (DQN) Approximates the Q-function with a neural network Atari games
Policy Gradient Directly optimizes the policy $\pi(a|s)$ Continuous action spaces
Actor-Critic Combines a policy (Actor) with a value function (Critic) Robotic control
PPO / TRPO Stable policy updates RLHF for LLMs, robotics
RLHF Reinforcement Learning from Human Feedback Fine-tuning LLMs such as ChatGPT

5. Deep Learning

Deep learning is an umbrella term for methods that use multi-layer neural networks. It applies to supervised, unsupervised, and reinforcement learning alike, and has dramatically outperformed traditional approaches in vision, speech, and natural language processing.

5.1 Convolutional Neural Networks (CNNs)

Hierarchically extract local spatial patterns (edges, textures) from images.

  • Key architectures: LeNet → AlexNet → VGG → ResNet → EfficientNet
  • Applications: Image classification, object detection (YOLO, Faster R-CNN), semantic segmentation

5.2 Recurrent Neural Networks (RNNs)

Process sequential data (time series, text) by propagating hidden states through time.

  • LSTM: Uses gating mechanisms to capture long-range dependencies; mitigates the vanishing gradient problem
  • GRU: A simplified variant of LSTM with fewer parameters
  • Applications: Speech recognition, machine translation (pre-Transformer era), time-series forecasting

5.3 Transformer

The Self-Attention mechanism directly computes relationships between any pair of positions in a sequence. Proposed in the 2017 paper "Attention Is All You Need," it has become the dominant deep learning architecture.

Self-Attention Computation

The input sequence is transformed into Query ($Q$), Key ($K$), and Value ($V$), and attention weights are computed as:

$$\text{Attention}(Q, K, V) = \text{softmax}\!\left(\frac{QK^\top}{\sqrt{d_k}}\right)V$$

$d_k$ is the key dimension; scaling by $\sqrt{d_k}$ stabilizes gradients.

  • BERT: Bidirectional encoder; excels at language understanding tasks (QA, classification)
  • GPT series: Autoregressive decoder; the foundation for text generation
  • Vision Transformer (ViT): Splits images into patches and processes them with a Transformer

6. Generative Models

Generative models learn the underlying data distribution and can produce new data. They are the core technology behind the "generative AI" boom that began in 2022.

MethodPrincipleRepresentative Models
GAN
(Generative Adversarial Network)
Competitive training between a generator and a discriminator StyleGAN (face generation), Pix2Pix
VAE
(Variational Autoencoder)
Learns a probability distribution over latent variables; maximizes the ELBO Image generation, anomaly detection
Diffusion Models Generates data by gradually removing noise Stable Diffusion, DALL-E 3, Sora
Autoregressive Models Generates tokens one at a time sequentially GPT-4, Claude, Gemini
Flow-Based Models Computes exact probability densities via invertible transformations Glow, RealNVP

Learn More

VAE / GAN / Diffusion Models

7. Self-Supervised Learning

Self-supervised learning designs pretext tasks from unlabeled data, using the data itself as the supervisory signal to learn general-purpose representations. It plays a central role in pretraining large language models and Vision Transformers.

Representative Methods

DomainMethodPretext Task
NLP Masked Language Model (BERT) Predict masked words
NLP Next-Token Prediction (GPT) Predict the next word
Vision SimCLR / MoCo Match different augmentations of the same image (contrastive learning)
Vision MAE (Masked Autoencoder) Reconstruct masked image patches
Multimodal CLIP Learn image-text correspondences

8. Latest Trends (2025-2026)

The field of machine learning is evolving rapidly. Below are the most notable recent topics.

8.1 Evolution of Large Language Models (LLMs)

Large language models such as GPT-4, Claude, and Gemini are evolving into multimodal models that handle not only text but also images, audio, and video in a unified framework.

  • Long-context processing: Context windows exceeding 1 million tokens
  • Tool use: Integration with external APIs and code execution (agents)
  • Enhanced reasoning: Chain-of-Thought and test-time compute scaling for improved reasoning capabilities

8.2 High-Quality Image and Video Generation

Diffusion models have surpassed GANs in image generation and are expanding into video generation.

  • Stable Diffusion 3 / FLUX: Transformer-based diffusion models (DiT)
  • Video generation: Long-form video generation models such as Sora
  • 3D generation: Generating 3D objects from text or images

8.3 State Space Models (SSMs)

Transformer Self-Attention has computational cost quadratic in sequence length. Mamba and other SSMs process long sequences with linear complexity, attracting attention as a potential alternative to Transformers.

8.4 Small Models and Distillation

Knowledge distillation transfers the knowledge of large models to smaller ones, enabling practical deployment on smartphones and edge devices.

  • Quantization: Compressing model weights to 4-bit / 8-bit
  • LoRA / QLoRA: Efficient fine-tuning with a small number of parameters

8.5 AI Agents

Autonomous systems built around LLMs are evolving, combining planning, tool use, and memory. They are being applied to coding assistance, data analysis, research support, and other complex task automation.

9. Method Comparison Table

Learning Paradigm Data Requirements Representative Methods Main Applications
Supervised Learning Labeled data Linear Regression, SVM, GBDT, CNN Classification, regression, object detection
Unsupervised Learning Unlabeled data k-means, PCA, DBSCAN Clustering, dimensionality reduction, anomaly detection
Reinforcement Learning Interaction with an environment DQN, PPO, Actor-Critic Games, robotic control, LLM alignment
Self-Supervised Learning Large amounts of unlabeled data BERT, GPT, MAE, CLIP Pretraining, foundation models
Generative Models Domain-specific data GAN, VAE, Diffusion Models, LLM Image, text, and video generation

10. How to Choose a Method

The following flowchart provides guidance on selecting a method based on the nature of your problem.

Machine learning method selection flowchart: guides method selection based on label availability, output type, and data scale Have labeled data? Yes No Continuous or discrete output? Continuous Regression Discrete Classification What is the goal? Discovery / Generation / Decisions Discovery Unsupervised Generation Generative Decisions RL Data size and compute resources? Small to medium Classical Methods SVM, GBDT, Random Forest Large Deep Learning CNN, RNN, Transformer Very large Foundation Models / LLMs GPT, Claude, Gemini, ViT * For tabular data, GBDT (XGBoost, LightGBM) often outperforms deep learning
Figure 2: Method Selection Flowchart Based on Problem Characteristics

Practical Tips

  • Tabular data (CSV / databases): GBDT (XGBoost, LightGBM) should be the first choice
  • Images: CNN or ViT; transfer learning achieves high accuracy even with limited data
  • Text: Fine-tuning a pretrained LLM is highly effective
  • Time series: Transformer-based models, or a hybrid of classical ARIMA and machine learning
  • Start simple: Begin with a simple baseline (logistic regression, random forest) and progressively move to more complex methods

11. Further Reading

This note series explains each method in progressive detail.

LevelContentLink
Introduction Fundamental concepts of ML, three main categories, first implementation Introduction Course
Beginner Linear regression, SVM, decision trees, ensembles, evaluation metrics Beginner Course
Intermediate Neural networks, CNN, RNN, LSTM, frameworks Intermediate Course
Advanced Attention, Transformer, GAN, diffusion models, deep RL Advanced Course

Summary

  • Three main categories: Supervised learning, unsupervised learning, and reinforcement learning
  • Deep learning spans all three categories; CNNs, RNNs, and Transformers are the key architectures
  • Self-supervised learning underpins pretraining of large-scale models, forming the basis for LLMs and ViT
  • Generative models (GANs, VAEs, diffusion models, autoregressive models) enable image, text, and video generation
  • Latest trends: Multimodal LLMs, SSMs (Mamba), AI agents, on-device AI
  • Method selection depends on problem type x data size x compute resources

Frequently Asked Questions

Q. What are the different types of machine learning?

Machine learning is broadly classified into three categories: supervised learning, which learns from labeled data; unsupervised learning, which discovers structure in unlabeled data; and reinforcement learning, which maximizes rewards through trial and error. Deep learning spans all three categories and includes CNNs, RNNs, Transformers, large language models (LLMs), and diffusion models.

Q. What is the difference between supervised and unsupervised learning?

Supervised learning trains on input-label pairs and predicts labels for new inputs. Unsupervised learning discovers patterns and structure from unlabeled data. Examples of supervised learning include image classification and price prediction; examples of unsupervised learning include clustering and dimensionality reduction.

Q. What are the latest machine learning trends in 2025-2026?

Key trends include multimodal large language models (LLMs), high-quality image and video generation with diffusion models, efficient long-sequence processing with State Space Models (e.g., Mamba), and test-time compute scaling. On-device AI and knowledge distillation for smaller models are also gaining practical adoption.

References