Chapter 2: Types of Machine Learning
1. The Big Picture of Machine Learning
Machine Learning (ML) is a collective term for algorithms that automatically learn patterns from data. Based on how they learn, ML methods fall into three main categories, with deep learning cutting across all of them.
Why Understanding the "Types" Matters
Selecting a learning approach that matches the nature of your problem is the first step toward a successful machine learning project. Does your data come with labels? Do you want to uncover hidden structure? Do you need sequential decision-making? This assessment is the starting point for choosing a method.
2. Supervised Learning
Supervised learning trains on pairs of input $x$ and label $y$ to learn a mapping $f: x \mapsto y$. It is the most widely used learning paradigm in practice.
2.1 Regression
Prediction problems where the output $y$ is a continuous value.
| Method | Overview | Applications |
|---|---|---|
| Linear Regression | Models a linear relationship: $y = w^\top x + b$ | House price prediction, sales forecasting |
| Ridge / Lasso | Regularized linear regression ($L_2$ / $L_1$) | High-dimensional multivariate data |
| Decision Tree Regression | Predicts values via conditional splits | Scenarios requiring interpretability |
| Random Forest | Ensemble of multiple decision trees | Tabular data in general |
| Gradient Boosting (XGBoost, LightGBM) |
Sequentially adds weak learners | Top performer in Kaggle competitions |
| Neural Network Regression | Learns nonlinear relationships with multilayer perceptrons | Large-scale, complex data |
2.2 Classification
Prediction problems where the output $y$ is a discrete class.
| Method | Overview | Applications |
|---|---|---|
| Logistic Regression | Outputs probability via the sigmoid function: $P(y=1|x) = \sigma(w^\top x + b)$ | Spam detection, credit scoring |
| Support Vector Machine (SVM) | Maximum-margin classifier with kernel trick | Text classification, small-to-medium datasets |
| k-Nearest Neighbors (kNN) | Majority vote among the $k$ closest neighbors | Recommendation, anomaly detection |
| Naive Bayes | Bayes' theorem with feature independence assumption | Text classification, fast inference |
| Random Forest / GBDT (Gradient Boosted Decision Trees) | Ensemble methods (classification variant) | De facto standard for tabular data |
| CNN (Image Classification) | Extracts spatial features via convolution | Image recognition, medical imaging |
Learn More
Supervised Learning in Detail / Linear Regression / Logistic Regression / SVM
3. Unsupervised Learning
Discovers hidden structure and patterns from unlabeled data.
3.1 Clustering
Groups data points based on similarity.
| Method | Overview | Characteristics |
|---|---|---|
| k-means | Assigns each point to the nearest of $k$ centroids | Fast; works well for spherical clusters |
| Hierarchical Clustering | Builds a dendrogram showing hierarchical structure | No need to specify $k$ in advance |
| DBSCAN | Density-based; handles arbitrary cluster shapes | Can detect noise points |
| Gaussian Mixture Model (GMM) | Mixture of multiple Gaussian distributions | Probabilistic cluster membership |
3.2 Dimensionality Reduction
Compresses high-dimensional data into fewer dimensions for visualization and preprocessing.
| Method | Overview | Characteristics |
|---|---|---|
| PCA (Principal Component Analysis) | Projects data onto directions of maximum variance | Linear, fast |
| t-SNE | Preserves local structure in 2-3D embeddings | Good for visualization; nonlinear |
| UMAP | Faster alternative to t-SNE; preserves global structure | Scales to large datasets |
| Autoencoder | Nonlinear compression via neural networks | Flexible; extensible to generative models |
3.3 Anomaly Detection
Identifies data points that deviate from the normal data distribution. Common methods include Isolation Forest, One-Class SVM, and autoencoders.
Learn More
4. Reinforcement Learning
An agent interacts with an environment and learns a policy that maximizes cumulative reward. This trial-and-error approach has achieved remarkable success in games (AlphaGo) and robotic control.
| Method | Overview | Applications |
|---|---|---|
| Q-Learning | Learns the state-action value function $Q(s, a)$ | Discrete action spaces |
| Deep Q-Network (DQN) | Approximates the Q-function with a neural network | Atari games |
| Policy Gradient | Directly optimizes the policy $\pi(a|s)$ | Continuous action spaces |
| Actor-Critic | Combines a policy (Actor) with a value function (Critic) | Robotic control |
| PPO / TRPO | Stable policy updates | RLHF for LLMs, robotics |
| RLHF | Reinforcement Learning from Human Feedback | Fine-tuning LLMs such as ChatGPT |
5. Deep Learning
Deep learning is an umbrella term for methods that use multi-layer neural networks. It applies to supervised, unsupervised, and reinforcement learning alike, and has dramatically outperformed traditional approaches in vision, speech, and natural language processing.
5.1 Convolutional Neural Networks (CNNs)
Hierarchically extract local spatial patterns (edges, textures) from images.
- Key architectures: LeNet → AlexNet → VGG → ResNet → EfficientNet
- Applications: Image classification, object detection (YOLO, Faster R-CNN), semantic segmentation
5.2 Recurrent Neural Networks (RNNs)
Process sequential data (time series, text) by propagating hidden states through time.
- LSTM: Uses gating mechanisms to capture long-range dependencies; mitigates the vanishing gradient problem
- GRU: A simplified variant of LSTM with fewer parameters
- Applications: Speech recognition, machine translation (pre-Transformer era), time-series forecasting
5.3 Transformer
The Self-Attention mechanism directly computes relationships between any pair of positions in a sequence. Proposed in the 2017 paper "Attention Is All You Need," it has become the dominant deep learning architecture.
Self-Attention Computation
The input sequence is transformed into Query ($Q$), Key ($K$), and Value ($V$), and attention weights are computed as:
$$\text{Attention}(Q, K, V) = \text{softmax}\!\left(\frac{QK^\top}{\sqrt{d_k}}\right)V$$$d_k$ is the key dimension; scaling by $\sqrt{d_k}$ stabilizes gradients.
- BERT: Bidirectional encoder; excels at language understanding tasks (QA, classification)
- GPT series: Autoregressive decoder; the foundation for text generation
- Vision Transformer (ViT): Splits images into patches and processes them with a Transformer
Learn More
6. Generative Models
Generative models learn the underlying data distribution and can produce new data. They are the core technology behind the "generative AI" boom that began in 2022.
| Method | Principle | Representative Models |
|---|---|---|
| GAN (Generative Adversarial Network) |
Competitive training between a generator and a discriminator | StyleGAN (face generation), Pix2Pix |
| VAE (Variational Autoencoder) |
Learns a probability distribution over latent variables; maximizes the ELBO | Image generation, anomaly detection |
| Diffusion Models | Generates data by gradually removing noise | Stable Diffusion, DALL-E 3, Sora |
| Autoregressive Models | Generates tokens one at a time sequentially | GPT-4, Claude, Gemini |
| Flow-Based Models | Computes exact probability densities via invertible transformations | Glow, RealNVP |
Learn More
VAE / GAN / Diffusion Models
7. Self-Supervised Learning
Self-supervised learning designs pretext tasks from unlabeled data, using the data itself as the supervisory signal to learn general-purpose representations. It plays a central role in pretraining large language models and Vision Transformers.
Representative Methods
| Domain | Method | Pretext Task |
|---|---|---|
| NLP | Masked Language Model (BERT) | Predict masked words |
| NLP | Next-Token Prediction (GPT) | Predict the next word |
| Vision | SimCLR / MoCo | Match different augmentations of the same image (contrastive learning) |
| Vision | MAE (Masked Autoencoder) | Reconstruct masked image patches |
| Multimodal | CLIP | Learn image-text correspondences |
Learn More
8. Latest Trends (2025-2026)
The field of machine learning is evolving rapidly. Below are the most notable recent topics.
8.1 Evolution of Large Language Models (LLMs)
Large language models such as GPT-4, Claude, and Gemini are evolving into multimodal models that handle not only text but also images, audio, and video in a unified framework.
- Long-context processing: Context windows exceeding 1 million tokens
- Tool use: Integration with external APIs and code execution (agents)
- Enhanced reasoning: Chain-of-Thought and test-time compute scaling for improved reasoning capabilities
8.2 High-Quality Image and Video Generation
Diffusion models have surpassed GANs in image generation and are expanding into video generation.
- Stable Diffusion 3 / FLUX: Transformer-based diffusion models (DiT)
- Video generation: Long-form video generation models such as Sora
- 3D generation: Generating 3D objects from text or images
8.3 State Space Models (SSMs)
Transformer Self-Attention has computational cost quadratic in sequence length. Mamba and other SSMs process long sequences with linear complexity, attracting attention as a potential alternative to Transformers.
8.4 Small Models and Distillation
Knowledge distillation transfers the knowledge of large models to smaller ones, enabling practical deployment on smartphones and edge devices.
- Quantization: Compressing model weights to 4-bit / 8-bit
- LoRA / QLoRA: Efficient fine-tuning with a small number of parameters
8.5 AI Agents
Autonomous systems built around LLMs are evolving, combining planning, tool use, and memory. They are being applied to coding assistance, data analysis, research support, and other complex task automation.
Learn More
9. Method Comparison Table
| Learning Paradigm | Data Requirements | Representative Methods | Main Applications |
|---|---|---|---|
| Supervised Learning | Labeled data | Linear Regression, SVM, GBDT, CNN | Classification, regression, object detection |
| Unsupervised Learning | Unlabeled data | k-means, PCA, DBSCAN | Clustering, dimensionality reduction, anomaly detection |
| Reinforcement Learning | Interaction with an environment | DQN, PPO, Actor-Critic | Games, robotic control, LLM alignment |
| Self-Supervised Learning | Large amounts of unlabeled data | BERT, GPT, MAE, CLIP | Pretraining, foundation models |
| Generative Models | Domain-specific data | GAN, VAE, Diffusion Models, LLM | Image, text, and video generation |
10. How to Choose a Method
The following flowchart provides guidance on selecting a method based on the nature of your problem.
Practical Tips
- Tabular data (CSV / databases): GBDT (XGBoost, LightGBM) should be the first choice
- Images: CNN or ViT; transfer learning achieves high accuracy even with limited data
- Text: Fine-tuning a pretrained LLM is highly effective
- Time series: Transformer-based models, or a hybrid of classical ARIMA and machine learning
- Start simple: Begin with a simple baseline (logistic regression, random forest) and progressively move to more complex methods
11. Further Reading
This note series explains each method in progressive detail.
| Level | Content | Link |
|---|---|---|
| Introduction | Fundamental concepts of ML, three main categories, first implementation | Introduction Course |
| Beginner | Linear regression, SVM, decision trees, ensembles, evaluation metrics | Beginner Course |
| Intermediate | Neural networks, CNN, RNN, LSTM, frameworks | Intermediate Course |
| Advanced | Attention, Transformer, GAN, diffusion models, deep RL | Advanced Course |
Summary
- Three main categories: Supervised learning, unsupervised learning, and reinforcement learning
- Deep learning spans all three categories; CNNs, RNNs, and Transformers are the key architectures
- Self-supervised learning underpins pretraining of large-scale models, forming the basis for LLMs and ViT
- Generative models (GANs, VAEs, diffusion models, autoregressive models) enable image, text, and video generation
- Latest trends: Multimodal LLMs, SSMs (Mamba), AI agents, on-device AI
- Method selection depends on problem type x data size x compute resources
Frequently Asked Questions
Q. What are the different types of machine learning?
Machine learning is broadly classified into three categories: supervised learning, which learns from labeled data; unsupervised learning, which discovers structure in unlabeled data; and reinforcement learning, which maximizes rewards through trial and error. Deep learning spans all three categories and includes CNNs, RNNs, Transformers, large language models (LLMs), and diffusion models.
Q. What is the difference between supervised and unsupervised learning?
Supervised learning trains on input-label pairs and predicts labels for new inputs. Unsupervised learning discovers patterns and structure from unlabeled data. Examples of supervised learning include image classification and price prediction; examples of unsupervised learning include clustering and dimensionality reduction.
Q. What are the latest machine learning trends in 2025-2026?
Key trends include multimodal large language models (LLMs), high-quality image and video generation with diffusion models, efficient long-sequence processing with State Space Models (e.g., Mamba), and test-time compute scaling. On-device AI and knowledge distillation for smaller models are also gaining practical adoption.