Machine Learning Basics
Classical Machine Learning
Basic (Undergraduate Year 1–2)
About This Chapter
The basic level covers classical machine learning algorithms. Starting with linear regression, we move on to classification, decision trees, and ensemble learning. The goal is to understand the mathematical foundation of each method and to acquire appropriate evaluation techniques—so that you can explain "why this method is chosen."
Prerequisites
- The introductory-level content (types of learning, fundamental concepts)
- Foundations of linear algebra (matrices, vectors)
- Foundations of calculus (partial derivatives, gradients)
- Foundations of probability and statistics (expectation, variance)
Table of Contents
2. Polynomial Regression and Regularization
Preventing overfitting.
- Polynomial features
- Ridge regression (L2 regularization)
- Lasso regression (L1 regularization)
3. Logistic Regression
Extending to classification.
- Sigmoid function
- Maximum likelihood estimation
- Decision boundary
4. Multiclass Classification
Classifying more than two classes.
- One-vs-Rest
- One-vs-One
- Softmax regression
5. Decision Trees
Interpretable models.
- Splitting criteria (entropy, Gini)
- Tree growth and pruning
- Feature importance
8. k-Nearest Neighbors
The simplest method.
- Defining distance
- Choosing k
- The curse of dimensionality
9. Evaluation Metrics
How to measure performance.
- Accuracy, precision, recall, F1
- Confusion matrix
- ROC curve and AUC
10. Cross-Validation and Model Selection
Evaluating generalization performance.
- Holdout method
- k-fold cross-validation
- Hyperparameter tuning
11. Feature Engineering
Data preprocessing and feature design.
- Categorical variable encoding
- Missing-value handling and scaling
- Feature selection and feature generation
Supplementary Reading
Readings that explore the methods learned in the chapters more deeply, geometrically and intuitively, with diagrams.
The Geometry of Least Squares
Minimizing the residual sum of squares = orthogonal projection onto the column space. Seen via a 3D regression plane.
Normal Equation vs. Gradient Descent
The exact solution $O(d^3)$ vs. iterative solutions, and which wins as the number of features grows.
The Geometry of L1/L2 Regularization
Why the corners of the constraint region make Lasso sparse.
Bagging and Boosting
Reducing variance in parallel vs. reducing bias sequentially.
k-NN Decision Boundaries and k
Small k is complex; large k is smooth.
Feature Importance
Permutation importance and its pitfalls.
The Geometry of the Kernel Trick
Lifting to higher dimensions to make data linearly separable.
Handling Imbalanced Data
The accuracy trap, resampling, and SMOTE.
Diagnosing with Learning Curves
Diagnosing a model from the gap between training and validation error.
Probability Calibration
Reliability diagrams and Platt / isotonic / temperature scaling.
Key Concepts and Methods
The Objective of Linear Regression
Find the parameters $\boldsymbol{w}$ that minimize the following squared error: $$\min_{\boldsymbol{w}} \displaystyle\sum_{i=1}^{n} (y_i - \boldsymbol{w}^\top \boldsymbol{x}_i)^2$$
Regularization
Ridge regression: $\min_{\boldsymbol{w}} \displaystyle\sum_i (y_i - \boldsymbol{w}^\top \boldsymbol{x}_i)^2 + \lambda \|\boldsymbol{w}\|_2^2$
Lasso regression: $\min_{\boldsymbol{w}} \displaystyle\sum_i (y_i - \boldsymbol{w}^\top \boldsymbol{x}_i)^2 + \lambda \|\boldsymbol{w}\|_1$
Regularization suppresses overfitting and improves generalization.
Logistic Regression
Model the probability $P(y=1|\boldsymbol{x}) = \sigma(\boldsymbol{w}^\top \boldsymbol{x})$ (where $\sigma$ is the sigmoid function) and minimize the cross-entropy loss.
Bias-Variance Decomposition
Expected squared error $= \text{Bias}^2 + \text{Variance} + \text{Noise}$. Increasing model complexity decreases bias but increases variance.
Random Forest
Train many decision trees via bagging and average the predictions (regression) or take a majority vote (classification). Each tree uses a random subset of the features.
Applications You Can Understand at This Level
House Price Prediction
Model house prices with linear and Ridge regression. Also ideal for practicing feature engineering.
Spam Filtering
Classify email with naive Bayes or logistic regression. Learn how to handle text features.
Credit Card Fraud Detection
Classification with imbalanced data. The precision-recall trade-off is crucial.
Customer Segmentation
Group customers with k-means clustering. A hands-on application of unsupervised learning.
Study Tips
- Don't shy away from the math: understand loss functions and optimization
- Choosing a method: pick a method according to the nature of the data
- Evaluate correctly: avoid contaminating the test data
- Build a baseline: first set a reference with a simple model