Machine Learning Basics

Classical Machine Learning

Basic (Undergraduate Year 1–2)

About This Chapter

The basic level covers classical machine learning algorithms. Starting with linear regression, we move on to classification, decision trees, and ensemble learning. The goal is to understand the mathematical foundation of each method and to acquire appropriate evaluation techniques—so that you can explain "why this method is chosen."

Prerequisites

  • The introductory-level content (types of learning, fundamental concepts)
  • Foundations of linear algebra (matrices, vectors)
  • Foundations of calculus (partial derivatives, gradients)
  • Foundations of probability and statistics (expectation, variance)

Table of Contents

1. Linear Regression

The most fundamental model.

  • Least squares
  • Normal equation
  • Gradient descent

2. Polynomial Regression and Regularization

Preventing overfitting.

  • Polynomial features
  • Ridge regression (L2 regularization)
  • Lasso regression (L1 regularization)

3. Logistic Regression

Extending to classification.

  • Sigmoid function
  • Maximum likelihood estimation
  • Decision boundary

4. Multiclass Classification

Classifying more than two classes.

  • One-vs-Rest
  • One-vs-One
  • Softmax regression

5. Decision Trees

Interpretable models.

  • Splitting criteria (entropy, Gini)
  • Tree growth and pruning
  • Feature importance

6. Ensemble Learning

Combining multiple models.

  • Bagging
  • Random forests
  • Boosting (AdaBoost, GBDT)

7. Support Vector Machines

Margin maximization.

  • Linear SVM
  • The kernel trick
  • Soft margin

8. k-Nearest Neighbors

The simplest method.

  • Defining distance
  • Choosing k
  • The curse of dimensionality

9. Evaluation Metrics

How to measure performance.

  • Accuracy, precision, recall, F1
  • Confusion matrix
  • ROC curve and AUC

10. Cross-Validation and Model Selection

Evaluating generalization performance.

  • Holdout method
  • k-fold cross-validation
  • Hyperparameter tuning

11. Feature Engineering

Data preprocessing and feature design.

  • Categorical variable encoding
  • Missing-value handling and scaling
  • Feature selection and feature generation

Supplementary Reading

Readings that explore the methods learned in the chapters more deeply, geometrically and intuitively, with diagrams.

Key Concepts and Methods

The Objective of Linear Regression

Find the parameters $\boldsymbol{w}$ that minimize the following squared error: $$\min_{\boldsymbol{w}} \displaystyle\sum_{i=1}^{n} (y_i - \boldsymbol{w}^\top \boldsymbol{x}_i)^2$$

Regularization

Ridge regression: $\min_{\boldsymbol{w}} \displaystyle\sum_i (y_i - \boldsymbol{w}^\top \boldsymbol{x}_i)^2 + \lambda \|\boldsymbol{w}\|_2^2$
Lasso regression: $\min_{\boldsymbol{w}} \displaystyle\sum_i (y_i - \boldsymbol{w}^\top \boldsymbol{x}_i)^2 + \lambda \|\boldsymbol{w}\|_1$
Regularization suppresses overfitting and improves generalization.

Logistic Regression

Model the probability $P(y=1|\boldsymbol{x}) = \sigma(\boldsymbol{w}^\top \boldsymbol{x})$ (where $\sigma$ is the sigmoid function) and minimize the cross-entropy loss.

Bias-Variance Decomposition

Expected squared error $= \text{Bias}^2 + \text{Variance} + \text{Noise}$. Increasing model complexity decreases bias but increases variance.

Random Forest

Train many decision trees via bagging and average the predictions (regression) or take a majority vote (classification). Each tree uses a random subset of the features.

Applications You Can Understand at This Level

House Price Prediction

Model house prices with linear and Ridge regression. Also ideal for practicing feature engineering.

Spam Filtering

Classify email with naive Bayes or logistic regression. Learn how to handle text features.

Credit Card Fraud Detection

Classification with imbalanced data. The precision-recall trade-off is crucial.

Customer Segmentation

Group customers with k-means clustering. A hands-on application of unsupervised learning.

Study Tips

  • Don't shy away from the math: understand loss functions and optimization
  • Choosing a method: pick a method according to the nature of the data
  • Evaluate correctly: avoid contaminating the test data
  • Build a baseline: first set a reference with a simple model

Frequently Asked Questions (FAQ)

What topics are covered in basic machine learning?
Linear regression, regularization (Lasso/Ridge), logistic regression, multiclass classification, decision trees, ensemble learning (random forests, GBDT), SVM, k-NN, evaluation metrics, and cross-validation—the foundational machine learning algorithms and evaluation methods.
What prerequisites are needed for basic machine learning?
Linear algebra (matrix and vector operations), calculus (gradients and minimization), basic statistics (mean, variance, probability distributions), and Python fundamentals (NumPy, Pandas, scikit-learn). Completing the introductory (intro) level first is recommended.
What is the basic workflow for training a model with scikit-learn?
(1) Load and preprocess the data (imputation, standardization) → (2) instantiate a model (e.g., LinearRegression()) → (3) train with fit(X_train, y_train) → (4) predict with predict(X_test) → (5) check performance with evaluation metrics.