Statistics

Statistics

About This Series

Statistics is the discipline of extracting information from data and making decisions under uncertainty. This series provides a step-by-step learning path, starting from the fundamentals of descriptive statistics through inferential statistics, maximum likelihood estimation, and Bayesian statistics.

Statistics is essential knowledge in every field that deals with data, including scientific research, business analysis, and machine learning.

Learning by Level

Learning Flow

A flow diagram showing the four-stage learning path from introduction to advanced, and the branching into frequentist statistics and Bayesian statistics at the intermediate level and beyond. Introduction High School Basics University 1-2 Intermediate University 3-4 Advanced Graduate Intro: Data organization, central tendency, variance, correlation Basics: Probability distributions, CLT, estimation, testing Intermediate: MLE, regression, ANOVA, multivariate Advanced: Bayesian statistics, EM algorithm, model selection Two Approaches in Statistics Frequentist Statistics Parameters are fixed values Repeated sampling Confidence intervals, $p$-values Bayesian Statistics Parameters are random variables Prior → Posterior distribution Credible intervals, posterior probability

Key Topics

Descriptive Statistics

Techniques for summarizing data, including data organization, measures of central tendency, measures of dispersion, and correlation.

Inferential Statistics

Inferring population characteristics from samples through interval estimation and hypothesis testing.

Regression Analysis

Methods for modeling relationships between variables and applying them to prediction.

Bayesian Statistics

The Bayesian approach to inference, combining prior knowledge with data.

Individual Topics

What is GMM (Gaussian Mixture Model)? Complete Derivation of the EM Algorithm

Explains the definition, mechanism, and applications of GMM (Gaussian Mixture Model), and provides a complete derivation of the estimation of mean, variance-covariance matrix, and mixing coefficients via the EM algorithm, without omitting any intermediate calculations.

Prerequisites

  • A basic understanding of high school mathematics (comprehension of formulas, basic graph reading) is sufficient to start the introduction
  • At the intermediate level and above, calculus is needed (for understanding probability density functions)
  • At the advanced level, linear algebra is needed (for multivariate analysis)