Kalman Filter

What is the Kalman Filter

The Kalman filter is an algorithm that optimally estimates the internal state of a system from noisy observations. It is based on the 1960 paper by Rudolf E. Kalman.

[Original paper] R. E. Kalman, "A New Approach to Linear Filtering and Prediction Problems", ASME Journal of Basic Engineering, 82(1): 35–45 (1960).
[Continuous-time extension] R. E. Kalman, R. S. Bucy, "New Results in Linear Filtering and Prediction Theory", ASME Journal of Basic Engineering, 83(1): 95–108 (1961).

The Kalman filter has the following characteristics:

  • Sequential processing: updates estimates in real time as each new observation arrives
  • Optimality: provides the minimum variance estimate for linear Gaussian systems
  • Computational efficiency: does not need to store all past data, making it memory-efficient
  • Prediction capability: can predict future states in addition to estimating the current state

Historical background

The Kalman filter was published by Rudolf E. Kalman in 1960 and became widely known after being adopted for orbit estimation in the Apollo program during the 1960s. Since then, it has been used in virtually every modern control and estimation system, including GPS, autonomous driving, and robotics.

State-Space Model

+ +
$\boldsymbol{F}_k$
$\boldsymbol{H}_k$
$\boldsymbol{w}_k$
$\boldsymbol{v}_k$
$\boldsymbol{x}_{k\!-\!1}$
$\boldsymbol{x}_k$
$\boldsymbol{y}_k$
$z^{-1}$
Figure 1: State-space model

The Kalman filter assumes the following state-space model.

\begin{align} \text{State equation} : \quad \boldsymbol{x}_k &= \boldsymbol{F}_k \boldsymbol{x}_{k-1} + \boldsymbol{w}_k \\ \text{Observation equation} : \quad \boldsymbol{y}_k &= \boldsymbol{H}_k \boldsymbol{x}_k + \boldsymbol{v}_k \end{align}

where:

  • $\boldsymbol{x}_k$: state vector at time $k$ (the unknown quantity to be estimated)
  • $\boldsymbol{y}_k$: observation vector at time $k$ (data obtained from sensors)
  • $\boldsymbol{F}_k$: state transition matrix (system dynamics)
  • $\boldsymbol{H}_k$: observation matrix (mapping from state to observation)
  • $\boldsymbol{w}_k$: process noise (unmodeled disturbances, $\boldsymbol{Q}_k = E[\boldsymbol{w}_k\boldsymbol{w}_k^T]$)
  • $\boldsymbol{v}_k$: observation noise (sensor measurement error, $\boldsymbol{R}_k = E[\boldsymbol{v}_k\boldsymbol{v}_k^T]$)

Structure of the Kalman Filter

The Kalman filter alternates between two steps: prediction and update.

  • Prediction (Predict): predicts the current state from the previous estimate
  • Update (Update): corrects the prediction using the observation

Role of the Kalman Gain $\boldsymbol{K}_k$

The quantity $\boldsymbol{e}_k = \boldsymbol{y}_k - \hat{\boldsymbol{y}}_k$ output by the subtractor in Figure 2 is called the innovation, representing "how much the prediction was off." The Kalman gain $\boldsymbol{K}_k$ determines how much to trust this innovation when correcting the estimate.

The objective to minimize is not the innovation $\boldsymbol{e}_k$ itself, but rather the trace of the estimation error covariance (the total variance of estimation errors across all state variables):

$$\text{tr}(\boldsymbol{P}_{k|k}) \to \min, \quad \boldsymbol{P}_{k|k} = E\!\left[(\boldsymbol{x}_k - \hat{\boldsymbol{x}}_{k|k})(\boldsymbol{x}_k - \hat{\boldsymbol{x}}_{k|k})^T\right]$$

The optimal gain derived from this minimization is (see Derivation for details):

$$\boldsymbol{K}_k = \boldsymbol{P}_{k|k-1}\boldsymbol{H}_k^T(\boldsymbol{H}_k\boldsymbol{P}_{k|k-1}\boldsymbol{H}_k^T + \boldsymbol{R}_k)^{-1}$$

Intuitively:

  • Large observation noise (large $\boldsymbol{R}_k$) → small $\boldsymbol{K}_k$ → trust the prediction more, rely less on the observation
  • Large prediction uncertainty (large $\boldsymbol{P}_{k|k-1}$) → large $\boldsymbol{K}_k$ → trust the observation more and make a larger correction

Comparison with the Wiener filter

Comparing with the Wiener filter's optimal solution $G(\omega) = H^*P_S(|H|^2P_S + P_N)^{-1}$, the Kalman gain is based on the same principle of weighting by the ratio of "signal uncertainty" to "noise uncertainty." While the Wiener filter uses the signal-to-noise ratio in the frequency domain, the Kalman filter uses the ratio of covariance matrices in the time domain.

For details, see Algorithm and Derivation.

Prerequisites and Limitations

Required Known Parameters

To apply the basic Kalman filter, the following parameters must be known (or set in advance):

Parameter Meaning Practical handling
$\boldsymbol{F}_k$ State transition matrix Derived from physical laws (e.g., equations of motion)
$\boldsymbol{H}_k$ Observation matrix Determined from sensor characteristics
$\boldsymbol{Q}_k$ Process noise covariance Magnitude of model error (requires tuning)
$\boldsymbol{R}_k$ Observation noise covariance Determined from sensor specifications or measurements
$\hat{\boldsymbol{x}}_0,\, \boldsymbol{P}_0$ Initial estimate and covariance From prior knowledge or initialized with large $\boldsymbol{P}_0$

About the subscript $k$

The subscript $k$ in the table is included for generality, indicating that the filter can handle time-varying systems. In practice, many systems are time-invariant (LTI), where $\boldsymbol{F}, \boldsymbol{H}, \boldsymbol{Q}, \boldsymbol{R}$ are constant matrices determined once and used at all time steps (e.g., GPS, inertial navigation, most sensor fusion applications). However, when the system dynamics change over time — such as during a rocket's flight (mass decreases as fuel is consumed) or an aircraft's turn (equations of motion change) — $\boldsymbol{F}_k$ must be computed from the physical model at each time step $k$.

In practice, it is rare for all of these to be known exactly. In particular, $\boldsymbol{Q}_k$ and $\boldsymbol{R}_k$ are difficult to derive theoretically and often require trial-and-error tuning.

What happens when parameters are inaccurate

  • $\boldsymbol{Q}_k$ too small → the filter overtrusts the model and ignores observations (risk of divergence)
  • $\boldsymbol{Q}_k$ too large → the filter follows observations too closely and fails to remove noise
  • $\boldsymbol{R}_k$ too small → overtrusts noisy observations
  • $\boldsymbol{R}_k$ too large → ignores observations and relies too much on prediction
  • $\boldsymbol{F}_k$ inaccurate → predictions themselves become off target

What happens with non-Gaussian noise

The Kalman filter still works even when the noise is non-Gaussian. However, the optimality guarantee changes:

  • Gaussian noise: the Kalman filter is optimal among all estimators (minimum variance estimate)
  • Non-Gaussian noise: the Kalman filter is optimal among linear estimators (BLUE: Best Linear Unbiased Estimator), but may be outperformed by nonlinear estimators
Noise type Kalman filter performance More suitable methods
Gaussian Optimal (cannot be improved) None
Heavy-tailed (many outliers) Pulled by outliers Particle filter, Robust KF
Multimodal (multiple hypotheses) Breaks down due to single-Gaussian approximation Particle filter, Mixture KF
Uniform distribution Works but inefficient Particle filter

In other words, it does not "stop working" but rather "loses optimality." In practice, if the noise is approximately Gaussian, the filter performs well enough. When non-Gaussianity is significant, nonlinear methods such as the particle filter are effective.

The Leap from the Wiener Filter

With the above prerequisites and limitations in mind, let us revisit the historical significance of the Kalman filter. Before the Kalman filter, optimal filtering was dominated by the Wiener filter, developed by Norbert Wiener in the 1940s. The Wiener filter is formulated in the frequency domain and provides the optimal linear filter in the minimum mean square error (MMSE) sense for stationary processes. However, it had several fundamental limitations in practice.

Characteristic Wiener filter Kalman filter
Processing mode Typically batch processing (requires all data) Sequential processing (real-time update with each observation)
Applicable systems Stationary processes only Non-stationary and time-varying systems
Formulation Frequency domain (power spectra) State space (time domain, naturally extends to multivariable)
Memory Entire observation history (or autocorrelation function) Only the current estimate and covariance
Multivariable Difficult to extend (MIMO Wiener-Hopf equation) Naturally handles multivariable via matrix operations
Uncertainty quantification Residual error power only Covariance matrix tracks estimation confidence over time

The essential innovation of the Kalman filter was reformulating the optimal filtering problem from "spectral decomposition" to "sequential state estimation update." While the Wiener filter "designs the optimal frequency response after seeing all data," the Kalman filter "optimally corrects the estimate each time a new observation arrives." This paradigm shift made it possible to address real-world demands: real-time control, non-stationary systems, and multivariable problems.

Relationship between the two

For stationary processes as $k \to \infty$, the Kalman gain $\boldsymbol{K}_k$ converges to a constant value, and the steady-state Kalman filter becomes equivalent to the Wiener filter. In other words, the Wiener filter can be viewed as the steady-state limit of the Kalman filter.

How one researcher achieved so many advantages at once

What Kalman did in his 1960 paper "A New Approach to Linear Filtering and Prediction Problems" was not to design each advantage one by one, but to fundamentally change the formulation of the problem. Specifically, he reformulated Wiener's problem setting (frequency domain, stationary, infinite time) using the state-space representation from control theory (time domain, finite-dimensional, sequential update).

As a result, all the advantages listed in the table above follow automatically:

Advantage Why it follows automatically from the state-space representation
Sequential processing The state equation $\boldsymbol{x}_k = \boldsymbol{F}_k\boldsymbol{x}_{k-1} + \boldsymbol{w}_k$ has a one-step-at-a-time update structure
Non-stationary support $\boldsymbol{F}_k, \boldsymbol{H}_k$ carry time subscript $k$; time-varying is not a generalization but built in from the start
Time-domain formulation The state-space representation is inherently a time-domain framework
Memory efficiency The state vector $\boldsymbol{x}_k$ is a sufficient statistic that compresses all past observations
Multivariable support A multidimensional state vector is assumed from the outset
Covariance tracking The update equation for the error covariance $\boldsymbol{P}_k$ is naturally derived within the state-space framework

In essence, Kalman's fundamental contribution was not inventing multiple advantages individually, but rather that "a change of perspective brought everything along." It is a prime example of the power of formulation in mathematics, where the right abstraction solves many problems at once.

Intellectual context

  • The development of control theory in the 1950s, especially R.E. Bellman's dynamic programming (a sequential formulation of optimal control)
  • The limitations of Wiener-Hopf theory (difficulty extending to non-stationary and multivariable cases) were widely recognized
  • Kalman himself had deep expertise in both control theory and probability/optimization

Extensions for Imperfect Models

Many extension methods have been developed to relax the constraints of the basic Kalman filter (linearity, Gaussianity, known model parameters):

Method What it estimates/handles Approach
Adaptive Kalman filter $\boldsymbol{Q}_k$, $\boldsymbol{R}_k$ Monitors innovation $\boldsymbol{e}_k$ statistics to adjust covariances online
Extended Kalman filter (EKF) Nonlinear state transition/observation Linearizes via Jacobian (partial derivative matrix) at each time step
Unscented Kalman filter (UKF) Same as above Approximates nonlinear functions via sigma points without linearization
State augmentation Unknown parameters in $\boldsymbol{F}$, $\boldsymbol{H}$ Adds unknown parameters to the state vector for joint estimation
EM algorithm $\boldsymbol{F}$, $\boldsymbol{H}$, $\boldsymbol{Q}$, $\boldsymbol{R}$ Maximum likelihood estimation of model parameters in batch mode
Particle filter Arbitrary nonlinear/non-Gaussian Removes the Gaussian assumption via Monte Carlo methods

Practical guidelines

  • $\boldsymbol{F}_k$, $\boldsymbol{H}_k$ are linear and known → basic Kalman filter
  • $\boldsymbol{Q}_k$, $\boldsymbol{R}_k$ are uncertain → adaptive Kalman filter for dynamic adjustment
  • System is nonlinear → EKF (mild nonlinearity) or UKF (strong nonlinearity)
  • Model structure itself is unknown → state augmentation or EM
  • Non-Gaussian noise → particle filter

Articles

Applications

GPS/GNSS

High-precision estimation of position and velocity from satellite signals

Robotics

SLAM (Simultaneous Localization and Mapping)

Autonomous Driving

Vehicle position and attitude estimation via sensor fusion

Aerospace

Orbit estimation and attitude control for aircraft and spacecraft

Finance

Trend estimation and forecasting of time series data

Image Processing

Object tracking and video stabilization