What is the orthogonality principle?

In linear minimum mean-square estimation, the mean-square error is minimized if and only if the estimation error is orthogonal (uncorrelated) to all observation data. Geometrically, this corresponds to orthogonally projecting the target signal onto the subspace spanned by the observation data.

What is the Wiener-Hopf equation?

The Wiener-Hopf equation Rw=p is the normal equation that determines the optimal Wiener filter coefficients. R is the autocorrelation matrix of the observation signal, p is the cross-correlation vector between the observation and target signals, and the optimal filter is w_opt = R^{-1}p.

Can the Wiener-Hopf equation be derived using Wirtinger calculus?

Yes. By differentiating the cost function J with respect to the complex conjugate vector w* and setting the derivative to zero, the same equation Rw=p is obtained. Wirtinger calculus is a differentiation framework suited for optimizing real-valued functions of complex variables.

Chapter 2: Time Domain Derivation

1. Problem Setup

We linearly estimate a target signal $d \in \mathbb{C}$ (scalar) from an observation vector $\boldsymbol{x} \in \mathbb{C}^N$:

\begin{equation} \hat{d} = \boldsymbol{w}^H \boldsymbol{x} \end{equation}

Here $\boldsymbol{w}$ is the coefficient vector and $(\cdot)^H$ denotes the Hermitian transpose (conjugate transpose).

Figure 1. Linear estimation model in the time domain

2. Cost Function

We minimize the mean-square error of the estimation error $e = d - \hat{d} = d - \boldsymbol{w}^H\boldsymbol{x}$:

\begin{equation} J = E[|e|^2] = E[|d - \boldsymbol{w}^H\boldsymbol{x}|^2] \to \min \label{eq:J} \end{equation}

3. Notation

Target signal power: $\sigma_d^2 = E[|d|^2]$
Cross-correlation vector: $\boldsymbol{p} = E[\boldsymbol{x}d^*]$ ($N \times 1$)
Autocorrelation matrix: $\boldsymbol{R} = E[\boldsymbol{x}\boldsymbol{x}^H]$ ($N \times N$, Hermitian)

Using this notation, we expand the cost function \eqref{eq:J}. Since $|e|^2 = e \cdot e^*$:

\begin{align} J &= E[|d - \boldsymbol{w}^H\boldsymbol{x}|^2] \nonumber \\ &= E\bigl[(d - \boldsymbol{w}^H\boldsymbol{x})(d - \boldsymbol{w}^H\boldsymbol{x})^*\bigr] \nonumber \\ &= E\bigl[(d - \boldsymbol{w}^H\boldsymbol{x})(d^* - \boldsymbol{x}^H\boldsymbol{w})\bigr] \nonumber \\ &= E[dd^*] - E[d \cdot \boldsymbol{x}^H\boldsymbol{w}] - E[\boldsymbol{w}^H\boldsymbol{x} \cdot d^*] + E[\boldsymbol{w}^H\boldsymbol{x}\boldsymbol{x}^H\boldsymbol{w}] \end{align}

Since $\boldsymbol{w}$ is a deterministic vector (not a random variable), it can be taken outside the expectation:

\begin{align} J &= E[|d|^2] - E[d\boldsymbol{x}^H]\boldsymbol{w} - \boldsymbol{w}^H E[\boldsymbol{x}d^*] + \boldsymbol{w}^H E[\boldsymbol{x}\boldsymbol{x}^H]\boldsymbol{w} \nonumber \\ &= \sigma_d^2 - \boldsymbol{p}^H\boldsymbol{w} - \boldsymbol{w}^H\boldsymbol{p} + \boldsymbol{w}^H\boldsymbol{R}\boldsymbol{w} \label{eq:J_symbols} \end{align}

Here we used $E[d\boldsymbol{x}^H] = (E[\boldsymbol{x}d^*])^H = \boldsymbol{p}^H$.

4. Derivation via the Orthogonality Principle

We derive geometrically the condition that $\boldsymbol{w}$ must satisfy to minimize $J$. Let $e = d - \hat{d}$ be the error for the optimal estimate $\hat{d} = \boldsymbol{w}^H\boldsymbol{x}$, and let $\delta = \hat{d}' - \hat{d}$ be the difference from any other estimate $\hat{d}' = \boldsymbol{w}'^H\boldsymbol{x}$. Then:

\begin{align} E[|d - \hat{d}'|^2] &= E[|e - \delta|^2] \nonumber \\ &= E[|e|^2] - 2\operatorname{Re}\,E[e\,\delta^*] + E[|\delta|^2] \end{align}

For $\boldsymbol{w}$ to minimize $J$, we need $E[|d - \hat{d}'|^2] \ge E[|e|^2]$ for every $\hat{d}'$. Since $\delta$ can be any linear combination of $\boldsymbol{x}$, the cross-term must vanish:

$$E[e\,\delta^*] = 0 \quad \text{(for every linear combination $\delta$ of $\boldsymbol{x}$)}$$

This is equivalent to requiring, for each component:

\begin{equation} E[e \cdot x_k^*] = 0 \quad \text{for all } k = 1, 2, \ldots, N \label{eq:orthogonality} \end{equation}

We thus obtain the following principle:

Orthogonality Principle

In linear minimum mean-square estimation, $J = E[|e|^2]$ is minimized if and only if the error $e$ is orthogonal (uncorrelated) to all observation data.

Figure 2. Optimal estimation as orthogonal projection. The error $e$ is orthogonal to the subspace

Writing \eqref{eq:orthogonality} in vector form:

\begin{equation} E[\boldsymbol{x} e^*] = \boldsymbol{0} \end{equation}

Substituting $e = d - \boldsymbol{w}^H\boldsymbol{x}$ and noting $e^* = d^* - \boldsymbol{x}^H\boldsymbol{w}$:

\begin{align} E[\boldsymbol{x} e^*] &= E[\boldsymbol{x}(d^* - \boldsymbol{x}^H\boldsymbol{w})] \nonumber \\ &= E[\boldsymbol{x}d^*] - E[\boldsymbol{x}\boldsymbol{x}^H]\boldsymbol{w} \nonumber \\ &= \boldsymbol{p} - \boldsymbol{R}\boldsymbol{w} = \boldsymbol{0} \end{align}

Here $\boldsymbol{w}$ is deterministic and can be taken outside the expectation. Rearranging:

Wiener-Hopf Equation (via the Orthogonality Principle)

\begin{equation} \boldsymbol{R}\boldsymbol{w} = \boldsymbol{p} \end{equation}

If $\boldsymbol{R}$ is nonsingular:

\begin{equation} \boldsymbol{w}_{\rm opt} = \boldsymbol{R}^{-1}\boldsymbol{p} \end{equation}

5. Derivation via Wirtinger Calculus

We differentiate each term of \eqref{eq:J_symbols} with respect to $\boldsymbol{w}^*$. By the Wirtinger differentiation formulas for vectors:

\begin{align} \frac{\partial J}{\partial \boldsymbol{w}^*} &= \frac{\partial}{\partial \boldsymbol{w}^*}\left(\sigma_d^2 - \boldsymbol{p}^H\boldsymbol{w} - \boldsymbol{w}^H\boldsymbol{p} + \boldsymbol{w}^H\boldsymbol{R}\boldsymbol{w}\right) \nonumber \\ &= 0 - 0 - \boldsymbol{p} + \boldsymbol{R}\boldsymbol{w} \nonumber \\ &= \boldsymbol{R}\boldsymbol{w} - \boldsymbol{p} \end{align}

Here, $\boldsymbol{p}^H\boldsymbol{w}$ does not contain $\boldsymbol{w}^*$, so its derivative is $\boldsymbol{0}$; the derivative of $\boldsymbol{w}^H\boldsymbol{p}$ is $\boldsymbol{p}$; and the derivative of $\boldsymbol{w}^H\boldsymbol{R}\boldsymbol{w}$ is $\boldsymbol{R}\boldsymbol{w}$ (since $\boldsymbol{R}$ is Hermitian).

Setting $\dfrac{\partial J}{\partial \boldsymbol{w}^*} = \boldsymbol{0}$:

Wiener-Hopf Equation (Normal Equation)

\begin{equation} \boldsymbol{R}\boldsymbol{w}_{\rm opt} = \boldsymbol{p} \end{equation}

That is:

\begin{equation} \boldsymbol{w}_{\rm opt} = \boldsymbol{R}^{-1}\boldsymbol{p} \label{eq:wiener_time} \end{equation}

6. Intuitive Interpretation

$\boldsymbol{p} = E[\boldsymbol{x}d^*]$: indicates which components of $\boldsymbol{x}$ are correlated with the target $d$
$\boldsymbol{R}^{-1}$: cancels mutual correlations among the observation components (whitening)

As a result, only the components correlated with $d$ are extracted without distortion.

The time-domain solution $\boldsymbol{w}_{\rm opt} = \boldsymbol{R}^{-1}\boldsymbol{p}$ derived here is fully equivalent to the frequency-domain solution $G(\omega) = H^* P_S / (|H|^2 P_S + P_N)$ from Chapter 1, related via the Fourier transform. The correspondence and proof of equivalence are given in Chapter 3.