Chapter 2: Time Domain Derivation
1. Problem Setup
We linearly estimate a target signal $d \in \mathbb{C}$ (scalar) from an observation vector $\boldsymbol{x} \in \mathbb{C}^N$:
\begin{equation} \hat{d} = \boldsymbol{w}^H \boldsymbol{x} \end{equation}Here $\boldsymbol{w}$ is the coefficient vector and $(\cdot)^H$ denotes the Hermitian transpose (conjugate transpose).
2. Cost Function
We minimize the mean-square error of the estimation error $e = d - \hat{d} = d - \boldsymbol{w}^H\boldsymbol{x}$:
\begin{equation} J = E[|e|^2] = E[|d - \boldsymbol{w}^H\boldsymbol{x}|^2] \to \min \label{eq:J} \end{equation}3. Notation
- Target signal power: $\sigma_d^2 = E[|d|^2]$
- Cross-correlation vector: $\boldsymbol{p} = E[\boldsymbol{x}d^*]$ ($N \times 1$)
- Autocorrelation matrix: $\boldsymbol{R} = E[\boldsymbol{x}\boldsymbol{x}^H]$ ($N \times N$, Hermitian)
Using this notation, we expand the cost function \eqref{eq:J}. Since $|e|^2 = e \cdot e^*$:
\begin{align} J &= E[|d - \boldsymbol{w}^H\boldsymbol{x}|^2] \nonumber \\ &= E\bigl[(d - \boldsymbol{w}^H\boldsymbol{x})(d - \boldsymbol{w}^H\boldsymbol{x})^*\bigr] \nonumber \\ &= E\bigl[(d - \boldsymbol{w}^H\boldsymbol{x})(d^* - \boldsymbol{x}^H\boldsymbol{w})\bigr] \nonumber \\ &= E[dd^*] - E[d \cdot \boldsymbol{x}^H\boldsymbol{w}] - E[\boldsymbol{w}^H\boldsymbol{x} \cdot d^*] + E[\boldsymbol{w}^H\boldsymbol{x}\boldsymbol{x}^H\boldsymbol{w}] \end{align}Since $\boldsymbol{w}$ is a deterministic vector (not a random variable), it can be taken outside the expectation:
\begin{align} J &= E[|d|^2] - E[d\boldsymbol{x}^H]\boldsymbol{w} - \boldsymbol{w}^H E[\boldsymbol{x}d^*] + \boldsymbol{w}^H E[\boldsymbol{x}\boldsymbol{x}^H]\boldsymbol{w} \nonumber \\ &= \sigma_d^2 - \boldsymbol{p}^H\boldsymbol{w} - \boldsymbol{w}^H\boldsymbol{p} + \boldsymbol{w}^H\boldsymbol{R}\boldsymbol{w} \label{eq:J_symbols} \end{align}Here we used $E[d\boldsymbol{x}^H] = (E[\boldsymbol{x}d^*])^H = \boldsymbol{p}^H$.
4. Derivation via the Orthogonality Principle
We derive geometrically the condition that $\boldsymbol{w}$ must satisfy to minimize $J$. Let $e = d - \hat{d}$ be the error for the optimal estimate $\hat{d} = \boldsymbol{w}^H\boldsymbol{x}$, and let $\delta = \hat{d}' - \hat{d}$ be the difference from any other estimate $\hat{d}' = \boldsymbol{w}'^H\boldsymbol{x}$. Then:
\begin{align} E[|d - \hat{d}'|^2] &= E[|e - \delta|^2] \nonumber \\ &= E[|e|^2] - 2\operatorname{Re}\,E[e\,\delta^*] + E[|\delta|^2] \end{align}For $\boldsymbol{w}$ to minimize $J$, we need $E[|d - \hat{d}'|^2] \ge E[|e|^2]$ for every $\hat{d}'$. Since $\delta$ can be any linear combination of $\boldsymbol{x}$, the cross-term must vanish:
$$E[e\,\delta^*] = 0 \quad \text{(for every linear combination $\delta$ of $\boldsymbol{x}$)}$$This is equivalent to requiring, for each component:
\begin{equation} E[e \cdot x_k^*] = 0 \quad \text{for all } k = 1, 2, \ldots, N \label{eq:orthogonality} \end{equation}We thus obtain the following principle:
Orthogonality Principle
In linear minimum mean-square estimation, $J = E[|e|^2]$ is minimized if and only if the error $e$ is orthogonal (uncorrelated) to all observation data.
Writing \eqref{eq:orthogonality} in vector form:
\begin{equation} E[\boldsymbol{x} e^*] = \boldsymbol{0} \end{equation}Substituting $e = d - \boldsymbol{w}^H\boldsymbol{x}$ and noting $e^* = d^* - \boldsymbol{x}^H\boldsymbol{w}$:
\begin{align} E[\boldsymbol{x} e^*] &= E[\boldsymbol{x}(d^* - \boldsymbol{x}^H\boldsymbol{w})] \nonumber \\ &= E[\boldsymbol{x}d^*] - E[\boldsymbol{x}\boldsymbol{x}^H]\boldsymbol{w} \nonumber \\ &= \boldsymbol{p} - \boldsymbol{R}\boldsymbol{w} = \boldsymbol{0} \end{align}Here $\boldsymbol{w}$ is deterministic and can be taken outside the expectation. Rearranging:
Wiener-Hopf Equation (via the Orthogonality Principle)
\begin{equation} \boldsymbol{R}\boldsymbol{w} = \boldsymbol{p} \end{equation}If $\boldsymbol{R}$ is nonsingular:
\begin{equation} \boldsymbol{w}_{\rm opt} = \boldsymbol{R}^{-1}\boldsymbol{p} \end{equation}5. Derivation via Wirtinger Calculus
We differentiate each term of \eqref{eq:J_symbols} with respect to $\boldsymbol{w}^*$. By the Wirtinger differentiation formulas for vectors:
\begin{align} \frac{\partial J}{\partial \boldsymbol{w}^*} &= \frac{\partial}{\partial \boldsymbol{w}^*}\left(\sigma_d^2 - \boldsymbol{p}^H\boldsymbol{w} - \boldsymbol{w}^H\boldsymbol{p} + \boldsymbol{w}^H\boldsymbol{R}\boldsymbol{w}\right) \nonumber \\ &= 0 - 0 - \boldsymbol{p} + \boldsymbol{R}\boldsymbol{w} \nonumber \\ &= \boldsymbol{R}\boldsymbol{w} - \boldsymbol{p} \end{align}Here, $\boldsymbol{p}^H\boldsymbol{w}$ does not contain $\boldsymbol{w}^*$, so its derivative is $\boldsymbol{0}$; the derivative of $\boldsymbol{w}^H\boldsymbol{p}$ is $\boldsymbol{p}$; and the derivative of $\boldsymbol{w}^H\boldsymbol{R}\boldsymbol{w}$ is $\boldsymbol{R}\boldsymbol{w}$ (since $\boldsymbol{R}$ is Hermitian).
Setting $\dfrac{\partial J}{\partial \boldsymbol{w}^*} = \boldsymbol{0}$:
Wiener-Hopf Equation (Normal Equation)
\begin{equation} \boldsymbol{R}\boldsymbol{w}_{\rm opt} = \boldsymbol{p} \end{equation}That is:
\begin{equation} \boldsymbol{w}_{\rm opt} = \boldsymbol{R}^{-1}\boldsymbol{p} \label{eq:wiener_time} \end{equation}6. Intuitive Interpretation
- $\boldsymbol{p} = E[\boldsymbol{x}d^*]$: indicates which components of $\boldsymbol{x}$ are correlated with the target $d$
- $\boldsymbol{R}^{-1}$: cancels mutual correlations among the observation components (whitening)
As a result, only the components correlated with $d$ are extracted without distortion.
The time-domain solution $\boldsymbol{w}_{\rm opt} = \boldsymbol{R}^{-1}\boldsymbol{p}$ derived here is fully equivalent to the frequency-domain solution $G(\omega) = H^* P_S / (|H|^2 P_S + P_N)$ from Chapter 1, related via the Fourier transform. The correspondence and proof of equivalence are given in Chapter 3.