SVD — Golub-Reinsch, Jacobi, Divide-and-Conquer, Randomized
Overview
The singular value decomposition (SVD) of an $m \times n$ matrix $A$ is
$$A = U \Sigma V^\top, \quad U \in \mathbb{R}^{m \times m}, \, V \in \mathbb{R}^{n \times n} \text{ orthogonal}, \, \Sigma = \mathrm{diag}(\sigma_1 \geq \sigma_2 \geq \cdots \geq 0).$$
SVD exposes rank, condition number, low-rank approximations, and the pseudoinverse in a single decomposition. Unlike eigenvalue decomposition, it requires neither symmetry nor squareness and always exists.
The algorithmic landscape:
- Golub-Reinsch — the standard dense SVD, a two-stage method
- Jacobi SVD — high relative accuracy, naturally parallel, more iterations
- Divide-and-conquer (D&C) — asymptotically faster for large $n$
- Randomized SVD — dramatically faster when only a low-rank approximation is needed
This page covers the internal structure of each, Demmel-Kahan zero-shift sweeps for relative accuracy, Eckart-Young low-rank approximation, and applications to PCA, pseudoinverse, and rank-deficient least squares.
For API usage, see LinAlg API — SVD and Randomized SVD.
Golub-Reinsch
The standard dense SVD algorithm (Golub and Reinsch, 1970) is two-staged.
Stage 1: Upper Bidiagonalization by Householder
Alternating left and right Householder reflections reduce $A$ to upper bidiagonal $B$:
$$U_1^\top A V_1 = B = \begin{pmatrix} \alpha_1 & \beta_1 & & \\ & \alpha_2 & \beta_2 & \\ & & \ddots & \beta_{n-1} \\ & & & \alpha_n \end{pmatrix}.$$
$U_1$ is the product of $n$ left reflections, $V_1$ of $n-2$ right reflections. Flop count is $4mn^2 - \tfrac{4}{3}n^3$. After this stage the remaining problem is $n$-dimensional.
Stage 2: Diagonalize via Implicit-Shift QR Sweeps
The singular values of $B$ are the square roots of the eigenvalues of $B^\top B$, a symmetric tridiagonal matrix. Forming $B^\top B$ explicitly would square the condition number, so the practical algorithm applies Givens rotations directly to $B$, carrying out the equivalent QR iteration implicitly. With Wilkinson shifts each sweep costs $O(n)$ and the average number of sweeps is $O(n)$.
Total cost is $O(mn^2 + n^3)$, the canonical dense SVD benchmark. sangi's svd_decomposition follows this scheme.
See also: SVD / Singular Value Decomposition (theory) / Numerical Analysis — SVD
Demmel-Kahan Zero-Shift QR
A naive Wilkinson-shift QR converges quickly but degrades the relative accuracy of small singular values:
$$|\hat{\sigma}_i - \sigma_i| / \sigma_i = O(u \cdot \sigma_1 / \sigma_i).$$
The smallest singular value $\sigma_n$ suffers degradation proportional to the condition number. Demmel and Kahan (1990) discovered that running QR sweeps with shift 0 on a bidiagonal matrix uses only multiplications and divisions — never subtractive cancellation — and therefore preserves all singular values to $O(nu)$ relative accuracy.
Practical implementations are hybrid: Wilkinson shifts during most of the convergence, switching to zero-shift for the final sweeps on small magnitudes.
LAPACK's xBDSQR follows this strategy and sangi applies an equivalent technique internally.
High relative accuracy is decisive when small singular values carry physical meaning — noise removal in signal-processing SVD, scale separation in nanoscale analysis, and regularization parameter selection in ill-conditioned least squares.
Jacobi SVD
Jacobi's method, classical for the symmetric eigenvalue problem, extends naturally to SVD. The one-sided Jacobi SVD of Forsythe, Henrici, and Hestenes is especially attractive for high accuracy and parallelism.
The One-Sided Procedure
For each pair of columns $\mathbf{a}_i, \mathbf{a}_j$, apply a $2 \times 2$ plane rotation $J_{ij}$ from the right such that the resulting columns are mutually orthogonal:
$$A J_{ij} = A', \quad (\mathbf{a}'_i)^\top \mathbf{a}'_j = 0.$$
One pass over all column pairs is a sweep; repeat until every pair is already orthogonal. At convergence, the column norms $\|\mathbf{a}'_i\|_2$ are the singular values $\sigma_i$, the normalized columns form $U$, and the accumulated product of rotations is $V^\top$.
High Relative Accuracy
Each rotation operates directly on the angle between two columns, so relative errors in matrix entries are not amplified. Demmel and Veselić (1992) proved relative accuracy $O(u \cdot \kappa(A_s))$ for all singular values, where $A_s$ is $A$ after column scaling and $\kappa(A_s) \leq \kappa(A)$. For ill-graded problems (singular values spanning many orders of magnitude) Jacobi outperforms Golub-Reinsch.
Parallelism
Independent column pairs can be processed concurrently, giving $\lceil n/2 \rceil$ parallel $2 \times 2$ problems per sweep. Round-robin scheduling cycles through all pairs. Sweep counts are typically 5-20, leading to a constant factor several times larger than Golub-Reinsch.
Divide-and-Conquer SVD
For large bidiagonal $B$, divide-and-conquer (D&C) is asymptotically faster.
LAPACK's xGESDD implements this path.
$B$ is split in the middle into two sub-bidiagonal matrices $B_1, B_2$ plus a connecting term. $B_1$ and $B_2$ are decomposed recursively, then merged via the secular equation
$$1 + \rho \sum_{i=1}^{n} \frac{\zeta_i^2}{d_i - \lambda} = 0,$$
whose roots give the eigenvalues of the merged tridiagonal. The Gragg-Reinsch root finder locates each in $O(1)$ Newton iterations.
Total cost is typically $O(n^2 \log n)$, beating the $O(n^3)$ of Golub-Reinsch.
In practice D&C dominates for $n \gtrsim 1000$ and is 5-10x faster at $n = 10^4$.
sangi currently dispatches to LAPACKE_dgesvd (Golub-Reinsch) through the MKL/OpenBLAS path; switching to LAPACKE_dgesdd (D&C) for large matrices is on the roadmap.
Randomized SVD
When only the top $k$ singular values and vectors of $A$ are needed and $k \ll \min(m, n)$, randomized SVD computes an approximate decomposition in $O(mn k)$ flops. Halko, Martinsson, and Tropp (2011) gave the definitive error analysis.
Three-Stage Algorithm
- Range capture: draw a random Gaussian $\Omega \in \mathbb{R}^{n \times (k+p)}$ ($p$ is oversampling, typically 5-10) and form $Y = A \Omega$. With high probability the columns of $Y$ span the range of $A$.
- Orthonormalization: QR-factor $Y$ to obtain $Q \in \mathbb{R}^{m \times (k+p)}$ with orthonormal columns approximating the range of $A$.
- Small SVD: form $B = Q^\top A \in \mathbb{R}^{(k+p) \times n}$, take its standard SVD $B = \tilde{U} \Sigma V^\top$, and set $U = Q \tilde{U}$, giving $A \approx U \Sigma V^\top$.
Error Bound
The expected error satisfies
$$\mathbb{E} \|A - Q Q^\top A\|_2 \leq \left(1 + \frac{4 \sqrt{k+p}}{p-1} \sqrt{\min(m,n)}\right) \sigma_{k+1}(A).$$
The smaller $\sigma_{k+1}$ (the faster the spectral decay) the better the approximation. Adding $q$ power iterations $Y = (AA^\top)^q A \Omega$ pushes the residual factor to $\sigma_{k+1}^{2q+1}$, dramatically improving low-rank accuracy.
When to Use It
Extracting the top 50-500 singular values of $10^5 \times 10^5$ matrices, low-rank approximation of image and recommendation matrices, large-scale PCA, data compression, and latent semantic analysis. sangi's randomized_svd implements this method.
See also: Randomized SVD
Economy and Truncated SVD
For $m \geq n$, the full SVD has $U \in \mathbb{R}^{m \times m}$, but the last $m - n$ columns correspond to zero singular values and merely span the null space. The economy (thin) SVD discards them:
$$A = U_n \Sigma_n V^\top, \quad U_n \in \mathbb{R}^{m \times n}, \, \Sigma_n \in \mathbb{R}^{n \times n}, \, V \in \mathbb{R}^{n \times n}.$$
The full $U$ costs $O(m^2)$ memory; the thin form needs only $O(mn)$ — essential when $m \gg n$.
Truncated SVD and Eckart-Young
Keeping the top $k$ singular triplets gives the truncated SVD
$$A_k = \sum_{i=1}^{k} \sigma_i \mathbf{u}_i \mathbf{v}_i^\top.$$
The Eckart-Young theorem (1936) states that $A_k$ is the best rank-$k$ approximation in both Frobenius and spectral norms:
$$\min_{\mathrm{rank}(B) \leq k} \|A - B\|_F = \sqrt{\sigma_{k+1}^2 + \cdots + \sigma_r^2}, \quad \min_{\mathrm{rank}(B) \leq k} \|A - B\|_2 = \sigma_{k+1}.$$
This is the theoretical basis for PCA, image compression, recommender systems, and latent semantic analysis.
Applications
Rank-Deficient Least Squares
When $A$ is rank-deficient or ill-conditioned, the SVD constructs the Moore-Penrose pseudoinverse:
$$A^+ = V \Sigma^+ U^\top, \quad \Sigma^+_{ii} = \begin{cases} 1/\sigma_i & \sigma_i > \tau \\ 0 & \sigma_i \leq \tau, \end{cases}$$
where $\tau$ is the numerical rank threshold (typically $\tau = \mathrm{eps} \cdot \sigma_1 \cdot \max(m,n)$).
The minimum-norm least-squares solution is $\mathbf{x} = A^+ \mathbf{b}$.
sangi's svd_solve exposes this threshold through its rcond parameter.
Principal Component Analysis (PCA)
For a centered data matrix $X \in \mathbb{R}^{n \times d}$ ($n$ samples, $d$ features), PCA is the eigenvalue decomposition of the covariance $C = X^\top X / n$. Computing the SVD $X = U \Sigma V^\top$ directly gives the principal axes as columns of $V$ and the variance contributions as $\sigma_i^2 / n$, without ever forming $C$. For ill-conditioned $X$, going through $C$ squares the condition number and loses accuracy, so SVD is the standard route.
Condition Number and Rank
The 2-norm condition number is $\kappa_2(A) = \sigma_1 / \sigma_n$ (when $\sigma_n > 0$), available only from SVD. Numerical rank is the count of singular values above a threshold $\tau$, providing a robust effective-rank measure on noisy data.
Low-Rank Approximation
Image compression, recommendation matrix completion, latent semantic analysis, graph embedding, and data visualization all reduce large matrices to $A_k = \sum_{i=1}^{k} \sigma_i \mathbf{u}_i \mathbf{v}_i^\top$, justified by Eckart-Young. For large-scale problems, combining with randomized SVD is standard practice.
See also: Moore-Penrose Pseudoinverse / Total Least Squares / Regularization
sangi's Choices
| Function / Scenario | Method | Notes |
|---|---|---|
svd_decomposition dense default | Golub-Reinsch with Wilkinson/zero-shift hybrid | Golub-Reinsch through MKL dgesvd |
| High relative accuracy required | Jacobi SVD | Ill-conditioned or ill-graded matrices |
| Large dense $n$ | Divide-and-conquer (LAPACK dgesdd) | Beats Golub-Reinsch from $n \gtrsim 1000$ (planned dispatch) |
| Low-rank approximation ($k \ll \min(m,n)$) | randomized_svd | Power iteration $q = 1, 2$ for accuracy boost |
svd_solve | Golub-Reinsch with rcond damping | Stable pseudoinverse on ill-conditioned $A$ |
References
- Golub, G. H. and Reinsch, C. (1970). "Singular value decomposition and least squares solutions". Numer. Math., 14, 403–420.
- Eckart, C. and Young, G. (1936). "The approximation of one matrix by another of lower rank". Psychometrika, 1, 211–218.
- Demmel, J. and Kahan, W. (1990). "Accurate singular values of bidiagonal matrices". SIAM J. Sci. Stat. Comput., 11(5), 873–912.
- Demmel, J. and Veselić, K. (1992). "Jacobi's method is more accurate than QR". SIAM J. Matrix Anal. Appl., 13(4), 1204–1245.
- Halko, N., Martinsson, P.-G., and Tropp, J. A. (2011). "Finding structure with randomness: probabilistic algorithms for constructing approximate matrix decompositions". SIAM Review, 53(2), 217–288.
- Trefethen, L. N. and Bau, D. (1997). Numerical Linear Algebra, SIAM. Lectures 4–5, 31–32.