How does Golub-Reinsch SVD work?

Two stages. Stage 1 applies two-sided Householder reflections to bring A to upper bidiagonal form B = U_1^T A V_1. Stage 2 runs implicit-shift QR sweeps on B to diagonalize it, equivalent to the symmetric tridiagonal eigenvalue problem for B^T B but applied directly to B to preserve numerical conditioning. With Wilkinson shifts each sweep costs O(n) and convergence is on average O(n) sweeps. Total cost is O(mn^2 + n^3). LAPACK's xGESVD uses this scheme.

Jacobi SVD vs Golub-Reinsch — when to choose which?

Golub-Reinsch is faster but the relative accuracy of small singular values can degrade like the condition number. One-sided or two-sided Jacobi SVD applies plane rotations between column pairs and preserves relative accuracy of all singular values at machine precision. It also has natural parallelism. The trade-off is more iterations — typically several times slower than Golub-Reinsch. sangi uses Golub-Reinsch by default and switches to Jacobi when high relative accuracy is required (analytical computations, ill-graded matrices).

When is randomized SVD appropriate?

When only the top k singular values and vectors are needed and k << min(m, n). Standard SVD costs O(mn min(m,n)); randomized SVD costs O(mn k + (m+n) k^2). The algorithm: capture the range with a random Gaussian Ω so Y = AΩ, orthogonalize Y by QR to get Q, then compute the small SVD of B = Q^T A. Halko-Martinsson-Tropp (2011) provides the error analysis. sangi exposes randomized_svd; typical use is k in 50-500 for 5-10x speedups.

What is the Demmel-Kahan zero-shift QR sweep?

Running implicit QR sweeps on a bidiagonal matrix with shift 0 preserves the relative accuracy of small singular values because the arithmetic involves only multiplications and divisions, never subtractive cancellation. Wilkinson shifts converge faster but lose relative accuracy on tiny singular values. Demmel-Kahan (1990) introduced a hybrid strategy that keeps full relative accuracy O(nu) on every singular value. LAPACK's xBDSQR uses it, as does sangi internally.

What does the Eckart-Young theorem say?

The best rank-k approximation of A in the Frobenius norm (or spectral norm) is the truncated SVD A_k = U_k diag(σ_1, ..., σ_k) V_k^T. Concretely, min_(rank(B)<=k) ||A - B||_F = sqrt(σ_(k+1)^2 + ... + σ_r^2). This underpins PCA, image compression, recommender systems, latent semantic analysis, and any low-rank data approximation. sangi's truncated SVD builds on this theorem.

SVD — Golub-Reinsch, Jacobi, Divide-and-Conquer, Randomized

Q: What is the Demmel-Kahan zero-shift QR sweep?

Running implicit QR sweeps on a bidiagonal matrix with shift 0 preserves the relative accuracy of small singular values because the arithmetic involves only multiplications and divisions, never subtractive cancellation. Wilkinson shifts converge faster but lose relative accuracy on tiny singular values. Demmel-Kahan (1990) introduced a hybrid strategy that keeps full relative accuracy O(nu) on every singular value. LAPACK's xBDSQR uses it, as does sangi internally.

Q: What does the Eckart-Young theorem say?

The best rank-k approximation of A in the Frobenius norm (or spectral norm) is the truncated SVD A_k = U_k diag(σ_1, ..., σ_k) V_k^T. Concretely, min_(rank(B)<=k) ||A - B||_F = sqrt(σ_(k+1)^2 + ... + σ_r^2). This underpins PCA, image compression, recommender systems, latent semantic analysis, and any low-rank data approximation. sangi's truncated SVD builds on this theorem.

Overview

The singular value decomposition (SVD) of an $m \times n$ matrix $A$ is

$$A = U \Sigma V^\top, \quad U \in \mathbb{R}^{m \times m}, \, V \in \mathbb{R}^{n \times n} \text{ orthogonal}, \, \Sigma = \mathrm{diag}(\sigma_1 \geq \sigma_2 \geq \cdots \geq 0).$$

SVD exposes rank, condition number, low-rank approximations, and the pseudoinverse in a single decomposition. Unlike eigenvalue decomposition, it requires neither symmetry nor squareness and always exists.

The algorithmic landscape:

Golub-Reinsch — the standard dense SVD, a two-stage method
Jacobi SVD — high relative accuracy, naturally parallel, more iterations
Divide-and-conquer (D&C) — asymptotically faster for large $n$
Randomized SVD — dramatically faster when only a low-rank approximation is needed

This page covers the internal structure of each, Demmel-Kahan zero-shift sweeps for relative accuracy, Eckart-Young low-rank approximation, and applications to PCA, pseudoinverse, and rank-deficient least squares.

For API usage, see LinAlg API — SVD and Randomized SVD.

Golub-Reinsch

The standard dense SVD algorithm (Golub and Reinsch, 1970) is two-staged.

Stage 1: Upper Bidiagonalization by Householder

Alternating left and right Householder reflections reduce $A$ to upper bidiagonal $B$:

$$U_1^\top A V_1 = B = \begin{pmatrix} \alpha_1 & \beta_1 & & \\ & \alpha_2 & \beta_2 & \\ & & \ddots & \beta_{n-1} \\ & & & \alpha_n \end{pmatrix}.$$

$U_1$ is the product of $n$ left reflections, $V_1$ of $n-2$ right reflections. Flop count is $4mn^2 - \tfrac{4}{3}n^3$. After this stage the remaining problem is $n$-dimensional.

Stage 2: Diagonalize via Implicit-Shift QR Sweeps

The singular values of $B$ are the square roots of the eigenvalues of $B^\top B$, a symmetric tridiagonal matrix. Forming $B^\top B$ explicitly would square the condition number, so the practical algorithm applies Givens rotations directly to $B$, carrying out the equivalent QR iteration implicitly. With Wilkinson shifts each sweep costs $O(n)$ and the average number of sweeps is $O(n)$.

Total cost is $O(mn^2 + n^3)$, the canonical dense SVD benchmark. sangi's svd_decomposition follows this scheme.

Demmel-Kahan Zero-Shift QR

A naive Wilkinson-shift QR converges quickly but degrades the relative accuracy of small singular values:

$$|\hat{\sigma}_i - \sigma_i| / \sigma_i = O(u \cdot \sigma_1 / \sigma_i).$$

The smallest singular value $\sigma_n$ suffers degradation proportional to the condition number. Demmel and Kahan (1990) discovered that running QR sweeps with shift 0 on a bidiagonal matrix uses only multiplications and divisions — never subtractive cancellation — and therefore preserves all singular values to $O(nu)$ relative accuracy.

Practical implementations are hybrid: Wilkinson shifts during most of the convergence, switching to zero-shift for the final sweeps on small magnitudes. LAPACK's xBDSQR follows this strategy and sangi applies an equivalent technique internally.

High relative accuracy is decisive when small singular values carry physical meaning — noise removal in signal-processing SVD, scale separation in nanoscale analysis, and regularization parameter selection in ill-conditioned least squares.

Jacobi SVD

Jacobi's method, classical for the symmetric eigenvalue problem, extends naturally to SVD. The one-sided Jacobi SVD of Forsythe, Henrici, and Hestenes is especially attractive for high accuracy and parallelism.

The One-Sided Procedure

For each pair of columns $\mathbf{a}_i, \mathbf{a}_j$, apply a $2 \times 2$ plane rotation $J_{ij}$ from the right such that the resulting columns are mutually orthogonal:

$$A J_{ij} = A', \quad (\mathbf{a}'_i)^\top \mathbf{a}'_j = 0.$$

One pass over all column pairs is a sweep; repeat until every pair is already orthogonal. At convergence, the column norms $\|\mathbf{a}'_i\|_2$ are the singular values $\sigma_i$, the normalized columns form $U$, and the accumulated product of rotations is $V^\top$.

High Relative Accuracy

Each rotation operates directly on the angle between two columns, so relative errors in matrix entries are not amplified. Demmel and Veselić (1992) proved relative accuracy $O(u \cdot \kappa(A_s))$ for all singular values, where $A_s$ is $A$ after column scaling and $\kappa(A_s) \leq \kappa(A)$. For ill-graded problems (singular values spanning many orders of magnitude) Jacobi outperforms Golub-Reinsch.

Parallelism

Independent column pairs can be processed concurrently, giving $\lceil n/2 \rceil$ parallel $2 \times 2$ problems per sweep. Round-robin scheduling cycles through all pairs. Sweep counts are typically 5-20, leading to a constant factor several times larger than Golub-Reinsch.

Divide-and-Conquer SVD

For large bidiagonal $B$, divide-and-conquer (D&C) is asymptotically faster. LAPACK's xGESDD implements this path.

$B$ is split in the middle into two sub-bidiagonal matrices $B_1, B_2$ plus a connecting term. $B_1$ and $B_2$ are decomposed recursively, then merged via the secular equation

$$1 + \rho \sum_{i=1}^{n} \frac{\zeta_i^2}{d_i - \lambda} = 0,$$

whose roots give the eigenvalues of the merged tridiagonal. The Gragg-Reinsch root finder locates each in $O(1)$ Newton iterations.

Total cost is typically $O(n^2 \log n)$, beating the $O(n^3)$ of Golub-Reinsch. In practice D&C dominates for $n \gtrsim 1000$ and is 5-10x faster at $n = 10^4$. sangi currently dispatches to LAPACKE_dgesvd (Golub-Reinsch) through the MKL/OpenBLAS path; switching to LAPACKE_dgesdd (D&C) for large matrices is on the roadmap.

Randomized SVD

When only the top $k$ singular values and vectors of $A$ are needed and $k \ll \min(m, n)$, randomized SVD computes an approximate decomposition in $O(mn k)$ flops. Halko, Martinsson, and Tropp (2011) gave the definitive error analysis.

Three-Stage Algorithm

Range capture: draw a random Gaussian $\Omega \in \mathbb{R}^{n \times (k+p)}$ ($p$ is oversampling, typically 5-10) and form $Y = A \Omega$. With high probability the columns of $Y$ span the range of $A$.
Orthonormalization: QR-factor $Y$ to obtain $Q \in \mathbb{R}^{m \times (k+p)}$ with orthonormal columns approximating the range of $A$.
Small SVD: form $B = Q^\top A \in \mathbb{R}^{(k+p) \times n}$, take its standard SVD $B = \tilde{U} \Sigma V^\top$, and set $U = Q \tilde{U}$, giving $A \approx U \Sigma V^\top$.

Error Bound

The expected error satisfies

$$\mathbb{E} \|A - Q Q^\top A\|_2 \leq \left(1 + \frac{4 \sqrt{k+p}}{p-1} \sqrt{\min(m,n)}\right) \sigma_{k+1}(A).$$

The smaller $\sigma_{k+1}$ (the faster the spectral decay) the better the approximation. Adding $q$ power iterations $Y = (AA^\top)^q A \Omega$ pushes the residual factor to $\sigma_{k+1}^{2q+1}$, dramatically improving low-rank accuracy.

When to Use It

Extracting the top 50-500 singular values of $10^5 \times 10^5$ matrices, low-rank approximation of image and recommendation matrices, large-scale PCA, data compression, and latent semantic analysis. sangi's randomized_svd implements this method.

Economy and Truncated SVD

For $m \geq n$, the full SVD has $U \in \mathbb{R}^{m \times m}$, but the last $m - n$ columns correspond to zero singular values and merely span the null space. The economy (thin) SVD discards them:

$$A = U_n \Sigma_n V^\top, \quad U_n \in \mathbb{R}^{m \times n}, \, \Sigma_n \in \mathbb{R}^{n \times n}, \, V \in \mathbb{R}^{n \times n}.$$

The full $U$ costs $O(m^2)$ memory; the thin form needs only $O(mn)$ — essential when $m \gg n$.

Truncated SVD and Eckart-Young

Keeping the top $k$ singular triplets gives the truncated SVD

$$A_k = \sum_{i=1}^{k} \sigma_i \mathbf{u}_i \mathbf{v}_i^\top.$$

The Eckart-Young theorem (1936) states that $A_k$ is the best rank-$k$ approximation in both Frobenius and spectral norms:

$$\min_{\mathrm{rank}(B) \leq k} \|A - B\|_F = \sqrt{\sigma_{k+1}^2 + \cdots + \sigma_r^2}, \quad \min_{\mathrm{rank}(B) \leq k} \|A - B\|_2 = \sigma_{k+1}.$$

This is the theoretical basis for PCA, image compression, recommender systems, and latent semantic analysis.

Applications

Rank-Deficient Least Squares

When $A$ is rank-deficient or ill-conditioned, the SVD constructs the Moore-Penrose pseudoinverse:

$$A^+ = V \Sigma^+ U^\top, \quad \Sigma^+_{ii} = \begin{cases} 1/\sigma_i & \sigma_i > \tau \\ 0 & \sigma_i \leq \tau, \end{cases}$$

where $\tau$ is the numerical rank threshold (typically $\tau = \mathrm{eps} \cdot \sigma_1 \cdot \max(m,n)$). The minimum-norm least-squares solution is $\mathbf{x} = A^+ \mathbf{b}$. sangi's svd_solve exposes this threshold through its rcond parameter.

Principal Component Analysis (PCA)

For a centered data matrix $X \in \mathbb{R}^{n \times d}$ ($n$ samples, $d$ features), PCA is the eigenvalue decomposition of the covariance $C = X^\top X / n$. Computing the SVD $X = U \Sigma V^\top$ directly gives the principal axes as columns of $V$ and the variance contributions as $\sigma_i^2 / n$, without ever forming $C$. For ill-conditioned $X$, going through $C$ squares the condition number and loses accuracy, so SVD is the standard route.

Condition Number and Rank

The 2-norm condition number is $\kappa_2(A) = \sigma_1 / \sigma_n$ (when $\sigma_n > 0$), available only from SVD. Numerical rank is the count of singular values above a threshold $\tau$, providing a robust effective-rank measure on noisy data.

Low-Rank Approximation

Image compression, recommendation matrix completion, latent semantic analysis, graph embedding, and data visualization all reduce large matrices to $A_k = \sum_{i=1}^{k} \sigma_i \mathbf{u}_i \mathbf{v}_i^\top$, justified by Eckart-Young. For large-scale problems, combining with randomized SVD is standard practice.

sangi's Choices

Function / Scenario	Method	Notes
`svd_decomposition` dense default	Golub-Reinsch with Wilkinson/zero-shift hybrid	Golub-Reinsch through MKL `dgesvd`
High relative accuracy required	Jacobi SVD	Ill-conditioned or ill-graded matrices
Large dense $n$	Divide-and-conquer (LAPACK `dgesdd`)	Beats Golub-Reinsch from $n \gtrsim 1000$ (planned dispatch)
Low-rank approximation ($k \ll \min(m,n)$)	`randomized_svd`	Power iteration $q = 1, 2$ for accuracy boost
`svd_solve`	Golub-Reinsch with `rcond` damping	Stable pseudoinverse on ill-conditioned $A$

References

Golub, G. H. and Reinsch, C. (1970). "Singular value decomposition and least squares solutions". Numer. Math., 14, 403–420.
Eckart, C. and Young, G. (1936). "The approximation of one matrix by another of lower rank". Psychometrika, 1, 211–218.
Demmel, J. and Kahan, W. (1990). "Accurate singular values of bidiagonal matrices". SIAM J. Sci. Stat. Comput., 11(5), 873–912.
Demmel, J. and Veselić, K. (1992). "Jacobi's method is more accurate than QR". SIAM J. Matrix Anal. Appl., 13(4), 1204–1245.
Halko, N., Martinsson, P.-G., and Tropp, J. A. (2011). "Finding structure with randomness: probabilistic algorithms for constructing approximate matrix decompositions". SIAM Review, 53(2), 217–288.
Trefethen, L. N. and Bau, D. (1997). Numerical Linear Algebra, SIAM. Lectures 4–5, 31–32.