Computer Vision: Introduction
Image and Camera Fundamentals
Overview
The Introduction covers the foundations for studying computer vision: the structure of digital images, mathematical camera models, and the basics of coordinate transforms.
Learning Objectives
- Understand the structure of digital images and color spaces
- Understand the pinhole camera model
- Master homogeneous coordinates and transformation matrices
- Understand projection from 3D to 2D
Table of Contents
-
Chapter 1
Digital Image Fundamentals
Pixels, resolution, bit depth
-
Chapter 2
Color Spaces
RGB, HSV, YUV, color conversion
-
Chapter 3
Pinhole Camera Model
Focal length, field of view, perspective projection
-
Chapter 4
Homogeneous Coordinates and Transformation Matrices
Unified representation of translation, rotation, and scaling
-
Chapter 5
Camera Matrix
Intrinsic parameters, extrinsic parameters
-
Chapter 6
Exercises
Summary exercises for the Introduction
Prerequisites
- Basics of linear algebra (matrices, vectors)
- High school mathematics (trigonometric functions)
Key Concepts
Pinhole Camera Model
Projection from a 3D point $(X, Y, Z)$ to image coordinates $(u, v)$:
$$\begin{pmatrix} u \\ v \\ 1 \end{pmatrix} \sim K \begin{pmatrix} X \\ Y \\ Z \end{pmatrix}, \quad K = \begin{pmatrix} f_x & 0 & c_x \\ 0 & f_y & c_y \\ 0 & 0 & 1 \end{pmatrix}$$Homogeneous Coordinates
A 2D point $(x, y)$ is represented as $(x, y, 1)$, and a 3D point $(X, Y, Z)$ as $(X, Y, Z, 1)$.
This allows translation to be expressed as matrix multiplication.
Camera Matrix
$$P = K[R | t]$$$K$: intrinsic parameters, $R$: rotation matrix, $t$: translation vector