Computer Vision: Introduction

Image and Camera Fundamentals

Overview

Image Fundamentals Pixels, Color Spaces RGB, HSV, Grayscale Camera Model Pinhole Camera Focal Length, Field of View Coordinate Transforms Homogeneous Coordinates Rotation, Translation, Scale Projective Transform 3D to 2D Projection Perspective Projection

The Introduction covers the foundations for studying computer vision: the structure of digital images, mathematical camera models, and the basics of coordinate transforms.

Learning Objectives

  • Understand the structure of digital images and color spaces
  • Understand the pinhole camera model
  • Master homogeneous coordinates and transformation matrices
  • Understand projection from 3D to 2D

Table of Contents

  1. Chapter 1 Digital Image Fundamentals

    Pixels, resolution, bit depth

  2. Chapter 2 Color Spaces

    RGB, HSV, YUV, color conversion

  3. Chapter 3 Pinhole Camera Model

    Focal length, field of view, perspective projection

  4. Chapter 4 Homogeneous Coordinates and Transformation Matrices

    Unified representation of translation, rotation, and scaling

  5. Chapter 5 Camera Matrix

    Intrinsic parameters, extrinsic parameters

  6. Chapter 6 Exercises

    Summary exercises for the Introduction

Prerequisites

  • Basics of linear algebra (matrices, vectors)
  • High school mathematics (trigonometric functions)

Key Concepts

Pinhole Camera Model

Projection from a 3D point $(X, Y, Z)$ to image coordinates $(u, v)$:

$$\begin{pmatrix} u \\ v \\ 1 \end{pmatrix} \sim K \begin{pmatrix} X \\ Y \\ Z \end{pmatrix}, \quad K = \begin{pmatrix} f_x & 0 & c_x \\ 0 & f_y & c_y \\ 0 & 0 & 1 \end{pmatrix}$$

Homogeneous Coordinates

A 2D point $(x, y)$ is represented as $(x, y, 1)$, and a 3D point $(X, Y, Z)$ as $(X, Y, Z, 1)$.

This allows translation to be expressed as matrix multiplication.

Camera Matrix

$$P = K[R | t]$$

$K$: intrinsic parameters, $R$: rotation matrix, $t$: translation vector