Chapter 1

Digital Image Fundamentals

First Steps in Computer Vision

What Is a Digital Image?

A digital image is a collection of pixels (picture elements) arranged in a two-dimensional grid. Each pixel has a position $(x, y)$ and a brightness value (or color value).

I(x,y) x y (0,0) Image Representation • Width W × Height H pixels • Each pixel: I(x, y) • Origin: top-left (0, 0) • x: rightward, y: downward Grayscale: I(x,y) ∈ [0, 255] 0 = black, 255 = white
Figure 1. Pixel coordinates and the image grid structure

Why the y-axis points downward

In mathematical coordinate systems, the origin is at the bottom-left and the y-axis points upward. In image processing, however, the origin is at the top-left and the y-axis points downward. This convention originates from the CRT display era, when the electron beam performed a raster scan across the screen from left to right and top to bottom. In memory, image data is stored starting from the top-left pixel, so increasing row numbers correspond to lower positions on the screen. This convention has been inherited by all modern image formats, displays, and image processing libraries (OpenCV, PIL, etc.).

Resolution and Bit Depth

Resolution

Resolution refers to the number of pixels in an image. It is expressed as $W \times H$; for example, 1920×1080 is approximately 2 million pixels. The terms "4K" and "8K" used for TVs and displays derive from the horizontal pixel count: 4K is 3840×2160 (approximately 8.3 million pixels) and 8K is 7680×4320 (approximately 33 million pixels).

Note: Consumer TVs and displays use "4K UHD" (3840×2160), which differs from the cinema industry's "DCI 4K" (4096×2160).

Bit Depth

The number of bits used to represent the brightness value of each pixel:

  • 1-bit: binary image (black and white)
  • 8-bit: 256 levels (0 to 255). For $n$ bits, values range from $0$ to $2^n - 1$
  • 16-bit: 65,536 levels
  • 24-bit: 8 bits per RGB channel (full color)
1-bit 2 levels 4-bit 16 levels 8-bit 256 levels 24-bit ~16.77 million colors Number of levels = 2^n (n: number of bits) 8-bit grayscale: I(x, y) ∈ {0, 1, ..., 255} 24-bit color: 8 bits each for R, G, B → 2^24 ≈ 16.77 million colors
Figure 2. Bit depth and number of levels

Number of Channels

Number of Channels

The number of values each pixel holds.

Channels Name Pixel Values Use Cases
1 Grayscale 1 brightness value B&W photos, medical imaging, edge detection preprocessing
3 Color (RGB) 3 values: red, green, blue General color images
4 RGBA 4 values: red, green, blue, alpha PNG, icons, UI design

The more channels a pixel has, the more data it requires. In image processing, a common technique is to convert a color image to grayscale (1 channel) to reduce the computational cost.

Image Memory Size

The data size of an image can be calculated as follows:

$$\text{Size (bytes)} = W \times H \times \frac{\text{bit depth}}{8} \times \text{number of channels}$$

Example: 1920×1080 Full HD image

  • Grayscale (8-bit): $1920 \times 1080 \times 1 = 2,073,600$ bytes ≈ 2MB
  • Color (24-bit RGB): $1920 \times 1080 \times 3 = 6,220,800$ bytes ≈ 6MB

Mathematical Representation of Images

An image can be represented mathematically as follows:

Grayscale Image

Represented as a matrix $\mathbf{I} \in \mathbb{R}^{H \times W}$, where each element $I(x,y)$ is the brightness value of a pixel.

$$\mathbf{I} = \begin{pmatrix} I(0,0) & I(1,0) & \cdots & I(W-1,0) \\ I(0,1) & I(1,1) & \cdots & I(W-1,1) \\ \vdots & \vdots & \ddots & \vdots \\ I(0,H-1) & I(1,H-1) & \cdots & I(W-1,H-1) \end{pmatrix}$$

Note: Here we use the mathematical convention $I(x, y)$, but in NumPy/OpenCV, pixels are accessed as img[y, x] (row, column).

Color Image

A 3D tensor $\mathbf{I} \in \mathbb{R}^{H \times W \times C}$ (C: number of channels, typically 3)

$$I(x, y) = \begin{pmatrix} R(x,y) \\ G(x,y) \\ B(x,y) \end{pmatrix}$$
Grayscale H × W × 1ch Grayscale Color (RGB) H × W × 3ch B G R * With transparency: RGBA (4ch)
Figure 3. Data structures of grayscale and color images

Why does OpenCV use BGR ordering?

Common image formats (PNG, JPEG, etc.) use RGB ordering, but OpenCV uses BGR ordering. This is not due to technical merit but rather historical circumstance. When Intel began developing OpenCV in 1999, the dominant camera API on Windows (Video for Windows) and the bitmap format (BMP/DIB) returned data in BGR order, so OpenCV followed suit. The reason Windows BMP uses BGR is that storing a 32-bit RGBA value in memory on Intel x86 little-endian systems results in the byte order B, G, R, A. Even after OpenCV became cross-platform, BGR ordering has been retained for backward compatibility with existing code.

When displaying an image loaded with OpenCV in other libraries (PIL, matplotlib, etc.), you need to convert it using cv2.cvtColor(img, cv2.COLOR_BGR2RGB).

Code Example (Python)

import numpy as np
import cv2

# Load an image
img_color = cv2.imread('image.jpg')        # Color (H, W, 3)
img_gray = cv2.imread('image.jpg', cv2.IMREAD_GRAYSCALE)  # Grayscale (H, W)

# Image properties
print(f"Shape: {img_color.shape}")         # (height, width, channels)
print(f"Dtype: {img_color.dtype}")         # uint8 (0-255)
print(f"Size: {img_color.size} bytes")     # Total bytes

# Accessing a pixel
pixel = img_color[100, 200]                # (B, G, R) values
gray_value = img_gray[100, 200]            # Brightness value

# Creating new images
blank = np.zeros((480, 640, 3), dtype=np.uint8)  # Black image
white = np.ones((480, 640), dtype=np.uint8) * 255  # White image

Summary

  • A digital image is a 2D array of pixels
  • Resolution defines the number of pixels; bit depth determines the number of intensity levels
  • Grayscale images are 2D matrices; color images are 3D tensors
  • Matrix operations form the foundation of image processing