Why does the y-axis point downward in digital images?

This convention dates back to the CRT display era, when the electron beam scanned the screen from left to right and top to bottom in a raster scan pattern. In memory, image data is also stored starting from the top-left pixel, so increasing row numbers correspond to lower positions on the screen. This convention has been carried over to all modern image formats and libraries.

Chapter 1

Digital Image Fundamentals

Q: Why does OpenCV use BGR ordering?

When OpenCV was developed at Intel in 1999, the standard Windows camera APIs and BMP format returned data in BGR order, so OpenCV adopted the same convention. On Intel x86 little-endian systems, storing a 32-bit RGBA value in memory results in the byte order B, G, R, A. For backward compatibility, OpenCV still uses BGR today.

First Steps in Computer Vision

What Is a Digital Image?

A digital image is a collection of pixels (picture elements) arranged in a two-dimensional grid. Each pixel has a position $(x, y)$ and a brightness value (or color value).

Figure 1. Pixel coordinates and the image grid structure

Why the y-axis points downward

In mathematical coordinate systems, the origin is at the bottom-left and the y-axis points upward. In image processing, however, the origin is at the top-left and the y-axis points downward. This convention originates from the CRT display era, when the electron beam performed a raster scan across the screen from left to right and top to bottom. In memory, image data is stored starting from the top-left pixel, so increasing row numbers correspond to lower positions on the screen. This convention has been inherited by all modern image formats, displays, and image processing libraries (OpenCV, PIL, etc.).

Resolution and Bit Depth

Resolution

Resolution refers to the number of pixels in an image. It is expressed as $W \times H$; for example, 1920×1080 is approximately 2 million pixels. The terms "4K" and "8K" used for TVs and displays derive from the horizontal pixel count: 4K is 3840×2160 (approximately 8.3 million pixels) and 8K is 7680×4320 (approximately 33 million pixels).

Note: Consumer TVs and displays use "4K UHD" (3840×2160), which differs from the cinema industry's "DCI 4K" (4096×2160).

Bit Depth

The number of bits used to represent the brightness value of each pixel:

1-bit: binary image (black and white)
8-bit: 256 levels (0 to 255). For $n$ bits, values range from $0$ to $2^n - 1$
16-bit: 65,536 levels
24-bit: 8 bits per RGB channel (full color)

Figure 2. Bit depth and number of levels

Number of Channels

Number of Channels

The number of values each pixel holds.

Channels	Name	Pixel Values	Use Cases
1	Grayscale	1 brightness value	B&W photos, medical imaging, edge detection preprocessing
3	Color (RGB)	3 values: red, green, blue	General color images
4	RGBA	4 values: red, green, blue, alpha	PNG, icons, UI design

The more channels a pixel has, the more data it requires. In image processing, a common technique is to convert a color image to grayscale (1 channel) to reduce the computational cost.

Image Memory Size

The data size of an image can be calculated as follows:

$$\text{Size (bytes)} = W \times H \times \frac{\text{bit depth}}{8} \times \text{number of channels}$$

Example: 1920×1080 Full HD image

Grayscale (8-bit): $1920 \times 1080 \times 1 = 2,073,600$ bytes ≈ 2MB
Color (24-bit RGB): $1920 \times 1080 \times 3 = 6,220,800$ bytes ≈ 6MB

Mathematical Representation of Images

An image can be represented mathematically as follows:

Grayscale Image

Represented as a matrix $\mathbf{I} \in \mathbb{R}^{H \times W}$, where each element $I(x,y)$ is the brightness value of a pixel.

$$\mathbf{I} = \begin{pmatrix} I(0,0) & I(1,0) & \cdots & I(W-1,0) \\ I(0,1) & I(1,1) & \cdots & I(W-1,1) \\ \vdots & \vdots & \ddots & \vdots \\ I(0,H-1) & I(1,H-1) & \cdots & I(W-1,H-1) \end{pmatrix}$$

Note: Here we use the mathematical convention $I(x, y)$, but in NumPy/OpenCV, pixels are accessed as img[y, x] (row, column).

Color Image

A 3D tensor $\mathbf{I} \in \mathbb{R}^{H \times W \times C}$ (C: number of channels, typically 3)

$$I(x, y) = \begin{pmatrix} R(x,y) \\ G(x,y) \\ B(x,y) \end{pmatrix}$$

Figure 3. Data structures of grayscale and color images

Why does OpenCV use BGR ordering?

Common image formats (PNG, JPEG, etc.) use RGB ordering, but OpenCV uses BGR ordering. This is not due to technical merit but rather historical circumstance. When Intel began developing OpenCV in 1999, the dominant camera API on Windows (Video for Windows) and the bitmap format (BMP/DIB) returned data in BGR order, so OpenCV followed suit. The reason Windows BMP uses BGR is that storing a 32-bit RGBA value in memory on Intel x86 little-endian systems results in the byte order B, G, R, A. Even after OpenCV became cross-platform, BGR ordering has been retained for backward compatibility with existing code.

When displaying an image loaded with OpenCV in other libraries (PIL, matplotlib, etc.), you need to convert it using cv2.cvtColor(img, cv2.COLOR_BGR2RGB).

Code Example (Python)

import numpy as np
import cv2

# Load an image
img_color = cv2.imread('image.jpg')        # Color (H, W, 3)
img_gray = cv2.imread('image.jpg', cv2.IMREAD_GRAYSCALE)  # Grayscale (H, W)

# Image properties
print(f"Shape: {img_color.shape}")         # (height, width, channels)
print(f"Dtype: {img_color.dtype}")         # uint8 (0-255)
print(f"Size: {img_color.size} bytes")     # Total bytes

# Accessing a pixel
pixel = img_color[100, 200]                # (B, G, R) values
gray_value = img_gray[100, 200]            # Brightness value

# Creating new images
blank = np.zeros((480, 640, 3), dtype=np.uint8)  # Black image
white = np.ones((480, 640), dtype=np.uint8) * 255  # White image

Summary

A digital image is a 2D array of pixels
Resolution defines the number of pixels; bit depth determines the number of intensity levels
Grayscale images are 2D matrices; color images are 3D tensors
Matrix operations form the foundation of image processing