Chapter 1 · Part 1
Pixels & the grid
Look at any photo on your screen and it seems smooth and continuous — soft edges, gentle shading. But a computer has no idea what a "face" or a "sky" is. To a machine, an image is something far more boring, and far more useful: a grid of numbers.
Scroll slowly through the image below. We'll zoom in until the smoothness breaks apart into the squares it was made of all along.
This looks like a smooth little picture.
An image is a grid of numbers
That's the whole idea of this first chapter. The picture you saw never really was continuous — it was always a grid of pixels, and each pixel stores a single brightness value:
- 0 is fully black.
- 255 is fully white.
- Numbers in between are greys —
128is a middle grey,220a light grey.
Why 0 to 255? That's one byte of storage per pixel — exactly 256 distinct values. We'll see in Chapter 2 what happens when we want colour instead of grey, but the principle never changes: pixels are numbers.
Reading a pixel by its address
Because the pixels sit in a grid, we can point to any one of them by its row and
column — just like a spreadsheet cell. The pixel in row r, column c is
written I[r][c].
In code, an image really is a 2D array, and reading a pixel is just indexing into it:
from PIL import Image
import numpy as np
img = np.array(Image.open("face.png").convert("L")) # "L" = grayscale
print(img.shape) # (16, 16) -> height, width
print(img[5, 5]) # 34 -> one pixel's brightness (0-255)
print(img.min(), img.max()) # 28 240 -> darkest and brightestNotice the order: img[row, column], or img[y, x]. Rows come first because
the array is stored top-to-bottom. It trips up almost everyone at least once.
Why this matters
Once an image is "just numbers," everything else becomes possible. We can do arithmetic on it, slide windows across it, and eventually feed it to a network that learns. Every later chapter — brightness, edges, convolution, neural networks — is an operation on this grid.
Next up: those numbers only described brightness. To capture colour, we'll need not one grid but three.