Chapter 2 · Part 1
What noise really is
Last chapter we stirred "random noise" into a photo until it became static. But that word random is doing a lot of quiet work. If you asked a diffusion model to add noise and it splattered any old values onto your image, nothing would work. The noise has a precise, well-defined shape — and that shape is the reason the whole process can be reversed.
Almost all of the noise in these models is Gaussian noise: every value is an independent draw from a bell curve centered at zero. Most draws are tiny nudges; big jumps are rare. Scroll to turn up the amount of noise and watch the shape hold.
Each pixel of static is one random draw from a bell curve — most values land near zero.
Two numbers describe the whole thing
A Gaussian (or normal) distribution is pinned down by just two numbers:
- Mean (μ) — where the bell is centered. For diffusion noise it's 0: the noise is equally likely to brighten or darken a pixel, so on average it adds nothing.
- Standard deviation (σ) — how wide the bell is, i.e. how strong the noise. Small σ is a gentle haze; large σ is full static. About 68% of all draws land within one σ of the mean, and 95% within two — no matter how you scale it.
That fixed shape is exactly why the forward process from Chapter 1 is so
controllable: when we say "add noise at timestep t," we mean "draw from a
Gaussian with this specific σ." Nothing is left to chance about how random it is.
Sampling: turning a formula into pixels
"Drawing from a distribution" is called sampling. A computer can sample a standard Gaussian (mean 0, σ 1) directly, then shift and scale it to whatever mean and σ you want. To noise an image, you sample one value per pixel per channel.
import numpy as np
# One independent N(0, 1) sample per pixel & channel — a field of static.
eps = np.random.randn(height, width, 3)
# Want mean μ and standard deviation σ? Shift and scale:
# x = μ + σ * eps -> x ~ N(μ, σ²)
sigma = 0.5
noise = 0.0 + sigma * eps
print(noise.mean(), noise.std()) # ≈ 0.0, ≈ 0.5Because every sample is independent, neighboring noise pixels have nothing to do with each other — that's why it looks like grain with no pattern. And because they all come from the same known distribution, we can always reason about how much was added.
Why this shape, and why it matters
Gaussians show up everywhere in nature and math (add up many small random effects and you tend to get one), and they have friendly properties: add two Gaussians and you get another Gaussian; scale one and it stays Gaussian. That's what lets the forward process stack noise step after step and still describe the result with a single clean formula — the shortcut you saw in Chapter 1.
Now that we know precisely what we're adding, we can be precise about how we add it over time. Next: the forward process — the fixed schedule that takes an image from crisp to pure static, one controlled step at a time.