Chapter 3 · Part 2
The forward process
We've added noise by hand and we know exactly what noise is. Now let's make it systematic. The forward process is the fixed recipe that takes a clean image all the way to static — the same dial you scrubbed in Chapter 1, but now we'll see the rule behind it.
Two ideas do all the work. First, noise is added on a schedule: a little at the start, more later, following a curve that's decided in advance and never learned. Second — and this is the magic trick — you don't have to walk the schedule step by step. There's a closed-form shortcut that drops you at any timestep in one calculation.
Scroll to move along the schedule and watch the image jump straight to that step.
At t = 0 the schedule value ᾱ is 1: all signal, no noise. The image is untouched.
A schedule, not a free-for-all
Think of the forward process as a chain of T tiny steps (often T = 1000). Each
step takes the previous image and mixes in a small amount of fresh Gaussian noise,
governed by a number βₜ — the variance schedule. Early steps use a tiny βₜ
(barely any noise); later steps use larger ones. Because each step depends only on
the one before it, the whole thing is a Markov chain.
If you had to run all 1000 steps every time you wanted a noisy image, training would
crawl. So we define a running product of the schedule, written ᾱₜ ("alpha-bar"),
that summarizes all the noise added up to step t. It starts at 1 and slides down
to 0 — exactly the teal curve in the visual.
The shortcut that makes it practical
With ᾱₜ in hand, the entire chain collapses into a single equation. Any noisy image
xₜ is just a weighted blend of the original image and one patch of noise:
This shortcut is the reason diffusion training is feasible at all:
import numpy as np
# Cosine schedule: ᾱ slides from ~1 down to ~0 across T steps.
T = 1000
t_norm = np.linspace(0, 1, T)
alpha_bar = np.cos(t_norm * np.pi / 2) ** 2
def q_sample(x0, t):
"""Noise x0 straight to timestep t — no loop over earlier steps."""
a = alpha_bar[t]
eps = np.random.randn(*x0.shape)
xt = np.sqrt(a) * x0 + np.sqrt(1 - a) * eps
return xt, eps # eps is the training targetDuring training we pick a random timestep for each image, jump straight to it
with q_sample, and ask the network to predict the eps we used. No simulation,
no waiting — just one blend per training example.
Where we're headed
The forward process is now fully pinned down: a fixed schedule, a closed-form jump,
and — every single time — a known noise patch ε that produced the result. That ε
is a free, exact answer key.
Next we put it to work: training a network to look at a noisy xₜ and predict the
noise — the one piece of learning this whole system needs.