Chapter 3 · Part 2

People like you

Here's the idea that launched modern recommendations, and it's almost suspiciously simple: to guess what you'll like, find people who have liked the same things as you, and look at what else they liked. You don't need to understand anything about the items themselves — only the pattern of who liked what.

It's called collaborative filtering: the crowd filters the catalog for you, collaboratively, just by everyone leaving traces of what they enjoyed.

Scroll to find your taste-twins and let them predict your next favorite.

A grid of who liked what. You haven't seen the ✈️ travel post yet (the ?).

scroll

Two flavors of the same idea

  • User-based: find users similar to you, recommend what they liked. ("People like you also watched…")
  • Item-based: find items similar to ones you liked — where "similar" means liked by the same people. ("Because you watched X…") Item-based is what Amazon's classic "customers who bought this also bought" uses, and it's often more stable because items change taste less than people do.

Both rest on the same raw material: a giant, mostly-empty user–item matrix of who interacted with what. Recommending is filling in the blanks.

The magic and the limits

What's striking is that the model never knows a video is "about cooking." It only knows that rows 1 and 3 look like your row. Meaning falls out of the pattern of co-likes alone — which is powerful, but it has two cracks:

  • Sparsity: the matrix is mostly empty; most people have rated almost nothing.
  • Cold start: a brand-new user or item has no co-likes at all, so there's nothing to compare. (We'll tackle this in Chapter 5.)
cf.py — user-based collaborative filtering
def predict(user, item, ratings):
  neighbors = [v for v in users if ratings[v].get(item) is not None]
  neighbors.sort(key=lambda v: cosine(ratings[user], ratings[v]), reverse=True)

  top = neighbors[:K]
  num = sum(cosine(ratings[user], ratings[v]) * ratings[v][item] for v in top)
  den = sum(cosine(ratings[user], ratings[v]) for v in top)
  return num / den if den else None

Where we're headed

Comparing whole rows works, but it's slow and brittle on huge, sparse data. What if we could boil each user and each item down to a short list of numbers that captures their taste — and just compare those? That's the leap to taste vectors, next.