How instant email-availability checks work
·14 min read
How Gmail knows your email is taken, instantly
You type an email on the Gmail sign-up page, reach for Tab, and — before your finger lifts — the form says already taken. It feels like Gmail already knew.
It didn't. There's no magic; there's a pipeline. Six small things happen between your last keystroke and that red text. Each one is cheap. Each one can answer on its own and exit early. The slowest of them — a real read against the database — almost never runs while you're typing.
We'll walk it stage by stage, with a small interactive at every step.
It's not one request per keystroke.#
The first thing to notice is the part you can't see: the silence. Each character you type doesn't fire a check. Instead the page waits about 300 ms after your last keystroke, and only then sends one request — for the entire address, all at once.
The technique has a dull name (debounce) and a load-bearing job. Without it, a ten-character email would mean ten round-trips to a Google data centre, nine of which the server would happily answer about a string nobody finished typing yet. With it, the back-end gets one well-formed question per pause.
Tap + keystroke mid-wait and watch the timer reset to zero. Every new key throws away the previous wait. The request only fires when you actually stop. The rest of the article describes what happens in those ~300 ms — but nothing in the rest of the article runs until the timer hits zero.
Your email is not the email the server checks.#
Once the request arrives, the very first thing the server does is rewrite your address into a different one. J.Ohn.Doe+promo@gmail.com becomes johndoe@gmail.com. That second string — the canonical form — is what every layer below the Gaia front-end actually checks. The thing you typed is discarded; only its canonical version exists from this point on.
The rewrite is three deterministic steps applied in order. Step through the widget below to watch each rule fire on the messy address.
One canonical form means many possible spellings of your address all collapse to the same row. It's also why j.o.h.n.d.o.e@gmail.comcan't register if johndoe@gmail.com already exists. The thing you typed was discarded.
Most of the time, the answer is already in memory.#
With a canonical form in hand, the server now has to actually look it up. Before it touches anything as expensive as a database, it asks two cheaper places first.
The first is an in-process near-cache, sitting in RAM inside the Gaia front-end shard that handles your request. If the same canonical email was asked about in the last few seconds on the same shard — popular names get typed constantly — the answer is right there in process memory. No further work.
The second is a distributed cache that spans many front-end shards. A warm answer from one shard can serve another. Pick a locality below and step through the two lookups; you'll see exactly where each kind of request exits.
The filter can lie about yes, never about no.#
When both caches miss, the server still doesn't go to the database. It asks one more cheap thing first: a Bloom filter. A row of bits, all zero. When an account is created, a few hash functions of the canonical email each pick a bit, and those bits are flipped on. To check an email, hash it the same way and look at those bits: if any one of them is 0, the address is definitely not in the set. If they're all 1, it might be — and the server has to actually look.
Step through a handful of inserts and queries below. Two of the queries find a match; one of those matches is real, one is a coincidence — a false positive.
A no from the filter is the only stop sign on the whole pipeline. It saves a database round-trip every time it fires, and most checks fire it: most of the addresses people type while signing up are not, in fact, taken. A maybe pays the database. The asymmetry is the entire reason the filter is here.
Four answers to one question.#
Put those layers together and a single check has four possible exits, in increasing cost:
- Near-cache hit — a few microseconds. The fastest path.
- Distributed-cache hit — a millisecond or two. Still fast.
- Bloom filter says no — a couple of milliseconds, no database trip.
- Bloom filter says maybe— point-read on Spanner. Google's published target for Spanner point reads is under 5 ms at the median. The only path that actually talks to the authoritative store.
Four answers to one question. Only one of them ever asks the database.
And here's the part that's easy to miss: every one of those four answers is a hint, not a verdict. They're what the UI uses to colour the text red while you're typing. None of them actually decides whether the account gets created.
Submit — the check you didn't see#
When you actually click Sign up, a different thing happens. A database transaction tries to INSERTa new row keyed on your canonical email, against a column with a uniqueness constraint enforced by Spanner itself. If another person's transaction committed first, yours fails with a constraint violation and the server returns EMAIL_EXISTS— the official “someone else already owns this canonical email” signal, which is what the UI renders as already taken.
The pre-check (cache, Bloom, point-read) is a UX hint. It is allowed to be wrong, stale, or racing someone else. Drag the two sliders below to set when each user submits — the only thing that decides the winner is the gap to commit.
Drag fast or slow — the verdict is the same shape. Now watch the canonical race play out at the wire.
The winner is whichever INSERT Spanner committed first — not whichever user clicked Submit first in their browser. Both clients saw available while typing; only the database's serial ordering at commit decides who actually got the address.
Other services don't normalise. That's the attack surface.#
The reason the rewrite step in §3 mattered isn't pedagogical. It's economic. The rest of the web doesn't do it. Most services treat your typed string as the address, period. Gmail stores one canonical row; Netflix stores whatever you typed.
In 2018, an engineer described a scam that exploits exactly this gap. An attacker signs up for Netflix using a dotted variant of your Gmail — say j.ohn.doe@gmail.com — with a stolen card. Netflix treats the dotted version as a brand-new customer; the card fails; Netflix sends you — or what Gmail thinks is you — a polite email about the payment problem. You, confused, helpfully add a real card.
Two parsers, one inbox. Same address, two accounts. Both are technically right by their own rules. The damage happens in the gap between their rules. That gap is the attack surface.
If you're storing emails in something you operate: normalise on write, put your uniqueness constraint on the normalised column, and keep the raw version only for display. Make your service one of the ones that closed the gap.
Two questions, two versions of your email.#
Everything you saw while typing is a hint. The one thing the database is asked at submit time is the verdict. Two different questions, on two different versions of your email — neither of which is exactly the string you typed.
The fast one is for the UI. The slow one is for the truth.