WebSockets, SSE, and long-polling: how real-time web works
·9 min read
The browser stopped asking
You open a Google Doc link a teammate sent you. In the top-right a tinted avatar appears — Jordan is in. A colored cursor labeled Jordan shows up inside the document. Jordan starts typing. The characters appear on your screen as they're typed — not on refresh, not after a click, not when you tab back in. They're just there.
The moment is so mundane we forget that the web wasn't born able to do this. Rewind: what had to change about the web for Jordan's cursor to appear on your screen?
HTTP's only move is request-and-reply#
The web's original protocol has one shape: the browser asks, the server answers, the connection closes. That's it. A request leaves your machine, a response comes back, and whatever socket they travelled on is recycled or discarded. There is no protocol room for the server to say anything the browser didn't ask for.
This isn't a style choice; it's baked in. HTTP/1.1 (RFC 9112 §9.2) doesn't carry a request ID on the wire. Responses are matched to requests by arrival order. If the server ever spoke out of turn, the browser would have no way to know which request — if any — it was replying to.
GET /doc/42 HTTP/1.1
Host: docs.example
Accept: text/html
HTTP/1.1 200 OK
Content-Type: text/html
Content-Length: 1432
<!doctype html>…The web's cell wall is that the client always speaks first. Everything that follows — every mechanism that makes Jordan's cursor appear on your screen — is a way to live inside that wall. None of them let the server initiate. They all turn the browser into something else: a listener, not an asker.
Just keep asking.#
The most obvious answer is also the crudest. Fire a request every second and see if anything's new.
setInterval(async () => {
const res = await fetch('/doc/42/updates?since=' + lastSeen);
const updates = await res.json();
if (updates.length) applyUpdates(updates);
}, 1000);
// 60 requests / minute. Each carries ~500–2000 B of HTTP headers.
// At 1s intervals, that's ~120 KB / minute just to say: "anything new?"
// Most of the time the answer is: no.This polling loop honors HTTP's rule to the letter. It also pays for it. A full request-response round-trip isn't free: RFC 6202 notes that “every long poll request and long poll response is a complete HTTP message and thus contains a full set of HTTP headers”— the same is obviously true of short polling, where the headers show up 60 times a minute whether or not there's anything to say (RFC 6202 §2.2).
Worse, your typing latency is bounded below by your polling interval. Jordan hits a key; you don't see it until your next tick. Make the interval shorter? You burn more bytes per minute. Make it longer? You watch letters arrive in clumps.
What if you asked once, and the server waited?#
Here's the clever move. The client still asks — but the server doesn't reply until it has news. The request goes out, the TCP socket stays open, and the response sits there, a promise dangling on both sides of the wire. When something happens, the server writes the response and closes. The client reads it and immediately opens another.
Ably calls this shape “bending HTTP slightly out of shape”. The format is preserved — still one ask, one answer — but a long-polling client fires and listens, sometimes for tens of seconds, before its single reply arrives.
Alex Russell coined Comet — long polling as a family name — in March 2006. Google Docs shipped on long polling for years: look inside Google's Closure Library and you'll still find goog.net.BrowserChannel, long polling over XHR with forever-iframe streaming as a fallback. Google never published it as an API, which is its own kind of tell — attribution comes from ex-Googlers and Joseph Gentle's node re-implementation.
Polling burns most of its bytes saying nothing. The natural next move: don't hang up until there is news.
Long polling earns the latency back, but every reply still pays a full HTTP round-trip of overhead. The next move skips that.
Three frames cost 562 bytes; polling spent 7.5 KB asking. Refusing to finish the question is the move every later protocol inherits — long polling did it inside HTTP, WebSocket did it by ending HTTP mid-socket.
The line that ends HTTP mid-socket#
WebSocket's move is to get HTTP to politely step aside. The client opens a regular HTTP request with three special headers that ask it to stop being HTTP. The server replies with a status code that was, until WebSocket came along, vanishingly rare — 101 Switching Protocols — plus one header that proves it understood.
GET /chat HTTP/1.1
Host: docs.example
Upgrade: websocket
Connection: Upgrade
Sec-WebSocket-Key: dGhlIHNhbXBsZSBub25jZQ==
Sec-WebSocket-Version: 13HTTP/1.1 101 Switching Protocols
Upgrade: websocket
Connection: Upgrade
Sec-WebSocket-Accept: s3pPLMBiTxaQ9kYGzzhZRbK+xOo=The proof header is a hash. The client sent a random key in the request; the server has to send one specific derivation of it back. The next widget runs that derivation in your browser against a fresh random key every time you press the button.
The widget walks the spec sample. Now compute it on your own machine — the next widget runs that derivation in your browser against a fresh random key.
The widget runs the computation through your browser's Web Crypto API, so the Sec-WebSocket-Accept you see is the real SHA-1 your machine just computed. Nothing in the reveal is faked.
What you just watched was a SHA-1 over your random key with one very strange suffix glued on. That suffix is a literal string, written into the spec itself, identical for every server on earth:
258EAFA5-E914-47DA-95CA-C5AB0DC85B11The whole algorithm fits on one line:
Sec-WebSocket-Accept = base64( sha1( Sec-WebSocket-Key + "258EAFA5-E914-47DA-95CA-C5AB0DC85B11" ) )The GUID is there so an unaware server can't accidentally reply “yes, we're speaking WebSocket” without having read the spec — the only way to produce the right accept header is to know exactly what string to glue on (RFC 6455 §1.3).
After that reply is written and read, the same TCP socket is no longer speaking HTTP. It speaks WebSocket frames — 2 bytes of header for most messages, up to 14 at the far end. Either side can send, anytime, as long as both ends want the connection open. The client spoke first, exactly once, and then the conversation became something else.
One direction, with a safety net#
WebSocket isn't the only way out. Server-Sent Events (SSE) are the one-way cousin: a regular HTTP response with Content-Type: text/event-stream that the server never closes. The browser hands each data: …\n\n chunk to onmessage as it arrives. An SSE handler can be eight lines of Node.
app.get('/stream', (req, res) => {
res.setHeader('Content-Type', 'text/event-stream');
res.setHeader('Cache-Control', 'no-cache');
res.flushHeaders();
const send = (ev) => res.write(`id: ${ev.id}\ndata: ${JSON.stringify(ev)}\n\n`);
const off = bus.on('update', send);
req.on('close', off);
});If WebSocket is “HTTP steps aside,” SSE is “HTTP just never stops.” No Upgrade, no magic GUID, no frame opcodes. Just one very patient HTTP response that keeps writing.
Then why not use WebSocket for everything? Because SSE ships the one feature WebSocket doesn't: automatic reconnect, with state recovery.The browser remembers the last event's id: field, and when the stream drops, it reopens the connection with a Last-Event-ID header so the server can resume from that cursor (WHATWG HTML §9.2.3). WebSocket has none of this in the spec. The socket closes, you start over — from whatever ws:// URL, with whatever auth, and whatever resume protocol you decided to build.
The dropout has two dials: when it starts, and how long it lasts. Hold the timing fixed and stretch the gap.
Both dials compose into the shape §6 names: every long-poll and WebSocket client racing back the moment the gateway heals, the same instant. SSE absorbs that storm on the protocol; the others invent it.
The cost moved. It didn't vanish.#
Three failure modes show up the moment you ship any of these to production. None are in the tutorials. The matrix below is one row per failure mode, one column per protocol — tap a row to read what each cell means. The pattern that falls out is the point of this section.
Each row tells a different story. Proxies and corporate networks silently break the WebSocket Upgrade, which is why Socket.IO still opens every connection on long polling first — the fallback isn't legacy; it's 2026 insurance. When a gateway blips, every long-poll and WebSocket client races back at the same instant: Discord's reconnect stampede took 17.5 seconds on a ring-lookup until they cached, and Slack built Flannel for the same shape. SSE alone has automatic resume on the protocol; the others need it written by hand. And on the third row, the hard part of real-time isn't holding the connection — it's the O(N) write on every event. Phoenix held two million idle sockets on one box; Discord's publish to a single 30,000-member guild still took 900 ms – 2.1 s before they parallelized fanout with Manifold.
None of this kills the idea. It just means real-time is a system, not a primitive. The socket is the easy part.
So when Jordan's cursor shows up on your screen…#
…which of these is actually doing the work? Probably a WebSocket today. Was long polling, via BrowserChannel, for most of the last decade. Google doesn't publish which, and the honest answer is that most production systems have some layer of every mechanism in this post somewhere — polling for a heartbeat, long polling as a Socket.IO fallback, a WebSocket for the hot path, SSE for the log tail, a cache in front of it all so a reconnect storm doesn't take the site down.
What they share is the move at the center of every one of them. The server never learned to speak first. The browser just stopped hanging up.A request goes out; it doesn't come back until it has something to say; it opens another the moment it does; or the socket simply never closes.
Every “real-time” web app on your laptop right now is variations on a browser that refuses to finish its sentence.
The title lied slightly: the browser didn't stop asking. It stopped ending the question.