lainlog
back

WebSockets, SSE, and long-polling: how real-time web works

·9 min read

The browser stopped asking

You open a Google Doc link a teammate sent you. In the top-right a tinted avatar appears — Jordan is in. A colored cursor labeled Jordan shows up inside the document. Jordan starts typing. The characters appear on your screen as they're typed — not on refresh, not after a click, not when you tab back in. They're just there.

The moment is so mundane we forget that the web wasn't born able to do this. Rewind: what had to change about the web for Jordan's cursor to appear on your screen?

HTTP's only move is request-and-reply#

The web's original protocol has one shape: the browser asks, the server answers, the connection closes. That's it. A request leaves your machine, a response comes back, and whatever socket they travelled on is recycled or discarded. There is no protocol room for the server to say anything the browser didn't ask for.

This isn't a style choice; it's baked in. HTTP/1.1 (RFC 9112 §9.2) doesn't carry a request ID on the wire. Responses are matched to requests by arrival order. If the server ever spoke out of turn, the browser would have no way to know which request — if any — it was replying to.

the shape everything else lives insidehttp
GET /doc/42 HTTP/1.1
Host: docs.example
Accept: text/html

HTTP/1.1 200 OK
Content-Type: text/html
Content-Length: 1432

<!doctype html>…

The web's cell wall is that the client always speaks first. Everything that follows — every mechanism that makes Jordan's cursor appear on your screen — is a way to live inside that wall. None of them let the server initiate. They all turn the browser into something else: a listener, not an asker.

Just keep asking.#

The most obvious answer is also the crudest. Fire a request every second and see if anything's new.

polling.jsjavascript
setInterval(async () => {
  const res = await fetch('/doc/42/updates?since=' + lastSeen);
  const updates = await res.json();
  if (updates.length) applyUpdates(updates);
}, 1000);

// 60 requests / minute. Each carries ~500–2000 B of HTTP headers.
// At 1s intervals, that's ~120 KB / minute just to say: "anything new?"
// Most of the time the answer is: no.

This polling loop honors HTTP's rule to the letter. It also pays for it. A full request-response round-trip isn't free: RFC 6202 notes that “every long poll request and long poll response is a complete HTTP message and thus contains a full set of HTTP headers”— the same is obviously true of short polling, where the headers show up 60 times a minute whether or not there's anything to say (RFC 6202 §2.2).

Worse, your typing latency is bounded below by your polling interval. Jordan hits a key; you don't see it until your next tick. Make the interval shorter? You burn more bytes per minute. Make it longer? You watch letters arrive in clumps.

What if you asked once, and the server waited?#

Here's the clever move. The client still asks — but the server doesn't reply until it has news. The request goes out, the TCP socket stays open, and the response sits there, a promise dangling on both sides of the wire. When something happens, the server writes the response and closes. The client reads it and immediately opens another.

Ably calls this shape “bending HTTP slightly out of shape”. The format is preserved — still one ask, one answer — but a long-polling client fires and listens, sometimes for tens of seconds, before its single reply arrives.

Alex Russell coined Comet — long polling as a family name — in March 2006. Google Docs shipped on long polling for years: look inside Google's Closure Library and you'll still find goog.net.BrowserChannel, long polling over XHR with forever-iframe streaming as a fallback. Google never published it as an API, which is its own kind of tell — attribution comes from ex-Googlers and Joseph Gentle's node re-implementation.

polling · 10s window
polling · 10s windowt = 0.0s of 10s
1 polls · 520 B · max Δ —
Press play and watch sixty requests carry three pieces of news. Most replies carry nothing.

Polling burns most of its bytes saying nothing. The natural next move: don't hang up until there is news.

long polling · 10s window
long polling · 10s windowt = 0.0s of 10s
1 requests · 520 B · max Δ —
Press play and watch three requests cover the same three events. The held spans are the server choosing not to reply yet.

Long polling earns the latency back, but every reply still pays a full HTTP round-trip of overhead. The next move skips that.

websocket · 10s window
websocket · 10s windowt = 0.0s of 10s
0 frames · 520 B · max Δ —
Press play and watch the handshake finish, then three frames ride the open socket. No more requests after the first.

Three frames cost 562 bytes; polling spent 7.5 KB asking. Refusing to finish the question is the move every later protocol inherits — long polling did it inside HTTP, WebSocket did it by ending HTTP mid-socket.

The line that ends HTTP mid-socket#

WebSocket's move is to get HTTP to politely step aside. The client opens a regular HTTP request with three special headers that ask it to stop being HTTP. The server replies with a status code that was, until WebSocket came along, vanishingly rare — 101 Switching Protocols — plus one header that proves it understood.

client · opening requesthttp
GET /chat HTTP/1.1
Host: docs.example
Upgrade: websocket
Connection: Upgrade
Sec-WebSocket-Key: dGhlIHNhbXBsZSBub25jZQ==
Sec-WebSocket-Version: 13
server · the 101 replyhttp
HTTP/1.1 101 Switching Protocols
Upgrade: websocket
Connection: Upgrade
Sec-WebSocket-Accept: s3pPLMBiTxaQ9kYGzzhZRbK+xOo=

The proof header is a hash. The client sent a random key in the request; the server has to send one specific derivation of it back. The next widget runs that derivation in your browser against a fresh random key every time you press the button.

the 3-step upgrade
the 3-step upgradeRFC 6455 §1.3 sample
step 1 of 3
Step through the upgrade. First, the browser sends a regular HTTP/1.1 request — three headers ask the server to stop being HTTP: Upgrade, Connection, and a random 16-byte key.

The widget walks the spec sample. Now compute it on your own machine — the next widget runs that derivation in your browser against a fresh random key.

your browser does the SHA-1
your browser does the SHA-1SHA-1 ready
Tap regenerate and your machine produces a fresh random key, glues on the GUID, and runs the SHA-1. Nothing in the reveal is precomputed.

The widget runs the computation through your browser's Web Crypto API, so the Sec-WebSocket-Accept you see is the real SHA-1 your machine just computed. Nothing in the reveal is faked.

What you just watched was a SHA-1 over your random key with one very strange suffix glued on. That suffix is a literal string, written into the spec itself, identical for every server on earth:

RFC 6455 §1.3 — the string, verbatimtext
258EAFA5-E914-47DA-95CA-C5AB0DC85B11

The whole algorithm fits on one line:

 text
Sec-WebSocket-Accept = base64( sha1( Sec-WebSocket-Key + "258EAFA5-E914-47DA-95CA-C5AB0DC85B11" ) )

The GUID is there so an unaware server can't accidentally reply “yes, we're speaking WebSocket” without having read the spec — the only way to produce the right accept header is to know exactly what string to glue on (RFC 6455 §1.3).

After that reply is written and read, the same TCP socket is no longer speaking HTTP. It speaks WebSocket frames — 2 bytes of header for most messages, up to 14 at the far end. Either side can send, anytime, as long as both ends want the connection open. The client spoke first, exactly once, and then the conversation became something else.

One direction, with a safety net#

WebSocket isn't the only way out. Server-Sent Events (SSE) are the one-way cousin: a regular HTTP response with Content-Type: text/event-stream that the server never closes. The browser hands each data: …\n\n chunk to onmessage as it arrives. An SSE handler can be eight lines of Node.

server.js · the entire SSE handlerjavascript
app.get('/stream', (req, res) => {
  res.setHeader('Content-Type', 'text/event-stream');
  res.setHeader('Cache-Control', 'no-cache');
  res.flushHeaders();

  const send = (ev) => res.write(`id: ${ev.id}\ndata: ${JSON.stringify(ev)}\n\n`);
  const off = bus.on('update', send);
  req.on('close', off);
});

If WebSocket is “HTTP steps aside,” SSE is “HTTP just never stops.” No Upgrade, no magic GUID, no frame opcodes. Just one very patient HTTP response that keeps writing.

Then why not use WebSocket for everything? Because SSE ships the one feature WebSocket doesn't: automatic reconnect, with state recovery.The browser remembers the last event's id: field, and when the stream drops, it reopens the connection with a Last-Event-ID header so the server can resume from that cursor (WHATWG HTML §9.2.3). WebSocket has none of this in the spec. The socket closes, you start over — from whatever ws:// URL, with whatever auth, and whatever resume protocol you decided to build.

dropout · which side of the cliff
dropout · which side of the cliffWS lost: 3 · SSE replayed: 1
Drag the cliff where the network drops. WebSocket loses every event past the cliff; SSE replays them all when the connection heals.

The dropout has two dials: when it starts, and how long it lasts. Hold the timing fixed and stretch the gap.

gap length · how much SSE has to replay
gap length · how much SSE has to replaygap: 2.2s · replay: evt-3
Stretch the gap and watch SSE's replay arc grow as more events fall inside. WebSocket's loss count grows too — nothing comes back.

Both dials compose into the shape §6 names: every long-poll and WebSocket client racing back the moment the gateway heals, the same instant. SSE absorbs that storm on the protocol; the others invent it.

The cost moved. It didn't vanish.#

Three failure modes show up the moment you ship any of these to production. None are in the tutorials. The matrix below is one row per failure mode, one column per protocol — tap a row to read what each cell means. The pattern that falls out is the point of this section.

where the cost reappears
where the cost reappears3 modes · 4 protocols
Each cell is what the protocol does when this failure mode hits. Darker means costlier. Tap a row to read the detail.

Each row tells a different story. Proxies and corporate networks silently break the WebSocket Upgrade, which is why Socket.IO still opens every connection on long polling first — the fallback isn't legacy; it's 2026 insurance. When a gateway blips, every long-poll and WebSocket client races back at the same instant: Discord's reconnect stampede took 17.5 seconds on a ring-lookup until they cached, and Slack built Flannel for the same shape. SSE alone has automatic resume on the protocol; the others need it written by hand. And on the third row, the hard part of real-time isn't holding the connection — it's the O(N) write on every event. Phoenix held two million idle sockets on one box; Discord's publish to a single 30,000-member guild still took 900 ms – 2.1 s before they parallelized fanout with Manifold.

None of this kills the idea. It just means real-time is a system, not a primitive. The socket is the easy part.

So when Jordan's cursor shows up on your screen…#

…which of these is actually doing the work? Probably a WebSocket today. Was long polling, via BrowserChannel, for most of the last decade. Google doesn't publish which, and the honest answer is that most production systems have some layer of every mechanism in this post somewhere — polling for a heartbeat, long polling as a Socket.IO fallback, a WebSocket for the hot path, SSE for the log tail, a cache in front of it all so a reconnect storm doesn't take the site down.

What they share is the move at the center of every one of them. The server never learned to speak first. The browser just stopped hanging up.A request goes out; it doesn't come back until it has something to say; it opens another the moment it does; or the socket simply never closes.

Every “real-time” web app on your laptop right now is variations on a browser that refuses to finish its sentence.

The title lied slightly: the browser didn't stop asking. It stopped ending the question.