lainlog

Chapter 8 of 9 · Model Context Protocol

When the server asks back

sampling, elicitation, roots — and the human in the loop in each.

Every message we've followed in this course has gone the same way — client opens the session, client lists tools, client calls them. The server has been the one being asked. But MCP isn't one-way. There are three messages that flow the other direction, each with its own checkpoint, and one of them has two. Before we explain anything, try the opener.

A server asks for the host's LLM. How many human-approval gates fire?
A server asks for the host's LLM. How many human-approval gates fire?
Predict before you read on. The chapter exists because most readers arrive thinking sampling is a single forward call. It isn't — and the gate count is what makes the spec safe by construction.

The flip — three reverse messages#

Until now the client has been driving. Sometimes the server needs to. Three primitives reverse the direction: sampling(server asks the host's LLM for a completion), elicitation (server asks the user a structured question), and roots(server asks the client what directories it's allowed to operate inside). Each is named in the client's capability declaration from chapter 4 — the server can only ask back for what the client opted into.

Two of the three involve the user directly; all three pass through the host before anything reaches a model or a filesystem. The host's job here is the human-in-the-loopgate — the spec's claim that the user is the one who decides, not the model and not the server. The diagram below plays the three flows side by side; the gates appear as terracotta diamonds.

Server-asks-back, three views — sampling, elicitation, roots
Server-asks-back, three views — sampling, elicitation, rootstick 0 / 4 · sampling
tick 1 of 4
Pick a scenario, then step through the four ticks. Watch the server wants the host's LLM.

Three tabs, three flows. The arrows are the wire; the diamonds are the host. Now zoom on each one.

Sampling — the server wants your model#

Most servers don't ship their own LLM. A flight-search server wants to summarise; a Slack server wants to draft a reply; a SQL server wants to explain a query plan. Building inference into every server would mean every server pays for compute, every server chooses a model, every server has to be updated when a new model ships. The flip is cleaner: the server asks the host to use its model. Same protocol, same session, no extra credentials.

The method is sampling/createMessage. The body carries messages, an optional systemPrompt, a maxTokens ceiling, and modelPreferences — hints at speed/cost/quality the host is free to honour or ignore.

sampling/createMessage (request body)json
{
  "jsonrpc": "2.0",
  "id": 7,
  "method": "sampling/createMessage",
  "params": {
    "messages": [
      { "role": "user", "content": {
        "type": "text",
        "text": "Summarise these 47 flights in three lines."
      } }
    ],
    "systemPrompt": "You are a concise travel assistant.",
    "modelPreferences": {
      "hints": [{ "name": "claude-3-5-sonnet" }],
      "speedPriority": 0.4,
      "costPriority": 0.3,
      "intelligencePriority": 0.3
    },
    "maxTokens": 512
  }
}

Then the gate. The host pauses, shows the user the prompt, and asks. Three buttons. Three different sessions downstream — the request only reaches the model if the user lets it.

The sampling gate — what the host actually shows the user
The sampling gate — what the host actually shows the user
The server's prompt arrives at the host. Before it reaches the LLM, the host shows this. Three answers, three different sessions downstream.

Sampling has twohuman-in-the-loop gates, not one — that's the opener's payload. The first gate fires before the prompt reaches the LLM. The second fires before the completion goes back to the server: the host can show the user what the model said and ask whether it can be sent back. The user controls both ends. A server that wanted to exfiltrate data through a sampling response would still have to pass the second gate — which is the spec's safety claim made architectural.

Elicitation — the server asks the user#

The pattern: a server is mid-flow and needs a piece of information only the user has. A booking server doesn't know your seat preference. A timezone-aware server doesn't know your timezone. Hard-coding a failure path (error: missing information) would force the user to start over. Hard-coding a guess would be worse. The protocol's answer is to ask.

The method is elicitation/create. The server sends a JSON-Schema fragment — a small subset, just enough for primitive types and enums — describing the form it wants. The host renders the form. The user fills it. The values come back as the elicitation response, with an action field that's one of accept, decline, or cancel.

Elicitation — schema in, form out, response payload back
Elicitation — schema in, form out, response payload back
The server sent a JSON-Schema fragment; the host renders it as a form. Fill it (the user is the response) and watch the payload that goes back compose, field by field.

Elicitation has exactly one HITL gate, because the gate is the answer. Sampling routes a model around the user; elicitation routes the user's input around a model. The user is the response.

Roots — the boundaries the client declares#

The third flip is quieter. Roots are file:// URIs the client tells the server about: operate inside these directories. A filesystem server told its roots are /Users/ada/projects/bytesize and /Users/ada/Documents/specs knows not to read ~/.ssh. Two methods carry it: roots/list is a request the server can make at any time; the client's notifications/roots/list_changed fires when the user moves the workspace.

Roots — what the client tells the server it can touch
Roots — what the client tells the server it can touch
The list on the left is what the client declares; the log on the right is the wire. Add or remove a path and watch the notifications/roots/list_changed envelope fire.

The thing to feel about roots is what they aren't. Roots are coordination, not enforcement. The server SHOULDrespect the list — that's the spec's normative word. Nothing on the wire stops a malicious server from reading whatever the host's process can read; the roots message just makes the boundary legible. The enforcement has to come from the host process itself: sandboxing, OS-level permissions, capability tokens. Chapter 9 inherits this directly.

Why these three exist#

Without sampling, every server needs its own model. Without elicitation, every missing field is a hard error. Without roots, every server has to guess what it's allowed to touch. Agentic flows — the chains where one tool call leads to another, and the model needs more information halfway through — collapse without these. A single forward call can't carry an agent through a booking with three branching questions; the server has to be able to ask back.

And in all three cases, the trust boundary stays with the client. The server can request; the client can deny. The host is the bottleneck on purpose — a deny on the sampling gate is a deny full stop, and a closed elicitation form is an action: cancel on the wire. The asymmetry from chapter 5 (server primitives are controlled by model / app / user) holds here in mirror: client primitives add a fourth invariant — the human is in the loop on every one that touches data.

Comprehension check#

A server requests sampling. The host has no LLM available — maybe the user is offline, maybe the host shipped without a model. What does the client return? Predict before revealing.

reveal answer

The client returns a JSON-RPC error response, paired by id, with a code that signals the LLM is unavailable — -32603 (internal error) is the catch-all the spec falls back to, with a message describing the cause. A well-built server catches the error and either retries with a backoff, falls back to its own best-effort heuristic, or returns its own error to the client that opened the session. Sampling is a request; requests can fail; failure has a code.

The protocol's two directions, complete#

Now you know both directions. Client → server got us through chapters 3 through 7: the handshake, the primitives, the build-a- server / build-a-client tour. Server → client gets us through sampling, elicitation, and roots — the surface that lets agents do anything beyond a single forward call. Two HITL gates on sampling, one on elicitation, zero on roots (because roots is coordination, not enforcement). Scroll back to the opener: the gate count isn't a riddle anymore.

Every message we've followed has been polite. The server asked nicely, the client answered nicely, the host gated nicely. The wild has teeth. That's chapter 9.