<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom">
  <channel>
    <title>lainlog</title>
    <link>https://lainlog.com</link>
    <atom:link href="https://lainlog.com/rss.xml" rel="self" type="application/rss+xml" />
    <description>Long-form engineering essays paired with interactive widgets — what&apos;s actually happening when you press Run, with the wires showing.</description>
    <language>en-us</language>
    <lastBuildDate>Fri, 26 Jun 2026 13:21:59 GMT</lastBuildDate>
    <item>
      <title>Quantization, explained from scratch: how AI models get smaller without getting dumber</title>
      <link>https://lainlog.com/posts/quantization-from-scratch</link>
      <guid isPermaLink="true">https://lainlog.com/posts/quantization-from-scratch</guid>
      <description>an inference-engineering primer. what quantization is, why it works, which formats matter in 2026 (fp8, fp4, mxfp8, nvfp4, nf4), the five mainstream techniques with working code, kv-cache quantization, the sensitivity hierarchy, and a safe production recipe for h100 / b200.</description>
      <pubDate>Wed, 13 May 2026 00:00:00 GMT</pubDate>
    </item>
    <item>
      <title>An eval suite by Friday: LLM evals in CI by Monday standup</title>
      <link>https://lainlog.com/posts/eval-loop-monday</link>
      <guid isPermaLink="true">https://lainlog.com/posts/eval-loop-monday</guid>
      <description>the worked-example tutorial: 30 prompts, three test types, one CI workflow file. get llm evals into your build pipeline by monday standup with about 30 lines of code.</description>
      <pubDate>Mon, 04 May 2026 00:00:00 GMT</pubDate>
    </item>
    <item>
      <title>How to find the failure modes your eval set will actually catch — a primer on error analysis</title>
      <link>https://lainlog.com/posts/look-at-the-data</link>
      <guid isPermaLink="true">https://lainlog.com/posts/look-at-the-data</guid>
      <description>most teams write eval sets by guessing what could go wrong. the fix is reading 100 actual outputs first. open coding, axial coding, the saturation rule, and a monday-morning recipe.</description>
      <pubDate>Sun, 03 May 2026 00:00:00 GMT</pubDate>
    </item>
    <item>
      <title>Your LLM-as-judge has a palate too — calibrating the model that grades the model</title>
      <link>https://lainlog.com/posts/the-judge-that-learned</link>
      <guid isPermaLink="true">https://lainlog.com/posts/the-judge-that-learned</guid>
      <description>an llm-as-judge inherits every bias an llm has — position, verbosity, self-preference. calibration is what turns &apos;another llm scored it&apos; into a measurement.</description>
      <pubDate>Sun, 03 May 2026 00:00:00 GMT</pubDate>
    </item>
    <item>
      <title>How to build an eval set you can actually maintain — a primer on eval-set construction</title>
      <link>https://lainlog.com/posts/building-the-dataset</link>
      <guid isPermaLink="true">https://lainlog.com/posts/building-the-dataset</guid>
      <description>an eval set is a dataset, not a script. coverage, balance, anti-leakage, versioning — the four disciplines that turn a list of prompts into something you can ship a product on.</description>
      <pubDate>Sun, 03 May 2026 00:00:00 GMT</pubDate>
    </item>
    <item>
      <title>When 84% beats 81%: statistics for eval engineers</title>
      <link>https://lainlog.com/posts/error-bars-on-evals</link>
      <guid isPermaLink="true">https://lainlog.com/posts/error-bars-on-evals</guid>
      <description>error bars on a pass rate, paired comparison, and sample-size planning — the statistics subset that decides whether your eval improvement is real or noise.</description>
      <pubDate>Sun, 03 May 2026 00:00:00 GMT</pubDate>
    </item>
    <item>
      <title>Production traces are your eval set — the LLM eval maintenance flywheel</title>
      <link>https://lainlog.com/posts/the-eval-flywheel</link>
      <guid isPermaLink="true">https://lainlog.com/posts/the-eval-flywheel</guid>
      <description>an eval suite without a feedback loop becomes shelfware in three months. sample real traces, anonymize, label, fold back. the loop is mundane; running it is the moat.</description>
      <pubDate>Sun, 03 May 2026 00:00:00 GMT</pubDate>
    </item>
    <item>
      <title>Evals when your model uses tools — a primer on agent and trajectory evals</title>
      <link>https://lainlog.com/posts/evaluating-agents</link>
      <guid isPermaLink="true">https://lainlog.com/posts/evaluating-agents</guid>
      <description>when your model calls tools and decides what to do next, a single grade on the final reply isn&apos;t an evaluation — it&apos;s a guess. the four checks an agent suite needs, and a python skeleton you can wire in this week.</description>
      <pubDate>Sun, 03 May 2026 00:00:00 GMT</pubDate>
    </item>
    <item>
      <title>LLM benchmarks honestly read: MMLU, SWE-bench, GPQA &amp; friends</title>
      <link>https://lainlog.com/posts/the-benchmark-museum</link>
      <guid isPermaLink="true">https://lainlog.com/posts/the-benchmark-museum</guid>
      <description>the chart your boss screenshots is the lab&apos;s marketing surface, not your eval suite. what each public benchmark — mmlu, swe-bench, gpqa, humaneval, bfcl, tau-bench, osworld, helm — actually measures, and when its score really does track your product.</description>
      <pubDate>Sun, 03 May 2026 00:00:00 GMT</pubDate>
    </item>
    <item>
      <title>How to know if your AI is actually any good — a primer on evals for LLM products</title>
      <link>https://lainlog.com/posts/evals-or-vibes</link>
      <guid isPermaLink="true">https://lainlog.com/posts/evals-or-vibes</guid>
      <description>an eval is a test for an LLM feature: a list of inputs, the answers you expect, and a way to score what came back. three kinds, and a monday-morning recipe.</description>
      <pubDate>Sat, 02 May 2026 00:00:00 GMT</pubDate>
    </item>
    <item>
      <title>How the JavaScript event loop, microtasks, and the call stack work</title>
      <link>https://lainlog.com/posts/the-line-that-waits-its-turn</link>
      <guid isPermaLink="true">https://lainlog.com/posts/the-line-that-waits-its-turn</guid>
      <description>why setTimeout(0) is never zero, why await feels seamless, and why one runaway Promise can stall a tab.</description>
      <pubDate>Sat, 25 Apr 2026 00:00:00 GMT</pubDate>
    </item>
    <item>
      <title>JavaScript Hoisting, the TDZ, and the Call Stack Explained</title>
      <link>https://lainlog.com/posts/how-javascript-reads-its-own-future</link>
      <guid isPermaLink="true">https://lainlog.com/posts/how-javascript-reads-its-own-future</guid>
      <description>before line 1 runs, the engine has already walked your file — and that walk is why hoisting, the TDZ, and the call stack are one mechanism, not three quirks.</description>
      <pubDate>Sat, 25 Apr 2026 00:00:00 GMT</pubDate>
    </item>
    <item>
      <title>AI agent traps &amp; prompt injection on the open web</title>
      <link>https://lainlog.com/posts/the-webpage-that-reads-the-agent</link>
      <guid isPermaLink="true">https://lainlog.com/posts/the-webpage-that-reads-the-agent</guid>
      <description>you don&apos;t break the model — you break the page it reads. six ways the open web learned to trap an AI agent.</description>
      <pubDate>Fri, 24 Apr 2026 00:00:00 GMT</pubDate>
    </item>
    <item>
      <title>Why fetch fails in browser but works in curl (CORS)</title>
      <link>https://lainlog.com/posts/why-fetch-fails-only-in-browser</link>
      <guid isPermaLink="true">https://lainlog.com/posts/why-fetch-fails-only-in-browser</guid>
      <description>the server did answer — your browser is just holding the response back from your JavaScript.</description>
      <pubDate>Mon, 20 Apr 2026 00:00:00 GMT</pubDate>
    </item>
    <item>
      <title>WebSockets, SSE, and long-polling: how real-time web works</title>
      <link>https://lainlog.com/posts/the-browser-stopped-asking</link>
      <guid isPermaLink="true">https://lainlog.com/posts/the-browser-stopped-asking</guid>
      <description>real-time apps didn&apos;t teach the server to speak first — they taught the browser to stop hanging up.</description>
      <pubDate>Mon, 20 Apr 2026 00:00:00 GMT</pubDate>
    </item>
    <item>
      <title>How instant email-availability checks work</title>
      <link>https://lainlog.com/posts/how-gmail-knows-your-email-is-taken</link>
      <guid isPermaLink="true">https://lainlog.com/posts/how-gmail-knows-your-email-is-taken</guid>
      <description>the pipeline that tells you &apos;already taken&apos; before your finger lifts.</description>
      <pubDate>Sun, 19 Apr 2026 00:00:00 GMT</pubDate>
    </item>
    <item>
      <title>JavaScript closures, var vs let, and the loop bug</title>
      <link>https://lainlog.com/posts/the-function-that-remembered</link>
      <guid isPermaLink="true">https://lainlog.com/posts/the-function-that-remembered</guid>
      <description>how a function outlives the scope it was born in — and why half of your JS bugs start there.</description>
      <pubDate>Sun, 19 Apr 2026 00:00:00 GMT</pubDate>
    </item>
  </channel>
</rss>
