lainlog

lainlog https://lainlog.com Long-form engineering essays paired with interactive widgets — what's actually happening when you press Run, with the wires showing. en-us Fri, 26 Jun 2026 13:21:59 GMT Quantization, explained from scratch: how AI models get smaller without getting dumber https://lainlog.com/posts/quantization-from-scratch https://lainlog.com/posts/quantization-from-scratch an inference-engineering primer. what quantization is, why it works, which formats matter in 2026 (fp8, fp4, mxfp8, nvfp4, nf4), the five mainstream techniques with working code, kv-cache quantization, the sensitivity hierarchy, and a safe production recipe for h100 / b200. Wed, 13 May 2026 00:00:00 GMT An eval suite by Friday: LLM evals in CI by Monday standup https://lainlog.com/posts/eval-loop-monday https://lainlog.com/posts/eval-loop-monday the worked-example tutorial: 30 prompts, three test types, one CI workflow file. get llm evals into your build pipeline by monday standup with about 30 lines of code. Mon, 04 May 2026 00:00:00 GMT How to find the failure modes your eval set will actually catch — a primer on error analysis https://lainlog.com/posts/look-at-the-data https://lainlog.com/posts/look-at-the-data most teams write eval sets by guessing what could go wrong. the fix is reading 100 actual outputs first. open coding, axial coding, the saturation rule, and a monday-morning recipe. Sun, 03 May 2026 00:00:00 GMT Your LLM-as-judge has a palate too — calibrating the model that grades the model https://lainlog.com/posts/the-judge-that-learned https://lainlog.com/posts/the-judge-that-learned an llm-as-judge inherits every bias an llm has — position, verbosity, self-preference. calibration is what turns 'another llm scored it' into a measurement. Sun, 03 May 2026 00:00:00 GMT How to build an eval set you can actually maintain — a primer on eval-set construction https://lainlog.com/posts/building-the-dataset https://lainlog.com/posts/building-the-dataset an eval set is a dataset, not a script. coverage, balance, anti-leakage, versioning — the four disciplines that turn a list of prompts into something you can ship a product on. Sun, 03 May 2026 00:00:00 GMT When 84% beats 81%: statistics for eval engineers https://lainlog.com/posts/error-bars-on-evals https://lainlog.com/posts/error-bars-on-evals error bars on a pass rate, paired comparison, and sample-size planning — the statistics subset that decides whether your eval improvement is real or noise. Sun, 03 May 2026 00:00:00 GMT Production traces are your eval set — the LLM eval maintenance flywheel https://lainlog.com/posts/the-eval-flywheel https://lainlog.com/posts/the-eval-flywheel an eval suite without a feedback loop becomes shelfware in three months. sample real traces, anonymize, label, fold back. the loop is mundane; running it is the moat. Sun, 03 May 2026 00:00:00 GMT Evals when your model uses tools — a primer on agent and trajectory evals https://lainlog.com/posts/evaluating-agents https://lainlog.com/posts/evaluating-agents when your model calls tools and decides what to do next, a single grade on the final reply isn't an evaluation — it's a guess. the four checks an agent suite needs, and a python skeleton you can wire in this week. Sun, 03 May 2026 00:00:00 GMT LLM benchmarks honestly read: MMLU, SWE-bench, GPQA & friends https://lainlog.com/posts/the-benchmark-museum https://lainlog.com/posts/the-benchmark-museum the chart your boss screenshots is the lab's marketing surface, not your eval suite. what each public benchmark — mmlu, swe-bench, gpqa, humaneval, bfcl, tau-bench, osworld, helm — actually measures, and when its score really does track your product. Sun, 03 May 2026 00:00:00 GMT How to know if your AI is actually any good — a primer on evals for LLM products https://lainlog.com/posts/evals-or-vibes https://lainlog.com/posts/evals-or-vibes an eval is a test for an LLM feature: a list of inputs, the answers you expect, and a way to score what came back. three kinds, and a monday-morning recipe. Sat, 02 May 2026 00:00:00 GMT How the JavaScript event loop, microtasks, and the call stack work https://lainlog.com/posts/the-line-that-waits-its-turn https://lainlog.com/posts/the-line-that-waits-its-turn why setTimeout(0) is never zero, why await feels seamless, and why one runaway Promise can stall a tab. Sat, 25 Apr 2026 00:00:00 GMT JavaScript Hoisting, the TDZ, and the Call Stack Explained https://lainlog.com/posts/how-javascript-reads-its-own-future https://lainlog.com/posts/how-javascript-reads-its-own-future before line 1 runs, the engine has already walked your file — and that walk is why hoisting, the TDZ, and the call stack are one mechanism, not three quirks. Sat, 25 Apr 2026 00:00:00 GMT AI agent traps & prompt injection on the open web https://lainlog.com/posts/the-webpage-that-reads-the-agent https://lainlog.com/posts/the-webpage-that-reads-the-agent you don't break the model — you break the page it reads. six ways the open web learned to trap an AI agent. Fri, 24 Apr 2026 00:00:00 GMT Why fetch fails in browser but works in curl (CORS) https://lainlog.com/posts/why-fetch-fails-only-in-browser https://lainlog.com/posts/why-fetch-fails-only-in-browser the server did answer — your browser is just holding the response back from your JavaScript. Mon, 20 Apr 2026 00:00:00 GMT WebSockets, SSE, and long-polling: how real-time web works https://lainlog.com/posts/the-browser-stopped-asking https://lainlog.com/posts/the-browser-stopped-asking real-time apps didn't teach the server to speak first — they taught the browser to stop hanging up. Mon, 20 Apr 2026 00:00:00 GMT How instant email-availability checks work https://lainlog.com/posts/how-gmail-knows-your-email-is-taken https://lainlog.com/posts/how-gmail-knows-your-email-is-taken the pipeline that tells you 'already taken' before your finger lifts. Sun, 19 Apr 2026 00:00:00 GMT JavaScript closures, var vs let, and the loop bug https://lainlog.com/posts/the-function-that-remembered https://lainlog.com/posts/the-function-that-remembered how a function outlives the scope it was born in — and why half of your JS bugs start there. Sun, 19 Apr 2026 00:00:00 GMT