Measured, published in full

Reflex finishes the same flows in a fraction of the round trips.

TL;DR: 6 tool calls and ~13K tokens vs 14 calls and ~56K for Playwright MCP across the same three measured flows.

Across the 3 flows verified head-to-head against Playwright MCP, the free default for Claude Desktop and other pure MCP clients, Reflex completed the same tasks in 2 tool calls per flow, 6 in total, where Playwright MCP needed 14. Agent round trips, the model turns where it stops to read output, are what dominate real wall clock, and a batched Reflex flow spends far fewer of them.

Agent round trips, 3 flows (fewer is better)
Playwright MCP
14 turns
Reflex
6 turns

Tool calls match turns exactly: 14 vs 6. One task is 2-3 Reflex calls against 6-14 MCP round trips.

Reflex: 5/6 tasks completedTied best first-try success0-token deterministic replay

A fraction of the tokens into context

Every token a tool puts into the agent's context is a token the agent pays for and a token it can no longer spend on the actual job. Over the 3 verified flows, Reflex sent ~13K tokens into context where Playwright MCP sent ~56K: roughly 4x fewer. Reflex distills each page once, then sends deltas instead of re-dumping.

Tokens into agent context, 3 flows (fewer is better)
Playwright MCP
~56K tok
Reflex
~13K tok

The amber bar is the context Reflex hands back to you to spend on the task.

The context window is the bigger half of the story. One browser_snapshot look at the W3C CSS Grid spec is ~169K tokens, most of a 200K window gone in a single tool call before the agent has done anything. The same look through Reflex is ~7K tokens. Context the agent keeps is context it spends on the work you asked for.

The cheapest look on every page

The cost an agent pays to read a page it does not already know is the exploration price, and Reflex is the cheapest on every page tested. Playwright MCP's browser_snapshot inlines the full accessibility tree; Reflex distills it. Bars are scaled within each chart.

Wikipedia article
Playwright MCP
~31K tok
Reflex
~4K tok
GitHub repo
Playwright MCP
~21K tok
Reflex
~3K tok
Hacker News
Playwright MCP
~15K tok
Reflex
~2K tok
W3C CSS Grid spec
Playwright MCP
~169K tok
Reflex
~7K tok

~26x cheaper. This is the architectural win: distillation with folding, not incremental trimming.

The cheapest look across the wider field

If your agent has a shell, the field is wider than Playwright MCP: Microsoft's token-efficient Playwright CLI and Vercel's agent-browser are strong free alternatives. Reflex is still the cheapest look on every page, and on pathological pages the gap is architectural.

PagePW MCP snapshotPW CLI yml fileagent-browser -iReflex
Wikipedia article~31K tok~31K tok~4K tok~4K tok
GitHub repo~21K tok~21K tok~6K tok~3K tok
Hacker News~15K tok~14K tok~3K tok~2K tok
W3C CSS Grid spec~169K tok~171K tok~37K tok~7K tok

More reasons Reflex wins

Methodology

Measured 2026-06-12 on macOS, same machine, live public sites, with no LLM on either side. Versions pinned: @playwright/mcp 0.0.76, @playwright/cli 0.1.14, agent-browser 0.27.2 (Vercel). Same 6 real-world flows, same end-condition verification, each tool driven per its own agent documentation. Tool-side wall clock is a second or two slower per flow than the fastest tool, while the agent round trips that dominate real time are far fewer. Reproduce it yourself: node bench/headtohead.js (competitor tools install under bench/tools/).

See the install docs or the pricing to get started.

Ready when you are

Same browser tasks. ~13K tokens, not ~56K.

Installed in 2 minutes. Your pages stay yours.