WebMCP: What Changes When Your Agent Visits a Website

Most coverage focuses on exposing existing features to agents — here’s your flight search form, now an agent can use it. That’s fine, but it’s the boring version of the question. The interesting version: what would you build differently if you assumed a human and their agent would use it together, from the start?

The browser is becoming an agent runtime. Copilot ships in every copy of Edge. Gemini Nano is in Chrome. The agent doesn’t live in a separate app — it travels with the user. WebMCP (experimental in Chrome 146) is how pages decide whether that agent finds a wall or a door.

Three things actually change.

Without WebMCP, agents screen-scrape: slow, brittle, already accounting for 51% of web traffic. With WebMCP, a page exposes callable tools — filterListings(budget, schoolDistrict), addConcept(name, position). The same shift fintech made when it moved from screen-scraping to Open Banking APIs. Agents stop guessing at UI and start calling functions.

The shared canvas becomes possible. Chat-only AI can’t do spatial things. High-consideration decisions — house, software, venue — require visual comparison, changing your mind, going back. A well-built WebMCP app gives both the human and the agent the same canvas to work on. The human clicks a node; the agent expands it. The agent surfaces three options; the human rejects two. Neither is driving — they’re collaborating on something that exists in the page.

The agent carries context to you, not from you. It arrives already knowing the user’s budget, preferences, past choices. You don’t have to ask. That’s a different starting point for a session than the blank form.

What I haven’t worked out yet: when does a shared canvas actually beat just letting the agent handle everything in chat? Not every interaction needs a visual workspace. The question of when to build a full collaborative interface versus just exposing clean tools is genuinely unresolved. I suspect it has something to do with whether the decision requires the human to see something to know what they want — but that’s a hypothesis, not an answer.

There’s also an economics angle worth its own note. Local inference (Gemini Nano running in-browser) means zero marginal cost per user. That unlocks categories of applications that couldn’t exist with server-side AI — accessibility tools, personal therapy apps, private knowledge tools — because the unit economics never worked at free-tier. That’s a separate thread.

What WebMCP Actually Changes