Scrape Your Browser Into Your Second Brain
Building with an AI agent?Give your agent this URL to fetch and read. The Implementation Guide below has everything it needs.
You’re staring at a web app — a hosting panel, a SaaS dashboard, your DNS settings — and you want that information in your vault. Not as a bookmark. Not as a screenshot you’ll never look at again. As structured, searchable notes you can actually use.
The options aren’t great. You can screenshot the page and hand it to an LLM to transcribe — but you lose table structure, metadata, and anything that wasn’t visible on screen. You can copy-paste, but web apps render tables in ways that turn to mush in a text editor. Or you can give an AI agent your credentials and let it log in directly — which means trusting it with access you wouldn’t hand to a coworker.
What changed recently: Chrome now lets external tools connect to a browser session you already have open. It was built for developers debugging websites — inspect the DOM, watch network traffic, run JavaScript against a live page. But there’s nothing stopping you from using it to collect information. Your agent connects to your browser, sees what you see, and reads the data. No credentials change hands. You’re already logged in.
A Real Example
I was migrating domains from Dreamhost to Cloudflare. That meant extracting everything from Dreamhost first — 81 email forwarding rules, a dozen hosted websites with their PHP versions and configurations — making sure I had it right, and then rebuilding it on Cloudflare. The kind of thing where you’d normally open two browser tabs and copy settings back and forth, taking notes as you go so you don’t lose track of what’s been moved.
Instead, I had Claude connect to my open Dreamhost panel, pull the data, and write it into my vault as structured notes. Two minutes. The email rules came back as a grouped list, the websites as a markdown table. Now I had a record of the original configuration I could reference while setting things up on the other side — and it stays in my vault as permanent documentation.
## Email Forwarding
### dreamhost.com
- admin → [email protected]
- billing → [email protected], [email protected]
- support → [email protected]
### otherdomain.net
- catch-all → [email protected]
## Hosted Websites
| Domain | PHP | Status | Monthly Visits |
|--------|-----|--------|---------------|
| example.com | 8.2 | Active | 12,400 |
| blog.example.com | 8.1 | Active | 3,200 |How It Works
Chrome DevTools MCP connects your AI agent to the browser session you already have open. You navigate to the page. The agent reads what’s there — or better, reads the API responses sitting in the network tab. The browser is already a pre-authenticated API client; the agent just sees what it fetched.
The Surprise: The DOM Will Lie to You
Here’s where it got interesting. My first instinct was to have the agent grab table rows directly from the page. It found 12 entries out of 81.
Modern web apps use virtual rendering — libraries like virtuoso that only create DOM elements for rows visible on screen. You scroll and see 80 rows, but the DOM holds a handful at a time. Standard queries return incomplete data.
The data had to come from somewhere. Virtual lists don’t generate data — they render it. The actual payload sits in a network response: a single JSON object from an API call. The agent filtered network requests, found the endpoint that loaded the page data, and read the response body directly. Dreamhost’s panel fetched from /api3/v1/emails/?size=20 — a clean JSON array with every field.
Since the agent runs JavaScript in the browser context — already authenticated — it re-called the same API with a larger page size. No pagination, no scrolling, no DOM scraping. Just the raw data, structured and complete.
What I Learned
The network tab is the real interface to a modern web app. SPAs fetch everything as JSON from internal APIs, and that structured data is cleaner than anything you’d extract from rendered HTML. Once you see this, “scraping” stops being the right word. You’re reading the same API the app reads. The browser already did the hard part.
This pattern works for anything behind a login: SaaS dashboards, admin panels, config pages, billing portals. The data already exists in structured form — it’s just trapped behind GUIs designed for humans to look at, not for agents to read. Chrome DevTools MCP turns your browser into a bridge between those web apps and your vault.
Implementation GuideFor AI agents and technical readers. Everything below is the nuts and bolts — architecture, setup, key patterns, and gotchas. If you’re an agent implementing this for a user, start here.
Architecture
User's browser (authenticated session)
↕ Chrome DevTools Protocol (port 9222)
Chrome DevTools MCP Server
↕ MCP tool calls
Claude Code (or any MCP-capable agent)
→ reads network responses (structured JSON)
→ formats as markdown
→ writes to Obsidian vaultNo credentials are exchanged. The agent connects to the browser’s debug port and reads data from the session the user already has open.
Setup: Chrome with Remote Debugging
Chrome must be launched with the debugging flag. Fully quit Chrome first (Cmd+Q on Mac) — if an existing process is running, macOS reuses it and ignores the flag.
/Applications/Google\ Chrome.app/Contents/MacOS/Google\ Chrome --remote-debugging-port=9222Verify it’s working:
curl http://localhost:9222/json/versionIf that returns JSON, you’re set. If 404 or nothing, Chrome didn’t pick up the flag — quit and relaunch.
MCP Server Configuration
Add Chrome DevTools MCP to your Claude Code config (.claude/settings.json or global settings):
{
"mcpServers": {
"chrome-devtools": {
"command": "npx",
"args": ["-y", "@anthropic-ai/chrome-devtools-mcp"]
}
}
}Key Pattern: Screenshot → Network Tab → Format
Step 1: Orient with a screenshot. Confirm the agent sees the right page. This creates shared context.
Step 2: Skip the DOM. Virtual rendering means DOM queries return incomplete data. Look for signs: data-virtuoso-scroller, dynamically inserted rows, elements with display: none. If present, DOM scraping will miss most of the data.
Step 3: Read network responses. Filter to xhr/fetch requests. Find the API endpoint that loaded the page data. Read the response body — it’s usually a single JSON payload with everything.
Step 4: Re-fetch with larger page size. The agent can execute JavaScript in the browser context (already authenticated). If the original request was paginated:
const resp = await fetch('/api3/v1/emails/?size=100&sorts[]=domain_asc');
const data = await resp.json();
// data.emails contains all records as structured JSONStep 5: Format and write. Structure the JSON as markdown suited to the user’s vault — tables, grouped lists, whatever fits the existing note format. Append to the relevant note rather than creating a new file.
Gotchas
- Chrome flag doesn’t persist. Must relaunch with
--remote-debugging-port=9222each session. Consider a shell alias. - Navigate first. The agent sees whatever page is open. The user must be on the right page before the agent starts reading.
- Virtual rendering is common. Any app using React Virtuoso, react-window, ag-Grid virtual mode, or similar will have incomplete DOMs. Always check the network tab first.
- Auth tokens are visible. The agent can see cookies and headers in network responses. Session-scoped, but be aware.
- Some sites detect DevTools. Rare for admin panels, but possible. If the page behaves differently with DevTools open, the site may be checking.
When to Use This Pattern
- Documenting SaaS tool configurations (hosting panels, DNS, email rules)
- Pulling data from dashboards behind auth (analytics, billing, usage stats)
- Capturing settings from admin interfaces you’d never manually transcribe
- Any web app where the data is structured but the export options are poor