When to Use the Browser
Good fit for the browser
- One-off tasks that don’t need a reusable integration
- UI-only workflows where no API exists
- When source setup is blocked and you need results now
- Scraping or extracting data from web pages
- Filling forms or completing multi-step web workflows
Better with a source
- Repeatable tasks you’ll run regularly
- Team-wide automation and reporting
- Workflows that need stable, programmatic access
- When an API or MCP server already exists for the service
Core Workflow
Every browser interaction follows the same pattern:Navigate to a page
Load a URL — the agent can navigate to any website, including ones where you’re already logged in.
Inspect the page
The agent takes a snapshot of the page — a structured accessibility tree that identifies every interactive element (buttons, links, inputs) with a reference ID like
@e1, @e2, etc.Interact
Using those references, the agent can click buttons, fill text inputs, select dropdown options, scroll, and send keyboard shortcuts.
What You Can Do
Navigate & Click
Open URLs, click buttons and links, go back/forward in history
Fill Forms
Type into text fields, select dropdowns, submit forms
Extract Data
Run JavaScript to query the DOM and pull structured data from any page
Screenshots
Capture full-page or targeted screenshots of specific elements or regions
Inspect Network
See what API calls a page makes — debug failures or discover internal endpoints
Keyboard Input
Send key presses and shortcuts (Enter, Escape, Cmd+K, etc.)
Permissions
Browser tools work in all permission modes, including Explore. The agent can browse, read, and extract data without switching to a higher permission level.The agent reads a browser tools guide before its first browser interaction in each session. This ensures it uses the tools correctly and follows best practices. If you see a brief pause on the first browser action, that’s why.
Window Lifecycle
The browser window persists across interactions within a session. When the agent is done:| Action | What happens | When to use |
|---|---|---|
| Close | Window is destroyed, all state lost | Task fully complete, browser not needed |
| Release | Agent overlay dismissed, window stays visible | Agent done, you may want to keep browsing |
| Hide | Window hidden but preserved in memory | Temporarily done, may need browser again later |