Skip to main content
Craft Agent includes a built-in Chromium browser that your agent can control directly. Navigate pages, fill forms, click buttons, extract data, run JavaScript, and inspect network traffic — all without leaving the conversation.

When to Use the Browser

Good fit for the browser

  • One-off tasks that don’t need a reusable integration
  • UI-only workflows where no API exists
  • When source setup is blocked and you need results now
  • Scraping or extracting data from web pages
  • Filling forms or completing multi-step web workflows

Better with a source

  • Repeatable tasks you’ll run regularly
  • Team-wide automation and reporting
  • Workflows that need stable, programmatic access
  • When an API or MCP server already exists for the service

Core Workflow

Every browser interaction follows the same pattern:
1

Open the browser

The agent opens a browser window in the background (or reuses an existing one).
2

Navigate to a page

Load a URL — the agent can navigate to any website, including ones where you’re already logged in.
3

Inspect the page

The agent takes a snapshot of the page — a structured accessibility tree that identifies every interactive element (buttons, links, inputs) with a reference ID like @e1, @e2, etc.
4

Interact

Using those references, the agent can click buttons, fill text inputs, select dropdown options, scroll, and send keyboard shortcuts.
5

Extract or verify

Read the results — extract data with JavaScript, take screenshots for visual verification, or inspect network traffic to understand what happened.

What You Can Do

Navigate & Click

Open URLs, click buttons and links, go back/forward in history

Fill Forms

Type into text fields, select dropdowns, submit forms

Extract Data

Run JavaScript to query the DOM and pull structured data from any page

Screenshots

Capture full-page or targeted screenshots of specific elements or regions

Inspect Network

See what API calls a page makes — debug failures or discover internal endpoints

Keyboard Input

Send key presses and shortcuts (Enter, Escape, Cmd+K, etc.)

Permissions

Browser tools work in all permission modes, including Explore. The agent can browse, read, and extract data without switching to a higher permission level.
The agent reads a browser tools guide before its first browser interaction in each session. This ensures it uses the tools correctly and follows best practices. If you see a brief pause on the first browser action, that’s why.

Window Lifecycle

The browser window persists across interactions within a session. When the agent is done:
ActionWhat happensWhen to use
CloseWindow is destroyed, all state lostTask fully complete, browser not needed
ReleaseAgent overlay dismissed, window stays visibleAgent done, you may want to keep browsing
HideWindow hidden but preserved in memoryTemporarily done, may need browser again later
Closing the browser window via the OS close button hides it rather than destroying it — the agent can re-open it instantly.