What It Does
It works in two modes: a local Chrome browser for development, or a remote Browserbase environment for production-grade scraping with stealth mode and CAPTCHA handling — switching automatically based on your configuration.
Browser Automation lets your AI agent control a web browser using plain English commands through a simple CLI interface. Powered by Stagehand and Claude, it supports navigating URLs, clicking elements, filling forms, extracting structured data, and capturing screenshots.
Key Features
- Natural Language Browser Control — Issue browser actions in plain English using commands like `browser act "click the Sign In button"`. No XPath or CSS selectors required — Stagehand interprets your intent and interacts with the correct element.
- Automatic Local vs. Remote Mode Selection — The skill detects whether Browserbase API keys are present and automatically routes to the remote Browserbase environment or falls back to a local Chrome browser. No manual mode switching or user prompting needed.
- Structured Data Extraction — Use `browser extract "<instruction>"` with an optional JSON schema to pull specific data from any page. The skill returns data in the shape you define, making downstream processing straightforward.
- Screenshots for Verification — Capture the current browser state at any point with `browser screenshot`. This is especially useful for verifying that navigation or actions completed as expected before proceeding.
- Element Discovery with Observe — When an action fails or you're unsure what's on a page, `browser observe "<query>"` surfaces the interactive elements available — helping you craft precise follow-up actions.
- Stealth Mode and CAPTCHA Handling (Browserbase) — In remote mode, Browserbase provides stealth browsing, proxy support, and CAPTCHA bypass — making this skill suitable for production scraping pipelines and sites with bot detection.
Requirements
Requires both `BROWSERBASE_API_KEY` and `BROWSERBASE_PROJECT_ID`. - **Local Chrome** *(optional)* — Required only when running in local mode without Browserbase credentials.
- **LLM API Key** — Powers natural language action interpretation via Claude. - **Browserbase API Key** *(optional)* — Enables remote browser sessions with stealth mode and CAPTCHA support.
Use Cases
- Automated competitive research — Navigate competitor websites, extract pricing tables or product listings with a defined schema, and feed the structured results into an analysis workflow — all without writing a custom scraper.
- Form-based workflow automation — Have an agent log into a web portal, navigate to a form, fill in fields using natural language actions, and submit — automating repetitive data-entry tasks that lack an API.
- Visual QA and regression testing — After deploying a web application, use the skill to navigate key pages and take screenshots at each step, giving you a visual audit trail to confirm UI correctness.
- Production web scraping pipeline — Run the skill in Browserbase mode to scrape data from sites with anti-bot measures, using stealth mode and proxy support to reliably collect data at scale.