Doramagic Project Pack ยท Human Manual

browser-use

๐ŸŒ Make websites accessible for AI agents. Automate tasks online with ease.

Overview, Installation, and CLI

Related topics: Agent Runtime, Tools, and System Prompts, LLM Providers, MCP, Cloud, and Integrations

Section Related Pages

Continue reading this section for the full explanation and source context.

Section 2.1 Standard pip install

Continue reading this section for the full explanation and source context.

Section 2.2 Optional dependencies

Continue reading this section for the full explanation and source context.

Section 2.3 Versioning and upgrade

Continue reading this section for the full explanation and source context.

Related topics: Agent Runtime, Tools, and System Prompts, LLM Providers, MCP, Cloud, and Integrations

Overview, Installation, and CLI

1. What is Browser-Use

Browser-Use is an open-source library that enables AI agents to control a real web browser. The agent operates in an iterative loop: it observes the current page, decides on the next action, executes it through the browser, and repeats until the user's request is fulfilled. The core agent loop is described in the system prompt, which states the agent is "designed to operate in an iterative loop to automate browser tasks" with capabilities spanning navigation, form submission, content extraction, and persistent file-system tracking (system_prompt.md).

The project ships three primary surfaces:

  • Python library โ€” imported via from browser_use import Agent, Browser, ChatBrowserUse for programmatic use.
  • Browser Use CLI 2.0 โ€” a direct-CDP browser automation daemon optimized for AI coding agents such as Claude Code and Codex.
  • Actor API โ€” a low-level CDP wrapper exposing BrowserSession, Page, Element, and Mouse classes (actor/README.md).

The library is licensed under MIT, with services and data policy governed by the project's Terms of Service (README.md).

2. Installation

2.1 Standard pip install

The canonical installation is pip install browser-use. The library is published on PyPI and pulls in core dependencies required for agent operation, including async HTTP, Pydantic, and CDP bindings (README.md).

2.2 Optional dependencies

Several integrations are kept out of the core install to avoid supply-chain risk and to keep the package lightweight:

Optional packagePurposeNotes
litellmMulti-provider LLM routing via ChatLiteLLMRemoved from core deps in v0.12.5 due to a supply-chain incident; install separately with pip install litellm if needed
LLM provider SDKsOpenAI, Anthropic, Google Gemini, AWS Bedrock, Azure OpenAIInstalled per-provider as required
uvFast Python package managerDetected and used by install scripts; a fix in v0.12.4 added detection for curl-installed uv

The v0.12.5 release notes explicitly call out: "pip install browser-use no longer installs litellm โ€ฆ ChatLiteLLM wrapper is preserved โ€” install litellm separately if needed" (README.md).

2.3 Versioning and upgrade

The project follows 0.12.x semver with frequent point releases. Recent releases and their themes:

  • 0.12.3 โ€” Browser Use CLI 2.0 introduced; built on direct CDP for ~50ms command latency.
  • 0.12.4 โ€” Pinned litellm version after a CVE in aiohttp was patched.
  • 0.12.5 โ€” Removed litellm from core deps; raised aiohttp to 3.13.4 to patch a memory-exhaustion vulnerability.
  • 0.12.6 โ€” Default temperature set to 1.0 for Gemini-3 models; Bedrock structured-output fix.
  • 0.12.7 โ€” Major CLI refactor plus security fixes.
  • 0.12.8 โ€” Daemon unix socket restricted to owner-only; evaluate() refused on restricted browser profiles.
  • 0.12.9 โ€” Session id passed to judge LLM calls; new-tab pages skip screenshots.

Always upgrade with pip install -U browser-use to inherit the latest security and stability fixes (examples/apps/news-use/README.md).

3. Browser Use CLI 2.0

The CLI 2.0 launch (v0.12.3) is positioned as "the fastest browser automation for AI coding agents", claiming 2x faster execution and 50% fewer tokens relative to the previous Playwright-backed pipeline. The architectural shift is from Playwright orchestration to direct Chrome DevTools Protocol (CDP) communication with a persistent background daemon.

flowchart LR
    A[CLI command] --> B[CLI 2.0 client]
    B -->|IPC / Unix socket| C[Persistent daemon]
    C -->|CDP over WebSocket| D[Chromium browser]
    C --> E[Session state and history]
    D --> F[Target page]

Key design points:

  • Persistent background daemon โ€” eliminates browser startup overhead between commands, giving the ~50ms command latency advertised in the release notes.
  • Owner-only unix socket โ€” added in v0.12.8 to prevent local privilege escalation through the daemon socket.
  • Codex / Claude Code integration โ€” the CLI is the recommended interface when wiring browser-use into AI coding agents. A community request to support codex-cli without an API key (#4895) tracks ongoing work in this area.
  • Security profile โ€” the CLI refuses evaluate() calls on restricted browser profiles (v0.12.8) to prevent arbitrary JS execution in locked-down contexts.

4. Configuration and Authentication

4.1 LLM credentials

Most users set an LLM API key in a .env file or export it in the shell. The README pattern is:

export OPENAI_API_KEY='sk-...'
# or
export GEMINI_API_KEY='your-google-api-key-here'
# or
export ANTHROPIC_API_KEY='...'

The README's Tools example shows the canonical way to plug a custom function into the agent (README.md).

4.2 Real browser profiles

For tasks requiring existing logins, the README points users to examples/browser/real_browser.py, which reuses an existing Chrome profile with saved credentials. Remote profile sync is documented via a curl snippet in the README (README.md).

4.3 Agent settings

The Agent constructor exposes a large set of options surfaced through AgentSettings. Relevant fields visible in service.py include llm_timeout, step_timeout, final_response_after_failure, use_judge, ground_truth, enable_planning, planning_replan_on_stall, planning_exploration_limit, loop_detection_window, loop_detection_enabled, message_compaction, and max_clickable_elements_length. These are passed straight through to the running agent and control how the iterative loop behaves (browser_use/agent/service.py).

Message compaction is configured through MessageCompactionSettings defined in views.py, with fields such as enabled, compact_every_n_steps, trigger_char_count, trigger_token_count, keep_last_items, and summary_max_chars (browser_use/agent/views.py).

5. Common Setup Pitfalls

Several recurring community issues map directly to installation and configuration:

  • Blank Chromium at step 1 (#1020) โ€” most often a missing API key or an unsupported model. Verify the LLM credentials resolve and the model name matches a documented import.
  • Wrong model import in docs (#4755) โ€” some snippets in the "Supported Models" page reference classes that have been renamed or moved. Always check browser_use/llm/ for the current module path.
  • Azure OpenAI false content-filter blocks (#4783) โ€” Azure's Responsible-AI policy can flag normal navigation prompts; the workaround documented in the thread is to disable the content filter or switch providers.
  • Ollama structured-output failures (#2605) โ€” some local models return empty strings, which fail Pydantic JSON validation. The agent retries, but the loop can stall.

For deterministic web-game or canvas-based tasks, the agent currently relies on evaluate() to inject JavaScript; a feature request for a dedicated hover action (#4964) would close a gap for CSS-hover-driven UI patterns.

See Also

  • Agent Service Internals โ€” details on the iterative loop and message management
  • System Prompts โ€” model-specific prompt templates and reasoning rules
  • LLM Providers โ€” supported model integrations and credentials
  • Examples and Integrations โ€” runnable apps such as news-use

Source: https://github.com/browser-use/browser-use / Human Manual

Agent Runtime, Tools, and System Prompts

Related topics: Overview, Installation, and CLI, Browser Session, Watchdogs, DOM, and Actor, LLM Providers, MCP, Cloud, and Integrations

Section Related Pages

Continue reading this section for the full explanation and source context.

Related topics: Overview, Installation, and CLI, Browser Session, Watchdogs, DOM, and Actor, LLM Providers, MCP, Cloud, and Integrations

Agent Runtime, Tools, and System Prompts

1. Overview and Purpose

The agent runtime is the orchestrating layer of browser-use that drives an LLM through an iterative loop, observes the browser state, and emits structured tool calls until the user's task is satisfied. The runtime is built around three tightly coupled subsystems:

  • Agent service โ€” the loop, history, and orchestration core (browser_use/agent/service.py).
  • System prompts โ€” model-specific instructions that govern how the LLM should reason, plan, and act (browser_use/agent/system_prompts/).
  • Tools / actions โ€” the callable surface (click, input, navigate, extract, custom tools.action, etc.) that the LLM invokes through the agent.

Together they implement the "perception โ†’ reasoning โ†’ action โ†’ verification" pattern that the README positions as the central abstraction for AI-driven browser automation (README.md).

The design intentionally keeps these subsystems modular: prompt templates can be swapped per model family, tools can be extended at runtime, and the underlying browser transport can be Playwright or direct CDP (as introduced in CLI 2.0, see the 0.12.3 release notes referenced in the community context).

2. Agent Service: The Runtime Loop

The Agent class in service.py wires together the LLM, the browser session, the action registry, and an AgentHistoryList. The constructor accepts a long list of tunables โ€” llm_timeout, step_timeout, use_judge, ground_truth, enable_planning, planning_replan_on_stall, planning_exploration_limit, loop_detection_window, loop_detection_enabled, message_compaction, and max_clickable_elements_length โ€” and propagates them to internal subsystems such as the TokenCost service and the message-compaction controller (browser_use/agent/service.py).

The runtime also exposes a separate ai_step helper that lets you call a one-off "ask the LLM about the current page" operation, optionally with a screenshot, by extracting clean markdown via extract_clean_markdown and feeding it through get_ai_step_user_prompt (browser_use/agent/service.py, browser_use/agent/prompts.py).

A judge LLM (controlled by use_judge and ground_truth) can validate the final result, and a dedicated page_extraction_llm is registered independently for extract operations. Multiple LLMs are tracked in a single TokenCost service so cost reporting remains consistent across them.

The history model โ€” AgentHistoryList โ€” stores per-step entries with timing metadata and supports serialization via save_to_file, making it possible to persist or post-process runs (browser_use/agent/views.py).

3. System Prompts: Model-Specific Reasoning Templates

The agent ships multiple prompt variants under browser_use/agent/system_prompts/, each tuned for a different reasoning regime. The standard system_prompt.md is the most verbose: it requires an explicit thinking block, supplies <todo_examples>, <evaluation_examples>, and <memory_examples> blocks, and frames the agent as an iterative loop driven by <user_request>, <agent_history>, <agent_state>, <browser_state>, <browser_vision>, and one-shot <read_state> (browser_use/agent/system_prompts/system_prompt.md).

system_prompt_no_thinking.md removes the explicit chain-of-thought preamble so it can be used with models that prefer or require hidden reasoning (browser_use/agent/system_prompts/system_prompt_no_thinking.md).

system_prompt_anthropic_flash.md and system_prompt_flash_anthropic.md are trimmed, "flash" variants optimized for Anthropic-style tool use: they replace XML blocks with shorter natural-language rule lists, treat screenshots as ground truth, and constrain actions to a AgentOutput tool schema (browser_use/agent/system_prompts/system_prompt_anthropic_flash.md, browser_use/agent/system_prompts/system_prompt_flash_anthropic.md).

All variants share a common contract: the LLM must produce a memory, an evaluation_previous_goal, a next_goal, and a list of actions โ€” typically between 1 and max_actions per step โ€” and must verify outcomes against the screenshot before proceeding.

flowchart TD
    A[User Task] --> B[System Prompt + State]
    B --> C[LLM Call]
    C --> D{Parse AgentOutput}
    D -->|valid| E[Execute Actions<br/>click/input/extract/...]
    E --> F[Observe Browser State]
    F --> G[Judge LLM + TokenCost]
    G -->|task done| H[Final Result + GIF]
    G -->|continue| B
    D -->|invalid| I[Recovery / Loop Detector]
    I --> B

4. Tools, Actions, and Custom Extensions

The action surface documented in the standard prompt is fixed for browser control: navigate, click, input, scroll, wait, extract, screenshot, switch_tab, go_back, done, write_file, read_file, and replace_file_str (browser_use/agent/system_prompts/system_prompt.md). extract is governed by a dedicated extraction prompt that instructs the LLM to ground answers strictly in the supplied markdown and to avoid hallucination (browser_use/agent/prompts.py).

Custom tools are added through the Tools registry, as shown in the README:

from browser_use import Tools
tools = Tools()

@tools.action(description='Description of what this tool does.')
def custom_tool(param: str) -> str:
    return f"Result: {param}"

agent = Agent(task="Your task", llm=llm, browser=browser, tools=tools)

Source: README.md

Community members have asked for additional first-class actions such as hover for CSS :hover interactions (issue #4964) and for the ability to drive browser-use from CLI agents like codex-cli without an API key (issue #4895). These requests reflect an active effort to broaden the action surface beyond the default registry. As of 0.12.8, evaluate() is also refused on restricted browser profiles, and daemon UNIX sockets are owner-only โ€” security hardening that lives alongside the tools layer (release notes for 0.12.8).

5. Observability and Outputs

Every run produces an AgentHistoryList that can be saved to disk or replayed visually via create_history_gif, which overlays task text, per-step goals, and the rendered browser screenshots into a single GIF (browser_use/agent/gif.py, browser_use/agent/views.py). This makes the runtime suitable for both production telemetry and for debugging visual-task failures like the "blank chromium page" reports seen in issue #1020 and the Gold Miner play-through in issue #4939, where the agent loaded a page but failed to identify the right interactive elements.

See Also

  • Browser Session and DOM Layer โ€” covers the underlying browser transport used by the agent.
  • LLM Providers and Model Configuration โ€” model import paths and provider-specific quirks (see also issue #4755 on stale import examples).
  • Cloud Events and Telemetry โ€” browser_use/agent/cloud_events.py integration with the Browser Use cloud.

Research document (citation source reference)

(no reference document available)

Source: https://github.com/browser-use/browser-use / Human Manual

Browser Session, Watchdogs, DOM, and Actor

Related topics: Agent Runtime, Tools, and System Prompts, LLM Providers, MCP, Cloud, and Integrations

Section Related Pages

Continue reading this section for the full explanation and source context.

Related topics: Agent Runtime, Tools, and System Prompts, LLM Providers, MCP, Cloud, and Integrations

Browser Session, Watchdogs, DOM, and Actor

The browser automation stack in browser-use is layered into four cooperating subsystems: a Browser Session that owns the browser lifecycle, a set of Watchdogs that keep that session healthy, a DOM service that converts the live page into an LLM-friendly representation, and a low-level Actor that exposes raw Chrome DevTools Protocol (CDP) primitives for advanced users. Together they separate "control the browser" from "drive the browser" and let the agent loop reason about a stable, structured view of the page.

High-Level Architecture

flowchart TB
    Agent[Agent Loop / service.py] -->|events| BS[Browser Session<br/>session.py]
    BS -->|owns| WD[Watchdogs<br/>watchdog_base.py]
    BS -->|owns| CDPCDP[CDP Connection]
    CDPCDP -->|talks to| Chrome[(Chromium)]
    Agent -->|extract| DOM[DOM Service<br/>markdown_extractor.py]
    DOM -->|reads| BS
    Actor[Actor API<br/>actor/page.py] -.->|wraps| BS
    Profile[BrowserProfile<br/>profile.py] -->|configures| BS
    Events[Event Bus<br/>events.py] <-->|publishes| BS
    Events <-->|subscribes| WD

Browser Session

The BrowserSession class (browser_use/browser/session.py) is the long-lived owner of a Chromium instance. It is exposed in the public API under the alias Browser (browser_use/actor/README.md). The session is responsible for:

  • Lifecycle: start() launches the browser process, close() tears it down. A SessionManager (browser_use/browser/session_manager.py) coordinates multiple sessions, especially in the cloud deployment referenced in the README ("Scalable browser infrastructure / Memory management / Proxy rotation / Stealth browser fingerprinting / High-performance parallel execution").
  • Configuration: A BrowserProfile (browser_use/browser/profile.py) captures persistent settings (user data dir, headless mode, proxy, allowed domains, security flags).
  • Event bus: Navigation, dialogs, downloads, and tab changes are modeled as typed events in browser_use/browser/events.py and consumed by the watchdogs.
  • State: The session exposes structured BrowserState snapshots via browser_use/browser/views.py (URL, tabs, interactive elements, page content).

The agent consumes this state on every step: agent_history, agent_state, and browser_state are injected into the model prompt per the system prompt in browser_use/agent/system_prompts/system_prompt.md.

Watchdogs

Watchdogs are background coroutines attached to a session that react to events on the bus. The base class in browser_use/browser/watchdog_base.py standardizes subscription and teardown. Typical responsibilities include:

WatchdogResponsibility
DOM watchdogTriggered on navigation completion; asks the DOM service to re-extract the page.
Security watchdogEnforces profile-level restrictions; per release 0.12.8 it can "refuse evaluate() on restricted browser profiles" (see release notes).
Downloads watchdogAuto-saves PDFs into available_file_paths so the agent can read_file them, as documented in system_prompt.md.
Screenshots watchdogCaptures bounded-box screenshots; 0.12.9 adds "skip screenshots on new tab pages" to avoid wasted tokens.
Dialog/permissions watchdogDismisses JS dialogs, handles cookie banners before user actions, per the error_recovery section of browser_use/agent/system_prompts/system_prompt_anthropic_flash.md.

Watchdogs never run user code directly; they only mutate session state or publish new events, which keeps the agent loop deterministic.

DOM Service

Before each step the agent needs a token-efficient, structured view of the page. The DOM service in browser_use/dom/markdown_extractor.py is invoked from the agent service:

from browser_use.dom.markdown_extractor import extract_clean_markdown
content, content_stats = await extract_clean_markdown(
    browser_session=self.browser_session, extract_links=extract_links
)

Source: browser_use/agent/service.py

The extractor reports three sizes (HTML โ†’ initial markdown โ†’ filtered markdown) and feeds them into the ai_step prompt via get_ai_step_user_prompt (browser_use/agent/prompts.py). Interactive elements are indexed with [index]<type>text</type> markers so the LLM can reference them by number when emitting click/input actions. The matching schema lives in browser_use/dom/views.py. The agent prompt instructs the model: "Only [indexed] are interactive. Indentation=child. *=new element since last step" (browser_use/agent/system_prompts/system_prompt_flash.md).

Actor (Low-Level CDP API)

The actor package (browser_use/actor/README.md) is a deliberate escape hatch. It bypasses the high-level DOM/markdown pipeline and talks directly to CDP via the existing BrowserSession, which is why README's recommended entry point is from browser_use import Browser. Three primitives matter:

This is the same direct-CDP approach that powers CLI 2.0, described in release 0.12.3 as giving "~50ms command latency via a persistent background daemon" โ€” useful for the "human-in-the-loop" request in community issue #221 and for the "Codex / Claude Code" integration use case in #4895.

Failure Modes and Community Notes

  • Blank Chromium on first step (#1020): usually a profile/launcher mismatch; verify the BrowserProfile and that watchdogs' start() completed before the agent's first step.
  • Restricted profiles (0.12.8): evaluate() is refused when a profile's security flags disallow arbitrary JS โ€” prefer the indexed action surface.
  • Hover-only UI (#4964): there is no dedicated hover action; the workaround is actor.mouse or evaluate() to dispatch a mouseover event, which is exactly the gap the actor layer exposes today.
  • New-tab screenshots (0.12.9): screenshot watchdog now skips blank new-tab pages, so the agent should rely on browser_state rather than expecting a vision payload on the first step of a new tab.

See Also

Source: https://github.com/browser-use/browser-use / Human Manual

LLM Providers, MCP, Cloud, and Integrations

Related topics: Agent Runtime, Tools, and System Prompts, Browser Session, Watchdogs, DOM, and Actor

Section Related Pages

Continue reading this section for the full explanation and source context.

Section Common Provider Pitfalls (Community-Reported)

Continue reading this section for the full explanation and source context.

Related topics: Agent Runtime, Tools, and System Prompts, Browser Session, Watchdogs, DOM, and Actor

LLM Providers, MCP, Cloud, and Integrations

Overview

The browser-use project separates "what to do" (browser automation) from "how to decide what to do" (an LLM). The LLM layer exposes a uniform BaseChatModel interface so the same Agent can be paired with many providers, while integrations and the managed cloud deal with everything that surrounds the model โ€” proxies, browser infrastructure, and third-party services. The officially supported providers, the Browser Use Cloud offering, and the conventions for shipping third-party integrations are described in browser_use/llm/README.md, examples/cloud/README.md, and examples/integrations/README.md.

Officially Supported LLM Providers

The official provider list in browser_use/llm/README.md is intentionally short:

ProviderNotes
OpenAIDefault ChatOpenAI
AnthropicChatAnthropic
GoogleChatGoogle
GroqChatGroq
OllamaLocal models
DeepSeekChatDeepSeek
MistralUses MISTRAL_API_KEY; schema keyword stripping
CerebrasChatCerebras

The README also documents two escape hatches:

  • ChatLiteLLM โ€” the wrapper is preserved, but as of release 0.12.5 the litellm package is no longer a core dependency (removed in response to the supply-chain attack on versions 1.82.7/1.82.8). Users must pip install litellm separately. Source: README.md, release notes for 0.12.5.
  • ChatLangchain โ€” labeled NOT OFFICIALLY SUPPORTED, intended only as a reference adapter for users who want to reuse a LangChain model object. Source: browser_use/llm/README.md.

A worked example of a non-default provider lives at browser_use/llm/oci_raw/README.md, which wraps Oracle Cloud Infrastructure's Generative AI service via the raw oci SDK (no LangChain). It shows the pattern used throughout: construct a Chat<Provider>(model_id=..., temperature=..., max_tokens=...) object and pass it as llm= to the Agent. Source: browser_use/llm/oci_raw/README.md:35-60.

The recommended default is ChatBrowserUse(), which the integrations README explicitly prefers "unless the example is specifically about another model." Source: examples/integrations/README.md.

Common Provider Pitfalls (Community-Reported)

  • OpenRouter docs drift. Issue #4755 reports that the OpenRouter section of the supported-models docs shows imports that do not exist in the package. Treat the in-repo README as the source of truth and verify any snippet you copy from external docs.
  • Azure OpenAI content filters. Issue #4783 describes normal login/navigation prompts being flagged as ResponsibleAIPolicyViolation by Azure's content filter. Mitigations include switching providers for sensitive flows, lowering temperature, or using a different deployment tuned for tool use.
  • Gpt-OSS via Ollama. Issue #2605 shows an EOF while parsing a value validation error from the agent when the local model emits empty content. The fix is usually to pin a model that returns valid JSON or to enable a stricter parser.

Browser Use Cloud and CLI 2.0

The cloud offering is documented in examples/cloud/README.md. The README points to:

Cloud handles "scalable browser infrastructure, memory management, proxy rotation, stealth browser fingerprinting, [and] high-performance parallel execution" โ€” concerns that are painful to run locally. Examples in examples/cloud/ use 30-second timeouts, retries, environment variables for secrets, and domain restrictions for security. Source: examples/cloud/README.md.

Complementing the cloud, Browser Use CLI 2.0 shipped in release 0.12.3. It is "built on direct CDP (Chrome DevTools Protocol) instead of Playwright, giving ~50ms command latency via a persistent background daemon" and is aimed at AI coding agents such as Claude Code and Codex (see also issue #4895, which requests first-class Codex-CLI integration).

Integration Patterns for Third Parties

examples/integrations/README.md codifies where third-party code belongs. The decision tree is:

flowchart TD
    A[New third-party integration] --> B{Is it shipped<br/>as part of browser-use<br/>with tests?}
    B -- Yes --> C[browser_use/integrations/<provider>/]
    B -- No --> D{Is it a small,<br/>runnable example?}
    D -- Yes --> E[examples/integrations/<provider>/]
    D -- No, but provider-agnostic --> F[examples/custom-functions/]
    D -- No, full app --> G[Own repository<br/>+ add to community list]

The README also lists an example checklist: use uv, document env vars and OAuth scopes, never commit secrets, prefer ChatBrowserUse() unless the example is specifically about another model, and include the exact command that runs the example from the repo root. Source: examples/integrations/README.md.

A concrete integration example is examples/apps/news-use/, which wires browser-use to Google Gemini for news monitoring with sentiment analysis. The README there shows the recurring pattern: install with pip install -U browser-use, export the provider API key (GEMINI_API_KEY), and call a small Python entry point (python news_monitor.py --once). Source: examples/apps/news-use/README.md.

Security and Operational Notes

Several recent releases target the LLM/integration surface:

  • 0.12.5 removes litellm from core deps after the supply-chain compromise. Source: 0.12.5 release notes.
  • 0.12.6 sets the default temperature=1.0 for Gemini 3 models and flattens Bedrock structured-output schemas. Source: 0.12.6 release notes.
  • 0.12.7 upgrades aiohttp to 3.13.4 (memory-exhaustion CVE) and tightens CLI security. Source: 0.12.7 release notes.
  • 0.12.8 restricts the daemon's Unix socket to owner-only access and refuses evaluate() on restricted browser profiles. Source: 0.12.8 release notes.
  • 0.12.9 passes the session id to judge LLM calls and skips screenshots on new-tab pages. Source: 0.12.9 release notes.

See Also

Source: https://github.com/browser-use/browser-use / Human Manual

Doramagic Pitfall Log

Source-linked risks stay visible on the manual page so the preview does not read like a recommendation.

high Installation risk requires verification

May increase setup, validation, or first-run risk for the user.

high Installation risk requires verification

May increase setup, validation, or first-run risk for the user.

high Configuration risk requires verification

May increase setup, validation, or first-run risk for the user.

high Capability evidence risk requires verification

May increase setup, validation, or first-run risk for the user.

Doramagic Pitfall Log

Found 28 structured pitfall item(s), including 5 high/blocking item(s). Top priority: Installation risk - Installation risk requires verification.

1. Installation risk: Installation risk requires verification

  • Severity: high
  • Finding: Project evidence flags a installation risk. Review the linked source before relying on this workflow.
  • User impact: May increase setup, validation, or first-run risk for the user.
  • Recommended check: Reproduce the official install and quickstart path in an isolated environment.
  • Evidence: community_evidence:github | https://github.com/browser-use/browser-use/issues/4742

2. Installation risk: Installation risk requires verification

  • Severity: high
  • Finding: Project evidence flags a installation risk. Review the linked source before relying on this workflow.
  • User impact: May increase setup, validation, or first-run risk for the user.
  • Recommended check: Reproduce the official install and quickstart path in an isolated environment.
  • Evidence: community_evidence:github | https://github.com/browser-use/browser-use/issues/4939

3. Configuration risk: Configuration risk requires verification

  • Severity: high
  • Finding: Project evidence flags a configuration risk. Review the linked source before relying on this workflow.
  • User impact: May increase setup, validation, or first-run risk for the user.
  • Recommended check: Reproduce the official install and quickstart path in an isolated environment.
  • Evidence: community_evidence:github | https://github.com/browser-use/browser-use/issues/4783

4. Capability evidence risk: Capability evidence risk requires verification

  • Severity: high
  • Finding: Project evidence flags a capability evidence risk. Review the linked source before relying on this workflow.
  • User impact: May increase setup, validation, or first-run risk for the user.
  • Recommended check: Reproduce the official install and quickstart path in an isolated environment.
  • Evidence: community_evidence:github | https://github.com/browser-use/browser-use/issues/4755

5. Security or permission risk: Security or permission risk requires verification

  • Severity: high
  • Finding: Project evidence flags a security or permission risk. Review the linked source before relying on this workflow.
  • User impact: May increase setup, validation, or first-run risk for the user.
  • Recommended check: Reproduce the official install and quickstart path in an isolated environment.
  • Evidence: community_evidence:github | https://github.com/browser-use/browser-use/issues/4579

6. Installation risk: Installation risk requires verification

  • Severity: medium
  • Finding: Developers should check this installation risk before relying on the project: 0.12.0
  • User impact: Upgrade or migration may change expected behavior: 0.12.0
  • Recommended check: Before packaging this project, run the relevant install/config/quickstart check for: 0.12.0. Context: Source discussion did not expose a precise runtime context.
  • Evidence: failure_mode_cluster:github_release | https://github.com/browser-use/browser-use/releases/tag/0.12.0

7. Installation risk: Installation risk requires verification

  • Severity: medium
  • Finding: Developers should check this installation risk before relying on the project: 0.12.3 - Browser Use CLI 2.0
  • User impact: Upgrade or migration may change expected behavior: 0.12.3 - Browser Use CLI 2.0
  • Recommended check: Before packaging this project, run the relevant install/config/quickstart check for: 0.12.3 - Browser Use CLI 2.0. Context: Observed when using playwright
  • Evidence: failure_mode_cluster:github_release | https://github.com/browser-use/browser-use/releases/tag/0.12.3

8. Installation risk: Installation risk requires verification

  • Severity: medium
  • Finding: Developers should check this installation risk before relying on the project: 0.12.4
  • User impact: Upgrade or migration may change expected behavior: 0.12.4
  • Recommended check: Before packaging this project, run the relevant install/config/quickstart check for: 0.12.4. Context: Observed when using python
  • Evidence: failure_mode_cluster:github_release | https://github.com/browser-use/browser-use/releases/tag/0.12.4

9. Installation risk: Installation risk requires verification

  • Severity: medium
  • Finding: Developers should check this installation risk before relying on the project: 0.12.5
  • User impact: Upgrade or migration may change expected behavior: 0.12.5
  • Recommended check: Before packaging this project, run the relevant install/config/quickstart check for: 0.12.5. Context: Observed when using python
  • Evidence: failure_mode_cluster:github_release | https://github.com/browser-use/browser-use/releases/tag/0.12.5

10. Installation risk: Installation risk requires verification

  • Severity: medium
  • Finding: Developers should check this installation risk before relying on the project: 0.12.6
  • User impact: Upgrade or migration may change expected behavior: 0.12.6
  • Recommended check: Before packaging this project, run the relevant install/config/quickstart check for: 0.12.6. Context: Observed when using windows
  • Evidence: failure_mode_cluster:github_release | https://github.com/browser-use/browser-use/releases/tag/0.12.6

11. Installation risk: Installation risk requires verification

  • Severity: medium
  • Finding: Developers should check this installation risk before relying on the project: Great project! I tried playing Gold Miner via browser-harness in Codex. It can successfully open the webpage and load the game, but fails to determine when to aim the claw and r...
  • User impact: Developers may fail before the first successful local run: Great project! I tried playing Gold Miner via browser-harness in Codex. It can successfully open the webpage and load the game, but fails to determine when to aim the claw and r...
  • Recommended check: Before packaging this project, run the relevant install/config/quickstart check for: Great project! I tried playing Gold Miner via browser-harness in Codex. It can successfully open the webpage and load the game, but fails to determine when to aim the claw and r.... Context: Observed when using python, windows
  • Evidence: failure_mode_cluster:github_issue | https://github.com/browser-use/browser-use/issues/4939

12. Configuration risk: Configuration risk requires verification

  • Severity: medium
  • Finding: Developers should check this configuration risk before relying on the project: 0.12.1
  • User impact: Upgrade or migration may change expected behavior: 0.12.1
  • Recommended check: Before packaging this project, run the relevant install/config/quickstart check for: 0.12.1. Context: Observed when using windows
  • Evidence: failure_mode_cluster:github_release | https://github.com/browser-use/browser-use/releases/tag/0.12.1

Source: Doramagic discovery, validation, and Project Pack records

Community Discussion Evidence

These external discussion links are review inputs, not standalone proof that the project is production-ready.

Sources 12

Count of project-level external discussion links exposed on this manual page.

Use Review before install

Open the linked issues or discussions before treating the pack as ready for your environment.

Community Discussion Evidence

Doramagic exposes project-level community discussion separately from official documentation. Review these links before using browser-use with real data or production workflows.

Source: Project Pack community evidence and pitfall evidence