Doramagic Project Pack ยท Human Manual
browser-use
๐ Make websites accessible for AI agents. Automate tasks online with ease.
Overview, Installation, and CLI
Related topics: Agent Runtime, Tools, and System Prompts, LLM Providers, MCP, Cloud, and Integrations
Continue reading this section for the full explanation and source context.
Continue reading this section for the full explanation and source context.
Continue reading this section for the full explanation and source context.
Continue reading this section for the full explanation and source context.
Related Pages
Related topics: Agent Runtime, Tools, and System Prompts, LLM Providers, MCP, Cloud, and Integrations
Overview, Installation, and CLI
1. What is Browser-Use
Browser-Use is an open-source library that enables AI agents to control a real web browser. The agent operates in an iterative loop: it observes the current page, decides on the next action, executes it through the browser, and repeats until the user's request is fulfilled. The core agent loop is described in the system prompt, which states the agent is "designed to operate in an iterative loop to automate browser tasks" with capabilities spanning navigation, form submission, content extraction, and persistent file-system tracking (system_prompt.md).
The project ships three primary surfaces:
- Python library โ imported via
from browser_use import Agent, Browser, ChatBrowserUsefor programmatic use. - Browser Use CLI 2.0 โ a direct-CDP browser automation daemon optimized for AI coding agents such as Claude Code and Codex.
- Actor API โ a low-level CDP wrapper exposing
BrowserSession,Page,Element, andMouseclasses (actor/README.md).
The library is licensed under MIT, with services and data policy governed by the project's Terms of Service (README.md).
2. Installation
2.1 Standard pip install
The canonical installation is pip install browser-use. The library is published on PyPI and pulls in core dependencies required for agent operation, including async HTTP, Pydantic, and CDP bindings (README.md).
2.2 Optional dependencies
Several integrations are kept out of the core install to avoid supply-chain risk and to keep the package lightweight:
| Optional package | Purpose | Notes |
|---|---|---|
litellm | Multi-provider LLM routing via ChatLiteLLM | Removed from core deps in v0.12.5 due to a supply-chain incident; install separately with pip install litellm if needed |
| LLM provider SDKs | OpenAI, Anthropic, Google Gemini, AWS Bedrock, Azure OpenAI | Installed per-provider as required |
uv | Fast Python package manager | Detected and used by install scripts; a fix in v0.12.4 added detection for curl-installed uv |
The v0.12.5 release notes explicitly call out: "pip install browser-use no longer installs litellm โฆ ChatLiteLLM wrapper is preserved โ install litellm separately if needed" (README.md).
2.3 Versioning and upgrade
The project follows 0.12.x semver with frequent point releases. Recent releases and their themes:
- 0.12.3 โ Browser Use CLI 2.0 introduced; built on direct CDP for ~50ms command latency.
- 0.12.4 โ Pinned
litellmversion after a CVE inaiohttpwas patched. - 0.12.5 โ Removed
litellmfrom core deps; raised aiohttp to 3.13.4 to patch a memory-exhaustion vulnerability. - 0.12.6 โ Default temperature set to 1.0 for Gemini-3 models; Bedrock structured-output fix.
- 0.12.7 โ Major CLI refactor plus security fixes.
- 0.12.8 โ Daemon unix socket restricted to owner-only;
evaluate()refused on restricted browser profiles. - 0.12.9 โ Session id passed to judge LLM calls; new-tab pages skip screenshots.
Always upgrade with pip install -U browser-use to inherit the latest security and stability fixes (examples/apps/news-use/README.md).
3. Browser Use CLI 2.0
The CLI 2.0 launch (v0.12.3) is positioned as "the fastest browser automation for AI coding agents", claiming 2x faster execution and 50% fewer tokens relative to the previous Playwright-backed pipeline. The architectural shift is from Playwright orchestration to direct Chrome DevTools Protocol (CDP) communication with a persistent background daemon.
flowchart LR
A[CLI command] --> B[CLI 2.0 client]
B -->|IPC / Unix socket| C[Persistent daemon]
C -->|CDP over WebSocket| D[Chromium browser]
C --> E[Session state and history]
D --> F[Target page]Key design points:
- Persistent background daemon โ eliminates browser startup overhead between commands, giving the ~50ms command latency advertised in the release notes.
- Owner-only unix socket โ added in v0.12.8 to prevent local privilege escalation through the daemon socket.
- Codex / Claude Code integration โ the CLI is the recommended interface when wiring browser-use into AI coding agents. A community request to support
codex-cliwithout an API key (#4895) tracks ongoing work in this area. - Security profile โ the CLI refuses
evaluate()calls on restricted browser profiles (v0.12.8) to prevent arbitrary JS execution in locked-down contexts.
4. Configuration and Authentication
4.1 LLM credentials
Most users set an LLM API key in a .env file or export it in the shell. The README pattern is:
export OPENAI_API_KEY='sk-...'
# or
export GEMINI_API_KEY='your-google-api-key-here'
# or
export ANTHROPIC_API_KEY='...'
The README's Tools example shows the canonical way to plug a custom function into the agent (README.md).
4.2 Real browser profiles
For tasks requiring existing logins, the README points users to examples/browser/real_browser.py, which reuses an existing Chrome profile with saved credentials. Remote profile sync is documented via a curl snippet in the README (README.md).
4.3 Agent settings
The Agent constructor exposes a large set of options surfaced through AgentSettings. Relevant fields visible in service.py include llm_timeout, step_timeout, final_response_after_failure, use_judge, ground_truth, enable_planning, planning_replan_on_stall, planning_exploration_limit, loop_detection_window, loop_detection_enabled, message_compaction, and max_clickable_elements_length. These are passed straight through to the running agent and control how the iterative loop behaves (browser_use/agent/service.py).
Message compaction is configured through MessageCompactionSettings defined in views.py, with fields such as enabled, compact_every_n_steps, trigger_char_count, trigger_token_count, keep_last_items, and summary_max_chars (browser_use/agent/views.py).
5. Common Setup Pitfalls
Several recurring community issues map directly to installation and configuration:
- Blank Chromium at step 1 (#1020) โ most often a missing API key or an unsupported model. Verify the LLM credentials resolve and the model name matches a documented import.
- Wrong model import in docs (#4755) โ some snippets in the "Supported Models" page reference classes that have been renamed or moved. Always check
browser_use/llm/for the current module path. - Azure OpenAI false content-filter blocks (#4783) โ Azure's Responsible-AI policy can flag normal navigation prompts; the workaround documented in the thread is to disable the content filter or switch providers.
- Ollama structured-output failures (#2605) โ some local models return empty strings, which fail Pydantic JSON validation. The agent retries, but the loop can stall.
For deterministic web-game or canvas-based tasks, the agent currently relies on evaluate() to inject JavaScript; a feature request for a dedicated hover action (#4964) would close a gap for CSS-hover-driven UI patterns.
See Also
- Agent Service Internals โ details on the iterative loop and message management
- System Prompts โ model-specific prompt templates and reasoning rules
- LLM Providers โ supported model integrations and credentials
- Examples and Integrations โ runnable apps such as
news-use
Source: https://github.com/browser-use/browser-use / Human Manual
Agent Runtime, Tools, and System Prompts
Related topics: Overview, Installation, and CLI, Browser Session, Watchdogs, DOM, and Actor, LLM Providers, MCP, Cloud, and Integrations
Continue reading this section for the full explanation and source context.
Related Pages
Related topics: Overview, Installation, and CLI, Browser Session, Watchdogs, DOM, and Actor, LLM Providers, MCP, Cloud, and Integrations
Agent Runtime, Tools, and System Prompts
1. Overview and Purpose
The agent runtime is the orchestrating layer of browser-use that drives an LLM through an iterative loop, observes the browser state, and emits structured tool calls until the user's task is satisfied. The runtime is built around three tightly coupled subsystems:
- Agent service โ the loop, history, and orchestration core (browser_use/agent/service.py).
- System prompts โ model-specific instructions that govern how the LLM should reason, plan, and act (browser_use/agent/system_prompts/).
- Tools / actions โ the callable surface (
click,input,navigate,extract, customtools.action, etc.) that the LLM invokes through the agent.
Together they implement the "perception โ reasoning โ action โ verification" pattern that the README positions as the central abstraction for AI-driven browser automation (README.md).
The design intentionally keeps these subsystems modular: prompt templates can be swapped per model family, tools can be extended at runtime, and the underlying browser transport can be Playwright or direct CDP (as introduced in CLI 2.0, see the 0.12.3 release notes referenced in the community context).
2. Agent Service: The Runtime Loop
The Agent class in service.py wires together the LLM, the browser session, the action registry, and an AgentHistoryList. The constructor accepts a long list of tunables โ llm_timeout, step_timeout, use_judge, ground_truth, enable_planning, planning_replan_on_stall, planning_exploration_limit, loop_detection_window, loop_detection_enabled, message_compaction, and max_clickable_elements_length โ and propagates them to internal subsystems such as the TokenCost service and the message-compaction controller (browser_use/agent/service.py).
The runtime also exposes a separate ai_step helper that lets you call a one-off "ask the LLM about the current page" operation, optionally with a screenshot, by extracting clean markdown via extract_clean_markdown and feeding it through get_ai_step_user_prompt (browser_use/agent/service.py, browser_use/agent/prompts.py).
A judge LLM (controlled by use_judge and ground_truth) can validate the final result, and a dedicated page_extraction_llm is registered independently for extract operations. Multiple LLMs are tracked in a single TokenCost service so cost reporting remains consistent across them.
The history model โ AgentHistoryList โ stores per-step entries with timing metadata and supports serialization via save_to_file, making it possible to persist or post-process runs (browser_use/agent/views.py).
3. System Prompts: Model-Specific Reasoning Templates
The agent ships multiple prompt variants under browser_use/agent/system_prompts/, each tuned for a different reasoning regime. The standard system_prompt.md is the most verbose: it requires an explicit thinking block, supplies <todo_examples>, <evaluation_examples>, and <memory_examples> blocks, and frames the agent as an iterative loop driven by <user_request>, <agent_history>, <agent_state>, <browser_state>, <browser_vision>, and one-shot <read_state> (browser_use/agent/system_prompts/system_prompt.md).
system_prompt_no_thinking.md removes the explicit chain-of-thought preamble so it can be used with models that prefer or require hidden reasoning (browser_use/agent/system_prompts/system_prompt_no_thinking.md).
system_prompt_anthropic_flash.md and system_prompt_flash_anthropic.md are trimmed, "flash" variants optimized for Anthropic-style tool use: they replace XML blocks with shorter natural-language rule lists, treat screenshots as ground truth, and constrain actions to a AgentOutput tool schema (browser_use/agent/system_prompts/system_prompt_anthropic_flash.md, browser_use/agent/system_prompts/system_prompt_flash_anthropic.md).
All variants share a common contract: the LLM must produce a memory, an evaluation_previous_goal, a next_goal, and a list of actions โ typically between 1 and max_actions per step โ and must verify outcomes against the screenshot before proceeding.
flowchart TD
A[User Task] --> B[System Prompt + State]
B --> C[LLM Call]
C --> D{Parse AgentOutput}
D -->|valid| E[Execute Actions<br/>click/input/extract/...]
E --> F[Observe Browser State]
F --> G[Judge LLM + TokenCost]
G -->|task done| H[Final Result + GIF]
G -->|continue| B
D -->|invalid| I[Recovery / Loop Detector]
I --> B4. Tools, Actions, and Custom Extensions
The action surface documented in the standard prompt is fixed for browser control: navigate, click, input, scroll, wait, extract, screenshot, switch_tab, go_back, done, write_file, read_file, and replace_file_str (browser_use/agent/system_prompts/system_prompt.md). extract is governed by a dedicated extraction prompt that instructs the LLM to ground answers strictly in the supplied markdown and to avoid hallucination (browser_use/agent/prompts.py).
Custom tools are added through the Tools registry, as shown in the README:
from browser_use import Tools
tools = Tools()
@tools.action(description='Description of what this tool does.')
def custom_tool(param: str) -> str:
return f"Result: {param}"
agent = Agent(task="Your task", llm=llm, browser=browser, tools=tools)
Source: README.md
Community members have asked for additional first-class actions such as hover for CSS :hover interactions (issue #4964) and for the ability to drive browser-use from CLI agents like codex-cli without an API key (issue #4895). These requests reflect an active effort to broaden the action surface beyond the default registry. As of 0.12.8, evaluate() is also refused on restricted browser profiles, and daemon UNIX sockets are owner-only โ security hardening that lives alongside the tools layer (release notes for 0.12.8).
5. Observability and Outputs
Every run produces an AgentHistoryList that can be saved to disk or replayed visually via create_history_gif, which overlays task text, per-step goals, and the rendered browser screenshots into a single GIF (browser_use/agent/gif.py, browser_use/agent/views.py). This makes the runtime suitable for both production telemetry and for debugging visual-task failures like the "blank chromium page" reports seen in issue #1020 and the Gold Miner play-through in issue #4939, where the agent loaded a page but failed to identify the right interactive elements.
See Also
- Browser Session and DOM Layer โ covers the underlying browser transport used by the agent.
- LLM Providers and Model Configuration โ model import paths and provider-specific quirks (see also issue #4755 on stale import examples).
- Cloud Events and Telemetry โ
browser_use/agent/cloud_events.pyintegration with the Browser Use cloud.
Research document (citation source reference)
(no reference document available)
Source: https://github.com/browser-use/browser-use / Human Manual
Browser Session, Watchdogs, DOM, and Actor
Related topics: Agent Runtime, Tools, and System Prompts, LLM Providers, MCP, Cloud, and Integrations
Continue reading this section for the full explanation and source context.
Related Pages
Related topics: Agent Runtime, Tools, and System Prompts, LLM Providers, MCP, Cloud, and Integrations
Browser Session, Watchdogs, DOM, and Actor
The browser automation stack in browser-use is layered into four cooperating subsystems: a Browser Session that owns the browser lifecycle, a set of Watchdogs that keep that session healthy, a DOM service that converts the live page into an LLM-friendly representation, and a low-level Actor that exposes raw Chrome DevTools Protocol (CDP) primitives for advanced users. Together they separate "control the browser" from "drive the browser" and let the agent loop reason about a stable, structured view of the page.
High-Level Architecture
flowchart TB
Agent[Agent Loop / service.py] -->|events| BS[Browser Session<br/>session.py]
BS -->|owns| WD[Watchdogs<br/>watchdog_base.py]
BS -->|owns| CDPCDP[CDP Connection]
CDPCDP -->|talks to| Chrome[(Chromium)]
Agent -->|extract| DOM[DOM Service<br/>markdown_extractor.py]
DOM -->|reads| BS
Actor[Actor API<br/>actor/page.py] -.->|wraps| BS
Profile[BrowserProfile<br/>profile.py] -->|configures| BS
Events[Event Bus<br/>events.py] <-->|publishes| BS
Events <-->|subscribes| WDBrowser Session
The BrowserSession class (browser_use/browser/session.py) is the long-lived owner of a Chromium instance. It is exposed in the public API under the alias Browser (browser_use/actor/README.md). The session is responsible for:
- Lifecycle:
start()launches the browser process,close()tears it down. ASessionManager(browser_use/browser/session_manager.py) coordinates multiple sessions, especially in the cloud deployment referenced in the README ("Scalable browser infrastructure / Memory management / Proxy rotation / Stealth browser fingerprinting / High-performance parallel execution"). - Configuration: A
BrowserProfile(browser_use/browser/profile.py) captures persistent settings (user data dir, headless mode, proxy, allowed domains, security flags). - Event bus: Navigation, dialogs, downloads, and tab changes are modeled as typed events in browser_use/browser/events.py and consumed by the watchdogs.
- State: The session exposes structured
BrowserStatesnapshots via browser_use/browser/views.py (URL, tabs, interactive elements, page content).
The agent consumes this state on every step: agent_history, agent_state, and browser_state are injected into the model prompt per the system prompt in browser_use/agent/system_prompts/system_prompt.md.
Watchdogs
Watchdogs are background coroutines attached to a session that react to events on the bus. The base class in browser_use/browser/watchdog_base.py standardizes subscription and teardown. Typical responsibilities include:
| Watchdog | Responsibility |
|---|---|
| DOM watchdog | Triggered on navigation completion; asks the DOM service to re-extract the page. |
| Security watchdog | Enforces profile-level restrictions; per release 0.12.8 it can "refuse evaluate() on restricted browser profiles" (see release notes). |
| Downloads watchdog | Auto-saves PDFs into available_file_paths so the agent can read_file them, as documented in system_prompt.md. |
| Screenshots watchdog | Captures bounded-box screenshots; 0.12.9 adds "skip screenshots on new tab pages" to avoid wasted tokens. |
| Dialog/permissions watchdog | Dismisses JS dialogs, handles cookie banners before user actions, per the error_recovery section of browser_use/agent/system_prompts/system_prompt_anthropic_flash.md. |
Watchdogs never run user code directly; they only mutate session state or publish new events, which keeps the agent loop deterministic.
DOM Service
Before each step the agent needs a token-efficient, structured view of the page. The DOM service in browser_use/dom/markdown_extractor.py is invoked from the agent service:
from browser_use.dom.markdown_extractor import extract_clean_markdown
content, content_stats = await extract_clean_markdown(
browser_session=self.browser_session, extract_links=extract_links
)
Source: browser_use/agent/service.py
The extractor reports three sizes (HTML โ initial markdown โ filtered markdown) and feeds them into the ai_step prompt via get_ai_step_user_prompt (browser_use/agent/prompts.py). Interactive elements are indexed with [index]<type>text</type> markers so the LLM can reference them by number when emitting click/input actions. The matching schema lives in browser_use/dom/views.py. The agent prompt instructs the model: "Only [indexed] are interactive. Indentation=child. *=new element since last step" (browser_use/agent/system_prompts/system_prompt_flash.md).
Actor (Low-Level CDP API)
The actor package (browser_use/actor/README.md) is a deliberate escape hatch. It bypasses the high-level DOM/markdown pipeline and talks directly to CDP via the existing BrowserSession, which is why README's recommended entry point is from browser_use import Browser. Three primitives matter:
- Page (browser_use/actor/page.py) โ tabs, navigation, history, screenshot, and element lookup by CSS selector.
- Element (browser_use/actor/element.py) โ backend-node-id lookups and an AI-assisted
get_element_by_prompt(...)for cases where the indexed DOM tree is insufficient. - Mouse (browser_use/actor/mouse.py) โ low-level pointer control.
This is the same direct-CDP approach that powers CLI 2.0, described in release 0.12.3 as giving "~50ms command latency via a persistent background daemon" โ useful for the "human-in-the-loop" request in community issue #221 and for the "Codex / Claude Code" integration use case in #4895.
Failure Modes and Community Notes
- Blank Chromium on first step (#1020): usually a profile/launcher mismatch; verify the
BrowserProfileand that watchdogs'start()completed before the agent's first step. - Restricted profiles (
0.12.8):evaluate()is refused when a profile's security flags disallow arbitrary JS โ prefer the indexed action surface. - Hover-only UI (#4964): there is no dedicated
hoveraction; the workaround isactor.mouseorevaluate()to dispatch a mouseover event, which is exactly the gap the actor layer exposes today. - New-tab screenshots (
0.12.9): screenshot watchdog now skips blank new-tab pages, so the agent should rely onbrowser_staterather than expecting a vision payload on the first step of a new tab.
See Also
Source: https://github.com/browser-use/browser-use / Human Manual
LLM Providers, MCP, Cloud, and Integrations
Related topics: Agent Runtime, Tools, and System Prompts, Browser Session, Watchdogs, DOM, and Actor
Continue reading this section for the full explanation and source context.
Continue reading this section for the full explanation and source context.
Related Pages
Related topics: Agent Runtime, Tools, and System Prompts, Browser Session, Watchdogs, DOM, and Actor
LLM Providers, MCP, Cloud, and Integrations
Overview
The browser-use project separates "what to do" (browser automation) from "how to decide what to do" (an LLM). The LLM layer exposes a uniform BaseChatModel interface so the same Agent can be paired with many providers, while integrations and the managed cloud deal with everything that surrounds the model โ proxies, browser infrastructure, and third-party services. The officially supported providers, the Browser Use Cloud offering, and the conventions for shipping third-party integrations are described in browser_use/llm/README.md, examples/cloud/README.md, and examples/integrations/README.md.
Officially Supported LLM Providers
The official provider list in browser_use/llm/README.md is intentionally short:
| Provider | Notes |
|---|---|
| OpenAI | Default ChatOpenAI |
| Anthropic | ChatAnthropic |
ChatGoogle | |
| Groq | ChatGroq |
| Ollama | Local models |
| DeepSeek | ChatDeepSeek |
| Mistral | Uses MISTRAL_API_KEY; schema keyword stripping |
| Cerebras | ChatCerebras |
The README also documents two escape hatches:
ChatLiteLLMโ the wrapper is preserved, but as of release 0.12.5 thelitellmpackage is no longer a core dependency (removed in response to the supply-chain attack on versions 1.82.7/1.82.8). Users mustpip install litellmseparately. Source: README.md, release notes for 0.12.5.ChatLangchainโ labeled NOT OFFICIALLY SUPPORTED, intended only as a reference adapter for users who want to reuse a LangChain model object. Source: browser_use/llm/README.md.
A worked example of a non-default provider lives at browser_use/llm/oci_raw/README.md, which wraps Oracle Cloud Infrastructure's Generative AI service via the raw oci SDK (no LangChain). It shows the pattern used throughout: construct a Chat<Provider>(model_id=..., temperature=..., max_tokens=...) object and pass it as llm= to the Agent. Source: browser_use/llm/oci_raw/README.md:35-60.
The recommended default is ChatBrowserUse(), which the integrations README explicitly prefers "unless the example is specifically about another model." Source: examples/integrations/README.md.
Common Provider Pitfalls (Community-Reported)
- OpenRouter docs drift. Issue #4755 reports that the OpenRouter section of the supported-models docs shows imports that do not exist in the package. Treat the in-repo README as the source of truth and verify any snippet you copy from external docs.
- Azure OpenAI content filters. Issue #4783 describes normal login/navigation prompts being flagged as
ResponsibleAIPolicyViolationby Azure's content filter. Mitigations include switching providers for sensitive flows, loweringtemperature, or using a different deployment tuned for tool use. - Gpt-OSS via Ollama. Issue #2605 shows an
EOF while parsing a valuevalidation error from the agent when the local model emits empty content. The fix is usually to pin a model that returns valid JSON or to enable a stricter parser.
Browser Use Cloud and CLI 2.0
The cloud offering is documented in examples/cloud/README.md. The README points to:
Cloud handles "scalable browser infrastructure, memory management, proxy rotation, stealth browser fingerprinting, [and] high-performance parallel execution" โ concerns that are painful to run locally. Examples in examples/cloud/ use 30-second timeouts, retries, environment variables for secrets, and domain restrictions for security. Source: examples/cloud/README.md.
Complementing the cloud, Browser Use CLI 2.0 shipped in release 0.12.3. It is "built on direct CDP (Chrome DevTools Protocol) instead of Playwright, giving ~50ms command latency via a persistent background daemon" and is aimed at AI coding agents such as Claude Code and Codex (see also issue #4895, which requests first-class Codex-CLI integration).
Integration Patterns for Third Parties
examples/integrations/README.md codifies where third-party code belongs. The decision tree is:
flowchart TD
A[New third-party integration] --> B{Is it shipped<br/>as part of browser-use<br/>with tests?}
B -- Yes --> C[browser_use/integrations/<provider>/]
B -- No --> D{Is it a small,<br/>runnable example?}
D -- Yes --> E[examples/integrations/<provider>/]
D -- No, but provider-agnostic --> F[examples/custom-functions/]
D -- No, full app --> G[Own repository<br/>+ add to community list]The README also lists an example checklist: use uv, document env vars and OAuth scopes, never commit secrets, prefer ChatBrowserUse() unless the example is specifically about another model, and include the exact command that runs the example from the repo root. Source: examples/integrations/README.md.
A concrete integration example is examples/apps/news-use/, which wires browser-use to Google Gemini for news monitoring with sentiment analysis. The README there shows the recurring pattern: install with pip install -U browser-use, export the provider API key (GEMINI_API_KEY), and call a small Python entry point (python news_monitor.py --once). Source: examples/apps/news-use/README.md.
Security and Operational Notes
Several recent releases target the LLM/integration surface:
- 0.12.5 removes
litellmfrom core deps after the supply-chain compromise. Source: 0.12.5 release notes. - 0.12.6 sets the default
temperature=1.0for Gemini 3 models and flattens Bedrock structured-output schemas. Source: 0.12.6 release notes. - 0.12.7 upgrades
aiohttpto 3.13.4 (memory-exhaustion CVE) and tightens CLI security. Source: 0.12.7 release notes. - 0.12.8 restricts the daemon's Unix socket to owner-only access and refuses
evaluate()on restricted browser profiles. Source: 0.12.8 release notes. - 0.12.9 passes the session id to judge LLM calls and skips screenshots on new-tab pages. Source: 0.12.9 release notes.
See Also
- Agent Service and Configuration โ how
llmandjudge_llmare wired intoAgent(browser_use/agent/service.py). - System Prompts and Planning โ the prompts that govern how the LLM is queried (browser_use/agent/system_prompts/).
- Browser Use Cloud API Reference โ managed browser infrastructure.
- Supported Models (official) โ always cross-check against the in-repo browser_use/llm/README.md.
Source: https://github.com/browser-use/browser-use / Human Manual
Doramagic Pitfall Log
Source-linked risks stay visible on the manual page so the preview does not read like a recommendation.
May increase setup, validation, or first-run risk for the user.
May increase setup, validation, or first-run risk for the user.
May increase setup, validation, or first-run risk for the user.
May increase setup, validation, or first-run risk for the user.
Doramagic Pitfall Log
Found 28 structured pitfall item(s), including 5 high/blocking item(s). Top priority: Installation risk - Installation risk requires verification.
1. Installation risk: Installation risk requires verification
- Severity: high
- Finding: Project evidence flags a installation risk. Review the linked source before relying on this workflow.
- User impact: May increase setup, validation, or first-run risk for the user.
- Recommended check: Reproduce the official install and quickstart path in an isolated environment.
- Evidence: community_evidence:github | https://github.com/browser-use/browser-use/issues/4742
2. Installation risk: Installation risk requires verification
- Severity: high
- Finding: Project evidence flags a installation risk. Review the linked source before relying on this workflow.
- User impact: May increase setup, validation, or first-run risk for the user.
- Recommended check: Reproduce the official install and quickstart path in an isolated environment.
- Evidence: community_evidence:github | https://github.com/browser-use/browser-use/issues/4939
3. Configuration risk: Configuration risk requires verification
- Severity: high
- Finding: Project evidence flags a configuration risk. Review the linked source before relying on this workflow.
- User impact: May increase setup, validation, or first-run risk for the user.
- Recommended check: Reproduce the official install and quickstart path in an isolated environment.
- Evidence: community_evidence:github | https://github.com/browser-use/browser-use/issues/4783
4. Capability evidence risk: Capability evidence risk requires verification
- Severity: high
- Finding: Project evidence flags a capability evidence risk. Review the linked source before relying on this workflow.
- User impact: May increase setup, validation, or first-run risk for the user.
- Recommended check: Reproduce the official install and quickstart path in an isolated environment.
- Evidence: community_evidence:github | https://github.com/browser-use/browser-use/issues/4755
5. Security or permission risk: Security or permission risk requires verification
- Severity: high
- Finding: Project evidence flags a security or permission risk. Review the linked source before relying on this workflow.
- User impact: May increase setup, validation, or first-run risk for the user.
- Recommended check: Reproduce the official install and quickstart path in an isolated environment.
- Evidence: community_evidence:github | https://github.com/browser-use/browser-use/issues/4579
6. Installation risk: Installation risk requires verification
- Severity: medium
- Finding: Developers should check this installation risk before relying on the project: 0.12.0
- User impact: Upgrade or migration may change expected behavior: 0.12.0
- Recommended check: Before packaging this project, run the relevant install/config/quickstart check for: 0.12.0. Context: Source discussion did not expose a precise runtime context.
- Evidence: failure_mode_cluster:github_release | https://github.com/browser-use/browser-use/releases/tag/0.12.0
7. Installation risk: Installation risk requires verification
- Severity: medium
- Finding: Developers should check this installation risk before relying on the project: 0.12.3 - Browser Use CLI 2.0
- User impact: Upgrade or migration may change expected behavior: 0.12.3 - Browser Use CLI 2.0
- Recommended check: Before packaging this project, run the relevant install/config/quickstart check for: 0.12.3 - Browser Use CLI 2.0. Context: Observed when using playwright
- Evidence: failure_mode_cluster:github_release | https://github.com/browser-use/browser-use/releases/tag/0.12.3
8. Installation risk: Installation risk requires verification
- Severity: medium
- Finding: Developers should check this installation risk before relying on the project: 0.12.4
- User impact: Upgrade or migration may change expected behavior: 0.12.4
- Recommended check: Before packaging this project, run the relevant install/config/quickstart check for: 0.12.4. Context: Observed when using python
- Evidence: failure_mode_cluster:github_release | https://github.com/browser-use/browser-use/releases/tag/0.12.4
9. Installation risk: Installation risk requires verification
- Severity: medium
- Finding: Developers should check this installation risk before relying on the project: 0.12.5
- User impact: Upgrade or migration may change expected behavior: 0.12.5
- Recommended check: Before packaging this project, run the relevant install/config/quickstart check for: 0.12.5. Context: Observed when using python
- Evidence: failure_mode_cluster:github_release | https://github.com/browser-use/browser-use/releases/tag/0.12.5
10. Installation risk: Installation risk requires verification
- Severity: medium
- Finding: Developers should check this installation risk before relying on the project: 0.12.6
- User impact: Upgrade or migration may change expected behavior: 0.12.6
- Recommended check: Before packaging this project, run the relevant install/config/quickstart check for: 0.12.6. Context: Observed when using windows
- Evidence: failure_mode_cluster:github_release | https://github.com/browser-use/browser-use/releases/tag/0.12.6
11. Installation risk: Installation risk requires verification
- Severity: medium
- Finding: Developers should check this installation risk before relying on the project: Great project! I tried playing Gold Miner via browser-harness in Codex. It can successfully open the webpage and load the game, but fails to determine when to aim the claw and r...
- User impact: Developers may fail before the first successful local run: Great project! I tried playing Gold Miner via browser-harness in Codex. It can successfully open the webpage and load the game, but fails to determine when to aim the claw and r...
- Recommended check: Before packaging this project, run the relevant install/config/quickstart check for: Great project! I tried playing Gold Miner via browser-harness in Codex. It can successfully open the webpage and load the game, but fails to determine when to aim the claw and r.... Context: Observed when using python, windows
- Evidence: failure_mode_cluster:github_issue | https://github.com/browser-use/browser-use/issues/4939
12. Configuration risk: Configuration risk requires verification
- Severity: medium
- Finding: Developers should check this configuration risk before relying on the project: 0.12.1
- User impact: Upgrade or migration may change expected behavior: 0.12.1
- Recommended check: Before packaging this project, run the relevant install/config/quickstart check for: 0.12.1. Context: Observed when using windows
- Evidence: failure_mode_cluster:github_release | https://github.com/browser-use/browser-use/releases/tag/0.12.1
Source: Doramagic discovery, validation, and Project Pack records
Community Discussion Evidence
These external discussion links are review inputs, not standalone proof that the project is production-ready.
Count of project-level external discussion links exposed on this manual page.
Open the linked issues or discussions before treating the pack as ready for your environment.
Community Discussion Evidence
Doramagic exposes project-level community discussion separately from official documentation. Review these links before using browser-use with real data or production workflows.
- Bug: CDP connection instability causing indefinite hangs with remote bro - github / github_issue
- Bug: Screenshot blob in tool result poisons conversation context โ API 4 - github / github_issue
- Documentation: some model import does not exist at all. - github / github_issue
- Bug: ...Azure OpenAI false content_filter / ResponsibleAIPolicyViolation - github / github_issue
- Great project! I tried playing Gold Miner via browser-harness in Codex. - github / github_issue
- Feature Request: Add hover action for triggering CSS :hover dropdowns, t - github / github_issue
- Feature Request: ... - github / github_issue
- 0.13.2 - github / github_release
- 0.13.1 - github / github_release
- [0.13.0 - Rebuilt in Rust [beta]](https://github.com/browser-use/browser-use/releases/tag/0.13.0) - github / github_release
- 0.12.9 - github / github_release
- 0.12.8 - github / github_release
Source: Project Pack community evidence and pitfall evidence