Doramagic Project Pack · Human Manual
browser-agent-driver
LLM-driven browser automation with wallet extension testing. Accessibility tree + optional vision.
Overview & Core Agent Architecture
Related topics: Stealth, Anti-Bot & CAPTCHA (v0.23.0 / Gen 27), Design Audit & Auto-Fix, Benchmarking, Evaluation, Memory & Wallet
Continue reading this section for the full explanation and source context.
Related Pages
Related topics: Stealth, Anti-Bot & CAPTCHA (v0.23.0 / Gen 27), Design Audit & Auto-Fix, Benchmarking, Evaluation, Memory & Wallet
Overview & Core Agent Architecture
Purpose and Scope
@tangle-network/browser-agent-driver (binary: bad) is a general-purpose agentic browser automation system that completes real user outcomes on arbitrary websites: search, extraction, form filling, price comparison, and complex UI navigation. The package is published as a dual CLI and library; the CLI binary bad is declared in package.json:8-10, and the library entry point is exposed through the package exports field. According to the README headline metrics, the system reaches 91.3% on WebVoyager (590 tasks across 15 sites) at $0.09 per task, with the default model being gpt-5.4.
The scope spans three primary use modes surfaced in README.md:
- A one-shot CLI (
bad run --goal "...") for ad-hoc automation. - A programmatic SDK (
new BrowserAgent({ driver, config })) for application integration. - A benchmark harness under
bench/for CI-grade regression and competitive evaluation.
Architecture Overview
The agent loop is decoupled from the browser through a Driver interface, allowing the same decision engine to run against a local Playwright Chromium, a Steel cloud browser, or any other conforming implementation. The diagram below summarizes how the CLI, Brain, Driver, and reporting layer interact for a typical bad run invocation.
flowchart LR CLI[bad CLI<br/>run.ts] --> Brain[Brain<br/>decision engine] Brain -->|generate| ModelProvider[(Model Provider<br/>OpenAI / Anthropic /<br/>sandbox-backend)] Brain -->|decide / verify / scout| Driver[Driver<br/>Playwright or Steel] Driver --> Browser[(Chromium / Cloud)] CLI --> Reports[Reporters<br/>json / md / html / junit] CLI --> Renderer[Live Renderer<br/>stdout / TUI]
The CLI entry point in src/cli/commands/run.ts handles argument parsing, reporter fan-out, stream webhooks, and clean shutdown of browser, persistent context, and single-driver resources. A secondary command, bad showcase, delegates to a capture-and-evaluate pipeline via the thin wrapper in src/cli/commands/showcase.ts, which simply forwards CLI flags to the internal handleShowcase implementation.
Brain Decision Engine
The Brain is the LLM-driven core that observes page state and emits actions. To keep the file maintainable as decision subtasks grow, the Brain uses a delegate + host-interface pattern: each task lives in its own module under src/brain/tasks/, and the Brain class implements a small host interface that exposes only the slice of state the task needs. This makes missing or mistyped members a compile-time error.
The tasks observed in the source tree include:
| Task module | Responsibility |
|---|---|
| goal-verification.ts | Judge whether the agent's claimed result actually achieved the goal on the current page. Supports a dedicated verifier model and falls back to the navigation model under adaptive routing. |
| link-scout.ts | Recommend the single best next visible link from a scored candidate list, using only the top 5 to save 2–8k tokens. |
| knowledge.ts | Distill a completed trajectory into reusable timing/selector/pattern/quirk facts for future runs. |
| design-audit.ts | Vision-based layout, typography, spacing, contrast, and UX analysis returning structured findings with severities. |
| evaluate.ts | Rate the visual quality and professional polish of the current page via a dedicated EVALUATE_PROMPT. |
Each host interface declares only the slice of Brain state the task reads — typically provider, navProvider, navModelName, buildUserContent, and generate — keeping the dependency surface explicit and testable.
Driver Layer and Model Routing
The Driver abstraction decouples the agent loop from the browser transport. The default is a local Playwright driver, while a Steel driver is used for anti-bot, residential proxies, and CAPTCHA solve-as-a-service. Because the Brain only knows the Driver interface, alternative implementations can be substituted without touching decision logic.
Models are configurable per role. The README documents a models map covering planner, executor, verifier, and supervisor, letting cheaper models handle navigation while a stronger model supervises or audits. The goal verifier in src/brain/tasks/goal-verification.ts demonstrates this: if verifierProvider is unset, it falls back to navProvider when adaptiveModelRouting is enabled, otherwise it reuses the main provider. Link scouting in src/brain/tasks/link-scout.ts follows the same precedence: scoutProvider → navProvider → provider.
For sandboxed execution, src/providers/sandbox-backend.ts builds a transcript from ModelMessage[], serializes text and image attachments, and infers the backend type from the model name (claude/sonnet/opus/haiku → claude-code, gpt/o1/o3/o4/codex → codex), throwing if inference fails.
Reporting, Scenarios, and Failure Modes
After every run, src/cli/commands/run.ts writes report files for each requested format (json, markdown, html, junit) under the configured report directory, renders them through the live view, and finally — in JSON mode — echoes the structured result to stdout. Cleanup runs in a finally block that detaches the interrupt controller, flushes the webhook streamer, and closes driver, persistent context, and browser resources regardless of success.
The benchmark tracks documented in bench/scenarios/README.md split tasks into local-deterministic, staging-auth, public-web, webbench, and restricted-manual to keep flaky internet and policy-sensitive flows (captchas, third-party account provisioning) out of CI reliability metrics. The competitive harness in bench/competitive/README.md targets comparability against browser-use, Stagehand, Skyvern, and Computer Use — measuring cost-per-task alongside success rate.
Common failure modes worth understanding:
- Provider inference failure: the sandbox backend throws if model name matches neither Claude nor GPT/Codex patterns.
- Reporter errors are swallowed: report generation is best-effort, so missing templates will not abort the run.
- Cleanup throws are caught:
close()calls use.catch(() => {}), so resource leaks may be silent. - Stealth regressions: per the v0.23.0 release notes (Gen 27), previously-blocked sites now require System Chrome, Patchright, and Bezier mouse humanization to be active.
See Also
- Stealth & Anti-Bot Configuration
- Brain Decision Tasks
- Driver Implementations
- Benchmark Suites
- CLI Reference
Source: https://github.com/tangle-network/browser-agent-driver / Human Manual
Stealth, Anti-Bot & CAPTCHA (v0.23.0 / Gen 27)
Related topics: Overview & Core Agent Architecture, Benchmarking, Evaluation, Memory & Wallet
Continue reading this section for the full explanation and source context.
Related Pages
Related topics: Overview & Core Agent Architecture, Benchmarking, Evaluation, Memory & Wallet
Stealth, Anti-Bot & CAPTCHA (v0.23.0 / Gen 27)
Overview and Scope
v0.23.0 (Gen 27) consolidates browser-agent-driver's evasion surface into a single release focused on real-world anti-bot blocking, CAPTCHA handling, and form intelligence. The release claims that 9 of 13 previously-blocked sites now pass on the WebbBench-50 evaluation, with system Chrome, Patchright, and CAPTCHA solvers shipped as defaults.
The stealth subsystem sits at the browser-launch layer and combines four independent evasion techniques: TLS/JA3 fingerprinting, CDP protocol patching, mouse kinematics, and proxy routing. CAPTCHA handling is treated as a recovery job the main agent loop can invoke when blocked. The integration point for all of this is bad run, which is auto-detected for both CLI and SDK consumers and where CLI flags override config values.
Stealth and Anti-Bot Evasion
The browser launch code in src/cli/commands/run.ts is the primary integration point for stealth configuration. For stealth profiles, the launch plan upgrades the bundled Chromium channel to system Chrome: ...(isStealthProfile && browserName === 'chromium' ? { channel: 'chrome' } : {}). Source: src/cli/commands/run.ts. System Chrome provides a real TLS/JA3/HTTP2 fingerprint that bundled Chromium cannot reproduce. The launch code itself documents why the upgrade is gated to stealth profiles only: system Chrome renders differently than bundled Chromium on some sites, producing Allrecipes click timeouts and Amazon layout shifts.
Proxy support is wired through the launch plan: ...(launchPlan.proxyServer ? { proxy: { server: launchPlan.proxyServer, ...(launchPlan.proxyBypass ? { bypass: launchPlan.proxyBypass } : {}) } } : {}). Source: src/cli/commands/run.ts. Residential, SOCKS5, and HTTP proxies are accepted; the --proxy CLI flag and BAD_PROXY_URL environment variable both feed launchPlan.proxyServer.
Headless Chromium exposes itself with HeadlessChrome/... in the default User-Agent, which CDNs like Akamai reject with ERR_HTTP2_PROTOCOL_ERROR before any JS stealth patch can run. The launch code builds a clean UA from the live browser version and a platform-specific token:
const ver = browser.version()
const platformToken = process.platform === 'win32'
? 'Windows NT 10.0; Win64; x64'
: process.platform === 'linux'
? 'X11; Linux x86_64'
: 'Macintosh; Intel Mac OS X 10_15_7'
return `Mozilla/5.0 (${platformToken}) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/${ver} Safari/537.36`
Source: src/cli/commands/run.ts.
Additional stealth layers documented in the README include Patchright (a Playwright fork that patches CDP protocol leaks), mouse humanization with Bezier curves (8–15 control points plus gaussian click offset), browser fingerprint patches for navigator.webdriver, plugins, languages, WebGL, and canvas noise, and a blocklist of 99+ analytics/tracking domains. The --use-gl=desktop flag enables real GPU WebGL rendering. Source: README.md.
CAPTCHA Solving
CAPTCHA handling is enabled by default and configured via the captcha option: { captcha: { enabled: true, maxAttempts: 5 } }. Source: README.md. Three CAPTCHA families are supported:
- reCAPTCHA v2 — checkbox click followed by an LLM-vision image-grid solver.
- Cloudflare Turnstile — checkbox with behavioral click heuristics.
- Google "unusual traffic" — detected on the page and a solver attempted automatically.
Recovery is automatic: cookie consent, modal blockers, A-B-A-B oscillation loops, form-field resets, date-picker stalls, and CAPTCHA challenges are all handled in the agent loop before the run terminates. Source: README.md. The goal-verification task consults the current page state and a screenshot via buildUserContent(textContent, state.screenshot, true) before declaring success, so a CAPTCHA interstitial that survives the run will cause verification to fail rather than report a false positive. Source: src/brain/tasks/goal-verification.ts.
The link-scout task can use its own cheaper model (scoutProvider, scoutModelName) to pick the next visible link from a candidate list, reducing the cost of recovery loops on anti-bot pages. Source: src/brain/tasks/link-scout.ts.
Configuration, CLI Flags, and Failure Modes
| Concern | Surface | Notes |
|---|---|---|
| Proxy routing | --proxy flag or BAD_PROXY_URL env | Reads launchPlan.proxyServer and optional launchPlan.proxyBypass. Source: src/cli/commands/run.ts |
| Real WebGL | --use-gl=desktop | Avoids software-renderer fingerprint. Source: README.md |
| CAPTCHA policy | captcha: { enabled, maxAttempts } | Default enabled, maxAttempts: 5. Source: README.md |
| Profile gating | isStealthProfile | Channel upgrade only fires when stealth profile is active. Source: src/cli/commands/run.ts |
| Benchmark suite | bench:scoreboard, webbench:import | Package scripts under package.json. Source: package.json |
The scenario suite splits high-friction flows into a restricted-manual track that requires human-in-the-loop and is never run unattended in CI, while webbench and public-web tracks capture realistic anti-bot exposure under benchmark profiles (default, webbench, webvoyager). Source: bench/scenarios/README.md. Competitive benchmarking against browser-use, Stagehand, Skyvern, and the foundation-model Computer Use agents lives under bench/competitive/ and is the empirical ground truth for whether stealth and CAPTCHA work buys net task-completion improvement. Source: bench/competitive/README.md.
Known failure modes from the source:
| Symptom | Likely cause |
|---|---|
Site rejected with ERR_HTTP2_PROTOCOL_ERROR | Headless Chromium default UA still in flight; confirm isStealthProfile triggers system Chrome |
| Layout shifts or click timeouts on Allrecipes/Amazon | System Chrome renders differently than bundled Chromium — stealth upgrade is intentionally profile-scoped |
CAPTCHA loops repeat past maxAttempts | LLM-vision solver exhausted; raise captcha.maxAttempts and inspect screenshots in the run directory |
Source: src/cli/commands/run.ts, README.md.
See Also
- Configuration Reference — README.md
- CLI Reference —
bad run,bad snapshot,bad design-audit,bad view,bad competitive - Benchmark Suite — bench/scenarios/README.md, bench/competitive/README.md
- Brain tasks — src/brain/tasks/goal-verification.ts, src/brain/tasks/link-scout.ts
Source: https://github.com/tangle-network/browser-agent-driver / Human Manual
Design Audit & Auto-Fix
Related topics: Overview & Core Agent Architecture, Benchmarking, Evaluation, Memory & Wallet
Continue reading this section for the full explanation and source context.
Related Pages
Related topics: Overview & Core Agent Architecture, Benchmarking, Evaluation, Memory & Wallet
Design Audit & Auto-Fix
Overview
bad design-audit is a dedicated subsystem inside Browser Agent Driver that grades the visual quality, layout, typography, contrast, and overall UX polish of a target URL and emits structured findings. Unlike the agentic loop used by bad run, design audit is a single-pass vision-driven evaluation that scores a page against an explicit checklist of checkpoints. It is extracted from brain/index.ts as a delegate-and-host module under src/brain/tasks/design-audit.ts, and the file's own header documents the split: Brain.auditDesign keeps a thin delegator while the body lives in auditDesignImpl and reads Brain state through the BrainDesignAuditHost interface (src/brain/tasks/design-audit.ts).
The audit serves two related goals:
- Objective scoring — produce a numeric score plus categorized findings (category, severity, ROI hints).
- Optional patch passthrough — the same module carries
roi,reference, andjudgeknobs that feed an auto-fix loop, so a high-severity finding can drive a remediation candidate rather than just a report line.
The CLI entry point is runDesignAudit, wired in src/cli/commands/design-audit.ts. It forwards every relevant flag (--url, --pages, --profile, --model, --reference, --judge, --evolve, etc.) into a single typed options bag (src/cli/commands/design-audit.ts).
Architecture
flowchart LR
CLI["CLI: bad design-audit"] --> CMD["runDesignAudit()"]
CMD --> Brain["Brain.auditDesign (delegator)"]
Brain --> Impl["auditDesignImpl()"]
Impl --> Host["BrainDesignAuditHost<br/>(generate, buildUserContent, debug)"]
Host --> Model["LLM (vision-capable)"]
Model --> Parse["JSON parse → DesignFinding[]"]
Parse --> Score["score + designSystemScore"]
Parse --> Optional["Patch / ROI passthrough"]
Optional --> Report["json | html | junit sink"]The host interface is intentionally narrow. It only exposes debug, buildUserContent, and generate, so the body cannot reach into unrelated Brain state. Because Brain implements BrainDesignAuditHost, a missing or mistyped member fails tsc at compile time (src/brain/tasks/design-audit.ts).
Pipeline and Prompt
auditDesignImpl builds a single user message that mixes the goal, the explicit checkpoint list, the current URL/title, and the page snapshot, then attaches the screenshot with forceVision: true. This guarantees a vision-capable model is consulted even if vision is disabled elsewhere in the session (src/brain/tasks/design-audit.ts).
GOAL: <goal>
CHECKPOINTS:
1. <c1>
2. <c2>
...
CURRENT PAGE:
URL: <state.url>
Title: <state.title>
ELEMENTS:
<state.snapshot>
The system prompt defaults to DESIGN_AUDIT_PROMPT from brain/prompts.ts, but a caller can override it via the systemPrompt argument. The response is sent through generate(..., 8000) to cap output, then parsed with two layers of fallback: trim surrounding fences first, then regex-extract a JSON object if the model returns prose or a truncated payload. Both branches assign into a parseError field rather than throwing, so a malformed response still produces a report line (src/brain/tasks/design-audit.ts).
Configuration Surface
bad design-audit exposes more than 20 flags. The most relevant ones for the audit + auto-fix loop are summarized below.
| Flag | Purpose |
|---|---|
--url / --pages | Target URL and number of pages to crawl |
--profile, --model, --provider, --api-key, --base-url | Model selection (vision-capable) |
--sink | Output format: json, markdown, html, or junit |
--json, --headless, --debug | Output verbosity and runtime mode |
--storage-state | Reuse cookies/storage for authenticated audits |
--extract-tokens | Extract design tokens (colors, fonts) from the page |
--evolve, --evolve-rounds | Iterative auto-fix loop: re-audit after each patch |
--project-dir | Where to write patches |
--reproducibility | Lock seeds/snapshots for a reproducible audit |
--rubrics-dir | Override the builtin rubric set with a custom one |
--audit-passes | Multi-pass auditing for richer findings |
--skip-ethics | Bypass the Layer 7 ethics rollup floor (testing only) |
--ethics-rules-dir | Override builtin ethics rules |
--audience, --regulatory-context, --audience-vulnerability, --modality | Audience predicates that weight findings |
--reference, --reference-grounded | Opt-in reference-grounded taste judge (v1) |
--judge, --judge-models | Judge mode (text or vision) and ensemble list |
All flags are declared in src/cli/args.ts and forwarded unchanged through runDesignAudit (src/cli/args.ts, src/cli/commands/design-audit.ts).
Auto-Fix Loop
The auto-fix path is opt-in via --evolve. Each round:
- Run
auditDesignImplon the current state of the page. - Emit
DesignFinding[](category, severity, ROI). - For findings above the ROI threshold, project them into a patch candidate and write it under
--project-dir. - Re-render and re-audit until either
--evolve-roundsis exhausted or the score crosses a stop band.
The same module ships the designSystemScore map and tokensUsed counter so a downstream agent can decide whether to spend another round. Two sibling tasks in src/brain/tasks/ illustrate the related decision patterns the auto-fix loop composes with:
evaluateImpl(src/brain/tasks/evaluate.ts) — produces aQualityEvaluation(subjective taste rating) usingEVALUATE_PROMPT, the same transport funnel (buildUserContent+generate).verifyGoalCompletionImpl(src/brain/tasks/goal-verification.ts) — confirms a user-stated goal is satisfied before a run is marked complete, including abuildFirstPartyBoundaryNotesite-boundary check that prevents the verifier from claiming success on a first-party page that hasn't actually moved.
Both follow the same delegate-and-host pattern, which means an auto-fix pass can mix "did the design improve?" (audit) with "did the user's stated outcome improve?" (goal verification) and "does the page look professional now?" (evaluate) without duplicating transport plumbing.
Usage
Minimal:
bad design-audit --url https://example.com
With auto-fix, custom rubric, and reference-grounded judging:
bad design-audit \
--url https://example.com \
--pages 3 \
--evolve --evolve-rounds 5 \
--project-dir ./audit-out \
--rubrics-dir ./my-rubrics \
--reference ./ref.png --reference-grounded \
--judge vision --judge-models gpt-5.4,claude-opus-4.6
Reporters follow the same multi-format pattern used by bad run: json, markdown (with turn detail), html, and junit are emitted in parallel when requested, with best-effort error handling so a broken reporter never aborts the audit (src/cli/commands/run.ts).
Common Failure Modes
- Malformed model output. Handled by the trim-fence + regex-extract fallback in
auditDesignImpl; the raw text andparseErrorare still surfaced in the report so a downstream tool can re-prompt. - Non-vision model selected. Mitigated by
forceVision: truein the audit'sbuildUserContentcall. - Ethics floor blocks the run. Use
--skip-ethicsonly in test scenarios; production should leave the Layer 7 gate on (src/cli/args.ts). - Auto-fix stalls. Increase
--evolve-roundsor widen the ROI threshold; the score andtokensUsedper round are emitted so progress can be measured externally.
See Also
- Brain Decision Engine
- CLI Reference
- Configuration Guide
- Goal Verification
Source: https://github.com/tangle-network/browser-agent-driver / Human Manual
Benchmarking, Evaluation, Memory & Wallet
Related topics: Overview & Core Agent Architecture, Stealth, Anti-Bot & CAPTCHA (v0.23.0 / Gen 27)
Continue reading this section for the full explanation and source context.
Continue reading this section for the full explanation and source context.
Continue reading this section for the full explanation and source context.
Related Pages
Related topics: Overview & Core Agent Architecture, Stealth, Anti-Bot & CAPTCHA (v0.23.0 / Gen 27)
Benchmarking, Evaluation, Memory & Wallet
The browser-agent-driver ("bad") project ships a tightly integrated suite for measuring agent quality, scoring design and goal outcomes, distilling reusable knowledge from completed runs, and exercising browser-extension wallets during DeFi-style tasks. This page documents each pillar, the scripts that drive them, and how they compose.
1. Benchmarking Infrastructure
The benchmark layer is split into two complementary harnesses.
Scenario Suite
bench/scenarios/README.md defines five tracks. local-deterministic runs on controlled fixtures and is required in CI; staging-auth exercises real product flows with seeded storage state; public-web validates against stable public pages (non-critical in CI); webbench derives cases from the Halluminate WebBench corpus for cross-agent comparability; and restricted-manual covers captcha/phone-verified flows that must never run unattended. Each task is tagged with categories such as navigation, form-completion, product-usage, research, scraping, auth, and blocker-recovery. Source: bench/scenarios/README.md:1-50.
The canonical runner is scripts/run-scenario-track.mjs, invoked as node scripts/run-scenario-track.mjs --cases <file> --config <file> --model <id> --benchmark-profile <profile> --modes <list>. Benchmark profiles tune the noise/cost tradeoff: default is balanced, webbench is fast and low-noise, and webvoyager is evidence-rich. Source: bench/scenarios/README.md:50-80.
A/B experiments use npm run ab:experiment, with outputs summary.json (Wilson CIs, bootstrap delta CI), runs.csv, passrate-series.csv, summary.md, and blocker-adjusted cleanPassRate metrics. The Tier1 Reliability Gate runs deterministic fixtures with --min-full-pass-rate 1 --min-fast-pass-rate 1 and emits tier1-gate-summary.{json,md}. Source: bench/scenarios/README.md:80-130.
Competitive Harness
bench/competitive/README.md frames a head-to-head comparison of bad against browser-use, Stagehand, Skyvern, OpenAI Computer Use, and Claude Computer Use. Each (framework, task) cell captures success, wallTimeSeconds, turnCount, llmCallCount, token buckets, and costUsd computed from a shared pricing table. Source: bench/competitive/README.md:1-50.
The driver scripts pnpm bench:competitive:setup, :run, and :dashboard install runners, execute cells, and render results/_dashboard.md. Reported headline numbers: 91.3% on WebVoyager (590 tasks, 15 sites) at \$0.09/task, 100% on a held-out competitive bench, and 95.7% on WebbBench-50 excluding DataDome sites. Source: README.md:1-30.
| Track | CI Required | Drift Risk | Example Categories |
|---|---|---|---|
local-deterministic | Yes | None | navigation, form-completion |
staging-auth | Yes | Low | auth, product-usage |
public-web | No | High | research, scraping |
webbench | No | High | navigation, scraping |
restricted-manual | Never (human-in-loop) | High | blocker-recovery, auth |
2. Brain Evaluation Tasks
The Brain decision engine exposes three structured evaluation tasks, each implemented as a thin delegator on Brain plus a host-interface slice so the compiler proves completeness.
evaluateImplrates a page on a 1–10 scale and returns{ score, assessment, strengths, issues, suggestions, raw, tokensUsed }. It always forces vision (forceVision: true) so the model sees the actual screenshot. Source: src/brain/tasks/evaluate.ts:30-80.auditDesignImplreturns{ score, findings, raw }wherefindingscarry categories, severities, and optional ROI/patch passthrough for design regressions. Source: src/brain/tasks/design-audit.ts:20-60.verifyGoalCompletionImplasks the verifier model (which may differ from the main model viaverifierProvider/verifierModelor adaptive routing onnavModelName) whether the claimed result actually matches the livePageState. Source: src/brain/tasks/goal-verification.ts:20-70.recommendLinkCandidateImpl(link scout) picks the single best next visible link from a deterministic top-5 ranking, optionally using vision whenscoutUseVisionis set. Source: src/brain/tasks/link-scout.ts:20-60.
flowchart LR
A[Page State] --> B[evaluate]
A --> C[auditDesign]
A --> D[verifyGoalCompletion]
E[Link Candidates] --> F[linkScout]
B --> G[score 1-10]
C --> H[findings]
D --> I{achieved?}
F --> J[next ref]3. Knowledge Extraction (Memory)
extractKnowledgeImpl distills a completed trajectory into a bounded list of reusable facts with explicit type values: timing (wait durations), selector (reliable element handles), pattern (multi-step interaction sequences), and quirk (app-specific gotchas). The model is capped at 10 facts and must respond with raw JSON (the parser strips ```json fences before validation). Source: src/brain/tasks/knowledge.ts:20-80.
These facts feed downstream caches that speed up repeat visits to the same domain — a behaviour implied by the prompt design ("help an agent complete similar tasks faster next time"). Quality-over-quantity is enforced both in the system prompt and by post-validating each entry against the VALID_TYPES allow-list. Source: src/brain/tasks/knowledge.ts:60-90.
4. Wallet & DeFi Testing
Wallet flows are first-class in the run pipeline. The CLI shutdown sequence guarantees the auto-approver is stopped and the persistent context is closed before the process exits, even on error. Source: src/cli/commands/run.ts:200-240.
package.json exposes setup-time helpers:
wallet:setup— installs the wallet extension viabench/wallet/setup-extension.mjs.wallet:onboard— drives onboarding throughbench/wallet/setup-onboarding.mjs.wallet:configure— configures the extension viabench/wallet/...(truncated in context).
Source: package.json:1-40. These scripts pair with the run-time stopWalletAutoApprover hook so wallet UX flows can be exercised end-to-end without manual seeding.
See Also
README.md— headline benchmarks, install, CLI quick-start.bench/scenarios/README.md— full scenario track taxonomy and CLI flags.bench/competitive/README.md— competitor runner design and metrics.src/brain/tasks/knowledge.ts— memory fact schema and extraction prompt.
Source: https://github.com/tangle-network/browser-agent-driver / Human Manual
Doramagic Pitfall Log
Source-linked risks stay visible on the manual page so the preview does not read like a recommendation.
May increase setup, validation, or first-run risk for the user.
May increase setup, validation, or first-run risk for the user.
May increase setup, validation, or first-run risk for the user.
May increase setup, validation, or first-run risk for the user.
Doramagic Pitfall Log
Found 7 structured pitfall item(s), including 0 high/blocking item(s). Top priority: Configuration risk - Configuration risk requires verification.
1. Configuration risk: Configuration risk requires verification
- Severity: medium
- Finding: Project evidence flags a configuration risk. Review the linked source before relying on this workflow.
- User impact: May increase setup, validation, or first-run risk for the user.
- Recommended check: Reproduce the official install and quickstart path in an isolated environment.
- Evidence: capability.host_targets | https://github.com/tangle-network/browser-agent-driver
2. Capability evidence risk: Capability evidence risk requires verification
- Severity: medium
- Finding: README/documentation is current enough for a first validation pass.
- User impact: May increase setup, validation, or first-run risk for the user.
- Recommended check: Reproduce the official install and quickstart path in an isolated environment.
- Evidence: capability.assumptions | https://github.com/tangle-network/browser-agent-driver
3. Maintenance risk: Maintenance risk requires verification
- Severity: medium
- Finding: Project evidence flags a maintenance risk. Review the linked source before relying on this workflow.
- User impact: May increase setup, validation, or first-run risk for the user.
- Recommended check: Reproduce the official install and quickstart path in an isolated environment.
- Evidence: evidence.maintainer_signals | https://github.com/tangle-network/browser-agent-driver
4. Security or permission risk: Security or permission risk requires verification
- Severity: medium
- Finding: no_demo
- User impact: May increase setup, validation, or first-run risk for the user.
- Recommended check: Reproduce the official install and quickstart path in an isolated environment.
- Evidence: downstream_validation.risk_items | https://github.com/tangle-network/browser-agent-driver
5. Security or permission risk: Security or permission risk requires verification
- Severity: medium
- Finding: no_demo
- User impact: May increase setup, validation, or first-run risk for the user.
- Recommended check: Reproduce the official install and quickstart path in an isolated environment.
- Evidence: risks.scoring_risks | https://github.com/tangle-network/browser-agent-driver
6. Maintenance risk: Maintenance risk requires verification
- Severity: low
- Finding: issue_or_pr_quality=unknown。
- User impact: May increase setup, validation, or first-run risk for the user.
- Recommended check: Reproduce the official install and quickstart path in an isolated environment.
- Evidence: evidence.maintainer_signals | https://github.com/tangle-network/browser-agent-driver
7. Maintenance risk: Maintenance risk requires verification
- Severity: low
- Finding: release_recency=unknown。
- User impact: May increase setup, validation, or first-run risk for the user.
- Recommended check: Reproduce the official install and quickstart path in an isolated environment.
- Evidence: evidence.maintainer_signals | https://github.com/tangle-network/browser-agent-driver
Source: Doramagic discovery, validation, and Project Pack records
Community Discussion Evidence
These external discussion links are review inputs, not standalone proof that the project is production-ready.
Count of project-level external discussion links exposed on this manual page.
Open the linked issues or discussions before treating the pack as ready for your environment.
Community Discussion Evidence
Doramagic exposes project-level community discussion separately from official documentation. Review these links before using browser-agent-driver with real data or production workflows.
- Nightly reliability regression - github / github_issue
- v0.24.1 - github / github_release
- v0.24.0 - github / github_release
- v0.23.0 — Gen 27: Stealth + Anti-Bot + Form Intelligence - github / github_release
- v0.14.3 - github / github_release
- v0.14.2 - github / github_release
- v0.14.1 - github / github_release
- v0.11.0 - github / github_release
- v0.10.0 - github / github_release
- v0.9.0 - github / github_release
- v0.8.7 - github / github_release
- Configuration risk requires verification - GitHub / issue
Source: Project Pack community evidence and pitfall evidence