Doramagic Project Pack · Human Manual

browser-agent-driver

LLM-driven browser automation with wallet extension testing. Accessibility tree + optional vision.

Overview & Core Agent Architecture

Related topics: Stealth, Anti-Bot & CAPTCHA (v0.23.0 / Gen 27), Design Audit & Auto-Fix, Benchmarking, Evaluation, Memory & Wallet

Section Related Pages

Continue reading this section for the full explanation and source context.

Related topics: Stealth, Anti-Bot & CAPTCHA (v0.23.0 / Gen 27), Design Audit & Auto-Fix, Benchmarking, Evaluation, Memory & Wallet

Overview & Core Agent Architecture

Purpose and Scope

@tangle-network/browser-agent-driver (binary: bad) is a general-purpose agentic browser automation system that completes real user outcomes on arbitrary websites: search, extraction, form filling, price comparison, and complex UI navigation. The package is published as a dual CLI and library; the CLI binary bad is declared in package.json:8-10, and the library entry point is exposed through the package exports field. According to the README headline metrics, the system reaches 91.3% on WebVoyager (590 tasks across 15 sites) at $0.09 per task, with the default model being gpt-5.4.

The scope spans three primary use modes surfaced in README.md:

  • A one-shot CLI (bad run --goal "...") for ad-hoc automation.
  • A programmatic SDK (new BrowserAgent({ driver, config })) for application integration.
  • A benchmark harness under bench/ for CI-grade regression and competitive evaluation.

Architecture Overview

The agent loop is decoupled from the browser through a Driver interface, allowing the same decision engine to run against a local Playwright Chromium, a Steel cloud browser, or any other conforming implementation. The diagram below summarizes how the CLI, Brain, Driver, and reporting layer interact for a typical bad run invocation.

flowchart LR
  CLI[bad CLI<br/>run.ts] --> Brain[Brain<br/>decision engine]
  Brain -->|generate| ModelProvider[(Model Provider<br/>OpenAI / Anthropic /<br/>sandbox-backend)]
  Brain -->|decide / verify / scout| Driver[Driver<br/>Playwright or Steel]
  Driver --> Browser[(Chromium / Cloud)]
  CLI --> Reports[Reporters<br/>json / md / html / junit]
  CLI --> Renderer[Live Renderer<br/>stdout / TUI]

The CLI entry point in src/cli/commands/run.ts handles argument parsing, reporter fan-out, stream webhooks, and clean shutdown of browser, persistent context, and single-driver resources. A secondary command, bad showcase, delegates to a capture-and-evaluate pipeline via the thin wrapper in src/cli/commands/showcase.ts, which simply forwards CLI flags to the internal handleShowcase implementation.

Brain Decision Engine

The Brain is the LLM-driven core that observes page state and emits actions. To keep the file maintainable as decision subtasks grow, the Brain uses a delegate + host-interface pattern: each task lives in its own module under src/brain/tasks/, and the Brain class implements a small host interface that exposes only the slice of state the task needs. This makes missing or mistyped members a compile-time error.

The tasks observed in the source tree include:

Task moduleResponsibility
goal-verification.tsJudge whether the agent's claimed result actually achieved the goal on the current page. Supports a dedicated verifier model and falls back to the navigation model under adaptive routing.
link-scout.tsRecommend the single best next visible link from a scored candidate list, using only the top 5 to save 2–8k tokens.
knowledge.tsDistill a completed trajectory into reusable timing/selector/pattern/quirk facts for future runs.
design-audit.tsVision-based layout, typography, spacing, contrast, and UX analysis returning structured findings with severities.
evaluate.tsRate the visual quality and professional polish of the current page via a dedicated EVALUATE_PROMPT.

Each host interface declares only the slice of Brain state the task reads — typically provider, navProvider, navModelName, buildUserContent, and generate — keeping the dependency surface explicit and testable.

Driver Layer and Model Routing

The Driver abstraction decouples the agent loop from the browser transport. The default is a local Playwright driver, while a Steel driver is used for anti-bot, residential proxies, and CAPTCHA solve-as-a-service. Because the Brain only knows the Driver interface, alternative implementations can be substituted without touching decision logic.

Models are configurable per role. The README documents a models map covering planner, executor, verifier, and supervisor, letting cheaper models handle navigation while a stronger model supervises or audits. The goal verifier in src/brain/tasks/goal-verification.ts demonstrates this: if verifierProvider is unset, it falls back to navProvider when adaptiveModelRouting is enabled, otherwise it reuses the main provider. Link scouting in src/brain/tasks/link-scout.ts follows the same precedence: scoutProvidernavProviderprovider.

For sandboxed execution, src/providers/sandbox-backend.ts builds a transcript from ModelMessage[], serializes text and image attachments, and infers the backend type from the model name (claude/sonnet/opus/haikuclaude-code, gpt/o1/o3/o4/codexcodex), throwing if inference fails.

Reporting, Scenarios, and Failure Modes

After every run, src/cli/commands/run.ts writes report files for each requested format (json, markdown, html, junit) under the configured report directory, renders them through the live view, and finally — in JSON mode — echoes the structured result to stdout. Cleanup runs in a finally block that detaches the interrupt controller, flushes the webhook streamer, and closes driver, persistent context, and browser resources regardless of success.

The benchmark tracks documented in bench/scenarios/README.md split tasks into local-deterministic, staging-auth, public-web, webbench, and restricted-manual to keep flaky internet and policy-sensitive flows (captchas, third-party account provisioning) out of CI reliability metrics. The competitive harness in bench/competitive/README.md targets comparability against browser-use, Stagehand, Skyvern, and Computer Use — measuring cost-per-task alongside success rate.

Common failure modes worth understanding:

  • Provider inference failure: the sandbox backend throws if model name matches neither Claude nor GPT/Codex patterns.
  • Reporter errors are swallowed: report generation is best-effort, so missing templates will not abort the run.
  • Cleanup throws are caught: close() calls use .catch(() => {}), so resource leaks may be silent.
  • Stealth regressions: per the v0.23.0 release notes (Gen 27), previously-blocked sites now require System Chrome, Patchright, and Bezier mouse humanization to be active.

See Also

  • Stealth & Anti-Bot Configuration
  • Brain Decision Tasks
  • Driver Implementations
  • Benchmark Suites
  • CLI Reference

Source: https://github.com/tangle-network/browser-agent-driver / Human Manual

Stealth, Anti-Bot & CAPTCHA (v0.23.0 / Gen 27)

Related topics: Overview & Core Agent Architecture, Benchmarking, Evaluation, Memory & Wallet

Section Related Pages

Continue reading this section for the full explanation and source context.

Related topics: Overview & Core Agent Architecture, Benchmarking, Evaluation, Memory & Wallet

Stealth, Anti-Bot & CAPTCHA (v0.23.0 / Gen 27)

Overview and Scope

v0.23.0 (Gen 27) consolidates browser-agent-driver's evasion surface into a single release focused on real-world anti-bot blocking, CAPTCHA handling, and form intelligence. The release claims that 9 of 13 previously-blocked sites now pass on the WebbBench-50 evaluation, with system Chrome, Patchright, and CAPTCHA solvers shipped as defaults.

The stealth subsystem sits at the browser-launch layer and combines four independent evasion techniques: TLS/JA3 fingerprinting, CDP protocol patching, mouse kinematics, and proxy routing. CAPTCHA handling is treated as a recovery job the main agent loop can invoke when blocked. The integration point for all of this is bad run, which is auto-detected for both CLI and SDK consumers and where CLI flags override config values.

Stealth and Anti-Bot Evasion

The browser launch code in src/cli/commands/run.ts is the primary integration point for stealth configuration. For stealth profiles, the launch plan upgrades the bundled Chromium channel to system Chrome: ...(isStealthProfile && browserName === 'chromium' ? { channel: 'chrome' } : {}). Source: src/cli/commands/run.ts. System Chrome provides a real TLS/JA3/HTTP2 fingerprint that bundled Chromium cannot reproduce. The launch code itself documents why the upgrade is gated to stealth profiles only: system Chrome renders differently than bundled Chromium on some sites, producing Allrecipes click timeouts and Amazon layout shifts.

Proxy support is wired through the launch plan: ...(launchPlan.proxyServer ? { proxy: { server: launchPlan.proxyServer, ...(launchPlan.proxyBypass ? { bypass: launchPlan.proxyBypass } : {}) } } : {}). Source: src/cli/commands/run.ts. Residential, SOCKS5, and HTTP proxies are accepted; the --proxy CLI flag and BAD_PROXY_URL environment variable both feed launchPlan.proxyServer.

Headless Chromium exposes itself with HeadlessChrome/... in the default User-Agent, which CDNs like Akamai reject with ERR_HTTP2_PROTOCOL_ERROR before any JS stealth patch can run. The launch code builds a clean UA from the live browser version and a platform-specific token:

const ver = browser.version()
const platformToken = process.platform === 'win32'
  ? 'Windows NT 10.0; Win64; x64'
  : process.platform === 'linux'
    ? 'X11; Linux x86_64'
    : 'Macintosh; Intel Mac OS X 10_15_7'
return `Mozilla/5.0 (${platformToken}) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/${ver} Safari/537.36`

Source: src/cli/commands/run.ts.

Additional stealth layers documented in the README include Patchright (a Playwright fork that patches CDP protocol leaks), mouse humanization with Bezier curves (8–15 control points plus gaussian click offset), browser fingerprint patches for navigator.webdriver, plugins, languages, WebGL, and canvas noise, and a blocklist of 99+ analytics/tracking domains. The --use-gl=desktop flag enables real GPU WebGL rendering. Source: README.md.

CAPTCHA Solving

CAPTCHA handling is enabled by default and configured via the captcha option: { captcha: { enabled: true, maxAttempts: 5 } }. Source: README.md. Three CAPTCHA families are supported:

  • reCAPTCHA v2 — checkbox click followed by an LLM-vision image-grid solver.
  • Cloudflare Turnstile — checkbox with behavioral click heuristics.
  • Google "unusual traffic" — detected on the page and a solver attempted automatically.

Recovery is automatic: cookie consent, modal blockers, A-B-A-B oscillation loops, form-field resets, date-picker stalls, and CAPTCHA challenges are all handled in the agent loop before the run terminates. Source: README.md. The goal-verification task consults the current page state and a screenshot via buildUserContent(textContent, state.screenshot, true) before declaring success, so a CAPTCHA interstitial that survives the run will cause verification to fail rather than report a false positive. Source: src/brain/tasks/goal-verification.ts.

The link-scout task can use its own cheaper model (scoutProvider, scoutModelName) to pick the next visible link from a candidate list, reducing the cost of recovery loops on anti-bot pages. Source: src/brain/tasks/link-scout.ts.

Configuration, CLI Flags, and Failure Modes

ConcernSurfaceNotes
Proxy routing--proxy flag or BAD_PROXY_URL envReads launchPlan.proxyServer and optional launchPlan.proxyBypass. Source: src/cli/commands/run.ts
Real WebGL--use-gl=desktopAvoids software-renderer fingerprint. Source: README.md
CAPTCHA policycaptcha: { enabled, maxAttempts }Default enabled, maxAttempts: 5. Source: README.md
Profile gatingisStealthProfileChannel upgrade only fires when stealth profile is active. Source: src/cli/commands/run.ts
Benchmark suitebench:scoreboard, webbench:importPackage scripts under package.json. Source: package.json

The scenario suite splits high-friction flows into a restricted-manual track that requires human-in-the-loop and is never run unattended in CI, while webbench and public-web tracks capture realistic anti-bot exposure under benchmark profiles (default, webbench, webvoyager). Source: bench/scenarios/README.md. Competitive benchmarking against browser-use, Stagehand, Skyvern, and the foundation-model Computer Use agents lives under bench/competitive/ and is the empirical ground truth for whether stealth and CAPTCHA work buys net task-completion improvement. Source: bench/competitive/README.md.

Known failure modes from the source:

SymptomLikely cause
Site rejected with ERR_HTTP2_PROTOCOL_ERRORHeadless Chromium default UA still in flight; confirm isStealthProfile triggers system Chrome
Layout shifts or click timeouts on Allrecipes/AmazonSystem Chrome renders differently than bundled Chromium — stealth upgrade is intentionally profile-scoped
CAPTCHA loops repeat past maxAttemptsLLM-vision solver exhausted; raise captcha.maxAttempts and inspect screenshots in the run directory

Source: src/cli/commands/run.ts, README.md.

See Also

Source: https://github.com/tangle-network/browser-agent-driver / Human Manual

Design Audit & Auto-Fix

Related topics: Overview & Core Agent Architecture, Benchmarking, Evaluation, Memory & Wallet

Section Related Pages

Continue reading this section for the full explanation and source context.

Related topics: Overview & Core Agent Architecture, Benchmarking, Evaluation, Memory & Wallet

Design Audit & Auto-Fix

Overview

bad design-audit is a dedicated subsystem inside Browser Agent Driver that grades the visual quality, layout, typography, contrast, and overall UX polish of a target URL and emits structured findings. Unlike the agentic loop used by bad run, design audit is a single-pass vision-driven evaluation that scores a page against an explicit checklist of checkpoints. It is extracted from brain/index.ts as a delegate-and-host module under src/brain/tasks/design-audit.ts, and the file's own header documents the split: Brain.auditDesign keeps a thin delegator while the body lives in auditDesignImpl and reads Brain state through the BrainDesignAuditHost interface (src/brain/tasks/design-audit.ts).

The audit serves two related goals:

  1. Objective scoring — produce a numeric score plus categorized findings (category, severity, ROI hints).
  2. Optional patch passthrough — the same module carries roi, reference, and judge knobs that feed an auto-fix loop, so a high-severity finding can drive a remediation candidate rather than just a report line.

The CLI entry point is runDesignAudit, wired in src/cli/commands/design-audit.ts. It forwards every relevant flag (--url, --pages, --profile, --model, --reference, --judge, --evolve, etc.) into a single typed options bag (src/cli/commands/design-audit.ts).

Architecture

flowchart LR
    CLI["CLI: bad design-audit"] --> CMD["runDesignAudit()"]
    CMD --> Brain["Brain.auditDesign (delegator)"]
    Brain --> Impl["auditDesignImpl()"]
    Impl --> Host["BrainDesignAuditHost<br/>(generate, buildUserContent, debug)"]
    Host --> Model["LLM (vision-capable)"]
    Model --> Parse["JSON parse → DesignFinding[]"]
    Parse --> Score["score + designSystemScore"]
    Parse --> Optional["Patch / ROI passthrough"]
    Optional --> Report["json | html | junit sink"]

The host interface is intentionally narrow. It only exposes debug, buildUserContent, and generate, so the body cannot reach into unrelated Brain state. Because Brain implements BrainDesignAuditHost, a missing or mistyped member fails tsc at compile time (src/brain/tasks/design-audit.ts).

Pipeline and Prompt

auditDesignImpl builds a single user message that mixes the goal, the explicit checkpoint list, the current URL/title, and the page snapshot, then attaches the screenshot with forceVision: true. This guarantees a vision-capable model is consulted even if vision is disabled elsewhere in the session (src/brain/tasks/design-audit.ts).

GOAL: <goal>
CHECKPOINTS:
1. <c1>
2. <c2>
...

CURRENT PAGE:
URL: <state.url>
Title: <state.title>
ELEMENTS:
<state.snapshot>

The system prompt defaults to DESIGN_AUDIT_PROMPT from brain/prompts.ts, but a caller can override it via the systemPrompt argument. The response is sent through generate(..., 8000) to cap output, then parsed with two layers of fallback: trim surrounding fences first, then regex-extract a JSON object if the model returns prose or a truncated payload. Both branches assign into a parseError field rather than throwing, so a malformed response still produces a report line (src/brain/tasks/design-audit.ts).

Configuration Surface

bad design-audit exposes more than 20 flags. The most relevant ones for the audit + auto-fix loop are summarized below.

FlagPurpose
--url / --pagesTarget URL and number of pages to crawl
--profile, --model, --provider, --api-key, --base-urlModel selection (vision-capable)
--sinkOutput format: json, markdown, html, or junit
--json, --headless, --debugOutput verbosity and runtime mode
--storage-stateReuse cookies/storage for authenticated audits
--extract-tokensExtract design tokens (colors, fonts) from the page
--evolve, --evolve-roundsIterative auto-fix loop: re-audit after each patch
--project-dirWhere to write patches
--reproducibilityLock seeds/snapshots for a reproducible audit
--rubrics-dirOverride the builtin rubric set with a custom one
--audit-passesMulti-pass auditing for richer findings
--skip-ethicsBypass the Layer 7 ethics rollup floor (testing only)
--ethics-rules-dirOverride builtin ethics rules
--audience, --regulatory-context, --audience-vulnerability, --modalityAudience predicates that weight findings
--reference, --reference-groundedOpt-in reference-grounded taste judge (v1)
--judge, --judge-modelsJudge mode (text or vision) and ensemble list

All flags are declared in src/cli/args.ts and forwarded unchanged through runDesignAudit (src/cli/args.ts, src/cli/commands/design-audit.ts).

Auto-Fix Loop

The auto-fix path is opt-in via --evolve. Each round:

  1. Run auditDesignImpl on the current state of the page.
  2. Emit DesignFinding[] (category, severity, ROI).
  3. For findings above the ROI threshold, project them into a patch candidate and write it under --project-dir.
  4. Re-render and re-audit until either --evolve-rounds is exhausted or the score crosses a stop band.

The same module ships the designSystemScore map and tokensUsed counter so a downstream agent can decide whether to spend another round. Two sibling tasks in src/brain/tasks/ illustrate the related decision patterns the auto-fix loop composes with:

  • evaluateImpl (src/brain/tasks/evaluate.ts) — produces a QualityEvaluation (subjective taste rating) using EVALUATE_PROMPT, the same transport funnel (buildUserContent + generate).
  • verifyGoalCompletionImpl (src/brain/tasks/goal-verification.ts) — confirms a user-stated goal is satisfied before a run is marked complete, including a buildFirstPartyBoundaryNote site-boundary check that prevents the verifier from claiming success on a first-party page that hasn't actually moved.

Both follow the same delegate-and-host pattern, which means an auto-fix pass can mix "did the design improve?" (audit) with "did the user's stated outcome improve?" (goal verification) and "does the page look professional now?" (evaluate) without duplicating transport plumbing.

Usage

Minimal:

bad design-audit --url https://example.com

With auto-fix, custom rubric, and reference-grounded judging:

bad design-audit \
  --url https://example.com \
  --pages 3 \
  --evolve --evolve-rounds 5 \
  --project-dir ./audit-out \
  --rubrics-dir ./my-rubrics \
  --reference ./ref.png --reference-grounded \
  --judge vision --judge-models gpt-5.4,claude-opus-4.6

Reporters follow the same multi-format pattern used by bad run: json, markdown (with turn detail), html, and junit are emitted in parallel when requested, with best-effort error handling so a broken reporter never aborts the audit (src/cli/commands/run.ts).

Common Failure Modes

  • Malformed model output. Handled by the trim-fence + regex-extract fallback in auditDesignImpl; the raw text and parseError are still surfaced in the report so a downstream tool can re-prompt.
  • Non-vision model selected. Mitigated by forceVision: true in the audit's buildUserContent call.
  • Ethics floor blocks the run. Use --skip-ethics only in test scenarios; production should leave the Layer 7 gate on (src/cli/args.ts).
  • Auto-fix stalls. Increase --evolve-rounds or widen the ROI threshold; the score and tokensUsed per round are emitted so progress can be measured externally.

See Also

  • Brain Decision Engine
  • CLI Reference
  • Configuration Guide
  • Goal Verification

Source: https://github.com/tangle-network/browser-agent-driver / Human Manual

Benchmarking, Evaluation, Memory & Wallet

Related topics: Overview & Core Agent Architecture, Stealth, Anti-Bot & CAPTCHA (v0.23.0 / Gen 27)

Section Related Pages

Continue reading this section for the full explanation and source context.

Section Scenario Suite

Continue reading this section for the full explanation and source context.

Section Competitive Harness

Continue reading this section for the full explanation and source context.

Related topics: Overview & Core Agent Architecture, Stealth, Anti-Bot & CAPTCHA (v0.23.0 / Gen 27)

Benchmarking, Evaluation, Memory & Wallet

The browser-agent-driver ("bad") project ships a tightly integrated suite for measuring agent quality, scoring design and goal outcomes, distilling reusable knowledge from completed runs, and exercising browser-extension wallets during DeFi-style tasks. This page documents each pillar, the scripts that drive them, and how they compose.

1. Benchmarking Infrastructure

The benchmark layer is split into two complementary harnesses.

Scenario Suite

bench/scenarios/README.md defines five tracks. local-deterministic runs on controlled fixtures and is required in CI; staging-auth exercises real product flows with seeded storage state; public-web validates against stable public pages (non-critical in CI); webbench derives cases from the Halluminate WebBench corpus for cross-agent comparability; and restricted-manual covers captcha/phone-verified flows that must never run unattended. Each task is tagged with categories such as navigation, form-completion, product-usage, research, scraping, auth, and blocker-recovery. Source: bench/scenarios/README.md:1-50.

The canonical runner is scripts/run-scenario-track.mjs, invoked as node scripts/run-scenario-track.mjs --cases <file> --config <file> --model <id> --benchmark-profile <profile> --modes <list>. Benchmark profiles tune the noise/cost tradeoff: default is balanced, webbench is fast and low-noise, and webvoyager is evidence-rich. Source: bench/scenarios/README.md:50-80.

A/B experiments use npm run ab:experiment, with outputs summary.json (Wilson CIs, bootstrap delta CI), runs.csv, passrate-series.csv, summary.md, and blocker-adjusted cleanPassRate metrics. The Tier1 Reliability Gate runs deterministic fixtures with --min-full-pass-rate 1 --min-fast-pass-rate 1 and emits tier1-gate-summary.{json,md}. Source: bench/scenarios/README.md:80-130.

Competitive Harness

bench/competitive/README.md frames a head-to-head comparison of bad against browser-use, Stagehand, Skyvern, OpenAI Computer Use, and Claude Computer Use. Each (framework, task) cell captures success, wallTimeSeconds, turnCount, llmCallCount, token buckets, and costUsd computed from a shared pricing table. Source: bench/competitive/README.md:1-50.

The driver scripts pnpm bench:competitive:setup, :run, and :dashboard install runners, execute cells, and render results/_dashboard.md. Reported headline numbers: 91.3% on WebVoyager (590 tasks, 15 sites) at \$0.09/task, 100% on a held-out competitive bench, and 95.7% on WebbBench-50 excluding DataDome sites. Source: README.md:1-30.

TrackCI RequiredDrift RiskExample Categories
local-deterministicYesNonenavigation, form-completion
staging-authYesLowauth, product-usage
public-webNoHighresearch, scraping
webbenchNoHighnavigation, scraping
restricted-manualNever (human-in-loop)Highblocker-recovery, auth

2. Brain Evaluation Tasks

The Brain decision engine exposes three structured evaluation tasks, each implemented as a thin delegator on Brain plus a host-interface slice so the compiler proves completeness.

  • evaluateImpl rates a page on a 1–10 scale and returns { score, assessment, strengths, issues, suggestions, raw, tokensUsed }. It always forces vision (forceVision: true) so the model sees the actual screenshot. Source: src/brain/tasks/evaluate.ts:30-80.
  • auditDesignImpl returns { score, findings, raw } where findings carry categories, severities, and optional ROI/patch passthrough for design regressions. Source: src/brain/tasks/design-audit.ts:20-60.
  • verifyGoalCompletionImpl asks the verifier model (which may differ from the main model via verifierProvider / verifierModel or adaptive routing on navModelName) whether the claimed result actually matches the live PageState. Source: src/brain/tasks/goal-verification.ts:20-70.
  • recommendLinkCandidateImpl (link scout) picks the single best next visible link from a deterministic top-5 ranking, optionally using vision when scoutUseVision is set. Source: src/brain/tasks/link-scout.ts:20-60.
flowchart LR
  A[Page State] --> B[evaluate]
  A --> C[auditDesign]
  A --> D[verifyGoalCompletion]
  E[Link Candidates] --> F[linkScout]
  B --> G[score 1-10]
  C --> H[findings]
  D --> I{achieved?}
  F --> J[next ref]

3. Knowledge Extraction (Memory)

extractKnowledgeImpl distills a completed trajectory into a bounded list of reusable facts with explicit type values: timing (wait durations), selector (reliable element handles), pattern (multi-step interaction sequences), and quirk (app-specific gotchas). The model is capped at 10 facts and must respond with raw JSON (the parser strips ```json fences before validation). Source: src/brain/tasks/knowledge.ts:20-80.

These facts feed downstream caches that speed up repeat visits to the same domain — a behaviour implied by the prompt design ("help an agent complete similar tasks faster next time"). Quality-over-quantity is enforced both in the system prompt and by post-validating each entry against the VALID_TYPES allow-list. Source: src/brain/tasks/knowledge.ts:60-90.

4. Wallet & DeFi Testing

Wallet flows are first-class in the run pipeline. The CLI shutdown sequence guarantees the auto-approver is stopped and the persistent context is closed before the process exits, even on error. Source: src/cli/commands/run.ts:200-240.

package.json exposes setup-time helpers:

  • wallet:setup — installs the wallet extension via bench/wallet/setup-extension.mjs.
  • wallet:onboard — drives onboarding through bench/wallet/setup-onboarding.mjs.
  • wallet:configure — configures the extension via bench/wallet/... (truncated in context).

Source: package.json:1-40. These scripts pair with the run-time stopWalletAutoApprover hook so wallet UX flows can be exercised end-to-end without manual seeding.

See Also

Source: https://github.com/tangle-network/browser-agent-driver / Human Manual

Doramagic Pitfall Log

Source-linked risks stay visible on the manual page so the preview does not read like a recommendation.

medium Configuration risk requires verification

May increase setup, validation, or first-run risk for the user.

medium Capability evidence risk requires verification

May increase setup, validation, or first-run risk for the user.

medium Maintenance risk requires verification

May increase setup, validation, or first-run risk for the user.

medium Security or permission risk requires verification

May increase setup, validation, or first-run risk for the user.

Doramagic Pitfall Log

Found 7 structured pitfall item(s), including 0 high/blocking item(s). Top priority: Configuration risk - Configuration risk requires verification.

1. Configuration risk: Configuration risk requires verification

  • Severity: medium
  • Finding: Project evidence flags a configuration risk. Review the linked source before relying on this workflow.
  • User impact: May increase setup, validation, or first-run risk for the user.
  • Recommended check: Reproduce the official install and quickstart path in an isolated environment.
  • Evidence: capability.host_targets | https://github.com/tangle-network/browser-agent-driver

2. Capability evidence risk: Capability evidence risk requires verification

  • Severity: medium
  • Finding: README/documentation is current enough for a first validation pass.
  • User impact: May increase setup, validation, or first-run risk for the user.
  • Recommended check: Reproduce the official install and quickstart path in an isolated environment.
  • Evidence: capability.assumptions | https://github.com/tangle-network/browser-agent-driver

3. Maintenance risk: Maintenance risk requires verification

  • Severity: medium
  • Finding: Project evidence flags a maintenance risk. Review the linked source before relying on this workflow.
  • User impact: May increase setup, validation, or first-run risk for the user.
  • Recommended check: Reproduce the official install and quickstart path in an isolated environment.
  • Evidence: evidence.maintainer_signals | https://github.com/tangle-network/browser-agent-driver

4. Security or permission risk: Security or permission risk requires verification

  • Severity: medium
  • Finding: no_demo
  • User impact: May increase setup, validation, or first-run risk for the user.
  • Recommended check: Reproduce the official install and quickstart path in an isolated environment.
  • Evidence: downstream_validation.risk_items | https://github.com/tangle-network/browser-agent-driver

5. Security or permission risk: Security or permission risk requires verification

  • Severity: medium
  • Finding: no_demo
  • User impact: May increase setup, validation, or first-run risk for the user.
  • Recommended check: Reproduce the official install and quickstart path in an isolated environment.
  • Evidence: risks.scoring_risks | https://github.com/tangle-network/browser-agent-driver

6. Maintenance risk: Maintenance risk requires verification

  • Severity: low
  • Finding: issue_or_pr_quality=unknown。
  • User impact: May increase setup, validation, or first-run risk for the user.
  • Recommended check: Reproduce the official install and quickstart path in an isolated environment.
  • Evidence: evidence.maintainer_signals | https://github.com/tangle-network/browser-agent-driver

7. Maintenance risk: Maintenance risk requires verification

  • Severity: low
  • Finding: release_recency=unknown。
  • User impact: May increase setup, validation, or first-run risk for the user.
  • Recommended check: Reproduce the official install and quickstart path in an isolated environment.
  • Evidence: evidence.maintainer_signals | https://github.com/tangle-network/browser-agent-driver

Source: Doramagic discovery, validation, and Project Pack records

Community Discussion Evidence

These external discussion links are review inputs, not standalone proof that the project is production-ready.

Sources 12

Count of project-level external discussion links exposed on this manual page.

Use Review before install

Open the linked issues or discussions before treating the pack as ready for your environment.

Community Discussion Evidence

Doramagic exposes project-level community discussion separately from official documentation. Review these links before using browser-agent-driver with real data or production workflows.

Source: Project Pack community evidence and pitfall evidence