# dashboard - Doramagic AI Context Pack

> 定位：安装前体验与判断资产。它帮助宿主 AI 有一个好的开始，但不代表已经安装、执行或验证目标项目。

## 充分原则

- **充分原则，不是压缩原则**：AI Context Pack 应该充分到让宿主 AI 在开工前理解项目价值、能力边界、使用入口、风险和证据来源；它可以分层组织，但不以最短摘要为目标。
- **压缩策略**：只压缩噪声和重复内容，不压缩会影响判断和开工质量的上下文。

## 给宿主 AI 的使用方式

你正在读取 Doramagic 为 dashboard 编译的 AI Context Pack。请把它当作开工前上下文：帮助用户理解适合谁、能做什么、如何开始、哪些必须安装后验证、风险在哪里。不要声称你已经安装、运行或执行了目标项目。

## Claim 消费规则

- **事实来源**：Repo Evidence + Claim/Evidence Graph；Human Wiki 只提供显著性、术语和叙事结构。
- **事实最低状态**：`supported`
- `supported`：可以作为项目事实使用，但回答中必须引用 claim_id 和证据路径。
- `weak`：只能作为低置信度线索，必须要求用户继续核实。
- `inferred`：只能用于风险提示或待确认问题，不能包装成项目事实。
- `unverified`：不得作为事实使用，应明确说证据不足。
- `contradicted`：必须展示冲突来源，不得替用户强行选择一个版本。

## 它最适合谁

- **AI 研究者或研究型 Agent 构建者**：README 明确围绕研究、实验或论文工作流展开。 证据：`README.md` Claim：`clm_0002` supported 0.86
- **正在使用 Claude/Codex/Cursor/Gemini 等宿主 AI 的开发者**：README 或插件配置提到多个宿主 AI。 证据：`README.md` Claim：`clm_0003` supported 0.86

## 它能做什么

- **命令行启动或安装流程**（需要安装后验证）：项目文档中存在可执行命令，真实使用需要在本地或宿主环境中运行这些命令。 证据：`README.md` Claim：`clm_0001` supported 0.86

## 怎么开始

- `pip install forge-guardrails                # core only` 证据：`README.md` Claim：`clm_0004` supported 0.86
- `pip install "forge-guardrails[anthropic]"   # + Anthropic client` 证据：`README.md` Claim：`clm_0005` supported 0.86
- `git clone https://github.com/antoinezambelli/forge.git` 证据：`README.md` Claim：`clm_0006` supported 0.86
- `pip install -e ".[dev]"` 证据：`README.md` Claim：`clm_0007` supported 0.86
- `pip install -e ".[anthropic]"` 证据：`README.md` Claim：`clm_0008` supported 0.86

## 继续前判断卡

- **当前建议**：需要管理员/安全审批
- **为什么**：继续前可能涉及密钥、账号、外部服务或敏感上下文，建议先经过管理员或安全审批。

### 30 秒判断

- **现在怎么做**：需要管理员/安全审批
- **最小安全下一步**：先跑 Prompt Preview；若涉及凭证或企业环境，先审批再试装
- **先别相信**：角色质量和任务匹配不能直接相信。
- **继续会触碰**：角色选择偏差、命令执行、本地环境或项目文件

### 现在可以相信

- **适合人群线索：AI 研究者或研究型 Agent 构建者**（supported）：有 supported claim 或项目证据支撑，但仍不等于真实安装效果。 证据：`README.md` Claim：`clm_0002` supported 0.86
- **适合人群线索：正在使用 Claude/Codex/Cursor/Gemini 等宿主 AI 的开发者**（supported）：有 supported claim 或项目证据支撑，但仍不等于真实安装效果。 证据：`README.md` Claim：`clm_0003` supported 0.86
- **能力存在：命令行启动或安装流程**（supported）：可以相信项目包含这类能力线索；是否适合你的具体任务仍要试用或安装后验证。 证据：`README.md` Claim：`clm_0001` supported 0.86
- **存在 Quick Start / 安装命令线索**（supported）：可以相信项目文档出现过启动或安装入口；不要因此直接在主力环境运行。 证据：`README.md` Claim：`clm_0004` supported 0.86

### 现在还不能相信

- **角色质量和任务匹配不能直接相信。**（unverified）：角色库证明有很多角色，不证明每个角色都适合你的具体任务，也不证明角色能产生高质量结果。
- **不能把角色文案当成真实执行能力。**（unverified）：安装前只能判断角色描述和任务画像是否匹配，不能证明它能在宿主 AI 里完成任务。
- **真实输出质量不能在安装前相信。**（unverified）：Prompt Preview 只能展示引导方式，不能证明真实项目中的结果质量。
- **宿主 AI 版本兼容性不能在安装前相信。**（unverified）：Claude、Cursor、Codex、Gemini 等宿主加载规则和版本差异必须在真实环境验证。
- **不会污染现有宿主 AI 行为，不能直接相信。**（inferred）：Skill、plugin、AGENTS/CLAUDE/GEMINI 指令可能改变宿主 AI 的默认行为。
- **可安全回滚不能默认相信。**（unverified）：除非项目明确提供卸载和恢复说明，否则必须先在隔离环境验证。
- **真实安装后是否与用户当前宿主 AI 版本兼容？**（unverified）：兼容性只能通过实际宿主环境验证。
- **项目输出质量是否满足用户具体任务？**（unverified）：安装前预览只能展示流程和边界，不能替代真实评测。

### 继续会触碰什么

- **角色选择偏差**：用户对任务应该由哪个专家角色处理的判断。 原因：选错角色会让 AI 从错误专业视角回答，浪费时间或误导决策。
- **命令执行**：包管理器、网络下载、本地插件目录、项目配置或用户主目录。 原因：运行第一条命令就可能产生环境改动；必须先判断是否值得跑。 证据：`README.md`
- **本地环境或项目文件**：安装结果、插件缓存、项目配置或本地依赖目录。 原因：安装前无法证明写入范围和回滚方式，需要隔离验证。 证据：`README.md`
- **环境变量 / API Key**：项目入口文档明确出现 API key、token、secret 或账号凭证配置。 原因：如果真实安装需要凭证，应先使用测试凭证并经过权限/合规判断。 证据：`README.md`, `docs/EVAL_GUIDE.md`
- **宿主 AI 上下文**：AI Context Pack、Prompt Preview、Skill 路由、风险规则和项目事实。 原因：导入上下文会影响宿主 AI 后续判断，必须避免把未验证项包装成事实。

### 最小安全下一步

- **先跑 Prompt Preview**：先用交互式试用验证任务画像和角色匹配，不要先导入整套角色库。（适用：任何项目都适用，尤其是输出质量未知时。）
- **只在隔离目录或测试账号试装**：避免安装命令污染主力宿主 AI、真实项目或用户主目录。（适用：存在命令执行、插件配置或本地写入线索时。）
- **不要使用真实生产凭证**：环境变量/API key 一旦进入宿主或工具链，可能产生账号和合规风险。（适用：出现 API、TOKEN、KEY、SECRET 等环境线索时。）
- **安装后只验证一个最小任务**：先验证加载、兼容、输出质量和回滚，再决定是否深用。（适用：准备从试用进入真实工作流时。）

### 退出方式

- **保留安装前状态**：记录原始宿主配置和项目状态，后续才能判断是否可恢复。
- **保留原始角色选择记录**：如果输出偏题，可以回到任务画像阶段重新选择角色，而不是继续沿着错误角色推进。
- **记录安装命令和写入路径**：没有明确卸载说明时，至少要知道哪些目录或配置需要手动清理。
- **准备撤销测试 API key 或 token**：测试凭证泄露或误用时，可以快速止损。
- **如果没有回滚路径，不进入主力环境**：不可回滚是继续前阻断项，不应靠信任或运气继续。

## 哪些只能预览

- 解释项目适合谁和能做什么
- 基于项目文档演示典型对话流程
- 帮助用户判断是否值得安装或继续研究

## 哪些必须安装后验证

- 真实安装 Skill、插件或 CLI
- 执行脚本、修改本地文件或访问外部服务
- 验证真实输出质量、性能和兼容性

## 边界与风险判断卡

- **把安装前预览误认为真实运行**：用户可能高估项目已经完成的配置、权限和兼容性验证。 处理方式：明确区分 prompt_preview_can_do 与 runtime_required。 Claim：`clm_0009` inferred 0.45
- **命令执行会修改本地环境**：安装命令可能写入用户主目录、宿主插件目录或项目配置。 处理方式：先在隔离环境或测试账号中运行。 证据：`README.md` Claim：`clm_0010` supported 0.86
- **待确认**：真实安装后是否与用户当前宿主 AI 版本兼容？。原因：兼容性只能通过实际宿主环境验证。
- **待确认**：项目输出质量是否满足用户具体任务？。原因：安装前预览只能展示流程和边界，不能替代真实评测。
- **待确认**：安装命令是否需要网络、权限或全局写入？。原因：这影响企业环境和个人环境的安装风险。

## 开工前工作上下文

### 加载顺序

- 先读取 how_to_use.host_ai_instruction，建立安装前判断资产的边界。
- 读取 claim_graph_summary，确认事实来自 Claim/Evidence Graph，而不是 Human Wiki 叙事。
- 再读取 intended_users、capabilities 和 quick_start_candidates，判断用户是否匹配。
- 需要执行具体任务时，优先查 role_skill_index，再查 evidence_index。
- 遇到真实安装、文件修改、网络访问、性能或兼容性问题时，转入 risk_card 和 boundaries.runtime_required。

### 任务路由

- **命令行启动或安装流程**：先说明这是安装后验证能力，再给出安装前检查清单。 边界：必须真实安装或运行后验证。 证据：`README.md` Claim：`clm_0001` supported 0.86

### 上下文规模

- 文件总数：150
- 重要文件覆盖：40/150
- 证据索引条目：45
- 角色 / Skill 条目：30

### 证据不足时的处理

- **missing_evidence**：说明证据不足，要求用户提供目标文件、README 段落或安装后验证记录；不要补全事实。
- **out_of_scope_request**：说明该任务超出当前 AI Context Pack 证据范围，并建议用户先查看 Human Manual 或真实安装后验证。
- **runtime_request**：给出安装前检查清单和命令来源，但不要替用户执行命令或声称已执行。
- **source_conflict**：同时展示冲突来源，标记为待核实，不要强行选择一个版本。

## Prompt Recipes

### 适配判断

- 目标：判断这个项目是否适合用户当前任务。
- 预期输出：适配结论、关键理由、证据引用、安装前可预览内容、必须安装后验证内容、下一步建议。

```text
请基于 dashboard 的 AI Context Pack，先问我 3 个必要问题，然后判断它是否适合我的任务。回答必须包含：适合谁、能做什么、不能做什么、是否值得安装、证据来自哪里。所有项目事实必须引用 evidence_refs、source_paths 或 claim_id。
```

### 安装前体验

- 目标：让用户在安装前感受核心工作流，同时避免把预览包装成真实能力或营销承诺。
- 预期输出：一段带边界标签的体验剧本、安装后验证清单和谨慎建议；不含真实运行承诺或强营销表述。

```text
请把 dashboard 当作安装前体验资产，而不是已安装工具或真实运行环境。

请严格输出四段：
1. 先问我 3 个必要问题。
2. 给出一段“体验剧本”：用 [安装前可预览]、[必须安装后验证]、[证据不足] 三种标签展示它可能如何引导工作流。
3. 给出安装后验证清单：列出哪些能力只有真实安装、真实宿主加载、真实项目运行后才能确认。
4. 给出谨慎建议：只能说“值得继续研究/试装”“先补充信息后再判断”或“不建议继续”，不得替项目背书。

硬性边界：
- 不要声称已经安装、运行、执行测试、修改文件或产生真实结果。
- 不要写“自动适配”“确保通过”“完美适配”“强烈建议安装”等承诺性表达。
- 如果描述安装后的工作方式，必须使用“如果安装成功且宿主正确加载 Skill，它可能会……”这种条件句。
- 体验剧本只能写成“示例台词/假设流程”：使用“可能会询问/可能会建议/可能会展示”，不要写“已写入、已生成、已通过、正在运行、正在生成”。
- Prompt Preview 不负责给安装命令；如用户准备试装，只能提示先阅读 Quick Start 和 Risk Card，并在隔离环境验证。
- 所有项目事实必须来自 supported claim、evidence_refs 或 source_paths；inferred/unverified 只能作风险或待确认项。

```

### 角色 / Skill 选择

- 目标：从项目里的角色或 Skill 中挑选最匹配的资产。
- 预期输出：候选角色或 Skill 列表，每项包含适用场景、证据路径、风险边界和是否需要安装后验证。

```text
请读取 role_skill_index，根据我的目标任务推荐 3-5 个最相关的角色或 Skill。每个推荐都要说明适用场景、可能输出、风险边界和 evidence_refs。
```

### 风险预检

- 目标：安装或引入前识别环境、权限、规则冲突和质量风险。
- 预期输出：环境、权限、依赖、许可、宿主冲突、质量风险和未知项的检查清单。

```text
请基于 risk_card、boundaries 和 quick_start_candidates，给我一份安装前风险预检清单。不要替我执行命令，只说明我应该检查什么、为什么检查、失败会有什么影响。
```

### 宿主 AI 开工指令

- 目标：把项目上下文转成一次对话开始前的宿主 AI 指令。
- 预期输出：一段边界明确、证据引用明确、适合复制给宿主 AI 的开工前指令。

```text
请基于 dashboard 的 AI Context Pack，生成一段我可以粘贴给宿主 AI 的开工前指令。这段指令必须遵守 not_runtime=true，不能声称项目已经安装、运行或产生真实结果。
```


## 角色 / Skill 索引

- 共索引 30 个角色 / Skill / 项目文档条目。

- **forge**（project_doc）：! PyPI https://img.shields.io/pypi/v/forge-guardrails.svg https://pypi.org/project/forge-guardrails/ ! Tests https://github.com/antoinezambelli/forge/actions/workflows/tests.yml/badge.svg https://github.com/antoinezambelli/forge/actions/workflows/tests.yml ! codecov https://codecov.io/gh/antoinezambelli/forge/branch/main/graph/badge.svg https://codecov.io/gh/antoinezambelli/forge ! Python 3.12+ https://img.shields.i… 激活提示：当用户需要理解项目结构、安装方式或边界时参考。 证据：`README.md`
- **Contributing to forge**（project_doc）：Thanks for your interest in contributing. This guide covers how to get set up, run tests, and where to look when adding new functionality. 激活提示：当用户需要理解项目结构、安装方式或边界时参考。 证据：`CONTRIBUTING.md`
- **Architecture: Agentic Tool-Calling Library**（project_doc）：Architecture: Agentic Tool-Calling Library 激活提示：当用户需要理解项目结构、安装方式或边界时参考。 证据：`docs/ARCHITECTURE.md`
- **Backend Setup Guide**（project_doc）：How to install and run each LLM backend for forge eval and development. All instructions assume Windows 11 with an NVIDIA GPU 16GB VRAM . 激活提示：当用户需要理解项目结构、安装方式或边界时参考。 证据：`docs/BACKEND_SETUP.md`
- **Eval Guide**（project_doc）：Internal tooling for measuring how reliably a model + backend combo navigates multi-step tool-calling workflows. Not a test suite — run manually against a live backend. 激活提示：当用户需要理解项目结构、安装方式或边界时参考。 证据：`docs/EVAL_GUIDE.md`
- **Model Guide**（project_doc）：Which model and backend to use with forge, based on your hardware and goals. 激活提示：当用户需要理解项目结构、安装方式或边界时参考。 证据：`docs/MODEL_GUIDE.md`
- **User Guide**（project_doc）：Practical usage patterns for forge — from single-turn tool calling to multi-turn conversations. 激活提示：当用户需要理解项目结构、安装方式或边界时参考。 证据：`docs/USER_GUIDE.md`
- **Workflow**（project_doc）：Visual guide to the forge agentic tool-calling loop. 激活提示：当用户需要理解项目结构、安装方式或边界时参考。 证据：`docs/WORKFLOW.md`
- **ADR-001: Ablation Framework**（project_doc）：Status: Implemented az/ablation branch, Feb 2026 激活提示：当用户需要理解项目结构、安装方式或边界时参考。 证据：`docs/decisions/001-ablation-framework.md`
- **ADR-002: Anthropic Baseline Client**（project_doc）：Status: Implemented az/ablation branch, Feb 2026 激活提示：当用户需要理解项目结构、安装方式或边界时参考。 证据：`docs/decisions/002-anthropic-baseline.md`
- **ADR-003: thinking Label UX — Reasoning Capture Gating**（project_doc）：ADR-003: thinking Label UX — Reasoning Capture Gating 激活提示：当用户需要理解项目结构、安装方式或边界时参考。 证据：`docs/decisions/003-thinking-label-ux.md`
- **ADR-004: Async on chunk Callback**（project_doc）：Status: Done implemented on az/async think branch 激活提示：当用户需要理解项目结构、安装方式或边界时参考。 证据：`docs/decisions/004-async-on-chunk.md`
- **ADR-005: Parallel Tool Calling**（project_doc）：Status: Done branch az/parallel tools , commit cd2bd69 激活提示：当用户需要理解项目结构、安装方式或边界时参考。 证据：`docs/decisions/005-parallel-tool-calls.md`
- **ADR-006: Tool Prerequisites**（project_doc）：Status: Accepted and implemented March 2026 激活提示：当用户需要理解项目结构、安装方式或边界时参考。 证据：`docs/decisions/006-tool-prerequisites.md`
- **ADR-007: Report Views**（project_doc）：Status: Planned README roadmap item 5 激活提示：当用户需要理解项目结构、安装方式或边界时参考。 证据：`docs/decisions/007-report-views.md`
- **ADR-008: Stateful Eval Scenarios**（project_doc）：All 11 eval scenarios use argument-blind tool callables. get info query="rome" returns the Paris canned string. check supplier supplier="anything" routes through a fuzzy match but fundamentally returns a static blob. The only exception is error recovery , which validates a 4-digit format — a type check, not stateful behavior. 激活提示：当用户需要理解项目结构、安装方式或边界时参考。 证据：`docs/decisions/008-stateful-eval-scenarios.md`
- **ADR-009: BFCL Integration**（project_doc）：Status: Implemented az/bfcl eval branch, Feb 2026 激活提示：当用户需要理解项目结构、安装方式或边界时参考。 证据：`docs/decisions/009-bfcl-integration.md`
- **ADR-010: ToolResolutionError**（project_doc）：Status: Implemented az/tre branch, Mar 2026 激活提示：当用户需要理解项目结构、安装方式或边界时参考。 证据：`docs/decisions/010-tool-resolution-error.md`
- **ADR-011: Guardrail Middleware — Composable Reliability Without Loop Ownership**（project_doc）：ADR-011: Guardrail Middleware — Composable Reliability Without Loop Ownership 激活提示：当用户需要理解项目结构、安装方式或边界时参考。 证据：`docs/decisions/011-guardrail-middleware.md`
- **ADR-012: OpenAI-Compatible Proxy Server**（project_doc）：ADR-012: OpenAI-Compatible Proxy Server 激活提示：当用户需要理解项目结构、安装方式或边界时参考。 证据：`docs/decisions/012-openai-proxy.md`
- **ADR-013: Text Response Intent -- When the Model Chooses Not to Call Tools**（project_doc）：ADR-013: Text Response Intent -- When the Model Chooses Not to Call Tools 激活提示：当用户需要理解项目结构、安装方式或边界时参考。 证据：`docs/decisions/013-text-response-intent.md`
- **ADR-014: Recommended sampling — opt-in flag and proxy pass-through**（project_doc）：ADR-014: Recommended sampling — opt-in flag and proxy pass-through 激活提示：当用户需要理解项目结构、安装方式或边界时参考。 证据：`docs/decisions/014-recommended-sampling-opt-in.md`
- **Multi-Model Routing — Concept Doc**（project_doc）：Allow forge to manage multiple model backends simultaneously and expose them as named clients to the consumer. Forge handles the pool lifecycle, health, budgets . The consumer handles orchestration which workflow uses which model, when to swap, event dispatch . 激活提示：当用户需要理解项目结构、安装方式或边界时参考。 证据：`docs/decisions/MULTI_MODEL_ROUTING.md`
- **Forge Eval Reports**（project_doc）：For model and backend recommendations, see Model Guide ../MODEL GUIDE.md . 激活提示：当用户需要理解项目结构、安装方式或边界时参考。 证据：`docs/results/index.md`
- **Forge Eval — Native vs Prompt llama-server**（project_doc）：Forge Eval — Native vs Prompt llama-server 激活提示：当用户需要理解项目结构、安装方式或边界时参考。 证据：`docs/results/raw/native-vs-prompt.md`
- **Forge Eval — Reforged vs Bare**（project_doc）：claude-haiku-4-5-20251001 anthropic/native 激活提示：当用户需要理解项目结构、安装方式或边界时参考。 证据：`docs/results/raw/reforged-vs-bare.md`
- **Forge Eval — Reforged Leaderboard**（project_doc）：Scr=score correct/total , Acc=accuracy correct/total, excl validate errors , Cmp=completeness completed/total , Eff=efficiency ideal/actual calls , Wst=avg wasted calls, Spd=avg time excl compaction rel=relevance detection, arg=argument fidelity, tsl=tool selection, b2s=basic 2step, s3s=sequential 3step, crt=conditional routing, srn=sequential reasoning, err=error recovery, dgr=data gap recovery, dge=data gap recove… 激活提示：当用户需要理解项目结构、安装方式或边界时参考。 证据：`docs/results/raw/reforged/all.md`
- **Forge Eval — Reforged by Backend**（project_doc）：Scr=score correct/total , Acc=accuracy correct/total, excl validate errors , Cmp=completeness completed/total , Eff=efficiency ideal/actual calls , Wst=avg wasted calls, Spd=avg time excl compaction rel=relevance detection, arg=argument fidelity, tsl=tool selection, b2s=basic 2step, s3s=sequential 3step, crt=conditional routing, srn=sequential reasoning, err=error recovery, dgr=data gap recovery, dge=data gap recove… 激活提示：当用户需要理解项目结构、安装方式或边界时参考。 证据：`docs/results/raw/reforged/by-backend.md`
- **Forge Eval — Reforged by Model Family**（project_doc）：Forge Eval — Reforged by Model Family 激活提示：当用户需要理解项目结构、安装方式或边界时参考。 证据：`docs/results/raw/reforged/by-family.md`
- **Changelog**（project_doc）：All notable changes to forge are documented here. 激活提示：当用户需要理解项目结构、安装方式或边界时参考。 证据：`CHANGELOG.md`

## 证据索引

- 共索引 45 条证据。

- **forge**（documentation）：! PyPI https://img.shields.io/pypi/v/forge-guardrails.svg https://pypi.org/project/forge-guardrails/ ! Tests https://github.com/antoinezambelli/forge/actions/workflows/tests.yml/badge.svg https://github.com/antoinezambelli/forge/actions/workflows/tests.yml ! codecov https://codecov.io/gh/antoinezambelli/forge/branch/main/graph/badge.svg https://codecov.io/gh/antoinezambelli/forge ! Python 3.12+ https://img.shields.io/badge/python-3.12%2B-blue.svg https://www.python.org/downloads/ ! License: MIT https://img.shields.io/badge/license-MIT-green.svg LICENSE 证据：`README.md`
- **Contributing to forge**（documentation）：Thanks for your interest in contributing. This guide covers how to get set up, run tests, and where to look when adding new functionality. 证据：`CONTRIBUTING.md`
- **Package**（package_manifest）：{ "name": "dashboard", "private": true, "version": "0.0.0", "type": "module", "scripts": { "dev": "vite", "build": "tsc -b && vite build", "lint": "eslint .", "preview": "vite preview" }, "dependencies": { "react": "^19.2.0", "react-dom": "^19.2.0" }, "devDependencies": { "@eslint/js": "^9.39.1", "@tailwindcss/vite": "^4.2.1", "@types/node": "^24.10.1", "@types/react": "^19.2.7", "@types/react-dom": "^19.2.3", "@vitejs/plugin-react": "^5.1.1", "eslint": "^9.39.1", "eslint-plugin-react-hooks": "^7.0.1", "eslint-plugin-react-refresh": "^0.4.24", "globals": "^16.5.0", "tailwindcss": "^4.2.1", "typescript": "~5.9.3", "typescript-eslint": "^8.48.0", "vite": "^7.3.1", "vite-plugin-singlefile": "^… 证据：`tests/eval/dashboard/package.json`
- **License**（source_file）：Copyright c 2025-2026 Antoine Zambelli 证据：`LICENSE`
- **Architecture: Agentic Tool-Calling Library**（documentation）：Architecture: Agentic Tool-Calling Library 证据：`docs/ARCHITECTURE.md`
- **Backend Setup Guide**（documentation）：How to install and run each LLM backend for forge eval and development. All instructions assume Windows 11 with an NVIDIA GPU 16GB VRAM . 证据：`docs/BACKEND_SETUP.md`
- **Eval Guide**（documentation）：Internal tooling for measuring how reliably a model + backend combo navigates multi-step tool-calling workflows. Not a test suite — run manually against a live backend. 证据：`docs/EVAL_GUIDE.md`
- **Model Guide**（documentation）：Which model and backend to use with forge, based on your hardware and goals. 证据：`docs/MODEL_GUIDE.md`
- **User Guide**（documentation）：Practical usage patterns for forge — from single-turn tool calling to multi-turn conversations. 证据：`docs/USER_GUIDE.md`
- **Workflow**（documentation）：Visual guide to the forge agentic tool-calling loop. 证据：`docs/WORKFLOW.md`
- **ADR-001: Ablation Framework**（documentation）：Status: Implemented az/ablation branch, Feb 2026 证据：`docs/decisions/001-ablation-framework.md`
- **ADR-002: Anthropic Baseline Client**（documentation）：Status: Implemented az/ablation branch, Feb 2026 证据：`docs/decisions/002-anthropic-baseline.md`
- **ADR-003: thinking Label UX — Reasoning Capture Gating**（documentation）：ADR-003: thinking Label UX — Reasoning Capture Gating 证据：`docs/decisions/003-thinking-label-ux.md`
- **ADR-004: Async on chunk Callback**（documentation）：Status: Done implemented on az/async think branch 证据：`docs/decisions/004-async-on-chunk.md`
- **ADR-005: Parallel Tool Calling**（documentation）：Status: Done branch az/parallel tools , commit cd2bd69 证据：`docs/decisions/005-parallel-tool-calls.md`
- **ADR-006: Tool Prerequisites**（documentation）：Status: Accepted and implemented March 2026 证据：`docs/decisions/006-tool-prerequisites.md`
- **ADR-007: Report Views**（documentation）：Status: Planned README roadmap item 5 证据：`docs/decisions/007-report-views.md`
- **ADR-008: Stateful Eval Scenarios**（documentation）：All 11 eval scenarios use argument-blind tool callables. get info query="rome" returns the Paris canned string. check supplier supplier="anything" routes through a fuzzy match but fundamentally returns a static blob. The only exception is error recovery , which validates a 4-digit format — a type check, not stateful behavior. 证据：`docs/decisions/008-stateful-eval-scenarios.md`
- **ADR-009: BFCL Integration**（documentation）：Status: Implemented az/bfcl eval branch, Feb 2026 证据：`docs/decisions/009-bfcl-integration.md`
- **ADR-010: ToolResolutionError**（documentation）：Status: Implemented az/tre branch, Mar 2026 证据：`docs/decisions/010-tool-resolution-error.md`
- **ADR-011: Guardrail Middleware — Composable Reliability Without Loop Ownership**（documentation）：ADR-011: Guardrail Middleware — Composable Reliability Without Loop Ownership 证据：`docs/decisions/011-guardrail-middleware.md`
- **ADR-012: OpenAI-Compatible Proxy Server**（documentation）：ADR-012: OpenAI-Compatible Proxy Server 证据：`docs/decisions/012-openai-proxy.md`
- **ADR-013: Text Response Intent -- When the Model Chooses Not to Call Tools**（documentation）：ADR-013: Text Response Intent -- When the Model Chooses Not to Call Tools 证据：`docs/decisions/013-text-response-intent.md`
- **ADR-014: Recommended sampling — opt-in flag and proxy pass-through**（documentation）：ADR-014: Recommended sampling — opt-in flag and proxy pass-through 证据：`docs/decisions/014-recommended-sampling-opt-in.md`
- **Multi-Model Routing — Concept Doc**（documentation）：Allow forge to manage multiple model backends simultaneously and expose them as named clients to the consumer. Forge handles the pool lifecycle, health, budgets . The consumer handles orchestration which workflow uses which model, when to swap, event dispatch . 证据：`docs/decisions/MULTI_MODEL_ROUTING.md`
- **Forge Eval Reports**（documentation）：For model and backend recommendations, see Model Guide ../MODEL GUIDE.md . 证据：`docs/results/index.md`
- **Forge Eval — Native vs Prompt llama-server**（documentation）：Forge Eval — Native vs Prompt llama-server 证据：`docs/results/raw/native-vs-prompt.md`
- **Forge Eval — Reforged vs Bare**（documentation）：claude-haiku-4-5-20251001 anthropic/native 证据：`docs/results/raw/reforged-vs-bare.md`
- **Forge Eval — Reforged Leaderboard**（documentation）：Scr=score correct/total , Acc=accuracy correct/total, excl validate errors , Cmp=completeness completed/total , Eff=efficiency ideal/actual calls , Wst=avg wasted calls, Spd=avg time excl compaction rel=relevance detection, arg=argument fidelity, tsl=tool selection, b2s=basic 2step, s3s=sequential 3step, crt=conditional routing, srn=sequential reasoning, err=error recovery, dgr=data gap recovery, dge=data gap recovery extended, art=argument transformation, grs=grounded synthesis, iar=inconsistent api recovery, rel s=relevance detection stateful, arg s=argument fidelity stateful, tsl s=tool selection stateful, b2s s=basic 2step stateful, s3s s=sequential 3step stateful, crt s=conditional rou… 证据：`docs/results/raw/reforged/all.md`
- **Forge Eval — Reforged by Backend**（documentation）：Scr=score correct/total , Acc=accuracy correct/total, excl validate errors , Cmp=completeness completed/total , Eff=efficiency ideal/actual calls , Wst=avg wasted calls, Spd=avg time excl compaction rel=relevance detection, arg=argument fidelity, tsl=tool selection, b2s=basic 2step, s3s=sequential 3step, crt=conditional routing, srn=sequential reasoning, err=error recovery, dgr=data gap recovery, dge=data gap recovery extended, art=argument transformation, grs=grounded synthesis, iar=inconsistent api recovery, rel s=relevance detection stateful, arg s=argument fidelity stateful, tsl s=tool selection stateful, b2s s=basic 2step stateful, s3s s=sequential 3step stateful, crt s=conditional rou… 证据：`docs/results/raw/reforged/by-backend.md`
- **Forge Eval — Reforged by Model Family**（documentation）：Forge Eval — Reforged by Model Family 证据：`docs/results/raw/reforged/by-family.md`
- **Changelog**（documentation）：All notable changes to forge are documented here. 证据：`CHANGELOG.md`
- **Eval Rigs**（structured_config）：{ "rig-00": {"gpu": "RTX 5070", "platform": "windows"}, "rig-01": {"gpu": "RTX 5070 Ti", "platform": "linux/ubuntu24.04"}, "rig-02": {"gpu": "2x RTX 5070 Ti", "platform": "linux/ubuntu24.04"}, "rig-03": {"gpu": "AMD Strix Halo APU 128GB unified ", "platform": "linux/fedora43"} } 证据：`eval_rigs.json`
- **Tsconfig.App**（structured_config）：{ "compilerOptions": { "tsBuildInfoFile": "./node modules/.tmp/tsconfig.app.tsbuildinfo", "target": "ES2022", "useDefineForClassFields": true, "lib": "ES2022", "DOM", "DOM.Iterable" , "module": "ESNext", "types": "vite/client" , "skipLibCheck": true, 证据：`tests/eval/dashboard/tsconfig.app.json`
- **Tsconfig**（structured_config）：{ "files": , "references": { "path": "./tsconfig.app.json" }, { "path": "./tsconfig.node.json" } } 证据：`tests/eval/dashboard/tsconfig.json`
- **Tsconfig.Node**（structured_config）：{ "compilerOptions": { "tsBuildInfoFile": "./node modules/.tmp/tsconfig.node.tsbuildinfo", "target": "ES2023", "lib": "ES2023" , "module": "ESNext", "types": "node" , "skipLibCheck": true, 证据：`tests/eval/dashboard/tsconfig.node.json`
- **Normalize line endings**（source_file）：Normalize line endings text=auto .py text eol=lf .md text eol=lf .toml text eol=lf .yml text eol=lf .yaml text eol=lf eval results.jsonl filter=lfs diff=lfs merge=lfs -text eval results rig .jsonl filter=lfs diff=lfs merge=lfs -text 证据：`.gitattributes`
- **Python**（source_file）：Python pycache / .py cod .egg-info/ dist/ build/ .egg 证据：`.gitignore`
- **Codecov**（source_file）：comment: false 证据：`codecov.yml`
- **Eval Results**（source_file）：version https://git-lfs.github.com/spec/v1 oid sha256:b4393d257ba3e22c5bac7b4cf7ab9431f85bdd2aafe662f1ffb118a73203bc2c size 67078449 证据：`eval_results.jsonl`
- **=====================================================================**（source_file）："""Using forge's guardrail middleware in your own agentic loop. 证据：`examples/foreign_loop.py`
- **Integration-only: lifecycle orchestration requiring real backends/threads.**（source_file）：build-system requires = "hatchling" build-backend = "hatchling.build" 证据：`pyproject.toml`
- **Pinned-in-time translation tables. These are the GGUF MAP and LLAMAFILE MAP**（source_file）："""One-shot migration: rewrite llamaserver/llamafile rows to GGUF-stem identity. 证据：`scripts/migrate_eval_jsonl_gguf_identity.py`
- **Rig-00 plan for the model-params re-run: all Qwen3 variants × all backends,**（source_file）："""Unattended ablation study runner. 证据：`scripts/run_ablation.py`
- **Read request**（source_file）："""Smoke test for the proxy — starts proxy in external mode against a mock backend, sends one request, verifies the response. 证据：`scripts/smoke_test_proxy.py`

## 宿主 AI 必须遵守的规则

- **把本资产当作开工前上下文，而不是运行环境。**：AI Context Pack 只包含证据化项目理解，不包含目标项目的可执行状态。 证据：`README.md`, `CONTRIBUTING.md`, `tests/eval/dashboard/package.json`
- **回答用户时区分可预览内容与必须安装后才能验证的内容。**：安装前体验的消费者价值来自降低误装和误判，而不是伪装成真实运行。 证据：`README.md`, `CONTRIBUTING.md`, `tests/eval/dashboard/package.json`

## 用户开工前应该回答的问题

- 你准备在哪个宿主 AI 或本地环境中使用它？
- 你只是想先体验工作流，还是准备真实安装？
- 你最在意的是安装成本、输出质量、还是和现有规则的冲突？

## 验收标准

- 所有能力声明都能回指到 evidence_refs 中的文件路径。
- AI_CONTEXT_PACK.md 没有把预览包装成真实运行。
- 用户能在 3 分钟内看懂适合谁、能做什么、如何开始和风险边界。

---

## Doramagic Context Augmentation

下面内容用于强化 Repomix/AI Context Pack 主体。Human Manual 只提供阅读骨架；踩坑日志会被转成宿主 AI 必须遵守的工作约束。

## Human Manual 骨架

使用规则：这里只是项目阅读路线和显著性信号，不是事实权威。具体事实仍必须回到 repo evidence / Claim Graph。

宿主 AI 硬性规则：
- 不得把页标题、章节顺序、摘要或 importance 当作项目事实证据。
- 解释 Human Manual 骨架时，必须明确说它只是阅读路线/显著性信号。
- 能力、安装、兼容性、运行状态和风险判断必须引用 repo evidence、source path 或 Claim Graph。

- **Introduction to Forge**：importance `high`
  - source_paths: README.md, src/forge/__init__.py
- **Installation Guide**：importance `high`
  - source_paths: README.md, pyproject.toml
- **Quick Start Guide**：importance `high`
  - source_paths: README.md, src/forge/core/workflow.py, src/forge/core/runner.py
- **System Architecture**：importance `high`
  - source_paths: docs/ARCHITECTURE.md, src/forge/core/runner.py, src/forge/context/manager.py
- **Module Structure and API**：importance `high`
  - source_paths: src/forge/__init__.py, src/forge/core/messages.py, src/forge/core/workflow.py, src/forge/errors.py
- **Architecture Decision Records**：importance `medium`
  - source_paths: docs/decisions/001-ablation-framework.md, docs/decisions/011-guardrail-middleware.md, docs/decisions/013-text-response-intent.md
- **WorkflowRunner and Agentic Loop**：importance `high`
  - source_paths: src/forge/core/runner.py, src/forge/core/workflow.py, src/forge/core/inference.py, docs/WORKFLOW.md
- **Guardrails Middleware for External Loops**：importance `high`
  - source_paths: src/forge/guardrails/guardrails.py, src/forge/guardrails/__init__.py, examples/foreign_loop.py

## Repo Inspection Evidence / 源码检查证据

- repo_clone_verified: true
- repo_inspection_verified: true
- repo_commit: `f1b87b05b863c7d12927f3dbdbd716af2dc3ace1`
- inspected_files: `pyproject.toml`, `README.md`, `docs/USER_GUIDE.md`, `docs/ARCHITECTURE.md`, `docs/EVAL_GUIDE.md`, `docs/MODEL_GUIDE.md`, `docs/BACKEND_SETUP.md`, `docs/WORKFLOW.md`, `docs/results/index.md`, `docs/decisions/006-tool-prerequisites.md`, `docs/decisions/002-anthropic-baseline.md`, `docs/decisions/MULTI_MODEL_ROUTING.md`, `docs/decisions/009-bfcl-integration.md`, `docs/decisions/010-tool-resolution-error.md`, `docs/decisions/013-text-response-intent.md`, `docs/decisions/003-thinking-label-ux.md`, `docs/decisions/004-async-on-chunk.md`, `docs/decisions/012-openai-proxy.md`, `docs/decisions/001-ablation-framework.md`, `docs/decisions/005-parallel-tool-calls.md`

宿主 AI 硬性规则：
- 没有 repo_clone_verified=true 时，不得声称已经读过源码。
- 没有 repo_inspection_verified=true 时，不得把 README/docs/package 文件判断写成事实。
- 没有 quick_start_verified=true 时，不得声称 Quick Start 已跑通。

## Doramagic Pitfall Constraints / 踩坑约束

这些规则来自 Doramagic 发现、验证或编译过程中的项目专属坑点。宿主 AI 必须把它们当作工作约束，而不是普通说明文字。

### Constraint 1: 来源证据：Client sampling params: thread top_p/top_k/min_p/repeat_penalty through request body

- Trigger: GitHub 社区证据显示该项目存在一个安装相关的待验证问题：Client sampling params: thread top_p/top_k/min_p/repeat_penalty through request body
- Host AI rule: 来源显示可能已有修复、规避或版本变化，说明书中必须标注适用版本。
- Why it matters: 可能增加新用户试用和生产接入成本。
- Evidence: community_evidence:github | cevd_148dff87195e42549d0ffb88b99e9cbf | https://github.com/antoinezambelli/forge/issues/58 | 来源类型 github_issue 暴露的待验证使用条件。
- Hard boundary: 不要把这个坑点包装成已解决、已验证或可忽略，除非后续验证证据明确证明它已经关闭。

### Constraint 2: 来源证据：Investigate: integration paths with Hermes Agent

- Trigger: GitHub 社区证据显示该项目存在一个安装相关的待验证问题：Investigate: integration paths with Hermes Agent
- Host AI rule: 来源问题仍为 open，Pack Agent 需要复核是否仍影响当前版本。
- Why it matters: 可能增加新用户试用和生产接入成本。
- Evidence: community_evidence:github | cevd_e3cbd2d1c9a84a1887887bf24b036865 | https://github.com/antoinezambelli/forge/issues/51 | 来源类型 github_issue 暴露的待验证使用条件。
- Hard boundary: 不要把这个坑点包装成已解决、已验证或可忽略，除非后续验证证据明确证明它已经关闭。

### Constraint 3: 来源证据：Per-model recommended sampling defaults (map keyed by HF model cards)

- Trigger: GitHub 社区证据显示该项目存在一个安装相关的待验证问题：Per-model recommended sampling defaults (map keyed by HF model cards)
- Host AI rule: 来源显示可能已有修复、规避或版本变化，说明书中必须标注适用版本。
- Why it matters: 可能阻塞安装或首次运行。
- Evidence: community_evidence:github | cevd_057ca2af912e4a608259ffb2a3654d4f | https://github.com/antoinezambelli/forge/issues/59 | 来源讨论提到 python 相关条件，需在安装/试用前复核。
- Hard boundary: 不要把这个坑点包装成已解决、已验证或可忽略，除非后续验证证据明确证明它已经关闭。

### Constraint 4: 来源证据：Rescue-parse ChatGPT-style XML tool calls

- Trigger: GitHub 社区证据显示该项目存在一个安装相关的待验证问题：Rescue-parse ChatGPT-style XML tool calls
- Host AI rule: 来源显示可能已有修复、规避或版本变化，说明书中必须标注适用版本。
- Why it matters: 可能增加新用户试用和生产接入成本。
- Evidence: community_evidence:github | cevd_471c674c8d73451da75d6b8c9349aabf | https://github.com/antoinezambelli/forge/issues/55 | 来源类型 github_issue 暴露的待验证使用条件。
- Hard boundary: 不要把这个坑点包装成已解决、已验证或可忽略，除非后续验证证据明确证明它已经关闭。

### Constraint 5: 来源证据：Proxy external mode hardcodes native FC — no prompt-injection fallback

- Trigger: GitHub 社区证据显示该项目存在一个配置相关的待验证问题：Proxy external mode hardcodes native FC — no prompt-injection fallback
- Host AI rule: 来源问题仍为 open，Pack Agent 需要复核是否仍影响当前版本。
- Why it matters: 可能增加新用户试用和生产接入成本。
- Evidence: community_evidence:github | cevd_f3a85ec8447a4838b3bc4c846cd9e7a0 | https://github.com/antoinezambelli/forge/issues/53 | 来源类型 github_issue 暴露的待验证使用条件。
- Hard boundary: 不要把这个坑点包装成已解决、已验证或可忽略，除非后续验证证据明确证明它已经关闭。

### Constraint 6: 能力判断依赖假设

- Trigger: README/documentation is current enough for a first validation pass.
- Host AI rule: 将假设转成下游验证清单。
- Why it matters: 假设不成立时，用户拿不到承诺的能力。
- Evidence: capability.assumptions | hn_item:48192383 | https://news.ycombinator.com/item?id=48192383 | README/documentation is current enough for a first validation pass.
- Hard boundary: 不要把这个坑点包装成已解决、已验证或可忽略，除非后续验证证据明确证明它已经关闭。

### Constraint 7: 维护活跃度未知

- Trigger: 未记录 last_activity_observed。
- Host AI rule: 补 GitHub 最近 commit、release、issue/PR 响应信号。
- Why it matters: 新项目、停更项目和活跃项目会被混在一起，推荐信任度下降。
- Evidence: evidence.maintainer_signals | hn_item:48192383 | https://news.ycombinator.com/item?id=48192383 | last_activity_observed missing
- Hard boundary: 不要把这个坑点包装成已解决、已验证或可忽略，除非后续验证证据明确证明它已经关闭。

### Constraint 8: 下游验证发现风险项

- Trigger: no_demo
- Host AI rule: 进入安全/权限治理复核队列。
- Why it matters: 下游已经要求复核，不能在页面中弱化。
- Evidence: downstream_validation.risk_items | hn_item:48192383 | https://news.ycombinator.com/item?id=48192383 | no_demo; severity=medium
- Hard boundary: 不要把这个坑点包装成已解决、已验证或可忽略，除非后续验证证据明确证明它已经关闭。

### Constraint 9: 存在评分风险

- Trigger: no_demo
- Host AI rule: 把风险写入边界卡，并确认是否需要人工复核。
- Why it matters: 风险会影响是否适合普通用户安装。
- Evidence: risks.scoring_risks | hn_item:48192383 | https://news.ycombinator.com/item?id=48192383 | no_demo; severity=medium
- Hard boundary: 不要把这个坑点包装成已解决、已验证或可忽略，除非后续验证证据明确证明它已经关闭。

### Constraint 10: 来源证据：Hardware detection: AMD unified-memory rigs fall through to 4K Ollama budget

- Trigger: GitHub 社区证据显示该项目存在一个安全/权限相关的待验证问题：Hardware detection: AMD unified-memory rigs fall through to 4K Ollama budget
- Host AI rule: 来源问题仍为 open，Pack Agent 需要复核是否仍影响当前版本。
- Why it matters: 可能影响授权、密钥配置或安全边界。
- Evidence: community_evidence:github | cevd_4ad226a6d1fa4a5f89fa7702bec11188 | https://github.com/antoinezambelli/forge/issues/61 | 来源讨论提到 python 相关条件，需在安装/试用前复核。
- Hard boundary: 不要把这个坑点包装成已解决、已验证或可忽略，除非后续验证证据明确证明它已经关闭。