# pxpipe - Doramagic AI Context Pack

> 定位：安装前体验与判断资产。它帮助宿主 AI 有一个好的开始，但不代表已经安装、执行或验证目标项目。

## 充分原则

- **充分原则，不是压缩原则**：AI Context Pack 应该充分到让宿主 AI 在开工前理解项目价值、能力边界、使用入口、风险和证据来源；它可以分层组织，但不以最短摘要为目标。
- **压缩策略**：只压缩噪声和重复内容，不压缩会影响判断和开工质量的上下文。

## 给宿主 AI 的使用方式

你正在读取 Doramagic 为 pxpipe 编译的 AI Context Pack。请把它当作开工前上下文：帮助用户理解适合谁、能做什么、如何开始、哪些必须安装后验证、风险在哪里。不要声称你已经安装、运行或执行了目标项目。

## Claim 消费规则

- **事实来源**：Repo Evidence + Claim/Evidence Graph；Human Wiki 只提供显著性、术语和叙事结构。
- **事实最低状态**：`supported`
- `supported`：可以作为项目事实使用，但回答中必须引用 claim_id 和证据路径。
- `weak`：只能作为低置信度线索，必须要求用户继续核实。
- `inferred`：只能用于风险提示或待确认问题，不能包装成项目事实。
- `unverified`：不得作为事实使用，应明确说证据不足。
- `contradicted`：必须展示冲突来源，不得替用户强行选择一个版本。

## 它最适合谁

- **正在使用 Claude/Codex/Cursor/Gemini 等宿主 AI 的开发者**：README 或插件配置提到多个宿主 AI。 证据：`README.md` Claim：`clm_0002` supported 0.86

## 它能做什么

- **命令行启动或安装流程**（需要安装后验证）：项目文档中存在可执行命令，真实使用需要在本地或宿主环境中运行这些命令。 证据：`README.md` Claim：`clm_0001` supported 0.86

## 怎么开始

- `npx pxpipe-proxy                                  # proxy on 127.0.0.1:47821` 证据：`README.md` Claim：`clm_0003` supported 0.86

## 继续前判断卡

- **当前建议**：先做角色匹配试用
- **为什么**：这个项目更像角色库，核心风险是选错角色或把角色文案当执行能力；先用 Prompt Preview 试角色匹配，再决定是否沙盒导入。

### 30 秒判断

- **现在怎么做**：先做角色匹配试用
- **最小安全下一步**：先用 Prompt Preview 试角色匹配；满意后再隔离导入
- **先别相信**：角色质量和任务匹配不能直接相信。
- **继续会触碰**：角色选择偏差、命令执行、本地环境或项目文件

### 现在可以相信

- **适合人群线索：正在使用 Claude/Codex/Cursor/Gemini 等宿主 AI 的开发者**（supported）：有 supported claim 或项目证据支撑，但仍不等于真实安装效果。 证据：`README.md` Claim：`clm_0002` supported 0.86
- **能力存在：命令行启动或安装流程**（supported）：可以相信项目包含这类能力线索；是否适合你的具体任务仍要试用或安装后验证。 证据：`README.md` Claim：`clm_0001` supported 0.86
- **存在 Quick Start / 安装命令线索**（supported）：可以相信项目文档出现过启动或安装入口；不要因此直接在主力环境运行。 证据：`README.md` Claim：`clm_0003` supported 0.86

### 现在还不能相信

- **角色质量和任务匹配不能直接相信。**（unverified）：角色库证明有很多角色，不证明每个角色都适合你的具体任务，也不证明角色能产生高质量结果。
- **不能把角色文案当成真实执行能力。**（unverified）：安装前只能判断角色描述和任务画像是否匹配，不能证明它能在宿主 AI 里完成任务。
- **真实输出质量不能在安装前相信。**（unverified）：Prompt Preview 只能展示引导方式，不能证明真实项目中的结果质量。
- **宿主 AI 版本兼容性不能在安装前相信。**（unverified）：Claude、Cursor、Codex、Gemini 等宿主加载规则和版本差异必须在真实环境验证。
- **不会污染现有宿主 AI 行为，不能直接相信。**（inferred）：Skill、plugin、AGENTS/CLAUDE/GEMINI 指令可能改变宿主 AI 的默认行为。
- **可安全回滚不能默认相信。**（unverified）：除非项目明确提供卸载和恢复说明，否则必须先在隔离环境验证。
- **真实安装后是否与用户当前宿主 AI 版本兼容？**（unverified）：兼容性只能通过实际宿主环境验证。
- **项目输出质量是否满足用户具体任务？**（unverified）：安装前预览只能展示流程和边界，不能替代真实评测。

### 继续会触碰什么

- **角色选择偏差**：用户对任务应该由哪个专家角色处理的判断。 原因：选错角色会让 AI 从错误专业视角回答，浪费时间或误导决策。
- **命令执行**：包管理器、网络下载、本地插件目录、项目配置或用户主目录。 原因：运行第一条命令就可能产生环境改动；必须先判断是否值得跑。 证据：`README.md`
- **本地环境或项目文件**：安装结果、插件缓存、项目配置或本地依赖目录。 原因：安装前无法证明写入范围和回滚方式，需要隔离验证。 证据：`README.md`
- **宿主 AI 上下文**：AI Context Pack、Prompt Preview、Skill 路由、风险规则和项目事实。 原因：导入上下文会影响宿主 AI 后续判断，必须避免把未验证项包装成事实。

### 最小安全下一步

- **先跑 Prompt Preview**：先用交互式试用验证任务画像和角色匹配，不要先导入整套角色库。（适用：任何项目都适用，尤其是输出质量未知时。）
- **只在隔离目录或测试账号试装**：避免安装命令污染主力宿主 AI、真实项目或用户主目录。（适用：存在命令执行、插件配置或本地写入线索时。）
- **安装后只验证一个最小任务**：先验证加载、兼容、输出质量和回滚，再决定是否深用。（适用：准备从试用进入真实工作流时。）

### 退出方式

- **保留安装前状态**：记录原始宿主配置和项目状态，后续才能判断是否可恢复。
- **保留原始角色选择记录**：如果输出偏题，可以回到任务画像阶段重新选择角色，而不是继续沿着错误角色推进。
- **记录安装命令和写入路径**：没有明确卸载说明时，至少要知道哪些目录或配置需要手动清理。
- **如果没有回滚路径，不进入主力环境**：不可回滚是继续前阻断项，不应靠信任或运气继续。

## 哪些只能预览

- 解释项目适合谁和能做什么
- 基于项目文档演示典型对话流程
- 帮助用户判断是否值得安装或继续研究

## 哪些必须安装后验证

- 真实安装 Skill、插件或 CLI
- 执行脚本、修改本地文件或访问外部服务
- 验证真实输出质量、性能和兼容性

## 边界与风险判断卡

- **把安装前预览误认为真实运行**：用户可能高估项目已经完成的配置、权限和兼容性验证。 处理方式：明确区分 prompt_preview_can_do 与 runtime_required。 Claim：`clm_0004` inferred 0.45
- **命令执行会修改本地环境**：安装命令可能写入用户主目录、宿主插件目录或项目配置。 处理方式：先在隔离环境或测试账号中运行。 证据：`README.md` Claim：`clm_0005` supported 0.86
- **待确认**：真实安装后是否与用户当前宿主 AI 版本兼容？。原因：兼容性只能通过实际宿主环境验证。
- **待确认**：项目输出质量是否满足用户具体任务？。原因：安装前预览只能展示流程和边界，不能替代真实评测。
- **待确认**：安装命令是否需要网络、权限或全局写入？。原因：这影响企业环境和个人环境的安装风险。

## 开工前工作上下文

### 加载顺序

- 先读取 how_to_use.host_ai_instruction，建立安装前判断资产的边界。
- 读取 claim_graph_summary，确认事实来自 Claim/Evidence Graph，而不是 Human Wiki 叙事。
- 再读取 intended_users、capabilities 和 quick_start_candidates，判断用户是否匹配。
- 需要执行具体任务时，优先查 role_skill_index，再查 evidence_index。
- 遇到真实安装、文件修改、网络访问、性能或兼容性问题时，转入 risk_card 和 boundaries.runtime_required。

### 任务路由

- **命令行启动或安装流程**：先说明这是安装后验证能力，再给出安装前检查清单。 边界：必须真实安装或运行后验证。 证据：`README.md` Claim：`clm_0001` supported 0.86

### 上下文规模

- 文件总数：298
- 重要文件覆盖：40/298
- 证据索引条目：79
- 角色 / Skill 条目：29

### 证据不足时的处理

- **missing_evidence**：说明证据不足，要求用户提供目标文件、README 段落或安装后验证记录；不要补全事实。
- **out_of_scope_request**：说明该任务超出当前 AI Context Pack 证据范围，并建议用户先查看 Human Manual 或真实安装后验证。
- **runtime_request**：给出安装前检查清单和命令来源，但不要替用户执行命令或声称已执行。
- **source_conflict**：同时展示冲突来源，标记为待核实，不要强行选择一个版本。

## Prompt Recipes

### 适配判断

- 目标：判断这个项目是否适合用户当前任务。
- 预期输出：适配结论、关键理由、证据引用、安装前可预览内容、必须安装后验证内容、下一步建议。

```text
请基于 pxpipe 的 AI Context Pack，先问我 3 个必要问题，然后判断它是否适合我的任务。回答必须包含：适合谁、能做什么、不能做什么、是否值得安装、证据来自哪里。所有项目事实必须引用 evidence_refs、source_paths 或 claim_id。
```

### 安装前体验

- 目标：让用户在安装前感受核心工作流，同时避免把预览包装成真实能力或营销承诺。
- 预期输出：一段带边界标签的体验剧本、安装后验证清单和谨慎建议；不含真实运行承诺或强营销表述。

```text
请把 pxpipe 当作安装前体验资产，而不是已安装工具或真实运行环境。

请严格输出四段：
1. 先问我 3 个必要问题。
2. 给出一段“体验剧本”：用 [安装前可预览]、[必须安装后验证]、[证据不足] 三种标签展示它可能如何引导工作流。
3. 给出安装后验证清单：列出哪些能力只有真实安装、真实宿主加载、真实项目运行后才能确认。
4. 给出谨慎建议：只能说“值得继续研究/试装”“先补充信息后再判断”或“不建议继续”，不得替项目背书。

硬性边界：
- 不要声称已经安装、运行、执行测试、修改文件或产生真实结果。
- 不要写“自动适配”“确保通过”“完美适配”“强烈建议安装”等承诺性表达。
- 如果描述安装后的工作方式，必须使用“如果安装成功且宿主正确加载 Skill，它可能会……”这种条件句。
- 体验剧本只能写成“示例台词/假设流程”：使用“可能会询问/可能会建议/可能会展示”，不要写“已写入、已生成、已通过、正在运行、正在生成”。
- Prompt Preview 不负责给安装命令；如用户准备试装，只能提示先阅读 Quick Start 和 Risk Card，并在隔离环境验证。
- 所有项目事实必须来自 supported claim、evidence_refs 或 source_paths；inferred/unverified 只能作风险或待确认项。

```

### 角色 / Skill 选择

- 目标：从项目里的角色或 Skill 中挑选最匹配的资产。
- 预期输出：候选角色或 Skill 列表，每项包含适用场景、证据路径、风险边界和是否需要安装后验证。

```text
请读取 role_skill_index，根据我的目标任务推荐 3-5 个最相关的角色或 Skill。每个推荐都要说明适用场景、可能输出、风险边界和 evidence_refs。
```

### 风险预检

- 目标：安装或引入前识别环境、权限、规则冲突和质量风险。
- 预期输出：环境、权限、依赖、许可、宿主冲突、质量风险和未知项的检查清单。

```text
请基于 risk_card、boundaries 和 quick_start_candidates，给我一份安装前风险预检清单。不要替我执行命令，只说明我应该检查什么、为什么检查、失败会有什么影响。
```

### 宿主 AI 开工指令

- 目标：把项目上下文转成一次对话开始前的宿主 AI 指令。
- 预期输出：一段边界明确、证据引用明确、适合复制给宿主 AI 的开工前指令。

```text
请基于 pxpipe 的 AI Context Pack，生成一段我可以粘贴给宿主 AI 的开工前指令。这段指令必须遵守 not_runtime=true，不能声称项目已经安装、运行或产生真实结果。
```

## 角色 / Skill 索引

- 共索引 29 个角色 / Skill / 项目文档条目。

- **pxpipe**（project_doc）：Cut Claude Code's input tokens by rendering bulky context as images — the same system prompt, tool docs, and history, in a fraction of the tokens. 激活提示：当用户需要理解项目结构、安装方式或边界时参考。 证据：`README.md`
- **pxpipe demos**（project_doc）：Two demos, two questions, two honest verdicts. 激活提示：当用户需要理解项目结构、安装方式或边界时参考。 证据：`demo/README.md`
- **Reflow Eval Harness**（project_doc）：Evaluation harness for the reflow image-rendering mode in pxpipe. 激活提示：当用户需要理解项目结构、安装方式或边界时参考。 证据：`eval/README.md`
- **Demo 1 — cost A/B**（project_doc）：What it measures: does pxpipe cost less on a real coding task? Honest verdict: ~break-even on cost. The compression is real ~55% fewer real tokens, verified but it lands in cache read — cheap at $ 0.1× , and its weight against a Pro/Max weekly cap is unpublished. The capability story is in ../effective-context/ ../effective-context/README.md . 激活提示：当用户需要理解项目结构、安装方式或边界时参考。 证据：`demo/cost-ab/README.md`
- **pricing-engine**（project_doc）：A small order-pricing library. Computes an order total from line items, a volume discount, a loyalty-tier discount, and tax. 激活提示：当用户需要理解项目结构、安装方式或边界时参考。 证据：`demo/cost-ab/template/README.md`
- **Demo 2 — effective context recall at scale**（project_doc）：Demo 2 — effective context recall at scale 激活提示：当用户需要理解项目结构、安装方式或边界时参考。 证据：`demo/effective-context/README.md`
- **Gist-recall A/B: does the model lose information when history is imaged?**（project_doc）：Gist-recall A/B: does the model lose information when history is imaged? 激活提示：当用户需要理解项目结构、安装方式或边界时参考。 证据：`eval/gist-recall/README.md`
- **Per-glyph resolution sweep — why Opus misreads pxpipe renders**（project_doc）：Per-glyph resolution sweep — why Opus misreads pxpipe renders 激活提示：当用户需要理解项目结构、安装方式或边界时参考。 证据：`eval/glyph-matrix/sweep/README.md`
- **reading-fidelity eval — does the model actually read pxpipe's image?**（project_doc）：reading-fidelity eval — does the model actually read pxpipe's image? 激活提示：当用户需要理解项目结构、安装方式或边界时参考。 证据：`eval/gsm8k/README.md`
- **needle-haystack eval**（project_doc）：Receipts for the needle eval. It measures the worst case for a lossy compressor exact recovery of a random fact from imaged content , not the whole product. Its "dead" conclusion was later reversed on live measurement — see the correction in /FINDINGS.md ../../FINDINGS.md . 激活提示：当用户需要理解项目结构、安装方式或边界时参考。 证据：`eval/needle-haystack/README.md`
- **SWE-bench Pro - pxpipe ON vs OFF**（project_doc）：Expansion to 19 pairs + navidrome replication 2026-06-11 激活提示：当用户需要理解项目结构、安装方式或边界时参考。 证据：`eval/swe-bench-pro/README.md`
- **SWE-bench Lite pilot — pxpipe ON vs OFF**（project_doc）：SWE-bench Lite pilot — pxpipe ON vs OFF 激活提示：当用户需要理解项目结构、安装方式或边界时参考。 证据：`eval/swe-bench/README.md`
- **Adaptive chars-per-token plan Task 18**（project_doc）：Adaptive chars-per-token plan Task 18 激活提示：当用户需要理解项目结构、安装方式或边界时参考。 证据：`docs/ADAPTIVE_CPT_PLAN.md`
- **Prompt-Caching Alignment And Honest Savings Math**（project_doc）：Prompt-Caching Alignment And Honest Savings Math 激活提示：当用户需要理解项目结构、安装方式或边界时参考。 证据：`docs/CACHING_AND_SAVINGS.md`
- **How imaged history stays cache-safe as a conversation grows**（project_doc）：How imaged history stays cache-safe as a conversation grows 激活提示：当用户需要理解项目结构、安装方式或边界时参考。 证据：`docs/HISTORY_CACHE_MODEL.md`
- **Imaged-text legibility audit — 2026-07-01**（project_doc）：Imaged-text legibility audit — 2026-07-01 激活提示：当用户需要理解项目结构、安装方式或边界时参考。 证据：`docs/LEGIBILITY-AUDIT-2026-07-01.md`
- **How pxpipe sizes a rendered image — rules, reasons, and history**（project_doc）：How pxpipe sizes a rendered image — rules, reasons, and history 激活提示：当用户需要理解项目结构、安装方式或边界时参考。 证据：`docs/RENDER_SIZING.md`
- **How pxpipe compresses Claude Code requests**（project_doc）：How pxpipe compresses Claude Code requests 激活提示：当用户需要理解项目结构、安装方式或边界时参考。 证据：`docs/TRANSFORM_INFO.md`
- **Changelog**（project_doc）：All notable changes to pxpipe are documented here. This project adheres to Semantic Versioning https://semver.org/ pre-1.0: minor = features / behavioral changes, patch = fixes . 激活提示：当用户需要理解项目结构、安装方式或边界时参考。 证据：`CHANGELOG.md`
- **FINDINGS — pxpipe text→PNG token compression**（project_doc）：FINDINGS — pxpipe text→PNG token compression 激活提示：当用户需要理解项目结构、安装方式或边界时参考。 证据：`FINDINGS.md`
- **Packed-reflow legibility experiments**（project_doc）：Packed-reflow legibility experiments 激活提示：当用户需要理解项目结构、安装方式或边界时参考。 证据：`eval/EXPERIMENT_LOG.md`
- **Pricing Engine — Specification**（project_doc）：orderTotalCents items, tier returns the final order total as an integer number of cents . 激活提示：当用户需要理解项目结构、安装方式或边界时参考。 证据：`demo/cost-ab/template/SPEC.md`
- **Effective-context needle test — attempt log**（project_doc）：Effective-context needle test — attempt log 激活提示：当用户需要理解项目结构、安装方式或边界时参考。 证据：`demo/effective-context/ATTEMPTS.md`
- **Glyph confusion matrix + render-style A/B Task 7 — PLANNED, paused for usage budget**（project_doc）：Glyph confusion matrix + render-style A/B Task 7 — PLANNED, paused for usage budget 激活提示：当用户需要理解项目结构、安装方式或边界时参考。 证据：`eval/glyph-matrix/PLAN.md`
- **L1 OCR Fidelity Report**（project_doc）：Generated: 2026-05-22T03:36:29.045Z Model: opus Dry run: false Blocks evaluated: 20 激活提示：当用户需要理解项目结构、安装方式或边界时参考。 证据：`eval/results-opus/l1-report.md`
- **L2 Session Replay Report**（project_doc）：Generated: 2026-05-22T03:48:55.794Z Replay model: opus Judge model: opus Dry run: false Sessions evaluated: 10 激活提示：当用户需要理解项目结构、安装方式或边界时参考。 证据：`eval/results-opus/l2-report.md`
- **L1 OCR Fidelity Report**（project_doc）：Generated: 2026-05-23T01:54:01.508Z Model: opus Dry run: false Blocks evaluated: 20 激活提示：当用户需要理解项目结构、安装方式或边界时参考。 证据：`eval/results/l1-report.md`
- **L2 Session Replay Report**（project_doc）：Generated: 2026-05-22T18:09:09.056Z Replay model: opus Judge model: opus Dry run: false Sessions evaluated: 10 激活提示：当用户需要理解项目结构、安装方式或边界时参考。 证据：`eval/results/l2-report.md`
- **Reflow Eval — Combined Summary Report**（project_doc）：Reflow Eval — Combined Summary Report 激活提示：当用户需要理解项目结构、安装方式或边界时参考。 证据：`eval/results/summary.md`

## 证据索引

- 共索引 79 条证据。

- **pxpipe**（documentation）：Cut Claude Code's input tokens by rendering bulky context as images — the same system prompt, tool docs, and history, in a fraction of the tokens. 证据：`README.md`
- **pxpipe demos**（documentation）：Two demos, two questions, two honest verdicts. 证据：`demo/README.md`
- **Reflow Eval Harness**（documentation）：Evaluation harness for the reflow image-rendering mode in pxpipe. 证据：`eval/README.md`
- **Demo 1 — cost A/B**（documentation）：What it measures: does pxpipe cost less on a real coding task? Honest verdict: ~break-even on cost. The compression is real ~55% fewer real tokens, verified but it lands in cache read — cheap at $ 0.1× , and its weight against a Pro/Max weekly cap is unpublished. The capability story is in ../effective-context/ ../effective-context/README.md . 证据：`demo/cost-ab/README.md`
- **pricing-engine**（documentation）：A small order-pricing library. Computes an order total from line items, a volume discount, a loyalty-tier discount, and tax. 证据：`demo/cost-ab/template/README.md`
- **Demo 2 — effective context recall at scale**（documentation）：Demo 2 — effective context recall at scale 证据：`demo/effective-context/README.md`
- **Gist-recall A/B: does the model lose information when history is imaged?**（documentation）：Gist-recall A/B: does the model lose information when history is imaged? 证据：`eval/gist-recall/README.md`
- **Per-glyph resolution sweep — why Opus misreads pxpipe renders**（documentation）：Per-glyph resolution sweep — why Opus misreads pxpipe renders 证据：`eval/glyph-matrix/sweep/README.md`
- **reading-fidelity eval — does the model actually read pxpipe's image?**（documentation）：reading-fidelity eval — does the model actually read pxpipe's image? 证据：`eval/gsm8k/README.md`
- **needle-haystack eval**（documentation）：Receipts for the needle eval. It measures the worst case for a lossy compressor exact recovery of a random fact from imaged content , not the whole product. Its "dead" conclusion was later reversed on live measurement — see the correction in /FINDINGS.md ../../FINDINGS.md . 证据：`eval/needle-haystack/README.md`
- **SWE-bench Pro - pxpipe ON vs OFF**（documentation）：Expansion to 19 pairs + navidrome replication 2026-06-11 证据：`eval/swe-bench-pro/README.md`
- **SWE-bench Lite pilot — pxpipe ON vs OFF**（documentation）：SWE-bench Lite pilot — pxpipe ON vs OFF 证据：`eval/swe-bench/README.md`
- **Package**（package_manifest）：{ "name": "pxpipe-proxy", "version": "0.7.2", "description": "Token-saving proxy for Claude Code: renders bulky context system prompt, tool docs, old history as dense PNGs to cut input tokens. Runs on Node and Cloudflare Workers.", "type": "module", "bin": { "pxpipe": "bin/cli.js" }, "exports": { ".": { "types": "./dist/core/index.d.ts", "import": "./dist/core/index.js" }, "./transform": { "types": "./dist/core/library.d.ts", "import": "./dist/core/library.js" }, "./measurement": { "types": "./dist/core/measurement.d.ts", "import": "./dist/core/measurement.js" }, "./applicability": { "types": "./dist/core/applicability.d.ts", "import": "./dist/core/applicability.js" }, "./proxy": { "types":… 证据：`package.json`
- **Package**（package_manifest）：{ "name": "pricing-engine", "version": "0.1.0", "type": "module", "private": true, "description": "Small order-pricing library. Implement src/pricing.js per SPEC.md so the tests pass.", "scripts": { "test": "node --test" } } 证据：`demo/cost-ab/template/package.json`
- **License**（source_file）：Copyright c 2026 claude-image-proxy contributors 证据：`LICENSE`
- **Adaptive chars-per-token plan Task 18**（documentation）：Adaptive chars-per-token plan Task 18 证据：`docs/ADAPTIVE_CPT_PLAN.md`
- **Prompt-Caching Alignment And Honest Savings Math**（documentation）：Prompt-Caching Alignment And Honest Savings Math 证据：`docs/CACHING_AND_SAVINGS.md`
- **How imaged history stays cache-safe as a conversation grows**（documentation）：How imaged history stays cache-safe as a conversation grows 证据：`docs/HISTORY_CACHE_MODEL.md`
- **Imaged-text legibility audit — 2026-07-01**（documentation）：Imaged-text legibility audit — 2026-07-01 证据：`docs/LEGIBILITY-AUDIT-2026-07-01.md`
- **How pxpipe sizes a rendered image — rules, reasons, and history**（documentation）：How pxpipe sizes a rendered image — rules, reasons, and history 证据：`docs/RENDER_SIZING.md`
- **How pxpipe compresses Claude Code requests**（documentation）：How pxpipe compresses Claude Code requests 证据：`docs/TRANSFORM_INFO.md`
- **Anthropic Client**（source_file）：/ eval/lib/anthropic-client.mjs Model-call layer for the eval harness. Runs entirely on the local Claude Max subscription by shelling out to the interactive claude TUI via the cci.py shim NOT headless claude -p . NO Anthropic API key is used or required. Why the CLI and not the HTTP API: The operator runs on a Claude Max subscription, which does not expose a raw API key. The claude binary authenticates via the subscription's stored OAuth credentials ~/.claude , so claude -p calls bill against the subscription, not a metered API key. Proxy bypass: The interactive claude shell alias points ANTHROPIC BASE URL at the local pxpipe proxy. The eval MUST NOT go through pxpipe — that would transform… 证据：`eval/lib/anthropic-client.mjs`
- **Cci**（source_file）：CLAUDE = os.environ.get "CCI CLAUDE BIN", os.path.expanduser "~/.claude/local/claude" TIMEOUT = float os.environ.get "CCI TIMEOUT", "300" READY TIMEOUT = float os.environ.get "CCI READY TIMEOUT", "60" QUIET S = float os.environ.get "CCI QUIET S", "4.0" DEBUG = os.environ.get "CCI DEBUG" ROWS = int os.environ.get "CCI ROWS", "60" COLS = int os.environ.get "CCI COLS", "200" ⋮---- def parse argv argv ⋮---- model = None; output format = "text"; allowed = None; prompt = None takes val = {"--model", "--output-format", "--allowedTools", "--allowed-tools", i = 0 ⋮---- a = argv i ⋮---- model = argv i + 1 ; i += 2; continue ⋮---- output format = argv i + 1 ; i += 2; continue ⋮---- allowed = argv i +… 证据：`eval/lib/cci.py`
- **Cost**（source_file）：/ eval/lib/cost.mjs Token and USD cost estimation for the reflow eval harness. Based on Claude claude-sonnet-4-5 pricing May 2026 . Image token formula: Anthropic charges by pixel area ≈ ⌈w/28⌉·⌈h/28⌉ patches . A full dense ~1928×1928 page ≈ 69×69 = 4761 vision tokens. We use the empirically-measured 1.17 chars/token for text. / ⋮---- // --------------------------------------------------------------------------- // Model pricing per-million-token rates, USD — May 2026 // These are approximate public rates; update if pricing changes. // --------------------------------------------------------------------------- ⋮---- imageTileTokens: 4761, // dense 1928×1928 page = 69×69 patches ⋮---- / Char… 证据：`eval/lib/cost.mjs`
- **Diff**（source_file）：/ eval/lib/diff.mjs Character-level accuracy / edit-distance utilities for the L1 OCR eval. Uses Wagner–Fischer dynamic programming for Levenshtein distance. We operate on Unicode codepoints not UTF-16 code units so that multi-byte characters like ↵ are counted as single edits. / ⋮---- / Convert a string to an array of Unicode codepoints. @param {string} s @returns {number } / function codepoints s ⋮---- / Levenshtein edit distance between two strings, operating at the Unicode codepoint level. Space-optimised: O min a , b memory. @param {string} a @param {string} b @returns {number} / export function levenshtein a, b ⋮---- // Keep shorter string in the inner dimension for cache efficiency ⋮… 证据：`eval/lib/diff.mjs`
- **Render Bridge**（source_file）：/ eval/lib/render-bridge.mjs Thin bridge that imports the compiled pxpipe render functions from dist/core/render.js and exposes them to the eval scripts. Why dist/ and not src/? The vitest-based unit tests import from src/ via tsx TypeScript → JS on-the-fly . The eval scripts are plain .mjs files run with node and don't go through tsx, so they need the already-compiled dist/ output. Run npm run build or pnpm run build first if dist/ is stale. The bridge re-exports exactly what the eval harness needs and nothing else. / 证据：`eval/lib/render-bridge.mjs`
- **Applicability**（source_file）：export type PxpipeApplicabilityReason = 'eligible' 'unsupported model' 'unsupported method' 'unsupported path' 'empty body'; ⋮---- export interface PxpipeApplicabilityInput { readonly model?: string null; readonly method?: string null; readonly path?: string null; readonly bodyBytes?: number null; } ⋮---- function baseModelId model: string : string ⋮---- / Dashboard runtime override; null = fall back to PXPIPE MODELS env / built-in default. In-memory only. / ⋮---- / Built-in default scope when PXPIPE MODELS is unset: Fable 5 Claude plus GPT 5.6. GPT 5.5 and Opus 4.8 are intentionally off — same pipeline but measurably worse at reading imaged content FINDINGS.md 2026-06-16: Opus 4.8 ~2pp ari… 证据：`src/core/applicability.ts`
- **Types**（source_file）：export interface TextBlock { type: 'text'; text: string; cache control?: CacheControl; } ⋮---- export interface ImageBlock { type: 'image'; source: { type: 'base64'; media type: 'image/png' 'image/jpeg' 'image/gif' 'image/webp'; data: string; }; cache control?: CacheControl; } ⋮---- export interface ToolUseBlock { type: 'tool use'; id: string; name: string; input: unknown; } ⋮---- export interface ToolResultBlock { type: 'tool result'; tool use id: string; content: string Array ; is error?: boolean; cache control?: CacheControl; } ⋮---- export type ContentBlock = TextBlock ImageBlock ToolUseBlock ToolResultBlock; ⋮---- export interface CacheControl { type: 'ephemeral'; ttl?: '5m' '1h'; } ⋮-… 证据：`src/core/types.ts`
- **Types**（source_file）：export interface StatsPayload { port: number; uptime sec: number; requests: number; compressed requests: number; passthrough: number; baseline input weighted: number; actual input weighted: number; saved input tokens: number; saved pct: number; saved pct input only: number; saved pct of total bill: number; saved pct of all spend: number; all baseline equivalent weighted: number; all actual input weighted: number; all output weighted: number; all usage requests: number; compressed paid requests: number; passthrough paid requests: number; compressed actual usd: number; passthrough actual usd: number; compressed avg usd per request: number; passthrough avg usd per request: number; compressed m… 证据：`src/dashboard/types.ts`
- **Changelog**（documentation）：All notable changes to pxpipe are documented here. This project adheres to Semantic Versioning https://semver.org/ pre-1.0: minor = features / behavioral changes, patch = fixes . 证据：`CHANGELOG.md`
- **FINDINGS — pxpipe text→PNG token compression**（documentation）：FINDINGS — pxpipe text→PNG token compression 证据：`FINDINGS.md`
- **Packed-reflow legibility experiments**（documentation）：Packed-reflow legibility experiments 证据：`eval/EXPERIMENT_LOG.md`
- **Pricing Engine — Specification**（documentation）：orderTotalCents items, tier returns the final order total as an integer number of cents . 证据：`demo/cost-ab/template/SPEC.md`
- **Effective-context needle test — attempt log**（documentation）：Effective-context needle test — attempt log 证据：`demo/effective-context/ATTEMPTS.md`
- **Glyph confusion matrix + render-style A/B Task 7 — PLANNED, paused for usage budget**（documentation）：Glyph confusion matrix + render-style A/B Task 7 — PLANNED, paused for usage budget 证据：`eval/glyph-matrix/PLAN.md`
- **L1 OCR Fidelity Report**（documentation）：Generated: 2026-05-22T03:36:29.045Z Model: opus Dry run: false Blocks evaluated: 20 证据：`eval/results-opus/l1-report.md`
- **L2 Session Replay Report**（documentation）：Generated: 2026-05-22T03:48:55.794Z Replay model: opus Judge model: opus Dry run: false Sessions evaluated: 10 证据：`eval/results-opus/l2-report.md`
- **L1 OCR Fidelity Report**（documentation）：Generated: 2026-05-23T01:54:01.508Z Model: opus Dry run: false Blocks evaluated: 20 证据：`eval/results/l1-report.md`
- **L2 Session Replay Report**（documentation）：Generated: 2026-05-22T18:09:09.056Z Replay model: opus Judge model: opus Dry run: false Sessions evaluated: 10 证据：`eval/results/l2-report.md`
- **Reflow Eval — Combined Summary Report**（documentation）：Reflow Eval — Combined Summary Report 证据：`eval/results/summary.md`
- **Tsconfig**（structured_config）：{ "compilerOptions": { "target": "ES2022", "lib": "ES2022", "WebWorker" , "module": "ESNext", "moduleResolution": "Bundler", "types": "@cloudflare/workers-types", "node" , "strict": true, "noUncheckedIndexedAccess": true, "noImplicitOverride": true, "noFallthroughCasesInSwitch": true, "esModuleInterop": true, "forceConsistentCasingInFileNames": true, "skipLibCheck": true, "resolveJsonModule": true, "isolatedModules": true, "verbatimModuleSyntax": true, "declaration": true, "declarationMap": true, "sourceMap": true, "outDir": "dist", "rootDir": "src" }, "include": "src/ / " , "exclude": "node modules", "dist", "legacy", // src/dashboard/ is the Svelte browser bundle — compiled separately by… 证据：`tsconfig.json`
- **Probes**（structured_config）：{ "session": 0, "type": "decision", "q": "Which package was chosen for the store layer?", "gold": "mobx" }, { "session": 0, "type": "numeric", "q": "What exact value in ms was the retry budget set to?", "gold": "7880" }, { "session": 0, "type": "path", "q": "In which file path was the double-flush race found?", "gold": "src/batcher/core.ts" }, { "session": 0, "type": "name", "q": "Who was named as the on-call reviewer for the PR?", "gold": "Tobias Okafor" }, { "session": 0, "type": "negation", "q": "Was LEGACY PINS enabled in prod? Answer ENABLED or OFF.", "gold": "OFF" }, { "session": 0, "type": "unanswerable", "q": "Which database migration version was rolled back?", "gold": "UNKNOWN" },… 证据：`eval/gist-recall/work/probes.json`
- **Probes**（structured_config）：{ "session": 0, "type": "decision", "q": "What was the FINAL package chosen for the store layer?", "gold": "nanostores" }, { "session": 0, "type": "numeric", "q": "What exact value in ms was the RETRY BUDGET set to not the cache TTL ?", "gold": "7850" }, { "session": 0, "type": "path", "q": "Which file contained the ROOT CAUSE of the double-flush race?", "gold": "src/mailbox/core.ts" }, { "session": 0, "type": "name", "q": "Who is the on-call REVIEWER for the PR not the author ?", "gold": "Aiko Khoury" }, { "session": 0, "type": "negation", "q": "In PROD specifically, was LEGACY PINS enabled? Answer ENABLED or OFF.", "gold": "OFF" }, { "session": 0, "type": "unanswerable", "q": "Which datab… 证据：`eval/gist-recall/work2/probes.json`
- **Probes**（structured_config）：{ "session": 0, "type": "final", "q": "What is the FINAL locked value of BATCH WINDOW MS at the end of the session?", "gold": "8400" }, { "session": 0, "type": "first", "q": "What was the FIRST value BATCH WINDOW MS was set to at the start?", "gold": "9600" }, { "session": 0, "type": "count", "q": "How many distinct values was BATCH WINDOW MS set to over the whole session? Answer with a number.", "gold": "3" }, { "session": 1, "type": "final", "q": "What is the FINAL locked value of BATCH WINDOW MS at the end of the session?", "gold": "1200" }, { "session": 1, "type": "first", "q": "What was the FIRST value BATCH WINDOW MS was set to at the start?", "gold": "5400" }, { "session": 1, "type":… 证据：`eval/gist-recall/work3/probes.json`
- **Golds**（structured_config）：{"s0": {"C":"fa2587c3db43","A":"0a9016292918","B":"4aefc5667127","E":"3b57511a2d37","D":"4e557a6941a3"},{"D":"d9c1a44d9d82","C":"f68401231baa","B":"2556bf62cbeb","E":"1497869832e0","A":"9422c7d44eab"},{"C":"423406430208","B":"e04e32424126","D":"112c052c3cd3","E":"c7977df6e8e7","A":"7808126d9ce2"},{"B":"c1e378234a1f","D":"fc496e0325ca","C":"c3e61c70b689","E":"fd8d4a1f08ef","A":"4c1d5dab770b"} ,"s1": {"C":"fa2587c3db43","A":"0a9016292918","B":"4aefc5667127","E":"3b57511a2d37","D":"4e557a6941a3"},{"D":"d9c1a44d9d82","C":"f68401231baa","B":"2556bf62cbeb","E":"1497869832e0","A":"9422c7d44eab"},{"C":"423406430208","B":"e04e32424126","D":"112c052c3cd3","E":"c7977df6e8e7","A":"7808126d9ce2"},{"B":"… 证据：`eval/glyph-matrix/sweep/golds.json`
- **L1 Results**（structured_config）：{ "results": { "blockIdx": 0, "charCount": 211, "role": "user", "baselineImageCount": 1, "reflowImageCount": 1, "baselineScore": { "editDistance": 1, "charAccuracy": 0.995260663507109, "refLen": 211, "hypLen": 210 }, "reflowScore": { "editDistance": 5, "charAccuracy": 0.976303317535545, "refLen": 211, "hypLen": 207 }, "dryRun": false }, { "blockIdx": 1, "charCount": 284, "role": "assistant", "baselineImageCount": 1, "reflowImageCount": 1, "baselineScore": { "editDistance": 1, "charAccuracy": 0.9964788732394366, "refLen": 284, "hypLen": 284 }, "reflowScore": { "editDistance": 20, "charAccuracy": 0.9295774647887324, "refLen": 284, "hypLen": 270 }, "dryRun": false }, { "blockIdx": 2, "charCoun… 证据：`eval/results-opus/l1-results.json`
- **L2 Results**（structured_config）：{ "results": { "sessionIdx": 0, "sessionId": "6131a291-9f3e-44bd-8558-8ae470ddc85e", "totalTurns": 1024, "historyCharCount": 279683, "baselineImageCount": 2, "reflowImageCount": 1, "baselineAnswer": "You're right to push on this — let me be honest about what the current tests actually prove, because \"covered\" has been doing a lot of work in my earlier summaries.\n\n What the tests actually verify and don't \n\n LiveKit egress module tests 17 — these mock the LiveKit SDK entirely. They verify", "reflowAnswer": "Here's the current E2E/test structure in pixelpipe :\n\n Test layout\n\nAll tests live in a flat tests/ directory — there's no dedicated e2e/ directory . They split into two kinds:\… 证据：`eval/results-opus/l2-results.json`
- **L1 Results**（structured_config）：{ "results": { "blockIdx": 0, "charCount": 211, "role": "user", "variants": { "baseline": { "score": { "editDistance": 5, "charAccuracy": 0.976303317535545, "refLen": 211, "hypLen": 211 }, "imageCount": 1 }, "reflow": { "score": { "editDistance": 6, "charAccuracy": 0.9715639810426541, "refLen": 211, "hypLen": 210 }, "imageCount": 1 }, "reflow-inimage": { "score": { "editDistance": 1, "charAccuracy": 0.995260663507109, "refLen": 211, "hypLen": 210 }, "imageCount": 1 } } }, { "blockIdx": 1, "charCount": 228, "role": "assistant", "variants": { "baseline": { "score": { "editDistance": 2, "charAccuracy": 0.9912280701754386, "refLen": 228, "hypLen": 228 }, "imageCount": 1 }, "reflow": { "score":… 证据：`eval/results/l1-results.json`
- **L2 Results**（structured_config）：{ "results": { "sessionIdx": 0, "sessionId": "6131a291-9f3e-44bd-8558-8ae470ddc85e", "totalTurns": 1024, "historyCharCount": 279683, "baselineImageCount": 2, "reflowImageCount": 1, "aaImageCount": 1, "baselineAnswer": "You're right — let me be honest about what the tests actually prove.\n\n The gap\n\nThe 36 new tests mock LiveKit and Deepgram . That means:\n\n- Requirement 1 audio transcription works — utils.test.ts only tests our transcript-formatting helpers. Nothing exercises a real audio stream → Deepg", "reflowAnswer": "Based on the conversation, you've been reviewing PR 30 Phase 4B: Module Tests for LiveKit Functionality , and the open issue is a real coverage gap: the interview audi… 证据：`eval/results/l2-results.json`
- **Eval Results Off**（structured_config）：{"instance future-architect vuls-36456cb151894964ba1683ce7da5c35ada789970": true, "instance element-hq element-web-923ad4323b2006b2b180544429455ffe7d4a6cc3-vnan": false, "instance qutebrowser qutebrowser-0833b5f6f140d04200ec91605f88704dd18e2970-v059c6fdc75567943479b23ebca7c07b5e9a7f34c": true, "instance flipt-io flipt-2ce8a0331e8a8f63f2c1b555db8277ffe5aa2e63": true, "instance tutao tutanota-b4934a0f3c34d9d7649e944b183137e8fad3e859-vbc0d9ba8f0071fbe982809910959a6ff8884dbbf": false, "instance NodeBB NodeBB-0e07f3c9bace416cbab078a30eae972868c0a8a3-vf2cf3cbd463b7ad942381f1c6d077626485a1e9e": true, "instance navidrome navidrome-677d9947f302c9f7bba8c08c788c3dc99f235f39": true, "instance interneta… 证据：`eval/swe-bench-pro/bench/eval_results_off.json`
- **Eval Results On**（structured_config）：{"instance future-architect vuls-36456cb151894964ba1683ce7da5c35ada789970": true, "instance element-hq element-web-923ad4323b2006b2b180544429455ffe7d4a6cc3-vnan": false, "instance qutebrowser qutebrowser-0833b5f6f140d04200ec91605f88704dd18e2970-v059c6fdc75567943479b23ebca7c07b5e9a7f34c": true, "instance flipt-io flipt-2ce8a0331e8a8f63f2c1b555db8277ffe5aa2e63": true, "instance navidrome navidrome-677d9947f302c9f7bba8c08c788c3dc99f235f39": false, "instance NodeBB NodeBB-0e07f3c9bace416cbab078a30eae972868c0a8a3-vf2cf3cbd463b7ad942381f1c6d077626485a1e9e": true, "instance tutao tutanota-b4934a0f3c34d9d7649e944b183137e8fad3e859-vbc0d9ba8f0071fbe982809910959a6ff8884dbbf": false, "instance internet… 证据：`eval/swe-bench-pro/bench/eval_results_on.json`
- **Instances**（structured_config）："instance future-architect vuls-36456cb151894964ba1683ce7da5c35ada789970", "instance flipt-io flipt-2ce8a0331e8a8f63f2c1b555db8277ffe5aa2e63", "instance element-hq element-web-923ad4323b2006b2b180544429455ffe7d4a6cc3-vnan", "instance protonmail webclients-32ff10999a06455cb2147f6873d627456924ae13", "instance qutebrowser qutebrowser-0833b5f6f140d04200ec91605f88704dd18e2970-v059c6fdc75567943479b23ebca7c07b5e9a7f34c", "instance tutao tutanota-b4934a0f3c34d9d7649e944b183137e8fad3e859-vbc0d9ba8f0071fbe982809910959a6ff8884dbbf", "instance navidrome navidrome-677d9947f302c9f7bba8c08c788c3dc99f235f39", "instance NodeBB NodeBB-0e07f3c9bace416cbab078a30eae972868c0a8a3-vf2cf3cbd463b7ad942381f1c6d077626… 证据：`eval/swe-bench-pro/bench/instances.json`
- **Preds Off**（structured_config）：{"instance id": "instance future-architect vuls-36456cb151894964ba1683ce7da5c35ada789970", "patch": "diff --git a/wordpress/wordpress.go b/wordpress/wordpress.go\nindex 2d44b9f..25e9ddf 100644\n--- a/wordpress/wordpress.go\n+++ b/wordpress/wordpress.go\n@@ -235,6 +235,14 @@ func extractToVulnInfos pkgName string, cves WpCveInfo vinfos models.VulnI\n \treturn\n }\n \n+func searchCache name string, wpVulnCaches map string string string, bool {\n+\tvalue, ok := wpVulnCaches name \n+\tif ok {\n+\t\treturn value, true\n+\t}\n+\treturn \"\", false\n+}\n+\n func httpRequest url, token string string, error {\n \tretry := 1\n \tutil.Log.Debugf \"%s\", url \n", "prefix": ""}, {"instance id": "instanc… 证据：`eval/swe-bench-pro/bench/preds_off.json`
- **Preds On**（structured_config）：{"instance id": "instance future-architect vuls-36456cb151894964ba1683ce7da5c35ada789970", "patch": "diff --git a/wordpress/wordpress.go b/wordpress/wordpress.go\nindex 2d44b9f..41d71d9 100644\n--- a/wordpress/wordpress.go\n+++ b/wordpress/wordpress.go\n@@ -268,6 +268,16 @@ loop:\n \treturn \"\", err\n }\n \n+// searchCache looks for the given name in the cache and returns\n+// the cached response body and whether it was found.\n+func searchCache name string, wpVulnCaches map string string string, bool {\n+\tvalue, ok := wpVulnCaches name \n+\tif ok {\n+\t\treturn value, true\n+\t}\n+\treturn \"\", false\n+}\n+\n func removeInactives pkgs models.WordPressPackages removed models.WordPressPac… 证据：`eval/swe-bench-pro/bench/preds_on.json`
- **Eval Results Batch1 Off**（structured_config）：{"instance qutebrowser qutebrowser-c09e1439f145c66ee3af574386e277dd2388d094-v2ef375ac784985212b1805e1d0431dc8f1b3c171": true, "instance NodeBB NodeBB-cfc237c2b79d8c731bbfc6cadf977ed530bfd57a-v0495b863a912fbff5749c67e860612b91825407c": true, "instance flipt-io flipt-967855b429f749c28c112b8cb1b15bc79157f973": true, "instance internetarchive openlibrary-a48fd6ba9482c527602bc081491d9e8ae6e8226c-vfa6ff903cb27f336e17654595dd900fa943dcd91": true, "instance navidrome navidrome-0488fb92cb02a82924fb1181bf1642f2e87096db": true} 证据：`eval/swe-bench-pro/bench20/eval_results_batch1_off.json`
- **Eval Results Batch1 On**（structured_config）：{"instance qutebrowser qutebrowser-c09e1439f145c66ee3af574386e277dd2388d094-v2ef375ac784985212b1805e1d0431dc8f1b3c171": true, "instance NodeBB NodeBB-cfc237c2b79d8c731bbfc6cadf977ed530bfd57a-v0495b863a912fbff5749c67e860612b91825407c": true, "instance flipt-io flipt-967855b429f749c28c112b8cb1b15bc79157f973": true, "instance internetarchive openlibrary-a48fd6ba9482c527602bc081491d9e8ae6e8226c-vfa6ff903cb27f336e17654595dd900fa943dcd91": true, "instance navidrome navidrome-0488fb92cb02a82924fb1181bf1642f2e87096db": true} 证据：`eval/swe-bench-pro/bench20/eval_results_batch1_on.json`
- **Eval Results Batch2 Off**（structured_config）：{"instance gravitational teleport-1a77b7945a022ab86858029d30ac7ad0d5239d00-vee9b09fb20c43af7e520f57e9239bbcf46b7113d": true, "instance element-hq element-web-7c63d52500e145d6fff6de41dd717f61ab88d02f-vnan": false} 证据：`eval/swe-bench-pro/bench20/eval_results_batch2_off.json`
- **Eval Results Batch2 On**（structured_config）：{"instance gravitational teleport-1a77b7945a022ab86858029d30ac7ad0d5239d00-vee9b09fb20c43af7e520f57e9239bbcf46b7113d": true, "instance element-hq element-web-7c63d52500e145d6fff6de41dd717f61ab88d02f-vnan": false} 证据：`eval/swe-bench-pro/bench20/eval_results_batch2_on.json`
- **Eval Results Batch3 Off**（structured_config）：{"instance ansible ansible-f327e65d11bb905ed9f15996024f857a95592629-vba6da65a0f3baefda7a058ebbd0a8dcafb8512f5": true, "instance tutao tutanota-befce4b146002b9abc86aa95f4d57581771815ce-vee878bb72091875e912c52fc32bc60ec3760227b": false} 证据：`eval/swe-bench-pro/bench20/eval_results_batch3_off.json`
- **Eval Results Batch3 On**（structured_config）：{"instance ansible ansible-f327e65d11bb905ed9f15996024f857a95592629-vba6da65a0f3baefda7a058ebbd0a8dcafb8512f5": true, "instance tutao tutanota-befce4b146002b9abc86aa95f4d57581771815ce-vee878bb72091875e912c52fc32bc60ec3760227b": false} 证据：`eval/swe-bench-pro/bench20/eval_results_batch3_on.json`
- 其余 19 条证据见 `AI_CONTEXT_PACK.json` 或 `EVIDENCE_INDEX.json`。

## 宿主 AI 必须遵守的规则

- **把本资产当作开工前上下文，而不是运行环境。**：AI Context Pack 只包含证据化项目理解，不包含目标项目的可执行状态。 证据：`README.md`, `demo/README.md`, `eval/README.md`
- **回答用户时区分可预览内容与必须安装后才能验证的内容。**：安装前体验的消费者价值来自降低误装和误判，而不是伪装成真实运行。 证据：`README.md`, `demo/README.md`, `eval/README.md`

## 用户开工前应该回答的问题

- 你准备在哪个宿主 AI 或本地环境中使用它？
- 你只是想先体验工作流，还是准备真实安装？
- 你最在意的是安装成本、输出质量、还是和现有规则的冲突？

## 验收标准

- 所有能力声明都能回指到 evidence_refs 中的文件路径。
- AI_CONTEXT_PACK.md 没有把预览包装成真实运行。
- 用户能在 3 分钟内看懂适合谁、能做什么、如何开始和风险边界。

---

## Doramagic Context Augmentation

下面内容用于强化 Repomix/AI Context Pack 主体。Human Manual 只提供阅读骨架；踩坑日志会被转成宿主 AI 必须遵守的工作约束。

## Human Manual 骨架

使用规则：这里只是项目阅读路线和显著性信号，不是事实权威。具体事实仍必须回到 repo evidence / Claim Graph。

宿主 AI 硬性规则：
- 不得把页标题、章节顺序、摘要或 importance 当作项目事实证据。
- 解释 Human Manual 骨架时，必须明确说它只是阅读路线/显著性信号。
- 能力、安装、兼容性、运行状态和风险判断必须引用 repo evidence、source path 或 Claim Graph。

- **项目概览**：importance `high`
  - source_paths: README.md, demo/README.md, demo/cost-ab/README.md, demo/cost-ab/template/README.md, demo/cost-ab/template/package.json
- **Lib 模块**：importance `high`
  - source_paths: eval/lib/anthropic-client.mjs, eval/lib/cci.py, eval/lib/cost.mjs, eval/lib/diff.mjs, eval/lib/render-bridge.mjs
- **Src 模块**：importance `high`
  - source_paths: demo/cost-ab/template/src/catalog.js, demo/cost-ab/template/src/money.js, demo/cost-ab/template/src/pricing.js
- **Applicability.ts 模块**：importance `high`
  - source_paths: src/core/applicability.ts

## Repo Inspection Evidence / 源码检查证据

- repo_clone_verified: true
- repo_inspection_verified: true
- repo_commit: `9fa16eef541e9555fdf47bd7a6fe00323aa692d6`
- inspected_files: `README.md`, `package.json`, `pnpm-lock.yaml`, `docs/ADAPTIVE_CPT_PLAN.md`, `docs/CACHING_AND_SAVINGS.md`, `docs/HISTORY_CACHE_MODEL.md`, `docs/LEGIBILITY-AUDIT-2026-07-01.md`, `docs/RENDER_SIZING.md`, `docs/TRANSFORM_INFO.md`, `src/core/applicability.ts`, `src/core/atlas-gray.ts`, `src/core/atlas.ts`, `src/core/baseline.ts`, `src/core/export.ts`, `src/core/factsheet.ts`, `src/core/gpt-model-profiles.ts`, `src/core/history.ts`, `src/core/index.ts`, `src/core/library.ts`, `src/core/measurement.ts`

宿主 AI 硬性规则：
- 没有 repo_clone_verified=true 时，不得声称已经读过源码。
- 没有 repo_inspection_verified=true 时，不得把 README/docs/package 文件判断写成事实。
- 没有 quick_start_verified=true 时，不得声称 Quick Start 已跑通。

## Doramagic Pitfall Constraints / 踩坑约束

这些规则来自 Doramagic 发现、验证或编译过程中的项目专属坑点。宿主 AI 必须把它们当作工作约束，而不是普通说明文字。

### Constraint 1: 可能修改宿主 AI 配置

- Trigger: 项目面向 Claude/Cursor/Codex/Gemini/OpenCode 等宿主，或安装命令涉及用户配置目录。
- Host AI rule: 列出会写入的配置文件、目录和卸载/回滚步骤。
- Why it matters: 安装可能改变本机 AI 工具行为，用户需要知道写入位置和回滚方法。
- Evidence: capability.host_targets | https://news.ycombinator.com/item?id=48776464 | host_targets=claude_code, claude
- Hard boundary: 不要把这个坑点包装成已解决、已验证或可忽略，除非后续验证证据明确证明它已经关闭。

### Constraint 2: 能力判断依赖假设

- Trigger: README/documentation is current enough for a first validation pass.
- Host AI rule: 将假设转成下游验证清单。
- Why it matters: 假设不成立时，用户拿不到承诺的能力。
- Evidence: capability.assumptions | https://news.ycombinator.com/item?id=48776464 | README/documentation is current enough for a first validation pass.
- Hard boundary: 不要把这个坑点包装成已解决、已验证或可忽略，除非后续验证证据明确证明它已经关闭。

### Constraint 3: 维护活跃度未知

- Trigger: 未记录 last_activity_observed。
- Host AI rule: 补 GitHub 最近 commit、release、issue/PR 响应信号。
- Why it matters: 新项目、停更项目和活跃项目会被混在一起，推荐信任度下降。
- Evidence: evidence.maintainer_signals | https://news.ycombinator.com/item?id=48776464 | last_activity_observed missing
- Hard boundary: 不要把这个坑点包装成已解决、已验证或可忽略，除非后续验证证据明确证明它已经关闭。

- Trigger: no_demo
- Evidence: downstream_validation.risk_items | https://news.ycombinator.com/item?id=48776464 | no_demo; severity=medium
- Hard boundary: 不要把这个坑点包装成已解决、已验证或可忽略，除非后续验证证据明确证明它已经关闭。

### Constraint 5: 存在评分风险

- Trigger: no_demo
- Why it matters: 风险会影响是否适合普通用户安装。
- Evidence: risks.scoring_risks | https://news.ycombinator.com/item?id=48776464 | no_demo; severity=medium
- Hard boundary: 不要把这个坑点包装成已解决、已验证或可忽略，除非后续验证证据明确证明它已经关闭。

### Constraint 6: issue/PR 响应质量未知

- Trigger: issue_or_pr_quality=unknown。
- Host AI rule: 抽样最近 issue/PR，判断是否长期无人处理。
- Why it matters: 用户无法判断遇到问题后是否有人维护。
- Evidence: evidence.maintainer_signals | https://news.ycombinator.com/item?id=48776464 | issue_or_pr_quality=unknown
- Hard boundary: 不要把这个坑点包装成已解决、已验证或可忽略，除非后续验证证据明确证明它已经关闭。

### Constraint 7: 发布节奏不明确

- Trigger: release_recency=unknown。
- Host AI rule: 确认最近 release/tag 和 README 安装命令是否一致。
- Why it matters: 安装命令和文档可能落后于代码，用户踩坑概率升高。
- Evidence: evidence.maintainer_signals | https://news.ycombinator.com/item?id=48776464 | release_recency=unknown
- Hard boundary: 不要把这个坑点包装成已解决、已验证或可忽略，除非后续验证证据明确证明它已经关闭。
