# pxpipe - Doramagic AI Context Pack

> Positioning: a pre-install experience and judgment asset. It helps the host AI get off to a good start, but it does not mean the project has already been installed, run, or validated.

## Sufficiency Principle

- **Sufficiency over compression**: The AI Context Pack should be sufficient for the host AI to understand the project's value, capability boundaries, entrypoints, risks, and evidence sources before starting work; it may be layered, but it does not aim for the shortest possible summary.
- **Compression policy**: Compress only noise and duplication, never context that affects judgment or the quality of the work.

## How the Host AI Should Use This

You are reading the AI Context Pack that Doramagic compiled for pxpipe. Treat it as pre-work context: help the user understand who it fits, what it can do, how to start, what must be verified after install, and where the risks are. Do not claim that you have already installed, run, or executed the target project.

## Claim Consumption Rules

- **Fact source**: Repo Evidence + Claim/Evidence Graph; the Human Wiki only supplies salience, terminology, and narrative structure.
- **Minimum status for a fact**: `supported`
- `supported`: May be used as a project fact, but the answer must cite the claim_id and evidence path.
- `weak`: Usable only as a low-confidence lead; the user must be asked to keep verifying.
- `inferred`: Usable only for risk notes or open questions; must not be packaged as a project fact.
- `unverified`: Must not be used as fact; state clearly that evidence is insufficient.
- `contradicted`: Must show the conflicting sources and must not force a single version on the user's behalf.

## Who It Fits Best

- **Developers already using host AIs such as Claude/Codex/Cursor/Gemini**: The README or plugin config mentions multiple host AIs. Evidence: `README.md` Claim: `clm_0002` supported 0.86

## What It Can Do

- **Command-Line Startup or Install Flow** (Verify after install): The project documentation contains runnable commands; real use requires running them in a local or host environment. Evidence: `README.md` Claim: `clm_0001` supported 0.86

## How to Start

- `npx pxpipe-proxy                                  # proxy on 127.0.0.1:47821` Evidence: `README.md` Claim: `clm_0003` supported 0.86

## Continue-or-Stop Decision Card

- **Current recommendation**: Trial role matching first
- **Why**: This project is more of a role library; the core risk is picking the wrong role or treating role copy as execution capability. Trial role matching with Prompt Preview first, then decide whether to sandbox-import it.

### 30-Second Read

- **What to do now**: Trial role matching first
- **Minimum safe next step**: Trial role matching with Prompt Preview first; import in isolation only once satisfied
- **Do not trust yet**: Role quality and task fit cannot be trusted directly.
- **Continuing will touch**: Role selection bias, Command execution, Local environment or project files

### What You Can Trust Now

- **Target-audience signal: Developers already using host AIs such as Claude/Codex/Cursor/Gemini** (supported): Backed by a supported claim or project evidence, but that still is not the same as real install results. Evidence: `README.md` Claim: `clm_0002` supported 0.86
- **Capability exists: Command-Line Startup or Install Flow** (supported): You can trust that the project contains signals of this capability; whether it fits your specific task still needs trial or after-install verification. Evidence: `README.md` Claim: `clm_0001` supported 0.86
- **There are Quick Start / install-command signals** (supported): You can trust that the docs mention a startup or install entrypoint; do not run it directly in your primary environment because of that. Evidence: `README.md` Claim: `clm_0003` supported 0.86

### What You Cannot Trust Yet

- **Role quality and task fit cannot be trusted directly.** (unverified): A role library proves there are many roles; it does not prove each one fits your specific task or that a role produces high-quality results.
- **Do not treat role copy as real execution capability.** (unverified): Before install you can only judge whether the role description and task profile match; you cannot prove it can complete the task inside the host AI.
- **Real output quality cannot be trusted before install.** (unverified): Prompt Preview can only show how it guides you; it cannot prove result quality in the real project.
- **Host AI version compatibility cannot be trusted before install.** (unverified): Host loading rules and version differences across Claude, Cursor, Codex, Gemini, and others must be verified in a real environment.
- **That it will not pollute your existing host AI's behavior cannot be trusted directly.** (inferred): Skill, plugin, and AGENTS/CLAUDE/GEMINI instructions may change the host AI's default behavior.
- **Safe rollback cannot be assumed by default.** (unverified): Unless the project clearly provides uninstall and recovery instructions, verify in an isolated environment first.
- **After a real install, is it compatible with the user's current host AI version?** (unverified): Compatibility can only be verified in the actual host environment.
- **Does the project's output quality meet the user's specific task?** (unverified): The pre-install preview can only show flow and boundaries; it cannot replace real evaluation.

### What Continuing Will Touch

- **Role selection bias**: The user's judgment about which expert role should handle the task. Why: Picking the wrong role makes the AI answer from the wrong expert perspective, wasting time or misleading decisions.
- **Command execution**: Package managers, network downloads, the local plugin directory, project config, or the user's home directory. Why: Running the very first command can already change your environment; decide whether it is worth running first. Evidence: `README.md`
- **Local environment or project files**: Install results, plugin caches, project config, or local dependency directories. Why: The write scope and rollback path cannot be proven before install and need isolated verification. Evidence: `README.md`
- **Host AI context**: The AI Context Pack, Prompt Preview, Skill routing, risk rules, and project facts. Why: Importing context affects the host AI's later judgment, so avoid packaging unverified items as facts.

### Minimum Safe Next Steps

- **Run Prompt Preview first**: Use an interactive trial to verify the task profile and role match first; do not import the whole role library up front. (applies when: Applies to any project, especially when output quality is unknown.)
- **Trial-install only in an isolated directory or a test account**: Avoid letting install commands pollute your primary host AI, real projects, or home directory. (applies when: When there are signals of command execution, plugin config, or local writes.)
- **After install, verify just one minimal task**: Verify loading, compatibility, output quality, and rollback first, then decide whether to use it deeply. (applies when: When moving from a trial into a real workflow.)

### Exit Plan

- **Preserve the pre-install state**: Record the original host config and project state so you can later judge whether it is recoverable.
- **Keep a record of the original role selection**: If output goes off-topic, you can return to the task-profiling stage and reselect a role instead of pushing on with the wrong one.
- **Record the install commands and written paths**: Without clear uninstall instructions, you at least need to know which directories or configs to clean up manually.
- **If there is no rollback path, do not enter your primary environment**: No rollback is a blocker before continuing; do not proceed on trust or luck.

## What Can Only Be Previewed

- Explain who the project fits and what it can do
- Demonstrate a typical conversation flow based on project docs
- Help the user decide whether it is worth installing or researching further

## What Must Be Verified After Install

- Actually installing the Skill, plugin, or CLI
- Running scripts, modifying local files, or accessing external services
- Verifying real output quality, performance, and compatibility

## Boundary & Risk Decision Card

- **Mistaking the pre-install preview for a real run**: The user may overestimate how much configuration, permission, and compatibility verification the project has already done. Mitigation: Clearly separate prompt_preview_can_do from runtime_required. Claim: `clm_0004` inferred 0.45
- **Command execution will modify the local environment**: Install commands may write to the user's home directory, the host plugin directory, or project configuration. Mitigation: Run in an isolated environment or a test account first. Evidence: `README.md` Claim: `clm_0005` supported 0.86
- **To confirm**: After a real install, is it compatible with the user's current host AI version?. Why: Compatibility can only be verified in the actual host environment.
- **To confirm**: Does the project's output quality meet the user's specific task?. Why: The pre-install preview can only show flow and boundaries; it cannot replace real evaluation.
- **To confirm**: Do the install commands require network access, permissions, or global writes?. Why: This affects install risk in both enterprise and personal environments.

## Pre-Work Working Context

### Loading Order

- First read how_to_use.host_ai_instruction to establish the boundaries of this pre-install judgment asset.
- Read claim_graph_summary to confirm facts come from the Claim/Evidence Graph, not the Human Wiki narrative.
- Then read intended_users, capabilities, and quick_start_candidates to judge whether the user is a match.
- When you need to carry out a concrete task, check role_skill_index first, then evidence_index.
- For real install, file modification, network access, performance, or compatibility questions, turn to risk_card and boundaries.runtime_required.

### Task Routes

- **Command-Line Startup or Install Flow**: State that this is an after-install capability first, then give a pre-install checklist. Boundary: Must be verified after a real install or run. Evidence: `README.md` Claim: `clm_0001` supported 0.86

### Context Scale

- Total files: 298
- Important-file coverage: 40/298
- Evidence index entries: 79
- Role / Skill entries: 29

### Handling Insufficient Evidence

- **missing_evidence**: State that evidence is insufficient and ask the user for the target file, a README section, or after-install verification records; do not fill in facts.
- **out_of_scope_request**: State that the task is beyond the current AI Context Pack's evidence scope and suggest the user check the Human Manual or verify after a real install.
- **runtime_request**: Provide a pre-install checklist and command sources, but do not run commands for the user or claim they have been run.
- **source_conflict**: Show the conflicting sources side by side, mark them as unverified, and do not force a single version.

## Prompt Recipes

### Fit assessment

- Goal: Judge whether this project fits the user's current task.
- Expected output: A fit conclusion, key reasons, evidence citations, what can be previewed before install, what must be verified after install, and a next-step recommendation.

```text
Based on the AI Context Pack for pxpipe, ask me 3 necessary questions first, then judge whether it fits my task. The answer must cover: who it fits, what it can do, what it cannot do, whether it is worth installing, and where the evidence comes from. Every project fact must cite evidence_refs, source_paths, or a claim_id.
```

### Pre-install experience

- Goal: Let the user feel the core workflow before installing, while avoiding packaging the preview as real capability or a marketing promise.
- Expected output: An experience script with boundary labels, an after-install verification checklist, and a cautious recommendation; with no real-run promises or strong marketing language.

```text
Treat pxpipe as a pre-install experience asset, not an already-installed tool or a real runtime environment.

Output exactly four parts:
1. Ask me 3 necessary questions first.
2. Give an "experience script": use the three labels [Previewable before install], [Must verify after install], and [Insufficient evidence] to show how it might guide the workflow.
3. Give an after-install verification checklist: list which capabilities can only be confirmed after a real install, real host loading, and a real project run.
4. Give a cautious recommendation: only "worth researching/trialing further", "add information before deciding", or "not recommended to continue"; do not endorse the project.

Hard boundaries:
- Do not claim you have installed, run, executed tests, modified files, or produced real results.
- Do not write promise-like phrasing such as "auto-adapts", "guarantees passing", "perfect fit", or "strongly recommend installing".
- If you describe how it works after install, you must use a conditional such as "if installed successfully and the host loads the Skill correctly, it might...".
- The experience script may only be written as "example lines / hypothetical flow": use "might ask / might suggest / might show", not "has written, has generated, has passed, is running, is generating".
- Prompt Preview does not hand out install commands; if the user is ready to trial, only prompt them to read Quick Start and the Risk Card first and to verify in an isolated environment.
- Every project fact must come from a supported claim, evidence_refs, or source_paths; inferred/unverified items can only be risks or open questions.

```

### Role / Skill selection

- Goal: Pick the best-matching asset from the project's roles or Skills.
- Expected output: A list of candidate roles or Skills, each with an applicable scenario, evidence paths, risk boundary, and whether after-install verification is needed.

```text
Read role_skill_index and recommend 3-5 of the most relevant roles or Skills for my target task. For each recommendation, state the applicable scenario, likely output, risk boundary, and evidence_refs.
```

### Risk pre-check

- Goal: Identify environment, permission, rule-conflict, and quality risks before installing or adopting.
- Expected output: A checklist of environment, permission, dependency, license, host-conflict, quality risk, and unknown items.

```text
Based on risk_card, boundaries, and quick_start_candidates, give me a pre-install risk pre-check list. Do not run commands for me; only explain what I should check, why, and what impact a failure would have.
```

### Host AI kickoff instruction

- Goal: Turn the project context into a host AI instruction for the start of a conversation.
- Expected output: A pre-work instruction with clear boundaries and clear evidence citations, suitable to copy to a host AI.

```text
Based on the AI Context Pack for pxpipe, generate a pre-work instruction I can paste to my host AI. This instruction must obey not_runtime=true and must not claim the project has been installed, run, or produced real results.
```

## Role / Skill Index

- Indexed 29 role / Skill / project-doc entries.

- **pxpipe** (project_doc): Cut Claude Code's input tokens by rendering bulky context as images — the same system prompt, tool docs, and history, in a fraction of the tokens. Activation hint: Reference this when the user needs to understand the project's structure, install path, or boundaries. Evidence: `README.md`
- **pxpipe demos** (project_doc): Two demos, two questions, two honest verdicts. Activation hint: Reference this when the user needs to understand the project's structure, install path, or boundaries. Evidence: `demo/README.md`
- **Reflow Eval Harness** (project_doc): Evaluation harness for the reflow image-rendering mode in pxpipe. Activation hint: Reference this when the user needs to understand the project's structure, install path, or boundaries. Evidence: `eval/README.md`
- **Demo 1 — cost A/B** (project_doc): What it measures: does pxpipe cost less on a real coding task? Honest verdict: ~break-even on cost. The compression is real ~55% fewer real tokens, verified but it lands in cache read — cheap at $ 0.1× , and its weight against a Pro/Max weekly cap is unpublished. The capability story is in ../effective-context/ ../effective-context/README.md . Activation hint: Reference this when the user needs to understand the project's structure, install path, or boundaries. Evidence: `demo/cost-ab/README.md`
- **pricing-engine** (project_doc): A small order-pricing library. Computes an order total from line items, a volume discount, a loyalty-tier discount, and tax. Activation hint: Reference this when the user needs to understand the project's structure, install path, or boundaries. Evidence: `demo/cost-ab/template/README.md`
- **Demo 2 — effective context recall at scale** (project_doc): Demo 2 — effective context recall at scale Activation hint: Reference this when the user needs to understand the project's structure, install path, or boundaries. Evidence: `demo/effective-context/README.md`
- **Gist-recall A/B: does the model lose information when history is imaged?** (project_doc): Gist-recall A/B: does the model lose information when history is imaged? Activation hint: Reference this when the user needs to understand the project's structure, install path, or boundaries. Evidence: `eval/gist-recall/README.md`
- **Per-glyph resolution sweep — why Opus misreads pxpipe renders** (project_doc): Per-glyph resolution sweep — why Opus misreads pxpipe renders Activation hint: Reference this when the user needs to understand the project's structure, install path, or boundaries. Evidence: `eval/glyph-matrix/sweep/README.md`
- **reading-fidelity eval — does the model actually read pxpipe's image?** (project_doc): reading-fidelity eval — does the model actually read pxpipe's image? Activation hint: Reference this when the user needs to understand the project's structure, install path, or boundaries. Evidence: `eval/gsm8k/README.md`
- **needle-haystack eval** (project_doc): Receipts for the needle eval. It measures the worst case for a lossy compressor exact recovery of a random fact from imaged content , not the whole product. Its "dead" conclusion was later reversed on live measurement — see the correction in /FINDINGS.md ../../FINDINGS.md . Activation hint: Reference this when the user needs to understand the project's structure, install path, or boundaries. Evidence: `eval/needle-haystack/README.md`
- **SWE-bench Pro - pxpipe ON vs OFF** (project_doc): Expansion to 19 pairs + navidrome replication 2026-06-11 Activation hint: Reference this when the user needs to understand the project's structure, install path, or boundaries. Evidence: `eval/swe-bench-pro/README.md`
- **SWE-bench Lite pilot — pxpipe ON vs OFF** (project_doc): SWE-bench Lite pilot — pxpipe ON vs OFF Activation hint: Reference this when the user needs to understand the project's structure, install path, or boundaries. Evidence: `eval/swe-bench/README.md`
- **Adaptive chars-per-token plan Task 18** (project_doc): Adaptive chars-per-token plan Task 18 Activation hint: Reference this when the user needs to understand the project's structure, install path, or boundaries. Evidence: `docs/ADAPTIVE_CPT_PLAN.md`
- **Prompt-Caching Alignment And Honest Savings Math** (project_doc): Prompt-Caching Alignment And Honest Savings Math Activation hint: Reference this when the user needs to understand the project's structure, install path, or boundaries. Evidence: `docs/CACHING_AND_SAVINGS.md`
- **How imaged history stays cache-safe as a conversation grows** (project_doc): How imaged history stays cache-safe as a conversation grows Activation hint: Reference this when the user needs to understand the project's structure, install path, or boundaries. Evidence: `docs/HISTORY_CACHE_MODEL.md`
- **How pxpipe sizes a rendered image — rules, reasons, and history** (project_doc): How pxpipe sizes a rendered image — rules, reasons, and history Activation hint: Reference this when the user needs to understand the project's structure, install path, or boundaries. Evidence: `docs/RENDER_SIZING.md`
- **How pxpipe compresses Claude Code requests** (project_doc): How pxpipe compresses Claude Code requests Activation hint: Reference this when the user needs to understand the project's structure, install path, or boundaries. Evidence: `docs/TRANSFORM_INFO.md`
- **Changelog** (project_doc): All notable changes to pxpipe are documented here. This project adheres to Semantic Versioning https://semver.org/ pre-1.0: minor = features / behavioral changes, patch = fixes . Activation hint: Reference this when the user needs to understand the project's structure, install path, or boundaries. Evidence: `CHANGELOG.md`
- **FINDINGS — pxpipe text→PNG token compression** (project_doc): FINDINGS — pxpipe text→PNG token compression Activation hint: Reference this when the user needs to understand the project's structure, install path, or boundaries. Evidence: `FINDINGS.md`
- **Imaged-text legibility audit — 2026-07-01** (project_doc): Imaged-text legibility audit — 2026-07-01 Activation hint: Reference this when the user needs to understand the project's structure, install path, or boundaries. Evidence: `docs/LEGIBILITY-AUDIT-2026-07-01.md`
- **Packed-reflow legibility experiments** (project_doc): Packed-reflow legibility experiments Activation hint: Reference this when the user needs to understand the project's structure, install path, or boundaries. Evidence: `eval/EXPERIMENT_LOG.md`
- **Pricing Engine — Specification** (project_doc): orderTotalCents items, tier returns the final order total as an integer number of cents . Activation hint: Reference this when the user needs to understand the project's structure, install path, or boundaries. Evidence: `demo/cost-ab/template/SPEC.md`
- **Effective-context needle test — attempt log** (project_doc): Effective-context needle test — attempt log Activation hint: Reference this when the user needs to understand the project's structure, install path, or boundaries. Evidence: `demo/effective-context/ATTEMPTS.md`
- **Glyph confusion matrix + render-style A/B Task 7 — PLANNED, paused for usage budget** (project_doc): Glyph confusion matrix + render-style A/B Task 7 — PLANNED, paused for usage budget Activation hint: Reference this when the user needs to understand the project's structure, install path, or boundaries. Evidence: `eval/glyph-matrix/PLAN.md`
- **L1 OCR Fidelity Report** (project_doc): Generated: 2026-05-22T03:36:29.045Z Model: opus Dry run: false Blocks evaluated: 20 Activation hint: Reference this when the user needs to understand the project's structure, install path, or boundaries. Evidence: `eval/results-opus/l1-report.md`
- **L2 Session Replay Report** (project_doc): Generated: 2026-05-22T03:48:55.794Z Replay model: opus Judge model: opus Dry run: false Sessions evaluated: 10 Activation hint: Reference this when the user needs to understand the project's structure, install path, or boundaries. Evidence: `eval/results-opus/l2-report.md`
- **L1 OCR Fidelity Report** (project_doc): Generated: 2026-05-23T01:54:01.508Z Model: opus Dry run: false Blocks evaluated: 20 Activation hint: Reference this when the user needs to understand the project's structure, install path, or boundaries. Evidence: `eval/results/l1-report.md`
- **L2 Session Replay Report** (project_doc): Generated: 2026-05-22T18:09:09.056Z Replay model: opus Judge model: opus Dry run: false Sessions evaluated: 10 Activation hint: Reference this when the user needs to understand the project's structure, install path, or boundaries. Evidence: `eval/results/l2-report.md`
- **Reflow Eval — Combined Summary Report** (project_doc): Reflow Eval — Combined Summary Report Activation hint: Reference this when the user needs to understand the project's structure, install path, or boundaries. Evidence: `eval/results/summary.md`

## Evidence Index

- Indexed 79 evidence entries.

- **pxpipe** (documentation): Cut Claude Code's input tokens by rendering bulky context as images — the same system prompt, tool docs, and history, in a fraction of the tokens. Evidence: `README.md`
- **pxpipe demos** (documentation): Two demos, two questions, two honest verdicts. Evidence: `demo/README.md`
- **Reflow Eval Harness** (documentation): Evaluation harness for the reflow image-rendering mode in pxpipe. Evidence: `eval/README.md`
- **Demo 1 — cost A/B** (documentation): What it measures: does pxpipe cost less on a real coding task? Honest verdict: ~break-even on cost. The compression is real ~55% fewer real tokens, verified but it lands in cache read — cheap at $ 0.1× , and its weight against a Pro/Max weekly cap is unpublished. The capability story is in ../effective-context/ ../effective-context/README.md . Evidence: `demo/cost-ab/README.md`
- **pricing-engine** (documentation): A small order-pricing library. Computes an order total from line items, a volume discount, a loyalty-tier discount, and tax. Evidence: `demo/cost-ab/template/README.md`
- **Demo 2 — effective context recall at scale** (documentation): Demo 2 — effective context recall at scale Evidence: `demo/effective-context/README.md`
- **Gist-recall A/B: does the model lose information when history is imaged?** (documentation): Gist-recall A/B: does the model lose information when history is imaged? Evidence: `eval/gist-recall/README.md`
- **Per-glyph resolution sweep — why Opus misreads pxpipe renders** (documentation): Per-glyph resolution sweep — why Opus misreads pxpipe renders Evidence: `eval/glyph-matrix/sweep/README.md`
- **reading-fidelity eval — does the model actually read pxpipe's image?** (documentation): reading-fidelity eval — does the model actually read pxpipe's image? Evidence: `eval/gsm8k/README.md`
- **needle-haystack eval** (documentation): Receipts for the needle eval. It measures the worst case for a lossy compressor exact recovery of a random fact from imaged content , not the whole product. Its "dead" conclusion was later reversed on live measurement — see the correction in /FINDINGS.md ../../FINDINGS.md . Evidence: `eval/needle-haystack/README.md`
- **SWE-bench Pro - pxpipe ON vs OFF** (documentation): Expansion to 19 pairs + navidrome replication 2026-06-11 Evidence: `eval/swe-bench-pro/README.md`
- **SWE-bench Lite pilot — pxpipe ON vs OFF** (documentation): SWE-bench Lite pilot — pxpipe ON vs OFF Evidence: `eval/swe-bench/README.md`
- **Package** (package_manifest): { "name": "pxpipe-proxy", "version": "0.7.2", "description": "Token-saving proxy for Claude Code: renders bulky context system prompt, tool docs, old history as dense PNGs to cut input tokens. Runs on Node and Cloudflare Workers.", "type": "module", "bin": { "pxpipe": "bin/cli.js" }, "exports": { ".": { "types": "./dist/core/index.d.ts", "import": "./dist/core/index.js" }, "./transform": { "types": "./dist/core/library.d.ts", "import": "./dist/core/library.js" }, "./measurement": { "types": "./dist/core/measurement.d.ts", "import": "./dist/core/measurement.js" }, "./applicability": { "types": "./dist/core/applicability.d.ts", "import": "./dist/core/applicability.js" }, "./proxy": { "types":… Evidence: `package.json`
- **Package** (package_manifest): { "name": "pricing-engine", "version": "0.1.0", "type": "module", "private": true, "description": "Small order-pricing library. Implement src/pricing.js per SPEC.md so the tests pass.", "scripts": { "test": "node --test" } } Evidence: `demo/cost-ab/template/package.json`
- **Adaptive chars-per-token plan Task 18** (documentation): Adaptive chars-per-token plan Task 18 Evidence: `docs/ADAPTIVE_CPT_PLAN.md`
- **Prompt-Caching Alignment And Honest Savings Math** (documentation): Prompt-Caching Alignment And Honest Savings Math Evidence: `docs/CACHING_AND_SAVINGS.md`
- **How imaged history stays cache-safe as a conversation grows** (documentation): How imaged history stays cache-safe as a conversation grows Evidence: `docs/HISTORY_CACHE_MODEL.md`
- **How pxpipe sizes a rendered image — rules, reasons, and history** (documentation): How pxpipe sizes a rendered image — rules, reasons, and history Evidence: `docs/RENDER_SIZING.md`
- **How pxpipe compresses Claude Code requests** (documentation): How pxpipe compresses Claude Code requests Evidence: `docs/TRANSFORM_INFO.md`
- **License** (source_file): Copyright c 2026 claude-image-proxy contributors Evidence: `LICENSE`
- **Changelog** (documentation): All notable changes to pxpipe are documented here. This project adheres to Semantic Versioning https://semver.org/ pre-1.0: minor = features / behavioral changes, patch = fixes . Evidence: `CHANGELOG.md`
- **FINDINGS — pxpipe text→PNG token compression** (documentation): FINDINGS — pxpipe text→PNG token compression Evidence: `FINDINGS.md`
- **Imaged-text legibility audit — 2026-07-01** (documentation): Imaged-text legibility audit — 2026-07-01 Evidence: `docs/LEGIBILITY-AUDIT-2026-07-01.md`
- **Dashboard** (source_file): import type { ProxyEvent } from './core/proxy.js'; import type { TrackEvent } from './core/tracker.js'; import { computeActualInputEff, computeBaselineInputEff, deriveBaselineWarmth, } from './core/baseline.js'; import { computeOpenAIActualInputEff, computeOpenAIBaselineInputEff, computeOpenAIBaselineRawTokens, openAIOutputRate, } from './core/openai-savings.js'; import { aggregateSessions, claudeCodeMap, filterSessions, type ClaudeCodeSessionRef, type ListOptions, type SessionsPaths, } from './sessions.js'; import { aggregateEventsFile, summaryToJson } from './stats.js'; ⋮---- import { renderPage, renderToggleFragment, renderModelsFragment, renderContextMapFragment, renderSessionSummaryFra… Evidence: `src/dashboard.ts`
- **Node** (source_file): import { createServer, type IncomingMessage, type ServerResponse } from 'node:http'; import { once } from 'node:events'; ⋮---- import { spawnSync } from 'node:child process'; import { createProxy, parseGatewayHeaders, resolveUpstreams, type ProxyConfig } from './core/proxy.js'; import { parseExportArgv, runExportCore, shouldIncludeFile, type ExportParsed, type ExportResult, } from './core/export.js'; import { toTrackEvent, TRACK BODY INLINE MAX, type Tracker, type TrackEvent, } from './core/tracker.js'; import { DashboardState, dashboardPath, type DashboardRoute, } from './dashboard.js'; ⋮---- interface RuntimeConfig { port: number; upstream: string; openAIUpstream: string; openAIApiKey?: s… Evidence: `src/node.ts`
- **Applicability** (source_file): export type PxpipeApplicabilityReason = 'eligible' 'unsupported model' 'unsupported method' 'unsupported path' 'empty body'; ⋮---- export interface PxpipeApplicabilityInput { readonly model?: string null; readonly method?: string null; readonly path?: string null; readonly bodyBytes?: number null; } ⋮---- function baseModelId model: string : string ⋮---- / Dashboard runtime override; null = fall back to PXPIPE MODELS env / built-in default. In-memory only. / ⋮---- / Built-in default scope when PXPIPE MODELS is unset: Fable 5 Claude plus GPT 5.6. GPT 5.5 and Opus 4.8 are intentionally off — same pipeline but measurably worse at reading imaged content FINDINGS.md 2026-06-16: Opus 4.8 ~2pp ari… Evidence: `src/core/applicability.ts`
- **Gpt Model Profiles** (source_file): export type GptVisionCost = { regime: 'tile'; base: number; perTile: number } { regime: 'patch'; multiplier: number; patchCap: number }; ⋮---- export interface GptModelProfile { vision: GptVisionCost; stripCols: number; maxHeightPx: number; } ⋮---- interface ProfileRule { test: m: string = boolean; profile: GptModelProfile; } ⋮---- const isMiniNanoPatch = m: string : boolean ⋮---- function resolveBuiltin m: string : GptModelProfile ⋮---- function isValidVision v: unknown : v is GptVisionCost ⋮---- function posInt v: unknown, fallback: number : number ⋮---- function parseEnvProfiles raw: string : Map ⋮---- function envProfiles : Map ⋮---- / Resolve the full rendering + vision-cost profile fo… Evidence: `src/core/gpt-model-profiles.ts`
- **History** (source_file): import type { CacheControl, ContentBlock, ImageBlock, Message, TextBlock, ToolUseBlock, ToolResultBlock } from './types.js'; import { DENSE CONTENT CHARS PER IMAGE, DENSE CONTENT COLS, DENSE RENDER STYLE, neutralizeSentinel, reflow, renderTextToPngsWithCharLimit, roleSlotSegment, SLOT MARK ASSISTANT, SLOT MARK USER } from './render.js'; import { factSheetText } from './factsheet.js'; import { bytesToBase64 } from './png.js'; ⋮---- export type ProfitableFn = text: string, cols: number = boolean; ⋮---- export interface HistoryCollapseOptions { keepTail: number; minCollapsePrefix: number; cols: number; collapseChunk: number; freezeChunk: number; protectedPrefix: number; reflow: boolean; } ⋮---… Evidence: `src/core/history.ts`
- **Openai History** (source_file): import { renderTextToPngs, reflow, neutralizeSentinel, type RenderedImage } from './render.js'; import { GPT MAX HEIGHT PX } from './gpt-model-profiles.js'; import { countTokens as o200kCountTokens } from 'gpt-tokenizer/encoding/o200k base'; ⋮---- export type GptProfitableFn = text: string, cols: number = boolean; ⋮---- export interface GptHistoryOptions { keepTail: number; minCollapsePrefix: number; minCollapseTokens: number; cols: number; collapseChunk: number; freezeChunk: number; sectionTokens: number; maxHeightPx: number; maxImages: number; reflow: boolean; } ⋮---- export interface HistoryTurn { text: string; openIds: string ; closeIds: string ; opaque: boolean; userText?: string; } ⋮-… Evidence: `src/core/openai-history.ts`
- **Openai Savings** (source_file): export function openAICacheReadRate model: string undefined : number ⋮---- export function openAIOutputRate model: string undefined : number ⋮---- export function computeOpenAIActualInputEff inputTokens: number, cachedTokens: number, model?: string, : number ⋮---- export function computeOpenAIBaselineRawTokens inputTokens: number, imageTokens: number, baselineImagedTokens: number, : number ⋮---- export function computeOpenAIBaselineInputEff inputTokens: number, cachedTokens: number, imageTokens: number, baselineImagedTokens: number, model?: string, : number Evidence: `src/core/openai-savings.ts`
- **Openai** (source_file): import { renderTextToPngs, reflow, shrinkColsToContent, PAD X, CELL W, type RenderedImage, } from './render.js'; import { resolveGptProfile, DEFAULT GPT STRIP COLS, type GptVisionCost, } from './gpt-model-profiles.js'; import { bytesToBase64 } from './png.js'; import { compactSlabWhitespace, estimateImageCount, sha8, type TransformInfo, type TransformOptions, } from './transform.js'; import { stripSchemaDescriptions } from './schema-strip.js'; import { planGptCollapse, responsesItemsToTurns, chatMessagesToTurns, type GptCollapsePlan, type GptHistoryOptions, } from './openai-history.js'; import { HISTORY SYNTHETIC INTRO, HISTORY SYNTHETIC OUTRO } from './history.js'; import { factSheetText }… Evidence: `src/core/openai.ts`
- **Proxy** (source_file): import { transformRequest, type TransformOptions, type TransformInfo } from './transform.js'; import { transformOpenAIChatCompletions, transformOpenAIResponses } from './openai.js'; import { isPxpipeSupportedGptModel, isPxpipeSupportedModel } from './applicability.js'; import { buildBaselineCountTokensBody, buildCacheablePrefixCountTokensBody, } from './measurement.js'; import type { Usage } from './types.js'; ⋮---- export interface ProxyConfig { provider?: 'cloudflare-ai-gateway'; gatewayBaseUrl?: string; gatewayHeaders?: Record ; upstream?: string; apiKey?: string; openAIUpstream?: string; openAIApiKey?: string; transform?: TransformOptions = TransformOptions ; onRequest?: event: ProxyEve… Evidence: `src/core/proxy.ts`
- **Render** (source_file): import { ATLAS CELL W, ATLAS CELL H, ATLAS PIXELS, ATLAS OFFSETS, ATLAS WIDE FLAGS, atlasRank, } from './atlas.js'; import { ATLAS GRAY CELL W, ATLAS GRAY CELL H, ATLAS GRAY PIXELS, ATLAS GRAY OFFSETS, ATLAS GRAY WIDE FLAGS, atlasGrayRank, } from './atlas-gray.js'; import { encodeGrayPng, encodeRgbPng } from './png.js'; ⋮---- export interface RenderedImage { png: Uint8Array; width: number; height: number; charsRendered: number; droppedChars: number; droppedCodepoints: Map ; } ⋮---- export interface RenderStyle { grid?: boolean; gridCols?: number; markerScale?: number; markerRed?: boolean; cellHBonus?: number; cellWBonus?: number; aa?: boolean; colorCycle?: boolean; colorByRole?: boolean; }… Evidence: `src/core/render.ts`
- **Schema Strip** (source_file): export function stripSchemaDescriptions node: unknown, depth = 0 : unknown ⋮---- export function schemaHasStructure schema: Record : boolean Evidence: `src/core/schema-strip.ts`
- **Transform** (source_file): import type { ContentBlock, ImageBlock, Message, MessagesRequest, SystemField, TextBlock, ToolDef, ToolResultBlock, ToolUseBlock, } from './types.js'; import { renderTextToPngs, renderTextToPngsMultiCol, reflow, maxFittingCols, shrinkColsToContent, MAX HEIGHT PX, NL SENTINEL, neutralizeSentinel, PAD X, PAD Y, CELL W, CELL H, READABLE CHARS PER IMAGE, DENSE CONTENT CHARS PER IMAGE, DENSE CONTENT COLS, DENSE RENDER STYLE, renderTextToPngsWithCharLimit, } from './render.js'; import { factSheetText } from './factsheet.js'; import { stripSchemaDescriptions, schemaHasStructure } from './schema-strip.js'; import { bytesToBase64 } from './png.js'; import { collapseHistory, HISTORY SYNTHETIC INTRO }… Evidence: `src/core/transform.ts`
- **Packed-reflow legibility experiments** (documentation): Packed-reflow legibility experiments Evidence: `eval/EXPERIMENT_LOG.md`
- **Pricing Engine — Specification** (documentation): orderTotalCents items, tier returns the final order total as an integer number of cents . Evidence: `demo/cost-ab/template/SPEC.md`
- **Effective-context needle test — attempt log** (documentation): Effective-context needle test — attempt log Evidence: `demo/effective-context/ATTEMPTS.md`
- **Glyph confusion matrix + render-style A/B Task 7 — PLANNED, paused for usage budget** (documentation): Glyph confusion matrix + render-style A/B Task 7 — PLANNED, paused for usage budget Evidence: `eval/glyph-matrix/PLAN.md`
- **L1 OCR Fidelity Report** (documentation): Generated: 2026-05-22T03:36:29.045Z Model: opus Dry run: false Blocks evaluated: 20 Evidence: `eval/results-opus/l1-report.md`
- **L2 Session Replay Report** (documentation): Generated: 2026-05-22T03:48:55.794Z Replay model: opus Judge model: opus Dry run: false Sessions evaluated: 10 Evidence: `eval/results-opus/l2-report.md`
- **L1 OCR Fidelity Report** (documentation): Generated: 2026-05-23T01:54:01.508Z Model: opus Dry run: false Blocks evaluated: 20 Evidence: `eval/results/l1-report.md`
- **L2 Session Replay Report** (documentation): Generated: 2026-05-22T18:09:09.056Z Replay model: opus Judge model: opus Dry run: false Sessions evaluated: 10 Evidence: `eval/results/l2-report.md`
- **Reflow Eval — Combined Summary Report** (documentation): Reflow Eval — Combined Summary Report Evidence: `eval/results/summary.md`
- **Tsconfig** (structured_config): { "compilerOptions": { "target": "ES2022", "lib": "ES2022", "WebWorker" , "module": "ESNext", "moduleResolution": "Bundler", "types": "@cloudflare/workers-types", "node" , "strict": true, "noUncheckedIndexedAccess": true, "noImplicitOverride": true, "noFallthroughCasesInSwitch": true, "esModuleInterop": true, "forceConsistentCasingInFileNames": true, "skipLibCheck": true, "resolveJsonModule": true, "isolatedModules": true, "verbatimModuleSyntax": true, "declaration": true, "declarationMap": true, "sourceMap": true, "outDir": "dist", "rootDir": "src" }, "include": "src/ / " , "exclude": "node modules", "dist", "legacy", // src/dashboard/ is the Svelte browser bundle — compiled separately by… Evidence: `tsconfig.json`
- **Probes** (structured_config): { "session": 0, "type": "decision", "q": "Which package was chosen for the store layer?", "gold": "mobx" }, { "session": 0, "type": "numeric", "q": "What exact value in ms was the retry budget set to?", "gold": "7880" }, { "session": 0, "type": "path", "q": "In which file path was the double-flush race found?", "gold": "src/batcher/core.ts" }, { "session": 0, "type": "name", "q": "Who was named as the on-call reviewer for the PR?", "gold": "Tobias Okafor" }, { "session": 0, "type": "negation", "q": "Was LEGACY PINS enabled in prod? Answer ENABLED or OFF.", "gold": "OFF" }, { "session": 0, "type": "unanswerable", "q": "Which database migration version was rolled back?", "gold": "UNKNOWN" },… Evidence: `eval/gist-recall/work/probes.json`
- **Probes** (structured_config): { "session": 0, "type": "decision", "q": "What was the FINAL package chosen for the store layer?", "gold": "nanostores" }, { "session": 0, "type": "numeric", "q": "What exact value in ms was the RETRY BUDGET set to not the cache TTL ?", "gold": "7850" }, { "session": 0, "type": "path", "q": "Which file contained the ROOT CAUSE of the double-flush race?", "gold": "src/mailbox/core.ts" }, { "session": 0, "type": "name", "q": "Who is the on-call REVIEWER for the PR not the author ?", "gold": "Aiko Khoury" }, { "session": 0, "type": "negation", "q": "In PROD specifically, was LEGACY PINS enabled? Answer ENABLED or OFF.", "gold": "OFF" }, { "session": 0, "type": "unanswerable", "q": "Which datab… Evidence: `eval/gist-recall/work2/probes.json`
- **Probes** (structured_config): { "session": 0, "type": "final", "q": "What is the FINAL locked value of BATCH WINDOW MS at the end of the session?", "gold": "8400" }, { "session": 0, "type": "first", "q": "What was the FIRST value BATCH WINDOW MS was set to at the start?", "gold": "9600" }, { "session": 0, "type": "count", "q": "How many distinct values was BATCH WINDOW MS set to over the whole session? Answer with a number.", "gold": "3" }, { "session": 1, "type": "final", "q": "What is the FINAL locked value of BATCH WINDOW MS at the end of the session?", "gold": "1200" }, { "session": 1, "type": "first", "q": "What was the FIRST value BATCH WINDOW MS was set to at the start?", "gold": "5400" }, { "session": 1, "type":… Evidence: `eval/gist-recall/work3/probes.json`
- **Golds** (structured_config): {"s0": {"C":"fa2587c3db43","A":"0a9016292918","B":"4aefc5667127","E":"3b57511a2d37","D":"4e557a6941a3"},{"D":"d9c1a44d9d82","C":"f68401231baa","B":"2556bf62cbeb","E":"1497869832e0","A":"9422c7d44eab"},{"C":"423406430208","B":"e04e32424126","D":"112c052c3cd3","E":"c7977df6e8e7","A":"7808126d9ce2"},{"B":"c1e378234a1f","D":"fc496e0325ca","C":"c3e61c70b689","E":"fd8d4a1f08ef","A":"4c1d5dab770b"} ,"s1": {"C":"fa2587c3db43","A":"0a9016292918","B":"4aefc5667127","E":"3b57511a2d37","D":"4e557a6941a3"},{"D":"d9c1a44d9d82","C":"f68401231baa","B":"2556bf62cbeb","E":"1497869832e0","A":"9422c7d44eab"},{"C":"423406430208","B":"e04e32424126","D":"112c052c3cd3","E":"c7977df6e8e7","A":"7808126d9ce2"},{"B":"… Evidence: `eval/glyph-matrix/sweep/golds.json`
- **L1 Results** (structured_config): { "results": { "blockIdx": 0, "charCount": 211, "role": "user", "baselineImageCount": 1, "reflowImageCount": 1, "baselineScore": { "editDistance": 1, "charAccuracy": 0.995260663507109, "refLen": 211, "hypLen": 210 }, "reflowScore": { "editDistance": 5, "charAccuracy": 0.976303317535545, "refLen": 211, "hypLen": 207 }, "dryRun": false }, { "blockIdx": 1, "charCount": 284, "role": "assistant", "baselineImageCount": 1, "reflowImageCount": 1, "baselineScore": { "editDistance": 1, "charAccuracy": 0.9964788732394366, "refLen": 284, "hypLen": 284 }, "reflowScore": { "editDistance": 20, "charAccuracy": 0.9295774647887324, "refLen": 284, "hypLen": 270 }, "dryRun": false }, { "blockIdx": 2, "charCoun… Evidence: `eval/results-opus/l1-results.json`
- **L2 Results** (structured_config): { "results": { "sessionIdx": 0, "sessionId": "6131a291-9f3e-44bd-8558-8ae470ddc85e", "totalTurns": 1024, "historyCharCount": 279683, "baselineImageCount": 2, "reflowImageCount": 1, "baselineAnswer": "You're right to push on this — let me be honest about what the current tests actually prove, because \"covered\" has been doing a lot of work in my earlier summaries.\n\n What the tests actually verify and don't \n\n LiveKit egress module tests 17 — these mock the LiveKit SDK entirely. They verify", "reflowAnswer": "Here's the current E2E/test structure in pixelpipe :\n\n Test layout\n\nAll tests live in a flat tests/ directory — there's no dedicated e2e/ directory . They split into two kinds:\… Evidence: `eval/results-opus/l2-results.json`
- **L1 Results** (structured_config): { "results": { "blockIdx": 0, "charCount": 211, "role": "user", "variants": { "baseline": { "score": { "editDistance": 5, "charAccuracy": 0.976303317535545, "refLen": 211, "hypLen": 211 }, "imageCount": 1 }, "reflow": { "score": { "editDistance": 6, "charAccuracy": 0.9715639810426541, "refLen": 211, "hypLen": 210 }, "imageCount": 1 }, "reflow-inimage": { "score": { "editDistance": 1, "charAccuracy": 0.995260663507109, "refLen": 211, "hypLen": 210 }, "imageCount": 1 } } }, { "blockIdx": 1, "charCount": 228, "role": "assistant", "variants": { "baseline": { "score": { "editDistance": 2, "charAccuracy": 0.9912280701754386, "refLen": 228, "hypLen": 228 }, "imageCount": 1 }, "reflow": { "score":… Evidence: `eval/results/l1-results.json`
- **L2 Results** (structured_config): { "results": { "sessionIdx": 0, "sessionId": "6131a291-9f3e-44bd-8558-8ae470ddc85e", "totalTurns": 1024, "historyCharCount": 279683, "baselineImageCount": 2, "reflowImageCount": 1, "aaImageCount": 1, "baselineAnswer": "You're right — let me be honest about what the tests actually prove.\n\n The gap\n\nThe 36 new tests mock LiveKit and Deepgram . That means:\n\n- Requirement 1 audio transcription works — utils.test.ts only tests our transcript-formatting helpers. Nothing exercises a real audio stream → Deepg", "reflowAnswer": "Based on the conversation, you've been reviewing PR 30 Phase 4B: Module Tests for LiveKit Functionality , and the open issue is a real coverage gap: the interview audi… Evidence: `eval/results/l2-results.json`
- **Eval Results Off** (structured_config): {"instance future-architect vuls-36456cb151894964ba1683ce7da5c35ada789970": true, "instance element-hq element-web-923ad4323b2006b2b180544429455ffe7d4a6cc3-vnan": false, "instance qutebrowser qutebrowser-0833b5f6f140d04200ec91605f88704dd18e2970-v059c6fdc75567943479b23ebca7c07b5e9a7f34c": true, "instance flipt-io flipt-2ce8a0331e8a8f63f2c1b555db8277ffe5aa2e63": true, "instance tutao tutanota-b4934a0f3c34d9d7649e944b183137e8fad3e859-vbc0d9ba8f0071fbe982809910959a6ff8884dbbf": false, "instance NodeBB NodeBB-0e07f3c9bace416cbab078a30eae972868c0a8a3-vf2cf3cbd463b7ad942381f1c6d077626485a1e9e": true, "instance navidrome navidrome-677d9947f302c9f7bba8c08c788c3dc99f235f39": true, "instance interneta… Evidence: `eval/swe-bench-pro/bench/eval_results_off.json`
- **Eval Results On** (structured_config): {"instance future-architect vuls-36456cb151894964ba1683ce7da5c35ada789970": true, "instance element-hq element-web-923ad4323b2006b2b180544429455ffe7d4a6cc3-vnan": false, "instance qutebrowser qutebrowser-0833b5f6f140d04200ec91605f88704dd18e2970-v059c6fdc75567943479b23ebca7c07b5e9a7f34c": true, "instance flipt-io flipt-2ce8a0331e8a8f63f2c1b555db8277ffe5aa2e63": true, "instance navidrome navidrome-677d9947f302c9f7bba8c08c788c3dc99f235f39": false, "instance NodeBB NodeBB-0e07f3c9bace416cbab078a30eae972868c0a8a3-vf2cf3cbd463b7ad942381f1c6d077626485a1e9e": true, "instance tutao tutanota-b4934a0f3c34d9d7649e944b183137e8fad3e859-vbc0d9ba8f0071fbe982809910959a6ff8884dbbf": false, "instance internet… Evidence: `eval/swe-bench-pro/bench/eval_results_on.json`
- **Instances** (structured_config): "instance future-architect vuls-36456cb151894964ba1683ce7da5c35ada789970", "instance flipt-io flipt-2ce8a0331e8a8f63f2c1b555db8277ffe5aa2e63", "instance element-hq element-web-923ad4323b2006b2b180544429455ffe7d4a6cc3-vnan", "instance protonmail webclients-32ff10999a06455cb2147f6873d627456924ae13", "instance qutebrowser qutebrowser-0833b5f6f140d04200ec91605f88704dd18e2970-v059c6fdc75567943479b23ebca7c07b5e9a7f34c", "instance tutao tutanota-b4934a0f3c34d9d7649e944b183137e8fad3e859-vbc0d9ba8f0071fbe982809910959a6ff8884dbbf", "instance navidrome navidrome-677d9947f302c9f7bba8c08c788c3dc99f235f39", "instance NodeBB NodeBB-0e07f3c9bace416cbab078a30eae972868c0a8a3-vf2cf3cbd463b7ad942381f1c6d077626… Evidence: `eval/swe-bench-pro/bench/instances.json`
- **Preds Off** (structured_config): {"instance id": "instance future-architect vuls-36456cb151894964ba1683ce7da5c35ada789970", "patch": "diff --git a/wordpress/wordpress.go b/wordpress/wordpress.go\nindex 2d44b9f..25e9ddf 100644\n--- a/wordpress/wordpress.go\n+++ b/wordpress/wordpress.go\n@@ -235,6 +235,14 @@ func extractToVulnInfos pkgName string, cves WpCveInfo vinfos models.VulnI\n \treturn\n }\n \n+func searchCache name string, wpVulnCaches map string string string, bool {\n+\tvalue, ok := wpVulnCaches name \n+\tif ok {\n+\t\treturn value, true\n+\t}\n+\treturn \"\", false\n+}\n+\n func httpRequest url, token string string, error {\n \tretry := 1\n \tutil.Log.Debugf \"%s\", url \n", "prefix": ""}, {"instance id": "instanc… Evidence: `eval/swe-bench-pro/bench/preds_off.json`
- **Preds On** (structured_config): {"instance id": "instance future-architect vuls-36456cb151894964ba1683ce7da5c35ada789970", "patch": "diff --git a/wordpress/wordpress.go b/wordpress/wordpress.go\nindex 2d44b9f..41d71d9 100644\n--- a/wordpress/wordpress.go\n+++ b/wordpress/wordpress.go\n@@ -268,6 +268,16 @@ loop:\n \treturn \"\", err\n }\n \n+// searchCache looks for the given name in the cache and returns\n+// the cached response body and whether it was found.\n+func searchCache name string, wpVulnCaches map string string string, bool {\n+\tvalue, ok := wpVulnCaches name \n+\tif ok {\n+\t\treturn value, true\n+\t}\n+\treturn \"\", false\n+}\n+\n func removeInactives pkgs models.WordPressPackages removed models.WordPressPac… Evidence: `eval/swe-bench-pro/bench/preds_on.json`
- **Eval Results Batch1 Off** (structured_config): {"instance qutebrowser qutebrowser-c09e1439f145c66ee3af574386e277dd2388d094-v2ef375ac784985212b1805e1d0431dc8f1b3c171": true, "instance NodeBB NodeBB-cfc237c2b79d8c731bbfc6cadf977ed530bfd57a-v0495b863a912fbff5749c67e860612b91825407c": true, "instance flipt-io flipt-967855b429f749c28c112b8cb1b15bc79157f973": true, "instance internetarchive openlibrary-a48fd6ba9482c527602bc081491d9e8ae6e8226c-vfa6ff903cb27f336e17654595dd900fa943dcd91": true, "instance navidrome navidrome-0488fb92cb02a82924fb1181bf1642f2e87096db": true} Evidence: `eval/swe-bench-pro/bench20/eval_results_batch1_off.json`
- **Eval Results Batch1 On** (structured_config): {"instance qutebrowser qutebrowser-c09e1439f145c66ee3af574386e277dd2388d094-v2ef375ac784985212b1805e1d0431dc8f1b3c171": true, "instance NodeBB NodeBB-cfc237c2b79d8c731bbfc6cadf977ed530bfd57a-v0495b863a912fbff5749c67e860612b91825407c": true, "instance flipt-io flipt-967855b429f749c28c112b8cb1b15bc79157f973": true, "instance internetarchive openlibrary-a48fd6ba9482c527602bc081491d9e8ae6e8226c-vfa6ff903cb27f336e17654595dd900fa943dcd91": true, "instance navidrome navidrome-0488fb92cb02a82924fb1181bf1642f2e87096db": true} Evidence: `eval/swe-bench-pro/bench20/eval_results_batch1_on.json`
- The remaining 19 evidence entries are in `AI_CONTEXT_PACK.json` or `EVIDENCE_INDEX.json`.

## Rules the Host AI Must Follow

- **Treat this asset as pre-work context, not a runtime environment.**: The AI Context Pack contains only an evidence-backed understanding of the project, not the project's executable state. Evidence: `README.md`, `demo/README.md`, `eval/README.md`
- **When answering the user, distinguish what can be previewed from what can only be verified after install.**: The consumer value of the pre-install experience comes from reducing bad installs and misjudgments, not from pretending to be a real run. Evidence: `README.md`, `demo/README.md`, `eval/README.md`

## Questions the User Should Answer First

- Which host AI or local environment do you plan to use it in?
- Do you just want to experience the workflow first, or are you ready to actually install?
- What matters most to you: install cost, output quality, or conflicts with your existing rules?

## Acceptance Checks

- Every capability claim can be traced back to a file path in evidence_refs.
- AI_CONTEXT_PACK.md does not package previews as a real run.
- The user can understand who it fits, what it can do, how to start, and the risk boundaries within 3 minutes.

---

## Doramagic Context Augmentation

The following sections strengthen the repository context for a host AI. Human Manual data is a reading route, and pitfall notes become operating constraints.

## Human Manual Outline

Usage rule: this is only a reading route and salience signal, not factual authority. Concrete claims must still return to repo evidence or Claim Graph.

Host AI hard rules:
- Do not treat page titles, section order, summaries, or importance values as factual project evidence.
- When explaining the Human Manual outline, state that it is only a reading route or salience signal.
- Capability, installation, compatibility, runtime state, and risk claims must cite repo evidence, source paths, or Claim Graph.

- **Overview & Getting Started**: importance `high`
  - source_paths: README.md, package.json, bin/cli.js, CHANGELOG.md
- **Core Architecture — Proxy, Transform Pipeline & Rendering**: importance `high`
  - source_paths: src/node.ts, src/core/transform.ts, src/core/render.ts, src/core/history.ts, src/core/proxy.ts
- **Model Routing, Applicability Gates & Configuration**: importance `high`
  - source_paths: src/core/applicability.ts, src/core/gpt-model-profiles.ts, src/core/openai.ts, src/core/openai-history.ts, src/core/openai-savings.ts
- **Evaluation, Benchmarks & Lossy Limitations**: importance `high`
  - source_paths: FINDINGS.md, docs/HISTORY_CACHE_MODEL.md, docs/CACHING_AND_SAVINGS.md, docs/RENDER_SIZING.md, docs/TRANSFORM_INFO.md

## Repo Inspection Evidence

- repo_clone_verified: true
- repo_inspection_verified: true
- repo_commit: `7dd54d395d119f5f822da5c1944ba5afbb02fa88`
- inspected_files: `README.md`, `package.json`, `pnpm-lock.yaml`, `docs/ADAPTIVE_CPT_PLAN.md`, `docs/CACHING_AND_SAVINGS.md`, `docs/HISTORY_CACHE_MODEL.md`, `docs/LEGIBILITY-AUDIT-2026-07-01.md`, `docs/RENDER_SIZING.md`, `docs/TRANSFORM_INFO.md`, `src/core/applicability.ts`, `src/core/atlas-gray.ts`, `src/core/atlas.ts`, `src/core/baseline.ts`, `src/core/export.ts`, `src/core/factsheet.ts`, `src/core/gpt-model-profiles.ts`, `src/core/history.ts`, `src/core/index.ts`, `src/core/library.ts`, `src/core/measurement.ts`

Host AI hard rules:
- Without repo_clone_verified=true, do not claim that the source code has been read.
- Without repo_inspection_verified=true, do not write README, docs, or package-file conclusions as facts.
- Without quick_start_verified=true, do not claim that the Quick Start path has run successfully.

## Doramagic Pitfall Constraints

These rules come from Doramagic discovery, validation, or compilation findings. The host AI must treat them as operating constraints, not background notes.

### Constraint 1: Capability evidence risk requires verification

- Trigger: README/documentation is current enough for a first validation pass.
- Host AI rule: Reproduce the official install and quickstart path in an isolated environment.
- Why it matters: May increase setup, validation, or first-run risk for the user.
- Evidence: capability.assumptions | https://news.ycombinator.com/item?id=48776464
- Hard boundary: Do not present this pitfall as solved, verified, or ignorable unless later evidence explicitly closes it.

### Constraint 2: Security or permission risk requires verification

- Trigger: no_demo
- Host AI rule: Reproduce the official install and quickstart path in an isolated environment.
- Why it matters: May increase setup, validation, or first-run risk for the user.
- Evidence: downstream_validation.risk_items | https://news.ycombinator.com/item?id=48776464
- Hard boundary: Do not present this pitfall as solved, verified, or ignorable unless later evidence explicitly closes it.

### Constraint 3: Security or permission risk requires verification

- Trigger: no_demo
- Host AI rule: Reproduce the official install and quickstart path in an isolated environment.
- Why it matters: May increase setup, validation, or first-run risk for the user.
- Evidence: risks.scoring_risks | https://news.ycombinator.com/item?id=48776464
- Hard boundary: Do not present this pitfall as solved, verified, or ignorable unless later evidence explicitly closes it.