# reader - Doramagic AI Context Pack

> 定位：安装前体验与判断资产。它帮助宿主 AI 有一个好的开始，但不代表已经安装、执行或验证目标项目。

## 充分原则

- **充分原则，不是压缩原则**：AI Context Pack 应该充分到让宿主 AI 在开工前理解项目价值、能力边界、使用入口、风险和证据来源；它可以分层组织，但不以最短摘要为目标。
- **压缩策略**：只压缩噪声和重复内容，不压缩会影响判断和开工质量的上下文。

## 给宿主 AI 的使用方式

你正在读取 Doramagic 为 reader 编译的 AI Context Pack。请把它当作开工前上下文：帮助用户理解适合谁、能做什么、如何开始、哪些必须安装后验证、风险在哪里。不要声称你已经安装、运行或执行了目标项目。

## Claim 消费规则

- **事实来源**：Repo Evidence + Claim/Evidence Graph；Human Wiki 只提供显著性、术语和叙事结构。
- **事实最低状态**：`supported`
- `supported`：可以作为项目事实使用，但回答中必须引用 claim_id 和证据路径。
- `weak`：只能作为低置信度线索，必须要求用户继续核实。
- `inferred`：只能用于风险提示或待确认问题，不能包装成项目事实。
- `unverified`：不得作为事实使用，应明确说证据不足。
- `contradicted`：必须展示冲突来源，不得替用户强行选择一个版本。

## 它最适合谁

- **AI 研究者或研究型 Agent 构建者**：README 明确围绕研究、实验或论文工作流展开。 证据：`README.md` Claim：`clm_0002` supported 0.86

## 它能做什么

- **命令行启动或安装流程**（需要安装后验证）：项目文档中存在可执行命令，真实使用需要在本地或宿主环境中运行这些命令。 证据：`README.md` Claim：`clm_0001` supported 0.86

## 怎么开始

- `curl 'https://s.jina.ai/When%20was%20Jina%20AI%20founded%3F?site=jina.ai&site=github.com'` 证据：`README.md` Claim：`clm_0003` supported 0.86
- `curl -H 'X-Respond-With: frontmatter' 'https://r.jina.ai/https://example.com'` 证据：`README.md` Claim：`clm_0004` supported 0.86
- `curl -X POST 'https://r.jina.ai/' -d 'url=https://example.com/#/route'` 证据：`README.md` Claim：`clm_0005` supported 0.86
- `curl 'https://r.jina.ai/https://example.com/' -H 'x-timeout: 10'` 证据：`README.md` Claim：`clm_0006` supported 0.86
- `curl 'https://r.jina.ai/https://example.com/' -H 'x-wait-for-selector: #content'` 证据：`README.md` Claim：`clm_0007` supported 0.86
- `curl 'https://r.jina.ai/https://example.com/' -H 'x-timeout: 30' -H 'x-wait-for-selector: non-existent-element'` 证据：`README.md` Claim：`clm_0008` supported 0.86
- `curl -H "Accept: application/json" https://r.jina.ai/https://en.m.wikipedia.org/wiki/Main_Page` 证据：`README.md` Claim：`clm_0009` supported 0.86
- `curl -H "X-With-Generated-Alt: true" https://r.jina.ai/https://en.m.wikipedia.org/wiki/Main_Page` 证据：`README.md` Claim：`clm_0010` supported 0.86
- `git clone git@github.com:jina-ai/reader.git` 证据：`README.md` Claim：`clm_0011` supported 0.86

## 继续前判断卡

- **当前建议**：需要管理员/安全审批
- **为什么**：继续前可能涉及密钥、账号、外部服务或敏感上下文，建议先经过管理员或安全审批。

### 30 秒判断

- **现在怎么做**：需要管理员/安全审批
- **最小安全下一步**：先跑 Prompt Preview；若涉及凭证或企业环境，先审批再试装
- **先别相信**：真实输出质量不能在安装前相信。
- **继续会触碰**：命令执行、宿主 AI 配置、本地环境或项目文件

### 现在可以相信

- **适合人群线索：AI 研究者或研究型 Agent 构建者**（supported）：有 supported claim 或项目证据支撑，但仍不等于真实安装效果。 证据：`README.md` Claim：`clm_0002` supported 0.86
- **能力存在：命令行启动或安装流程**（supported）：可以相信项目包含这类能力线索；是否适合你的具体任务仍要试用或安装后验证。 证据：`README.md` Claim：`clm_0001` supported 0.86
- **存在 Quick Start / 安装命令线索**（supported）：可以相信项目文档出现过启动或安装入口；不要因此直接在主力环境运行。 证据：`README.md` Claim：`clm_0003` supported 0.86

### 现在还不能相信

- **真实输出质量不能在安装前相信。**（unverified）：Prompt Preview 只能展示引导方式，不能证明真实项目中的结果质量。
- **宿主 AI 版本兼容性不能在安装前相信。**（unverified）：Claude、Cursor、Codex、Gemini 等宿主加载规则和版本差异必须在真实环境验证。
- **不会污染现有宿主 AI 行为，不能直接相信。**（inferred）：Skill、plugin、AGENTS/CLAUDE/GEMINI 指令可能改变宿主 AI 的默认行为。 证据：`CLAUDE.md`
- **可安全回滚不能默认相信。**（unverified）：除非项目明确提供卸载和恢复说明，否则必须先在隔离环境验证。
- **真实安装后是否与用户当前宿主 AI 版本兼容？**（unverified）：兼容性只能通过实际宿主环境验证。
- **项目输出质量是否满足用户具体任务？**（unverified）：安装前预览只能展示流程和边界，不能替代真实评测。
- **安装命令是否需要网络、权限或全局写入？**（unverified）：这影响企业环境和个人环境的安装风险。 证据：`README.md`

### 继续会触碰什么

- **命令执行**：包管理器、网络下载、本地插件目录、项目配置或用户主目录。 原因：运行第一条命令就可能产生环境改动；必须先判断是否值得跑。 证据：`README.md`
- **宿主 AI 配置**：Claude/Codex/Cursor/Gemini/OpenCode 等宿主的 plugin、Skill 或规则加载配置。 原因：宿主配置会改变 AI 后续工作方式，可能和用户已有规则冲突。 证据：`CLAUDE.md`
- **本地环境或项目文件**：安装结果、插件缓存、项目配置或本地依赖目录。 原因：安装前无法证明写入范围和回滚方式，需要隔离验证。 证据：`README.md`
- **环境变量 / API Key**：项目入口文档明确出现 API key、token、secret 或账号凭证配置。 原因：如果真实安装需要凭证，应先使用测试凭证并经过权限/合规判断。 证据：`CLAUDE.md`, `CONTRIBUTING.md`, `README.md`
- **宿主 AI 上下文**：AI Context Pack、Prompt Preview、Skill 路由、风险规则和项目事实。 原因：导入上下文会影响宿主 AI 后续判断，必须避免把未验证项包装成事实。

### 最小安全下一步

- **先跑 Prompt Preview**：用安装前交互式试用判断工作方式是否匹配，不需要授权或改环境。（适用：任何项目都适用，尤其是输出质量未知时。）
- **只在隔离目录或测试账号试装**：避免安装命令污染主力宿主 AI、真实项目或用户主目录。（适用：存在命令执行、插件配置或本地写入线索时。）
- **先备份宿主 AI 配置**：Skill、plugin、规则文件可能改变 Claude/Cursor/Codex 的默认行为。（适用：存在插件 manifest、Skill 或宿主规则入口时。）
- **不要使用真实生产凭证**：环境变量/API key 一旦进入宿主或工具链，可能产生账号和合规风险。（适用：出现 API、TOKEN、KEY、SECRET 等环境线索时。）
- **安装后只验证一个最小任务**：先验证加载、兼容、输出质量和回滚，再决定是否深用。（适用：准备从试用进入真实工作流时。）

### 退出方式

- **保留安装前状态**：记录原始宿主配置和项目状态，后续才能判断是否可恢复。
- **准备移除宿主 plugin / Skill / 规则入口**：如果试装后行为异常，可以把宿主 AI 恢复到试装前状态。
- **记录安装命令和写入路径**：没有明确卸载说明时，至少要知道哪些目录或配置需要手动清理。
- **准备撤销测试 API key 或 token**：测试凭证泄露或误用时，可以快速止损。
- **如果没有回滚路径，不进入主力环境**：不可回滚是继续前阻断项，不应靠信任或运气继续。

## 哪些只能预览

- 解释项目适合谁和能做什么
- 基于项目文档演示典型对话流程
- 帮助用户判断是否值得安装或继续研究

## 哪些必须安装后验证

- 真实安装 Skill、插件或 CLI
- 执行脚本、修改本地文件或访问外部服务
- 验证真实输出质量、性能和兼容性

## 边界与风险判断卡

- **把安装前预览误认为真实运行**：用户可能高估项目已经完成的配置、权限和兼容性验证。 处理方式：明确区分 prompt_preview_can_do 与 runtime_required。 Claim：`clm_0012` inferred 0.45
- **命令执行会修改本地环境**：安装命令可能写入用户主目录、宿主插件目录或项目配置。 处理方式：先在隔离环境或测试账号中运行。 证据：`README.md` Claim：`clm_0013` supported 0.86
- **待确认**：真实安装后是否与用户当前宿主 AI 版本兼容？。原因：兼容性只能通过实际宿主环境验证。
- **待确认**：项目输出质量是否满足用户具体任务？。原因：安装前预览只能展示流程和边界，不能替代真实评测。
- **待确认**：安装命令是否需要网络、权限或全局写入？。原因：这影响企业环境和个人环境的安装风险。

## 开工前工作上下文

### 加载顺序

- 先读取 how_to_use.host_ai_instruction，建立安装前判断资产的边界。
- 读取 claim_graph_summary，确认事实来自 Claim/Evidence Graph，而不是 Human Wiki 叙事。
- 再读取 intended_users、capabilities 和 quick_start_candidates，判断用户是否匹配。
- 需要执行具体任务时，优先查 role_skill_index，再查 evidence_index。
- 遇到真实安装、文件修改、网络访问、性能或兼容性问题时，转入 risk_card 和 boundaries.runtime_required。

### 任务路由

- **命令行启动或安装流程**：先说明这是安装后验证能力，再给出安装前检查清单。 边界：必须真实安装或运行后验证。 证据：`README.md` Claim：`clm_0001` supported 0.86

### 上下文规模

- 文件总数：117
- 重要文件覆盖：40/117
- 证据索引条目：37
- 角色 / Skill 条目：5

### 证据不足时的处理

- **missing_evidence**：说明证据不足，要求用户提供目标文件、README 段落或安装后验证记录；不要补全事实。
- **out_of_scope_request**：说明该任务超出当前 AI Context Pack 证据范围，并建议用户先查看 Human Manual 或真实安装后验证。
- **runtime_request**：给出安装前检查清单和命令来源，但不要替用户执行命令或声称已执行。
- **source_conflict**：同时展示冲突来源，标记为待核实，不要强行选择一个版本。

## Prompt Recipes

### 适配判断

- 目标：判断这个项目是否适合用户当前任务。
- 预期输出：适配结论、关键理由、证据引用、安装前可预览内容、必须安装后验证内容、下一步建议。

```text
请基于 reader 的 AI Context Pack，先问我 3 个必要问题，然后判断它是否适合我的任务。回答必须包含：适合谁、能做什么、不能做什么、是否值得安装、证据来自哪里。所有项目事实必须引用 evidence_refs、source_paths 或 claim_id。
```

### 安装前体验

- 目标：让用户在安装前感受核心工作流，同时避免把预览包装成真实能力或营销承诺。
- 预期输出：一段带边界标签的体验剧本、安装后验证清单和谨慎建议；不含真实运行承诺或强营销表述。

```text
请把 reader 当作安装前体验资产，而不是已安装工具或真实运行环境。

请严格输出四段：
1. 先问我 3 个必要问题。
2. 给出一段“体验剧本”：用 [安装前可预览]、[必须安装后验证]、[证据不足] 三种标签展示它可能如何引导工作流。
3. 给出安装后验证清单：列出哪些能力只有真实安装、真实宿主加载、真实项目运行后才能确认。
4. 给出谨慎建议：只能说“值得继续研究/试装”“先补充信息后再判断”或“不建议继续”，不得替项目背书。

硬性边界：
- 不要声称已经安装、运行、执行测试、修改文件或产生真实结果。
- 不要写“自动适配”“确保通过”“完美适配”“强烈建议安装”等承诺性表达。
- 如果描述安装后的工作方式，必须使用“如果安装成功且宿主正确加载 Skill，它可能会……”这种条件句。
- 体验剧本只能写成“示例台词/假设流程”：使用“可能会询问/可能会建议/可能会展示”，不要写“已写入、已生成、已通过、正在运行、正在生成”。
- Prompt Preview 不负责给安装命令；如用户准备试装，只能提示先阅读 Quick Start 和 Risk Card，并在隔离环境验证。
- 所有项目事实必须来自 supported claim、evidence_refs 或 source_paths；inferred/unverified 只能作风险或待确认项。

```

### 角色 / Skill 选择

- 目标：从项目里的角色或 Skill 中挑选最匹配的资产。
- 预期输出：候选角色或 Skill 列表，每项包含适用场景、证据路径、风险边界和是否需要安装后验证。

```text
请读取 role_skill_index，根据我的目标任务推荐 3-5 个最相关的角色或 Skill。每个推荐都要说明适用场景、可能输出、风险边界和 evidence_refs。
```

### 风险预检

- 目标：安装或引入前识别环境、权限、规则冲突和质量风险。
- 预期输出：环境、权限、依赖、许可、宿主冲突、质量风险和未知项的检查清单。

```text
请基于 risk_card、boundaries 和 quick_start_candidates，给我一份安装前风险预检清单。不要替我执行命令，只说明我应该检查什么、为什么检查、失败会有什么影响。
```

### 宿主 AI 开工指令

- 目标：把项目上下文转成一次对话开始前的宿主 AI 指令。
- 预期输出：一段边界明确、证据引用明确、适合复制给宿主 AI 的开工前指令。

```text
请基于 reader 的 AI Context Pack，生成一段我可以粘贴给宿主 AI 的开工前指令。这段指令必须遵守 not_runtime=true，不能声称项目已经安装、运行或产生真实结果。
```

## 角色 / Skill 索引

- 共索引 5 个角色 / Skill / 项目文档条目。

- **Reader**（project_doc）：! codecov https://codecov.io/gh/jina-ai/reader/branch/main/graph/badge.svg https://codecov.io/gh/jina-ai/reader ! Ask DeepWiki https://deepwiki.com/badge.svg https://deepwiki.com/jina-ai/reader 激活提示：当用户需要理解项目结构、安装方式或边界时参考。 证据：`README.md`
- **CLAUDE.md**（project_doc）：This file provides guidance to Claude Code claude.ai/code when working with code in this repository. 激活提示：当用户需要理解项目结构、安装方式或边界时参考。 证据：`CLAUDE.md`
- **Contributing to Reader**（project_doc）：Thanks for your interest in contributing. This is the open source branch of the codebase that runs at https://r.jina.ai and https://s.jina.ai . The MongoDB-backed SaaS storage layer is not part of this branch — local development uses the stateless / bucket-cached modes only. 激活提示：当用户需要理解项目结构、安装方式或边界时参考。 证据：`CONTRIBUTING.md`
- **Reader Cookbooks**（project_doc）：Recipes for shaping Reader's output to fit a specific downstream pipeline. The default output "drop into an LLM and read" is fine for ad-hoc use; the recipes below trade defaults for token efficiency, latency, or compatibility with a specific consumer. 激活提示：当用户需要理解项目结构、安装方式或边界时参考。 证据：`cookbooks.md`
- **Architecture**（project_doc）：Introduction Jina Reader is an API-first SaaS application that turns URLs of web pages, PDFs, and other documents into markdown or images. It's built to help developers prepare data context for LLMs — now widely known as context engineering. 激活提示：当用户需要理解项目结构、安装方式或边界时参考。 证据：`architecture.md`

## 证据索引

- 共索引 37 条证据。

- **Reader**（documentation）：! codecov https://codecov.io/gh/jina-ai/reader/branch/main/graph/badge.svg https://codecov.io/gh/jina-ai/reader ! Ask DeepWiki https://deepwiki.com/badge.svg https://deepwiki.com/jina-ai/reader 证据：`README.md`
- **Package**（package_manifest）：{ "name": "reader", "version": "0.5.0", "scripts": { "lint": "eslint --ext .js,.ts .", "assets:download": "bash ./download-external-assets.sh", "build": "node ./integrity-check.cjs && tsc -p .", "build:watch": "tsc --watch", "build:clean": "rm -rf ./build", "serve": "npm run build && npm run start", "debug": "npm run build && npm run dev", "start": "node ./build/stand-alone/crawl.js", "dry-run": "NODE ENV=dry-run node ./build/stand-alone/search.js", "test:unit": "tsc -p tests/tsconfig.json && node tests-build/run-unit.js", "test:e2e": "tsc -p tests/tsconfig.json && node tests-build/run.js", "test": "npm run test:unit && npm run test:e2e", "test:unit:coverage": "tsc -p tests/tsconfig.json &&… 证据：`package.json`
- **CLAUDE.md**（documentation）：This file provides guidance to Claude Code claude.ai/code when working with code in this repository. 证据：`CLAUDE.md`
- **Contributing to Reader**（documentation）：Thanks for your interest in contributing. This is the open source branch of the codebase that runs at https://r.jina.ai and https://s.jina.ai . The MongoDB-backed SaaS storage layer is not part of this branch — local development uses the stateless / bucket-cached modes only. 证据：`CONTRIBUTING.md`
- **License**（source_file）：Copyright 2020-2024 Jina AI Limited. All rights reserved. 证据：`LICENSE`
- **Reader Cookbooks**（documentation）：Recipes for shaping Reader's output to fit a specific downstream pipeline. The default output "drop into an LLM and read" is fine for ad-hoc use; the recipes below trade defaults for token efficiency, latency, or compatibility with a specific consumer. 证据：`cookbooks.md`
- **Docker Compose**（source_file）：services: minio: image: minio/minio hostname: minio ports: - 9001:9001 - 9000:9000 environment: MINIO ROOT USER: minio MINIO ROOT PASSWORD: minio123 volumes: - minio-data:/data/minio command: server /data/minio --console-address ":9001" healthcheck: test: 'CMD', 'curl', '-f', 'http://localhost:9001/minio/health/live' interval: 30s timeout: 20s retries: 3 networks: default: aliases: - minio.dev.jina.ai volumes: minio-data: 证据：`docker-compose.yml`
- **Google Gemini**（source_file）：import { HTTPService, HTTPServiceRequestOptions } from 'civkit/http'; import from 'lodash'; import { ProxyAgent } from 'undici'; import { InputServerEventStream } from '../lib/transform-server-event-stream'; import { Readable } from 'stream'; ⋮---- export enum HarmCategory { HARM CATEGORY UNSPECIFIED = "HARM CATEGORY UNSPECIFIED", HARM CATEGORY HATE SPEECH = "HARM CATEGORY HATE SPEECH", HARM CATEGORY SEXUALLY EXPLICIT = "HARM CATEGORY SEXUALLY EXPLICIT", HARM CATEGORY HARASSMENT = "HARM CATEGORY HARASSMENT", HARM CATEGORY DANGEROUS CONTENT = "HARM CATEGORY DANGEROUS CONTENT", } export enum HarmBlockThreshold { HARM BLOCK THRESHOLD UNSPECIFIED = "HARM BLOCK THRESHOLD UNSPECIFIED", BLOCK LOW… 证据：`src/3rd-party/google-gemini.ts`
- **Open Router**（source_file）：import { HTTPService, HTTPServiceRequestOptions } from 'civkit/http'; import type OpenAI from 'openai'; import { InputServerEventStream } from '../lib/transform-server-event-stream'; import from 'lodash'; import { ProxyAgent, Agent } from 'undici'; import { Readable } from 'stream'; export class OpenRouterHTTP extends HTTPService ⋮---- constructor public apiKey: string, public userTitle?: string, public userUrl?: string, listModels completions payload: T, opts?: typeof this 'baseOptions' chatCompletions payload: T, opts?: typeof this 'baseOptions' override async processResponse options: HTTPServiceRequestOptions, r: Response : Promise 证据：`src/3rd-party/open-router.ts`
- **Openai Compat**（source_file）：import { HTTPService, HTTPServiceRequestOptions } from 'civkit/http'; import type OpenAI from 'openai'; import { InputServerEventStream } from '../lib/transform-server-event-stream'; import from 'lodash'; import { ProxyAgent, Agent } from 'undici'; import { Readable } from 'stream'; export abstract class OpenAICompatHTTP extends HTTPService ⋮---- constructor public apiKey: string, baseUri: string, public organization?: string listModels getModelDetail model: string, opts?: typeof this 'baseOptions' completions payload: T, opts?: typeof this 'baseOptions' chatCompletions payload: T, opts?: typeof this 'baseOptions' imagesGenerations payload: T, opts?: typeof this 'baseOptions' imagesEdits pa… 证据：`src/3rd-party/openai-compat.ts`
- **Crawler**（source_file）：import { singleton } from 'tsyringe'; import { randomUUID } from 'crypto'; import from 'lodash'; import { Blob, File } from 'buffer'; import { assignTransferProtocolMeta, RPCHost, RPCReflection, AssertionFailureError, ParamValidationError, RawString, ApplicationError, DataStreamBrokenError, OperationNotAllowedError, assignMeta, extractMeta, } from 'civkit/civ-rpc'; import { marshalErrorLike } from 'civkit/lang'; import { Defer } from 'civkit/defer'; import { retryWith } from 'civkit/decorators'; import { FancyFile } from 'civkit/fancy-file'; import { CONTENT FORMAT, CrawlerOptions, CrawlerOptionsHeaderOnly, ENGINE TYPE, RESPOND TIMING } from '../dto/crawler-options'; import { OutputServerEv… 证据：`src/api/crawler.ts`
- **Searcher**（source_file）：import { singleton } from 'tsyringe'; import { assignTransferProtocolMeta, RPCHost, RPCReflection, AssertionFailureError, assignMeta, RawString, DownstreamServiceFailureError, AuthenticationRequiredError, ArrayOf, } from 'civkit/civ-rpc'; import { marshalErrorLike } from 'civkit/lang'; import { objHashMd5B64Of } from 'civkit/hash'; import from 'lodash'; import { CrawlerHost, ExtraScrappingOptions } from './crawler'; import { CrawlerOptions, RESPOND TIMING } from '../dto/crawler-options'; import { SnapshotFormatter, FormattedPage as RealFormattedPage, FormattedPageDto } from '../services/snapshot-formatter'; import { GoogleSearchExplicitOperatorsDto } from '../services/serper-search'; import… 证据：`src/api/searcher.ts`
- **Instruct Blip**（source_file）：import { Coercible, Prop } from 'civkit/coercible'; import { ReplicateHTTP } from '../../3rd-party/replicate'; import from 'lodash'; import { injectable } from 'tsyringe'; import { AbstractImageInterrogationModel, ImageInterrogationOptions } from './base'; import { EnvConfig } from '../envconfig'; import { GlobalLogger } from '../logger'; export class ReplicateInstructBlipVicuna13bModelOptions extends Coercible ⋮---- export class InstructBLIP extends AbstractImageInterrogationModel ⋮---- constructor protected envConfig: EnvConfig, protected globalLogger: GlobalLogger, override async init override async run client: this 'clients' number , modelOpts: U : Promise 证据：`src/services/common-iminterrogate/instruct-blip.ts`
- **Llms**（source_file）：import from "lodash"; import { injectable } from "tsyringe"; import { AbstractImageInterrogationModel, ImageInterrogationOptions } from './base'; import { EnvConfig } from '../envconfig'; import { GlobalLogger } from '../logger'; import type { AbstractLLM, PromptChunk } from '../common-llm/base'; export function imageInterrogationWithVisualLLM llm: AbstractLLM : any ⋮---- @injectable class LLMDerivedImageInterrogator extends AbstractImageInterrogationModel ⋮---- class LLMDerivedImageInterrogator extends AbstractImageInterrogationModel ⋮---- constructor protected envConfig: EnvConfig, protected globalLogger: GlobalLogger, override async init override async withClient = any func: U, options:… 证据：`src/services/common-iminterrogate/llms.ts`
- **Google Gemini**（source_file）：import { Coercible, DownstreamServiceFailureError, Prop, isCoercibleClass } from 'civkit/civ-rpc'; import { AbstractLLM, DependsOnOptions, DetectFunctions, LLMDto, LLMModelOptions, PromptChunk } from "./base"; import from "lodash"; import { isReadable, once, Readable, Transform, TransformCallback } from "stream"; import { injectable } from "tsyringe"; import { TempFileManager } from '../temp-file'; import { readFile } from 'fs/promises'; import { FunctionCallingAwareLLMMessage, FunctionCallingAwareLLMModelOptions, LLMFunctionCallRequest, LLMFunctionCallResponse, LLMMessage, LLMPeakStream } from './misc'; import { FunctionCallingMode, HarmBlockThreshold, HarmCategory, GenerateContentCandidat… 证据：`src/services/common-llm/google-gemini.ts`
- **Misc**（source_file）：import { randomBytes, randomUUID } from 'crypto'; import path from 'path'; import fsp from 'fs/promises'; import { countGPTToken } from '../../utils/openai'; import from 'lodash'; import { Writable } from 'stream'; import { JSONAccumulation, JSONParserStream, JSONParserStreamOptions } from '../../lib/json-parse-stream'; import { FancyFile } from 'civkit/fancy-file'; import { Coercible, Prop } from 'civkit/coercible'; export type PromptChunk = string URL Buffer File object; export interface LLMMessage { role: 'user' 'system' 'assistant' string, content: string PromptChunk null, name?: string; k: string : any; } export type FunctionCallingAwareLLMMessage = LLMMessage LLMFunctionCallRequest LL… 证据：`src/services/common-llm/misc.ts`
- **Open Router**（source_file）：import { ArrayOf, Coercible, DownstreamServiceFailureError, isCoercibleClass, ParamValidationError, Pick, Prop } from 'civkit/civ-rpc'; import { injectable } from 'tsyringe'; import { isReadable, once, Readable } from 'stream'; import { chatMLEncode, FunctionCallingAwareLLMMessage, FunctionCallingAwareLLMModelOptions, LLMFunctionCallRequest, LLMFunctionCallResponse, LLMMessage, LLMModelOptions, LLMPeakStream, parseJSON, PromptChunk, stringPromptChunks } from './misc'; import { AbstractLLM, DependsOnOptions, DetectFunctions, LLMDto } from './base'; import { OpenRouterHTTP } from '../../3rd-party/open-router'; import { EnvConfig } from '../envconfig'; import { GlobalLogger } from '../logger';… 证据：`src/services/common-llm/open-router.ts`
- **Misc**（source_file）：import { singleton } from 'tsyringe'; import { AsyncService } from 'civkit/async-service'; import { ParamValidationError } from 'civkit/civ-rpc'; import { isIP } from 'node:net'; import { isIPInNonPublicRange } from '../utils/ip'; import { GlobalLogger } from './logger'; import { lookup } from 'node:dns/promises'; import { Threaded } from './threaded'; import { SecurityCompromiseError } from './errors'; import { GeoIPService } from './geoip'; import from 'lodash'; ⋮---- export class MiscService extends AsyncService ⋮---- constructor protected globalLogger: GlobalLogger, protected geoIpService: GeoIPService, override async init ⋮---- async assertNormalizedUrl input: string 证据：`src/services/misc.ts`
- **Puppeteer**（source_file）：import from 'lodash'; import { isIP } from 'net'; import { readFile } from 'fs/promises'; import fs from 'fs'; import { Blob } from 'buffer'; import { container, singleton } from 'tsyringe'; import type { Browser, CookieParam, GoToOptions, HTTPRequest, HTTPResponse, Page, Viewport } from 'puppeteer'; import type { Cookie } from 'set-cookie-parser'; import puppeteer, { TimeoutError } from 'puppeteer'; import { Defer, Deferred } from 'civkit/defer'; import { AssertionFailureError, ParamValidationError } from 'civkit/civ-rpc'; import { AsyncService } from 'civkit/async-service'; import { FancyFile } from 'civkit/fancy-file'; import { delay } from 'civkit/timeout'; import { CurlControl } from '… 证据：`src/services/puppeteer.ts`
- **Bing**（source_file）：import { singleton } from 'tsyringe'; import { AsyncService } from 'civkit/async-service'; import { GlobalLogger } from '../logger'; import { JSDomControl } from '../jsdom'; import { isMainThread } from 'worker threads'; import from 'lodash'; import { WebSearchEntry } from './compat'; import { ScrappingOptions, SERPSpecializedPuppeteerControl } from './puppeteer'; import { CurlControl } from '../curl'; import { ApplicationError } from 'civkit/civ-rpc'; import { ServiceBadApproachError, ServiceBadAttemptError } from '../errors'; import { retry, retryWith } from 'civkit/decorators'; import { SERPProxyProviderService } from '../proxy-provider'; import { readBlob, WEB SUPPORTED ENCODINGS } from… 证据：`src/services/serp/bing.ts`
- **Compat**（source_file）：export interface WebSearchEntry { link: string; title: string; source?: string; date?: string; snippet?: string; imageUrl?: string; siteLinks?: { link: string; title: string; snippet?: string; } ; variant?: 'web' 'images' 'news'; } 证据：`src/services/serp/compat.ts`
- **Google**（source_file）：import { singleton } from 'tsyringe'; import { AsyncService } from 'civkit/async-service'; import { GlobalLogger } from '../logger'; import { JSDomControl } from '../jsdom'; import { isMainThread } from 'worker threads'; import from 'lodash'; import { WebSearchEntry } from './compat'; import { ScrappingOptions, SERPSpecializedPuppeteerControl } from './puppeteer'; import { CurlControl } from '../curl'; import { ApplicationError } from 'civkit/civ-rpc'; import { ServiceBadApproachError, ServiceBadAttemptError } from '../errors'; import { parseJSONText } from 'civkit/vectorize'; import { retry, retryWith } from 'civkit/decorators'; import { SERPProxyProviderService } from '../proxy-provider';… 证据：`src/services/serp/google.ts`
- **Puppeteer**（source_file）：import from 'lodash'; import { Blob } from 'buffer'; import { readFile } from 'fs/promises'; import { container, singleton } from 'tsyringe'; import type { Browser, BrowserContext, CookieParam, GoToOptions, Page, Viewport } from 'puppeteer'; import type { Cookie } from 'set-cookie-parser'; import puppeteer, { TimeoutError } from 'puppeteer'; import { Defer } from 'civkit/defer'; import { AssertionFailureError, ParamValidationError } from 'civkit/civ-rpc'; import { AsyncService } from 'civkit/async-service'; import { FancyFile } from 'civkit/fancy-file'; import { delay } from 'civkit/timeout'; import { SecurityCompromiseError, ServiceCrashedError, ServiceNodeResourceDrainError } from '../err… 证据：`src/services/serp/puppeteer.ts`
- **Serper**（source_file）：import { singleton } from 'tsyringe'; import { GlobalLogger } from '../logger'; import { AsyncLocalContext } from '../async-context'; import { SerperBingHTTP, SerperGoogleHTTP, SerperImageSearchResponse, SerperNewsSearchResponse, SerperSearchQueryParams, SerperWebSearchResponse } from '../../3rd-party/serper-search'; import { BlackHoleDetector } from '../blackhole-detector'; import { Context } from '../registry'; import { AsyncService } from 'civkit/async-service'; import { Coercible, Prop, RPC CALL ENVIRONMENT } from 'civkit/civ-rpc'; import { EnvConfig } from '../envconfig'; ⋮---- export class SerperGoogleSearchService extends AsyncService ⋮---- constructor protected globalLogger: GlobalL… 证据：`src/services/serp/serper.ts`
- **Markdown**（source_file）：export function tidyMarkdown markdown: string : string ⋮---- // Normalize by removing excessive spaces and new lines ⋮---- // Step 2: Normalize regular links that may be broken across lines ⋮---- // Step 3: Replace more than two consecutive empty lines with exactly two empty lines 证据：`src/utils/markdown.ts`
- **Misc**（source_file）：import { ParamValidationError } from 'civkit/civ-rpc'; export function cleanAttribute attribute: string null 证据：`src/utils/misc.ts`
- **Architecture**（documentation）：Introduction Jina Reader is an API-first SaaS application that turns URLs of web pages, PDFs, and other documents into markdown or images. It's built to help developers prepare data context for LLMs — now widely known as context engineering. 证据：`architecture.md`
- **.C8Rc**（structured_config）：{ "all": true, "src": "src" , "include": "src/ / .ts" , "exclude": "src/ / .d.ts", "tests/ ", "tests-build/ ", "src/scripts/ " , "reporter": "text", "html", "lcov" , "reports-dir": "coverage", "excludeAfterRemap": true, "skip-full": false } 证据：`.c8rc.json`
- **Tsconfig**（structured_config）：{ "compilerOptions": { "module": "node16", 证据：`tsconfig.json`
- **EditorConfig is awesome: https://EditorConfig.org**（source_file）：EditorConfig is awesome: https://EditorConfig.org 证据：`.editorconfig`
- **Logs**（source_file）：Logs logs .log npm-debug.log yarn-debug.log yarn-error.log firebase-debug.log firebase-debug. .log 证据：`.gitignore`
- **.nvmrc**（source_file）：24 证据：`.nvmrc`
- **syntax=docker/dockerfile:1**（source_file）：syntax=docker/dockerfile:1 FROM node:24 AS base 证据：`Dockerfile`
- **Download External Assets**（source_file）：set -u if "${SKIP DOWNLOAD EXTERNAL:-}" ; then echo " download-external SKIP DOWNLOAD EXTERNAL set, skipping." exit 0 fi cd "$ dirname "$0" " mkdir -p licensed ARTIFACTS= "GeoLite2-City.mmdb https://raw.githubusercontent.com/P3TERX/GeoLite.mmdb/download/GeoLite2-City.mmdb" "geolite2-asn.mmdb https://cdn.jsdelivr.net/npm/@ip-location-db/geolite2-asn-mmdb/geolite2-asn.mmdb" "SourceHanSansSC-Regular.otf https://raw.githubusercontent.com/adobe-fonts/source-han-sans/refs/heads/release/OTF/SimplifiedChinese/SourceHanSansSC-Regular.otf" "gsa useragents.txt https://raw.githubusercontent.com/searxng/searxng/refs/heads/master/searx/data/gsa useragents.txt" failed=0 for entry in "${ARTIFACTS @ }"; do… 证据：`download-external-assets.sh`
- **───── Live user-initiated fetches allow ─────**（source_file）：───── Live user-initiated fetches allow ───── User-agent: Claude-User User-agent: ChatGPT-User User-agent: Perplexity-User User-agent: Meta-ExternalFetcher User-agent: DuckAssistBot User-agent: MistralAI-User User-agent: kagi-fetcher User-agent: Manus-User Allow: / 证据：`public/robots.txt`
- **Config**（source_file）：import { container } from 'tsyringe'; import { StorageLayer } from './db/noop-storage'; import { BaseAuthDTO } from './dto/base-auth'; 证据：`src/config.ts`
- **Types.D**（source_file）：import EventEmitter from 'events'; export class JSDOM ⋮---- constructor html: string, options?: any ; ⋮---- export class VirtualConsole extends EventEmitter ⋮---- constructor ; sendTo console: any, options?: any ; ⋮---- import { Duplex } from 'stream'; export function ZSTDCompress lvl: Number : Duplex; export function ZSTDDecompress : Duplex; export function ZSTDDecompressMaybe : Duplex; 证据：`src/types.d.ts`

## 宿主 AI 必须遵守的规则

- **把本资产当作开工前上下文，而不是运行环境。**：AI Context Pack 只包含证据化项目理解，不包含目标项目的可执行状态。 证据：`README.md`, `package.json`, `CLAUDE.md`
- **回答用户时区分可预览内容与必须安装后才能验证的内容。**：安装前体验的消费者价值来自降低误装和误判，而不是伪装成真实运行。 证据：`README.md`, `package.json`, `CLAUDE.md`

## 用户开工前应该回答的问题

- 你准备在哪个宿主 AI 或本地环境中使用它？
- 你只是想先体验工作流，还是准备真实安装？
- 你最在意的是安装成本、输出质量、还是和现有规则的冲突？

## 验收标准

- 所有能力声明都能回指到 evidence_refs 中的文件路径。
- AI_CONTEXT_PACK.md 没有把预览包装成真实运行。
- 用户能在 3 分钟内看懂适合谁、能做什么、如何开始和风险边界。

---

## Doramagic Context Augmentation

下面内容用于强化 Repomix/AI Context Pack 主体。Human Manual 只提供阅读骨架；踩坑日志会被转成宿主 AI 必须遵守的工作约束。

## Human Manual 骨架

使用规则：这里只是项目阅读路线和显著性信号，不是事实权威。具体事实仍必须回到 repo evidence / Claim Graph。

宿主 AI 硬性规则：
- 不得把页标题、章节顺序、摘要或 importance 当作项目事实证据。
- 解释 Human Manual 骨架时，必须明确说它只是阅读路线/显著性信号。
- 能力、安装、兼容性、运行状态和风险判断必须引用 repo evidence、source path 或 Claim Graph。

- **Overview and System Architecture**：importance `high`
  - source_paths: README.md, architecture.md, src/config.ts, src/api/crawler.ts, src/api/searcher.ts
- **Read Pipeline: Engines, Extraction and LLM/VLM**：importance `high`
  - source_paths: src/dto/crawler-options.ts, src/dto/turndown-tweakable-options.ts, src/dto/base-auth.ts, src/services/puppeteer.ts, src/services/curl.ts
- **Search and SERP Integration**：importance `high`
  - source_paths: src/api/searcher.ts, src/api/serp.ts, src/stand-alone/search.ts, src/stand-alone/serp.ts, src/services/serp/common-serp.ts
- **Deployment, Security, and Common Failure Modes**：importance `high`
  - source_paths: Dockerfile, docker-compose.yml, CONTRIBUTING.md, CLAUDE.md, cookbooks.md

## Repo Inspection Evidence / 源码检查证据

- repo_clone_verified: true
- repo_inspection_verified: true
- repo_commit: `1574bfd380d249c86c82db4dace0d9c8fe17e2b1`
- inspected_files: `Dockerfile`, `README.md`, `docker-compose.yml`, `package.json`, `src/3rd-party/anthropic.ts`, `src/3rd-party/cloud-flare.ts`, `src/3rd-party/common-serp.ts`, `src/3rd-party/google-gemini.ts`, `src/3rd-party/internal-cloudrun.ts`, `src/3rd-party/jina-embeddings.ts`, `src/3rd-party/open-router.ts`, `src/3rd-party/openai-compat.ts`, `src/3rd-party/openai.ts`, `src/3rd-party/replicate.ts`, `src/3rd-party/serper-search.ts`, `src/api/crawler.ts`, `src/api/searcher.ts`, `src/api/serp.ts`, `src/config.ts`, `src/db/bucket-storage.ts`

宿主 AI 硬性规则：
- 没有 repo_clone_verified=true 时，不得声称已经读过源码。
- 没有 repo_inspection_verified=true 时，不得把 README/docs/package 文件判断写成事实。
- 没有 quick_start_verified=true 时，不得声称 Quick Start 已跑通。

## Doramagic Pitfall Constraints / 踩坑约束

这些规则来自 Doramagic 发现、验证或编译过程中的项目专属坑点。宿主 AI 必须把它们当作工作约束，而不是普通说明文字。

### Constraint 1: 失败模式：security_permissions: Server-Side Request Forgery via domain resolution bypass in self-hosted deployments

- Trigger: Developers should check this security_permissions risk before relying on the project: Server-Side Request Forgery via domain resolution bypass in self-hosted deployments
- Host AI rule: Before packaging this project, run the relevant install/config/quickstart check for: Server-Side Request Forgery via domain resolution bypass in self-hosted deployments. Context: Observed when using docker
- Why it matters: Developers may expose sensitive permissions or credentials: Server-Side Request Forgery via domain resolution bypass in self-hosted deployments
- Evidence: failure_mode_cluster:github_issue | https://github.com/jina-ai/reader/issues/1253 | Server-Side Request Forgery via domain resolution bypass in self-hosted deployments
- Hard boundary: 不要把这个坑点包装成已解决、已验证或可忽略，除非后续验证证据明确证明它已经关闭。

### Constraint 2: 失败模式：security_permissions: Unauthenticated SSRF via unvalidated HTTP redirects (single-shot SSRF gate not re-applied per...

- Trigger: Developers should check this security_permissions risk before relying on the project: Unauthenticated SSRF via unvalidated HTTP redirects (single-shot SSRF gate not re-applied per redirect hop)
- Host AI rule: Before packaging this project, run the relevant install/config/quickstart check for: Unauthenticated SSRF via unvalidated HTTP redirects (single-shot SSRF gate not re-applied per redirect hop). Context: Source discussion did not expose a precise runtime context.
- Why it matters: Developers may expose sensitive permissions or credentials: Unauthenticated SSRF via unvalidated HTTP redirects (single-shot SSRF gate not re-applied per redirect hop)
- Evidence: failure_mode_cluster:github_issue | https://github.com/jina-ai/reader/issues/1252 | Unauthenticated SSRF via unvalidated HTTP redirects (single-shot SSRF gate not re-applied per redirect hop)
- Hard boundary: 不要把这个坑点包装成已解决、已验证或可忽略，除非后续验证证据明确证明它已经关闭。

### Constraint 3: 依赖 Docker 环境

- Trigger: 安装/运行入口包含 Docker 命令：docker run --rm -p 3000:8081 ghcr.io/jina-ai/reader:oss # then: curl http://localhost:3000/https://example.com
- Host AI rule: 标注 Docker 前置条件，并提供非 Docker 路径或失败提示。
- Why it matters: 非工程用户可能没有 Docker，启动成本明显增加。
- Evidence: identity.distribution | https://github.com/jina-ai/reader | docker run --rm -p 3000:8081 ghcr.io/jina-ai/reader:oss # then: curl http://localhost:3000/https://example.com
- Hard boundary: 不要把这个坑点包装成已解决、已验证或可忽略，除非后续验证证据明确证明它已经关闭。

### Constraint 4: 失败模式：installation: npm run build failed because shared files are not found

- Trigger: Developers should check this installation risk before relying on the project: npm run build failed because shared files are not found
- Host AI rule: Before packaging this project, run the relevant install/config/quickstart check for: npm run build failed because shared files are not found. Context: Observed when using node
- Why it matters: Developers may fail before the first successful local run: npm run build failed because shared files are not found
- Evidence: failure_mode_cluster:github_issue | https://github.com/jina-ai/reader/issues/3 | npm run build failed because shared files are not found
- Hard boundary: 不要把这个坑点包装成已解决、已验证或可忽略，除非后续验证证据明确证明它已经关闭。

### Constraint 5: 来源证据：npm run build failed because shared files are not found

- Trigger: GitHub 社区证据显示该项目存在一个安装相关的待验证问题：npm run build failed because shared files are not found
- Host AI rule: 来源显示可能已有修复、规避或版本变化，说明书中必须标注适用版本。
- Why it matters: 可能阻塞安装或首次运行。
- Evidence: community_evidence:github | https://github.com/jina-ai/reader/issues/3 | 来源讨论提到 npm 相关条件，需在安装/试用前复核。
- Hard boundary: 不要把这个坑点包装成已解决、已验证或可忽略，除非后续验证证据明确证明它已经关闭。

### Constraint 6: 来源证据：support docker deployment

- Trigger: GitHub 社区证据显示该项目存在一个安装相关的待验证问题：support docker deployment
- Host AI rule: 来源显示可能已有修复、规避或版本变化，说明书中必须标注适用版本。
- Why it matters: 可能增加新用户试用和生产接入成本。
- Evidence: community_evidence:github | https://github.com/jina-ai/reader/issues/2 | 来源讨论提到 docker 相关条件，需在安装/试用前复核。
- Hard boundary: 不要把这个坑点包装成已解决、已验证或可忽略，除非后续验证证据明确证明它已经关闭。

### Constraint 7: 失败模式：configuration: Improve content extraction logic to handle dynamic and hidden elements

- Trigger: Developers should check this configuration risk before relying on the project: Improve content extraction logic to handle dynamic and hidden elements
- Host AI rule: Before packaging this project, run the relevant install/config/quickstart check for: Improve content extraction logic to handle dynamic and hidden elements. Context: Observed when using playwright
- Why it matters: Developers may misconfigure credentials, environment, or host setup: Improve content extraction logic to handle dynamic and hidden elements
- Evidence: failure_mode_cluster:github_issue | https://github.com/jina-ai/reader/issues/1242 | Improve content extraction logic to handle dynamic and hidden elements
- Hard boundary: 不要把这个坑点包装成已解决、已验证或可忽略，除非后续验证证据明确证明它已经关闭。

### Constraint 8: 失败模式：configuration: Respect robots.txt and identify your system

- Trigger: Developers should check this configuration risk before relying on the project: Respect robots.txt and identify your system
- Host AI rule: Before packaging this project, run the relevant install/config/quickstart check for: Respect robots.txt and identify your system. Context: Source discussion did not expose a precise runtime context.
- Why it matters: Developers may misconfigure credentials, environment, or host setup: Respect robots.txt and identify your system
- Evidence: failure_mode_cluster:github_issue | https://github.com/jina-ai/reader/issues/4 | Respect robots.txt and identify your system
- Hard boundary: 不要把这个坑点包装成已解决、已验证或可忽略，除非后续验证证据明确证明它已经关闭。

### Constraint 9: 失败模式：configuration: support docker deployment

- Trigger: Developers should check this configuration risk before relying on the project: support docker deployment
- Host AI rule: Before packaging this project, run the relevant install/config/quickstart check for: support docker deployment. Context: Observed when using docker
- Why it matters: Developers may misconfigure credentials, environment, or host setup: support docker deployment
- Evidence: failure_mode_cluster:github_issue | https://github.com/jina-ai/reader/issues/2 | support docker deployment
- Hard boundary: 不要把这个坑点包装成已解决、已验证或可忽略，除非后续验证证据明确证明它已经关闭。

### Constraint 10: 能力判断依赖假设

- Trigger: README/documentation is current enough for a first validation pass.
- Host AI rule: 将假设转成下游验证清单。
- Why it matters: 假设不成立时，用户拿不到承诺的能力。
- Evidence: capability.assumptions | https://github.com/jina-ai/reader | README/documentation is current enough for a first validation pass.
- Hard boundary: 不要把这个坑点包装成已解决、已验证或可忽略，除非后续验证证据明确证明它已经关闭。
