# reader - Doramagic AI Context Pack

> Purpose: pre-work context for the user's host AI. This pack does not prove that the project has been installed, run, or validated.

## Project

- canonical_name: `jina-ai/reader`
- capability: Convert any URL to an LLM-friendly input with a simple prefix https://r.jina.ai/
- expected_user_outcome: Convert any URL to an LLM-friendly input with a simple prefix https://r.jina.ai/

## Operating Boundaries

- Do not claim that the project has been installed, run, called through an API, or used on local files unless separate evidence proves it.
- Project facts must come from repo evidence, Claim Graph, or explicit source references.
- When a capability is not verified, mark it as unverified instead of completing it as fact.
- publish_status: `publishable`
- blocking_gaps: none

---

## Doramagic Context Augmentation

The following sections strengthen the repository context for a host AI. Human Manual data is a reading route, and pitfall notes become operating constraints.

## Human Manual Outline

Usage rule: this is only a reading route and salience signal, not factual authority. Concrete claims must still return to repo evidence or Claim Graph.

Host AI hard rules:
- Do not treat page titles, section order, summaries, or importance values as factual project evidence.
- When explaining the Human Manual outline, state that it is only a reading route or salience signal.
- Capability, installation, compatibility, runtime state, and risk claims must cite repo evidence, source paths, or Claim Graph.

- **System Overview & Architecture**: importance `high`
  - source_paths: README.md, architecture.md, CLAUDE.md, src/api/crawler.ts, src/api/searcher.ts
- **URL Fetching Engines & Content Extraction**: importance `high`
  - source_paths: src/services/puppeteer.ts, src/services/curl.ts, src/services/markify.ts, src/services/pdf-extract.ts, src/services/soffice.ts
- **Security, SSRF Protection & Abuse Mitigation**: importance `high`
  - source_paths: src/services/misc.ts, src/utils/ip.ts, src/services/geoip.ts, src/services/ipasn.ts, src/services/blackhole-detector.ts
- **Search, Proxies, Caching & Self-Hosting Deployment**: importance `high`
  - source_paths: src/services/serp/common-serp.ts, src/services/serp/google.ts, src/services/serp/bing.ts, src/services/serp/serper.ts, src/services/serp/puppeteer.ts

## Repo Inspection Evidence

- repo_clone_verified: true
- repo_inspection_verified: true
- repo_commit: `1574bfd380d249c86c82db4dace0d9c8fe17e2b1`
- inspected_files: `Dockerfile`, `README.md`, `docker-compose.yml`, `package.json`, `src/3rd-party/anthropic.ts`, `src/3rd-party/cloud-flare.ts`, `src/3rd-party/common-serp.ts`, `src/3rd-party/google-gemini.ts`, `src/3rd-party/internal-cloudrun.ts`, `src/3rd-party/jina-embeddings.ts`, `src/3rd-party/open-router.ts`, `src/3rd-party/openai-compat.ts`, `src/3rd-party/openai.ts`, `src/3rd-party/replicate.ts`, `src/3rd-party/serper-search.ts`, `src/api/crawler.ts`, `src/api/searcher.ts`, `src/api/serp.ts`, `src/config.ts`, `src/db/bucket-storage.ts`

Host AI hard rules:
- Without repo_clone_verified=true, do not claim that the source code has been read.
- Without repo_inspection_verified=true, do not write README, docs, or package-file conclusions as facts.
- Without quick_start_verified=true, do not claim that the Quick Start path has run successfully.

## Doramagic Pitfall Constraints

These rules come from Doramagic discovery, validation, or compilation findings. The host AI must treat them as operating constraints, not background notes.

### Constraint 1: Security or permission risk requires verification

- Trigger: Developers should check this security_permissions risk before relying on the project: Server-Side Request Forgery via domain resolution bypass in self-hosted deployments
- Host AI rule: Before packaging this project, run the relevant install/config/quickstart check for: Server-Side Request Forgery via domain resolution bypass in self-hosted deployments. Context: Observed when using docker
- Why it matters: Developers may expose sensitive permissions or credentials: Server-Side Request Forgery via domain resolution bypass in self-hosted deployments
- Evidence: failure_mode_cluster:github_issue | https://github.com/jina-ai/reader/issues/1253
- Hard boundary: Do not present this pitfall as solved, verified, or ignorable unless later evidence explicitly closes it.

### Constraint 2: Security or permission risk requires verification

- Trigger: Developers should check this security_permissions risk before relying on the project: Unauthenticated SSRF via unvalidated HTTP redirects (single-shot SSRF gate not re-applied per redirect hop)
- Host AI rule: Before packaging this project, run the relevant install/config/quickstart check for: Unauthenticated SSRF via unvalidated HTTP redirects (single-shot SSRF gate not re-applied per redirect hop). Context: Source discussion did not expose a precise runtime context.
- Why it matters: Developers may expose sensitive permissions or credentials: Unauthenticated SSRF via unvalidated HTTP redirects (single-shot SSRF gate not re-applied per redirect hop)
- Evidence: failure_mode_cluster:github_issue | https://github.com/jina-ai/reader/issues/1252
- Hard boundary: Do not present this pitfall as solved, verified, or ignorable unless later evidence explicitly closes it.

### Constraint 3: Installation risk requires verification

- Trigger: Project evidence flags a installation risk. Review the linked source before relying on this workflow.
- Host AI rule: Reproduce the official install and quickstart path in an isolated environment.
- Why it matters: May increase setup, validation, or first-run risk for the user.
- Evidence: identity.distribution | https://github.com/jina-ai/reader
- Hard boundary: Do not present this pitfall as solved, verified, or ignorable unless later evidence explicitly closes it.

### Constraint 4: Installation risk requires verification

- Trigger: Developers should check this installation risk before relying on the project: npm run build failed because shared files are not found
- Host AI rule: Before packaging this project, run the relevant install/config/quickstart check for: npm run build failed because shared files are not found. Context: Observed when using node
- Why it matters: Developers may fail before the first successful local run: npm run build failed because shared files are not found
- Evidence: failure_mode_cluster:github_issue | https://github.com/jina-ai/reader/issues/3
- Hard boundary: Do not present this pitfall as solved, verified, or ignorable unless later evidence explicitly closes it.

### Constraint 5: Installation risk requires verification

- Trigger: Project evidence flags a installation risk. Review the linked source before relying on this workflow.
- Host AI rule: Reproduce the official install and quickstart path in an isolated environment.
- Why it matters: May increase setup, validation, or first-run risk for the user.
- Evidence: community_evidence:github | https://github.com/jina-ai/reader/issues/3
- Hard boundary: Do not present this pitfall as solved, verified, or ignorable unless later evidence explicitly closes it.

### Constraint 6: Installation risk requires verification

- Trigger: Project evidence flags a installation risk. Review the linked source before relying on this workflow.
- Host AI rule: Reproduce the official install and quickstart path in an isolated environment.
- Why it matters: May increase setup, validation, or first-run risk for the user.
- Evidence: community_evidence:github | https://github.com/jina-ai/reader/issues/2
- Hard boundary: Do not present this pitfall as solved, verified, or ignorable unless later evidence explicitly closes it.

### Constraint 7: Configuration risk requires verification

- Trigger: Developers should check this configuration risk before relying on the project: Improve content extraction logic to handle dynamic and hidden elements
- Host AI rule: Before packaging this project, run the relevant install/config/quickstart check for: Improve content extraction logic to handle dynamic and hidden elements. Context: Observed when using playwright
- Why it matters: Developers may misconfigure credentials, environment, or host setup: Improve content extraction logic to handle dynamic and hidden elements
- Evidence: failure_mode_cluster:github_issue | https://github.com/jina-ai/reader/issues/1242
- Hard boundary: Do not present this pitfall as solved, verified, or ignorable unless later evidence explicitly closes it.

### Constraint 8: Configuration risk requires verification

- Trigger: Developers should check this configuration risk before relying on the project: Respect robots.txt and identify your system
- Host AI rule: Before packaging this project, run the relevant install/config/quickstart check for: Respect robots.txt and identify your system. Context: Source discussion did not expose a precise runtime context.
- Why it matters: Developers may misconfigure credentials, environment, or host setup: Respect robots.txt and identify your system
- Evidence: failure_mode_cluster:github_issue | https://github.com/jina-ai/reader/issues/4
- Hard boundary: Do not present this pitfall as solved, verified, or ignorable unless later evidence explicitly closes it.

### Constraint 9: Configuration risk requires verification

- Trigger: Developers should check this configuration risk before relying on the project: support docker deployment
- Host AI rule: Before packaging this project, run the relevant install/config/quickstart check for: support docker deployment. Context: Observed when using docker
- Why it matters: Developers may misconfigure credentials, environment, or host setup: support docker deployment
- Evidence: failure_mode_cluster:github_issue | https://github.com/jina-ai/reader/issues/2
- Hard boundary: Do not present this pitfall as solved, verified, or ignorable unless later evidence explicitly closes it.

### Constraint 10: Capability evidence risk requires verification

- Trigger: README/documentation is current enough for a first validation pass.
- Host AI rule: Reproduce the official install and quickstart path in an isolated environment.
- Why it matters: May increase setup, validation, or first-run risk for the user.
- Evidence: capability.assumptions | https://github.com/jina-ai/reader
- Hard boundary: Do not present this pitfall as solved, verified, or ignorable unless later evidence explicitly closes it.