# opendataloader-pdf - Doramagic AI Context Pack

> Purpose: pre-work context for the user's host AI. This pack does not prove that the project has been installed, run, or validated.

## Project

- canonical_name: `opendataloader-project/opendataloader-pdf`
- capability: PDF Parser for AI-ready data. Automate PDF accessibility. Open-source.
- expected_user_outcome: PDF Parser for AI-ready data. Automate PDF accessibility. Open-source.

## Operating Boundaries

- Do not claim that the project has been installed, run, called through an API, or used on local files unless separate evidence proves it.
- Project facts must come from repo evidence, Claim Graph, or explicit source references.
- When a capability is not verified, mark it as unverified instead of completing it as fact.
- publish_status: `publishable`
- blocking_gaps: none

---

## Doramagic Context Augmentation

The following sections strengthen the repository context for a host AI. Human Manual data is a reading route, and pitfall notes become operating constraints.

## Human Manual Outline

Usage rule: this is only a reading route and salience signal, not factual authority. Concrete claims must still return to repo evidence or Claim Graph.

Host AI hard rules:
- Do not treat page titles, section order, summaries, or importance values as factual project evidence.
- When explaining the Human Manual outline, state that it is only a reading route or salience signal.
- Capability, installation, compatibility, runtime state, and risk claims must cite repo evidence, source paths, or Claim Graph.

- **Project Overview and System Architecture**: importance `high`
  - source_paths: README.md, java/pom.xml, java/opendataloader-pdf-core/pom.xml, java/opendataloader-pdf-cli/pom.xml, java/opendataloader-pdf-core/src/main/java/org/opendataloader/pdf/api/OpenDataLoaderPDF.java
- **Core Processing Pipeline and PDF Element Detection**: importance `high`
  - source_paths: java/opendataloader-pdf-core/src/main/java/org/opendataloader/pdf/processors/DocumentProcessor.java, java/opendataloader-pdf-core/src/main/java/org/opendataloader/pdf/processors/HeadingProcessor.java, java/opendataloader-pdf-core/src/main/java/org/opendataloader/pdf/processors/TableBorderProcessor.java, java/opendataloader-pdf-core/src/main/java/org/opendataloader/pdf/processors/ClusterTableProcessor.java, java/opendataloader-pdf-core/src/main/java/org/opendataloader/pdf/processors/ListProcessor.java
- **Hybrid AI Mode, Output Generators, and JSON Schema**: importance `high`
  - source_paths: java/opendataloader-pdf-core/src/main/java/org/opendataloader/pdf/processors/HybridDocumentProcessor.java, java/opendataloader-pdf-core/src/main/java/org/opendataloader/pdf/hybrid/HybridClient.java, java/opendataloader-pdf-core/src/main/java/org/opendataloader/pdf/hybrid/HybridConfig.java, java/opendataloader-pdf-core/src/main/java/org/opendataloader/pdf/hybrid/DoclingFastServerClient.java, java/opendataloader-pdf-core/src/main/java/org/opendataloader/pdf/hybrid/HancomClient.java
- **Language SDKs, CLI, and Build/Operations**: importance `high`
  - source_paths: python/opendataloader-pdf/src/opendataloader_pdf/wrapper.py, python/opendataloader-pdf/src/opendataloader_pdf/runner.py, python/opendataloader-pdf/src/opendataloader_pdf/hybrid_server.py, python/opendataloader-pdf-mcp/src/opendataloader_pdf_mcp/server.py, node/opendataloader-pdf/src/index.ts

## Repo Inspection Evidence

- repo_clone_verified: true
- repo_inspection_verified: true
- repo_commit: `0448684162c27a27aa4838eec8fd42274ed60c08`
- inspected_files: `README.md`, `package.json`, `docs/hybrid/docling-speed-optimization-plan.md`, `docs/hybrid/experiments/chunking_strategy/conclusion.json`, `docs/hybrid/experiments/chunking_strategy/docling_benchmark_report.json`, `docs/hybrid/experiments/chunking_strategy/docling_page_range_benchmark.py`, `docs/hybrid/experiments/speed/baseline_results.json`, `docs/hybrid/experiments/speed/fastapi_results.json`, `docs/hybrid/experiments/speed/speed-experiment-2026-01-03.md`, `docs/hybrid/experiments/speed/subprocess_results.json`, `docs/hybrid/experiments/triage/triage-experiments.md`, `docs/hybrid/hybrid-mode-design.md`, `docs/hybrid/hybrid-mode-tasks.md`, `docs/hybrid/research/comparison-summary.md`, `docs/hybrid/research/docling-openapi.json`, `docs/hybrid/research/docling-sample-response-lorem.json`, `docs/hybrid/research/docling-sample-response.json`, `docs/hybrid/research/iobject-structure.md`, `docs/hybrid/research/opendataloader-sample-response.json`, `docs/hybrid/research/opendataloader-sample-response.md`

Host AI hard rules:
- Without repo_clone_verified=true, do not claim that the source code has been read.
- Without repo_inspection_verified=true, do not write README, docs, or package-file conclusions as facts.
- Without quick_start_verified=true, do not claim that the Quick Start path has run successfully.

## Doramagic Pitfall Constraints

These rules come from Doramagic discovery, validation, or compilation findings. The host AI must treat them as operating constraints, not background notes.

### Constraint 1: Configuration risk requires verification

- Trigger: Project evidence flags a configuration risk. Review the linked source before relying on this workflow.
- Host AI rule: Reproduce the official install and quickstart path in an isolated environment.
- Why it matters: May increase setup, validation, or first-run risk for the user.
- Evidence: community_evidence:github | https://github.com/opendataloader-project/opendataloader-pdf/issues/566
- Hard boundary: Do not present this pitfall as solved, verified, or ignorable unless later evidence explicitly closes it.

### Constraint 2: Capability evidence risk requires verification

- Trigger: Project evidence flags a capability evidence risk. Review the linked source before relying on this workflow.
- Host AI rule: Reproduce the official install and quickstart path in an isolated environment.
- Why it matters: May increase setup, validation, or first-run risk for the user.
- Evidence: community_evidence:github | https://github.com/opendataloader-project/opendataloader-pdf/issues/414
- Hard boundary: Do not present this pitfall as solved, verified, or ignorable unless later evidence explicitly closes it.

### Constraint 3: Runtime risk requires verification

- Trigger: Project evidence flags a runtime risk. Review the linked source before relying on this workflow.
- Host AI rule: Reproduce the official install and quickstart path in an isolated environment.
- Why it matters: May increase setup, validation, or first-run risk for the user.
- Evidence: community_evidence:github | https://github.com/opendataloader-project/opendataloader-pdf/issues/428
- Hard boundary: Do not present this pitfall as solved, verified, or ignorable unless later evidence explicitly closes it.

### Constraint 4: Installation risk requires verification

- Trigger: Project evidence flags a installation risk. Review the linked source before relying on this workflow.
- Host AI rule: Reproduce the official install and quickstart path in an isolated environment.
- Why it matters: May increase setup, validation, or first-run risk for the user.
- Evidence: community_evidence:github | https://github.com/opendataloader-project/opendataloader-pdf/issues/440
- Hard boundary: Do not present this pitfall as solved, verified, or ignorable unless later evidence explicitly closes it.

### Constraint 5: Installation risk requires verification

- Trigger: Project evidence flags a installation risk. Review the linked source before relying on this workflow.
- Host AI rule: Reproduce the official install and quickstart path in an isolated environment.
- Why it matters: May increase setup, validation, or first-run risk for the user.
- Evidence: community_evidence:github | https://github.com/opendataloader-project/opendataloader-pdf/issues/528
- Hard boundary: Do not present this pitfall as solved, verified, or ignorable unless later evidence explicitly closes it.

### Constraint 6: Installation risk requires verification

- Trigger: Project evidence flags a installation risk. Review the linked source before relying on this workflow.
- Host AI rule: Reproduce the official install and quickstart path in an isolated environment.
- Why it matters: May increase setup, validation, or first-run risk for the user.
- Evidence: community_evidence:github | https://github.com/opendataloader-project/opendataloader-pdf/issues/578
- Hard boundary: Do not present this pitfall as solved, verified, or ignorable unless later evidence explicitly closes it.

### Constraint 7: Installation risk requires verification

- Trigger: Project evidence flags a installation risk. Review the linked source before relying on this workflow.
- Host AI rule: Reproduce the official install and quickstart path in an isolated environment.
- Why it matters: May increase setup, validation, or first-run risk for the user.
- Evidence: community_evidence:github | https://github.com/opendataloader-project/opendataloader-pdf/issues/584
- Hard boundary: Do not present this pitfall as solved, verified, or ignorable unless later evidence explicitly closes it.

### Constraint 8: Configuration risk requires verification

- Trigger: Project evidence flags a configuration risk. Review the linked source before relying on this workflow.
- Host AI rule: Reproduce the official install and quickstart path in an isolated environment.
- Why it matters: May increase setup, validation, or first-run risk for the user.
- Evidence: community_evidence:github | https://github.com/opendataloader-project/opendataloader-pdf/issues/548
- Hard boundary: Do not present this pitfall as solved, verified, or ignorable unless later evidence explicitly closes it.

### Constraint 9: Capability evidence risk requires verification

- Trigger: README/documentation is current enough for a first validation pass.
- Host AI rule: Reproduce the official install and quickstart path in an isolated environment.
- Why it matters: May increase setup, validation, or first-run risk for the user.
- Evidence: capability.assumptions | https://github.com/opendataloader-project/opendataloader-pdf
- Hard boundary: Do not present this pitfall as solved, verified, or ignorable unless later evidence explicitly closes it.

### Constraint 10: Maintenance risk requires verification

- Trigger: Project evidence flags a maintenance risk. Review the linked source before relying on this workflow.
- Host AI rule: Reproduce the official install and quickstart path in an isolated environment.
- Why it matters: May increase setup, validation, or first-run risk for the user.
- Evidence: community_evidence:github | https://github.com/opendataloader-project/opendataloader-pdf/issues/581
- Hard boundary: Do not present this pitfall as solved, verified, or ignorable unless later evidence explicitly closes it.
