# PaddleOCR - Doramagic AI Context Pack

> Purpose: pre-work context for the user's host AI. This pack does not prove that the project has been installed, run, or validated.

## Project

- canonical_name: `PaddlePaddle/PaddleOCR`
- capability: Turn any PDF or image document into structured data for your AI. A powerful, lightweight OCR toolkit that bridges the gap between images/PDFs and LLMs. Supports 100+ languages.
- expected_user_outcome: Turn any PDF or image document into structured data for your AI. A powerful, lightweight OCR toolkit that bridges the gap between images/PDFs and LLMs. Supports 100+ languages.

## Operating Boundaries

- Do not claim that the project has been installed, run, called through an API, or used on local files unless separate evidence proves it.
- Project facts must come from repo evidence, Claim Graph, or explicit source references.
- When a capability is not verified, mark it as unverified instead of completing it as fact.
- publish_status: `publishable`
- blocking_gaps: none

---

## Doramagic Context Augmentation

The following sections strengthen the repository context for a host AI. Human Manual data is a reading route, and pitfall notes become operating constraints.

## Human Manual Outline

Usage rule: this is only a reading route and salience signal, not factual authority. Concrete claims must still return to repo evidence or Claim Graph.

Host AI hard rules:
- Do not treat page titles, section order, summaries, or importance values as factual project evidence.
- When explaining the Human Manual outline, state that it is only a reading route or salience signal.
- Capability, installation, compatibility, runtime state, and risk claims must cite repo evidence, source paths, or Claim Graph.

- **Repository Overview and System Architecture**: importance `high`
  - source_paths: README.md, paddleocr/__init__.py, paddleocr/_abstract.py, paddleocr/__main__.py, paddleocr/_env.py
- **Core Pipelines and Models (PP-OCR, PP-StructureV3, PaddleOCR-VL)**: importance `high`
  - source_paths: paddleocr/_pipelines/ocr.py, paddleocr/_pipelines/pp_structurev3.py, paddleocr/_pipelines/paddleocr_vl.py, paddleocr/_pipelines/doc_understanding.py, paddleocr/_pipelines/formula_recognition.py
- **Deployment, SDKs, and Integrations**: importance `high`
  - source_paths: paddleocr/_api_client/client.py, paddleocr/_api_client/async_client.py, paddleocr/_api_client/cli.py, paddleocr/_cli.py, deploy/cpp_infer/src/api/pipelines/ocr.cc
- **Configuration, Training, and Customization**: importance `high`
  - source_paths: ppocr/__init__.py, ppocr/data/__init__.py, ppocr/modeling/__init__.py, ppocr/optimizer/__init__.py, ppocr/losses/__init__.py

## Repo Inspection Evidence

- repo_clone_verified: true
- repo_inspection_verified: true
- repo_commit: `1af0448a200eae430b9addb40e7118d67f9840ab`
- inspected_files: `README.md`, `pyproject.toml`, `requirements.txt`, `docs/FAQ.en.md`, `docs/FAQ.md`, `docs/community/code_and_doc.en.md`, `docs/community/code_and_doc.md`, `docs/community/community_contribution.en.md`, `docs/community/community_contribution.md`, `docs/data_anno_synth/data_annotation.en.md`, `docs/data_anno_synth/data_annotation.md`, `docs/data_anno_synth/data_synthesis.en.md`, `docs/data_anno_synth/data_synthesis.md`, `docs/data_anno_synth/x_anylabeling.en.md`, `docs/data_anno_synth/x_anylabeling.md`, `docs/datasets/datasets.en.md`, `docs/datasets/datasets.md`, `docs/datasets/handwritten_datasets.en.md`, `docs/datasets/handwritten_datasets.md`, `docs/datasets/kie_datasets.en.md`

Host AI hard rules:
- Without repo_clone_verified=true, do not claim that the source code has been read.
- Without repo_inspection_verified=true, do not write README, docs, or package-file conclusions as facts.
- Without quick_start_verified=true, do not claim that the Quick Start path has run successfully.

## Doramagic Pitfall Constraints

These rules come from Doramagic discovery, validation, or compilation findings. The host AI must treat them as operating constraints, not background notes.

### Constraint 1: Installation risk requires verification

- Trigger: Developers should check this installation risk before relying on the project: Link Checker Report
- Host AI rule: Before packaging this project, run the relevant install/config/quickstart check for: Link Checker Report. Context: Observed when using python
- Why it matters: Developers may fail before the first successful local run: Link Checker Report
- Evidence: failure_mode_cluster:github_issue | https://github.com/PaddlePaddle/PaddleOCR/issues/18134, failure_mode_cluster:github_issue | https://github.com/PaddlePaddle/PaddleOCR/issues/18131, failure_mode_cluster:github_issue | https://github.com/PaddlePaddle/PaddleOCR/issues/18126, failure_mode_cluster:github_issue | https://github.com/PaddlePaddle/PaddleOCR/issues/18122, failure_mode_cluster:github_issue | https://github.com/PaddlePaddle/PaddleOCR/issues/18103
- Hard boundary: Do not present this pitfall as solved, verified, or ignorable unless later evidence explicitly closes it.

### Constraint 2: Installation risk requires verification

- Trigger: Project evidence flags a installation risk. Review the linked source before relying on this workflow.
- Host AI rule: Reproduce the official install and quickstart path in an isolated environment.
- Why it matters: May increase setup, validation, or first-run risk for the user.
- Evidence: failure_mode_cluster:github_issue | https://github.com/PaddlePaddle/PaddleOCR/issues/17974
- Hard boundary: Do not present this pitfall as solved, verified, or ignorable unless later evidence explicitly closes it.

### Constraint 3: Installation risk requires verification

- Trigger: Project evidence flags a installation risk. Review the linked source before relying on this workflow.
- Host AI rule: Reproduce the official install and quickstart path in an isolated environment.
- Why it matters: May increase setup, validation, or first-run risk for the user.
- Evidence: community_evidence:github | https://github.com/PaddlePaddle/PaddleOCR/issues/18157
- Hard boundary: Do not present this pitfall as solved, verified, or ignorable unless later evidence explicitly closes it.

### Constraint 4: Installation risk requires verification

- Trigger: Project evidence flags a installation risk. Review the linked source before relying on this workflow.
- Host AI rule: Reproduce the official install and quickstart path in an isolated environment.
- Why it matters: May increase setup, validation, or first-run risk for the user.
- Evidence: community_evidence:github | https://github.com/PaddlePaddle/PaddleOCR/issues/18194
- Hard boundary: Do not present this pitfall as solved, verified, or ignorable unless later evidence explicitly closes it.

### Constraint 5: Installation risk requires verification

- Trigger: Project evidence flags a installation risk. Review the linked source before relying on this workflow.
- Host AI rule: Reproduce the official install and quickstart path in an isolated environment.
- Why it matters: May increase setup, validation, or first-run risk for the user.
- Evidence: community_evidence:github | https://github.com/PaddlePaddle/PaddleOCR/issues/17974
- Hard boundary: Do not present this pitfall as solved, verified, or ignorable unless later evidence explicitly closes it.

### Constraint 6: Configuration risk requires verification

- Trigger: Developers should check this configuration risk before relying on the project: Link Checker Report
- Host AI rule: Before packaging this project, run the relevant install/config/quickstart check for: Link Checker Report. Context: Source discussion did not expose a precise runtime context.
- Why it matters: Developers may misconfigure credentials, environment, or host setup: Link Checker Report
- Evidence: failure_mode_cluster:github_issue | https://github.com/PaddlePaddle/PaddleOCR/issues/18157
- Hard boundary: Do not present this pitfall as solved, verified, or ignorable unless later evidence explicitly closes it.

### Constraint 7: Configuration risk requires verification

- Trigger: Developers should check this configuration risk before relying on the project: PaddleOCR-VL HPS: returnMarkdownImages=false is ineffective with default PaddleX 3.6 SDK
- Host AI rule: Before packaging this project, run the relevant install/config/quickstart check for: PaddleOCR-VL HPS: returnMarkdownImages=false is ineffective with default PaddleX 3.6 SDK. Context: Observed when using python, docker
- Why it matters: Developers may misconfigure credentials, environment, or host setup: PaddleOCR-VL HPS: returnMarkdownImages=false is ineffective with default PaddleX 3.6 SDK
- Evidence: failure_mode_cluster:github_issue | https://github.com/PaddlePaddle/PaddleOCR/issues/18194
- Hard boundary: Do not present this pitfall as solved, verified, or ignorable unless later evidence explicitly closes it.

### Constraint 8: Capability evidence risk requires verification

- Trigger: README/documentation is current enough for a first validation pass.
- Host AI rule: Reproduce the official install and quickstart path in an isolated environment.
- Why it matters: May increase setup, validation, or first-run risk for the user.
- Evidence: capability.assumptions | https://github.com/PaddlePaddle/PaddleOCR
- Hard boundary: Do not present this pitfall as solved, verified, or ignorable unless later evidence explicitly closes it.

### Constraint 9: Maintenance risk requires verification

- Trigger: Project evidence flags a maintenance risk. Review the linked source before relying on this workflow.
- Host AI rule: Reproduce the official install and quickstart path in an isolated environment.
- Why it matters: May increase setup, validation, or first-run risk for the user.
- Evidence: evidence.maintainer_signals | https://github.com/PaddlePaddle/PaddleOCR
- Hard boundary: Do not present this pitfall as solved, verified, or ignorable unless later evidence explicitly closes it.

### Constraint 10: Security or permission risk requires verification

- Trigger: no_demo
- Host AI rule: Reproduce the official install and quickstart path in an isolated environment.
- Why it matters: May increase setup, validation, or first-run risk for the user.
- Evidence: downstream_validation.risk_items | https://github.com/PaddlePaddle/PaddleOCR
- Hard boundary: Do not present this pitfall as solved, verified, or ignorable unless later evidence explicitly closes it.
