Doramagic Project Pack · Human Manual
PaddleOCR
Turn any PDF or image document into structured data for your AI. A powerful, lightweight OCR toolkit that bridges the gap between images/PDFs and LLMs. Supports 100+ languages.
Repository Overview and System Architecture
Related topics: Core Pipelines and Models (PP-OCR, PP-StructureV3, PaddleOCR-VL), Deployment, SDKs, and Integrations
Continue reading this section for the full explanation and source context.
Continue reading this section for the full explanation and source context.
Continue reading this section for the full explanation and source context.
Continue reading this section for the full explanation and source context.
Related Pages
Related topics: Core Pipelines and Models (PP-OCR, PP-StructureV3, PaddleOCR-VL), Deployment, SDKs, and Integrations
Repository Overview and System Architecture
1. Purpose and Scope
PaddleOCR is a multilingual, production-grade OCR and document-parsing toolkit built on top of PaddlePaddle. As described in the top-level README.md, the project "converts PDF documents and images into structured, LLM-ready data (JSON/Markdown) with industry-leading accuracy," and is positioned as the bedrock for RAG and Agentic applications, with 70k+ stars and adoption by projects such as Dify, RAGFlow, and Cherry Studio.
The repository organizes the system into three concentric layers:
- Algorithm core — PP-OCR family, PP-Structure, and the PaddleOCR-VL vision-language models.
- Engine layer — Python, C++, Paddle-Lite, ONNX, and PaddleServing inference stacks.
- SDK layer — first-party clients for Python, TypeScript, Go, and a browser bundle (
paddleocr-js).
This separation allows the same model zoo to be reused across research notebooks, server-side services, and edge devices.
2. Capability Pillars
2.1 Scene OCR (PP-OCRv6)
The PP-OCR series is the global multilingual text-spotting flagship. According to the release notes embedded in README.md, PP-OCRv6 "supports 50 languages with a single unified" model and is the default text_type: general pipeline. The C++ reference configuration in deploy/cpp_infer/src/configs/OCR.yaml shows the canonical end-to-end recipe: a DocPreprocessor sub-pipeline (orientation + unwarping) followed by TextDetection (PP-OCRv6_medium_det), TextLineOrientation, and TextRecognition (PP-OCRv6_medium_rec).
2.2 Intelligent Document Parsing (PP-Structure & PaddleOCR-VL)
The repository exposes two complementary document-parsing approaches, both summarized in README.md:
- PP-StructureV3 — structure-aware conversion into Markdown or JSON, preserving cell-level coordinates.
- PaddleOCR-VL-1.6 (0.9B) — a NaViT-style dynamic-resolution VLM fused with ERNIE-4.5-0.3B, achieving 96.3% accuracy on OmniDocBench v1.6 and supporting 109–111 languages depending on minor version.
The structural sub-modules live under ppstructure/:
| Sub-module | Path | Purpose | Source |
|---|---|---|---|
| Layout analysis | ppstructure/layout/ | Region segmentation (text/title/figure/table) via PP-PicoDet | ppstructure/layout/README.md |
| Layout recovery | ppstructure/recovery/ | Restore images/PDFs into editable Word files | ppstructure/recovery/README.md |
| KIE | ppstructure/kie/ | Key Information Extraction via VI-LayoutXLM (SER + RE) | ppstructure/kie/README.md |
2.3 Community-driven demand
The community evidence makes it clear which capabilities users push the hardest on. Issue #1048 "Multilingual OCR Development Plan" (72 comments) drove the consolidation toward a single multi-language model in PP-OCRv6. Issue #1663 discusses text-detection cropping padding — the very issue that motivates the limit_side_len, max_side_limit, and unclip parameters that appear in the C++ YAML above.
3. Deployment Topology
The deploy/README.md lists five official deployment schemes: Python inference, C++ inference, PaddleServing, Paddle-Lite (ARM CPU/OpenCL ARM GPU), and Paddle2ONNX. Each scheme consumes the same YAML-driven pipeline definition (see the pipeline_name: OCR example in deploy/cpp_infer/src/configs/OCR.yaml) but is compiled against a different runtime:
- C++ Inference — fastest server-side path, uses PaddleInference + TensorRT.
- Paddle-Lite — mobile/IoT path documented in deploy/lite/readme.md, which targets ARM7/ARM8 phones and depends on cross-compilation toolchains.
- Paddle2ONNX — produces interoperable ONNX models for non-Paddle runtimes.
flowchart LR
A[Image / PDF] --> B[DocPreprocessor]
B --> C[TextDetection]
C --> D[TextLineOrientation]
D --> E[TextRecognition]
E --> F[Structured Output]
subgraph "Optional post-processing"
F --> G[PP-StructureV3]
F --> H[PaddleOCR-VL]
F --> I[KIE: SER + RE]
end4. SDK and Multi-Language Bindings
The api_sdk/ directory contains the official server/client SDKs that wrap the HTTP inference API. Per api_sdk/README.md, the supported languages and their source locations are:
| Language | Source location | Notes |
|---|---|---|
| Python | paddleocr/ | Reference SDK, tested via pytest tests/api_client/ |
| TypeScript | api_sdk/typescript/ | Node ≥ 18, built with tsup, tested with vitest (package.json) |
| Go | api_sdk/go/ | Tested via go test ./... |
A separate browser-oriented bundle lives in paddleocr-js/. Its package.json declares vitest, eslint, and prettier tooling, indicating it is a published client library intended for front-end integration with the official API endpoint rather than an in-browser inference runtime.
The SDK layer intentionally decouples clients from model evolution: the server may upgrade from PP-OCRv5 to PP-OCRv6 (as it did between v3.6 and v3.7.0) without breaking the TypeScript or Go clients as long as the JSON contract is preserved. Recent issues such as #18194 (PaddleOCR-VL HPS — returnMarkdownImages=false ineffective with default PaddleX 3.6 SDK) confirm that contract drift between the PaddleX inference SDK and the hosted API is an active integration risk worth tracking when pinning versions.
5. Configuration Reference (PP-OCR C++ pipeline)
The single most representative configuration in the repo is the C++ inference YAML for PP-OCR. Excerpted from deploy/cpp_infer/src/configs/OCR.yaml:
| Section | Key | Value | Purpose |
|---|---|---|---|
| Top level | text_type | general | Selects the OCR pipeline |
DocPreprocessor | use_doc_orientation_classify | True | Enables PP-LCNet_x1_0_doc_ori |
DocPreprocessor | use_doc_unwarping | True | Enables UVDoc |
TextDetection | model_name | PP-OCRv6_medium_det | Default detector |
TextDetection | limit_side_len / max_side_limit | 64 / 4000 | Bounds long-side resizing — directly addresses the cropping-padding concern raised in #1663 |
TextDetection | thresh / box_thresh / unclip_ratio | 0.3 / 0.6 / 1.5 | Standard DB++ post-processing |
TextRecognition | model_name | PP-OCRv6_medium_rec | Default recognizer |
TextRecognition | batch_size | 6 | Throughput knob |
TextRecognition | score_thresh | 0.0 | Discard low-confidence text |
Every model_dir: null entry means PaddleX will resolve the artifact from its model zoo at first run, which is the convention all other YAMLs in the project follow.
See Also
- PP-OCRv6 Release Notes (v3.7.0) — accuracy and multilingual details.
- PP-StructureV3 & PaddleOCR-VL-1.6 — document-parsing flagship models.
- Deployment Overview — Python, C++, Serving, Lite, and ONNX paths.
- Official API SDKs — Python / TypeScript / Go clients.
- KIE Guide — SER + RE for form understanding.
Source: https://github.com/PaddlePaddle/PaddleOCR / Human Manual
Core Pipelines and Models (PP-OCR, PP-StructureV3, PaddleOCR-VL)
Related topics: Repository Overview and System Architecture, Deployment, SDKs, and Integrations, Configuration, Training, and Customization
Continue reading this section for the full explanation and source context.
Related Pages
Related topics: Repository Overview and System Architecture, Deployment, SDKs, and Integrations, Configuration, Training, and Customization
Core Pipelines and Models (PP-OCR, PP-StructureV3, PaddleOCR-VL)
1. Purpose and Scope
PaddleOCR exposes three primary solution families through its Python pipeline layer, each addressing a different class of document-understanding workload:
- PP-OCRv6 — a fast, multilingual scene-text spotting stack optimized for ~34.5M-parameter inference and 50+ languages in a single unified model (Source: README.md:0-0).
- PP-StructureV3 — a structure-aware converter that turns complex PDFs and images into Markdown or JSON with fine-grained coordinates (table cells, text blocks) (Source: README.md:0-0).
- PaddleOCR-VL (0.9B) — the flagship vision-language model for document parsing, achieving 96.3% on OmniDocBench v1.6 with structured Markdown/JSON output (Source: README.md:0-0).
The implementation surface is the paddleocr/_pipelines/ package, which contains thin orchestration classes — ocr.py, pp_structurev3.py, paddleocr_vl.py, plus auxiliary modules such as doc_understanding.py, formula_recognition.py, and seal_recognition.py. These pipelines share a common input/output contract so users can switch between them without rewriting client code.
2. Pipeline Architecture
The three pipelines are complementary rather than overlapping. The relationship is shown below.
graph LR
A[Image / PDF Input] --> B{Use case}
B -->|Scene text| C[PP-OCRv6]
B -->|Structured PDF / layout| D[PP-StructureV3]
B -->|VLM parsing| E[PaddleOCR-VL]
C --> F[Text + boxes]
D --> G[Markdown / JSON + cells]
E --> H[Markdown / JSON elements]
D -. layout .-> I[ppstructure/layout]
D -. table .-> J[ppstructure/table]
D -. recovery .-> K[ppstructure/recovery]
D -. KIE .-> L[ppstructure/kie]PP-OCRv6 is the high-throughput path for plain text extraction. PP-StructureV3 composes four PP-Structure subsystems — layout analysis (Source: ppstructure/layout/README.md:0-0), table recognition (Source: ppstructure/table/README.md:0-0), layout recovery (Source: ppstructure/recovery/README.md:0-0), and Key Information Extraction (Source: ppstructure/kie/README.md:0-0) — to produce document-level Markdown/JSON with explicit cell and block coordinates. PaddleOCR-VL collapses detection, recognition, layout, table, and formula tasks into a single end-to-end model when maximum accuracy on irregular layouts is required.
3. PP-OCRv6 Configuration
The canonical C++ configuration mirrors the Python pipeline and exposes every module name as a tunable parameter (Source: deploy/cpp_infer/src/configs/OCR.yaml:0-0).
| Module | Default model | Key knobs |
|---|---|---|
| DocOrientationClassify | PP-LCNet_x1_0_doc_ori | toggled via use_doc_preprocessor |
| DocUnwarping | UVDoc | toggled via use_doc_preprocessor |
| TextDetection | PP-OCRv6_medium_det | thresh, box_thresh, unclip_ratio, limit_side_len |
| TextLineOrientation | PP-LCNet_x1_0_textline_ori | use_textline_orientation, batch_size |
| TextRecognition | PP-OCRv6_medium_rec | score_thresh, batch_size |
Two top-level flags control the doc preprocessor and textline orientation stages, so a deployment can disable orientation handling for already-clean scans without removing the YAML keys. The same composition is reflected in paddleocr/_pipelines/ocr.py, which is the Python entry point exposed to users (Source: paddleocr/_pipelines/ocr.py:0-0). A common production failure mode reported by the community is empty recognition output when the preprocessor strips content that the detector expects — tightening box_thresh and unclip_ratio, or disabling use_textline_orientation, is the documented workaround (cf. community issue: "图片识别没有文字输出", #17974).
4. PP-StructureV3 and Auxiliary Pipelines
pp_structurev3.py is the orchestrator that wires the four ppstructure/* submodules into a single end-to-end document-parsing call (Source: paddleocr/_pipelines/pp_structurev3.py:0-0). Its main inputs are an image or PDF directory, model directories for layout/table/KIE, and dictionary paths; outputs are Markdown plus an HTML table string and per-element JSON (Source: ppstructure/recovery/README.md:0-0).
Specialized pipelines complement it:
doc_understanding.py— language-model-based semantic parsing of detected regions.formula_recognition.py— converts mathematical expressions to LaTeX.seal_recognition.py— handles stamp / seal text extraction, a capability highlighted in the v3.4.0 release notes (Source: README.md:0-0).
KIE is built on top of LayoutXLM and VI-LayoutXLM, supporting Semantic Entity Recognition (SER) and Relation Extraction (RE), and integrates the PP-OCR inference engine for OCR preprocessing (Source: ppstructure/kie/README.md:0-0). On the Chinese XFUND benchmark, VI-LayoutXLM reaches 93.19% Hmean at 15.49 ms / image (Source: ppstructure/kie/README.md:0-0).
5. PaddleOCR-VL and the v3.7.0 Stack
paddleocr_vl.py wraps the PaddleOCR-VL-0.9B model, which combines a NaViT-style dynamic-resolution visual encoder with the ERNIE-4.5-0.3B language model to handle text, tables, formulas, and charts in 109+ languages (Source: README.md:0-0). The model is the recommended default when users need unified element recognition without per-task model switching.
The v3.7.0 release notes (June 2026) highlight that PP-OCRv6 now achieves +4.6% detection and +5.1% recognition improvements over PP-OCRv5_server while "surpassing mainstream VLMs (Qwen3-VL-235B, GPT-5.5) with only 34.5M parameters" — a positioning explicitly aimed at users who previously assumed VLMs were always superior (Source: README.md:0-0). A known incompatibility is that returnMarkdownImages=false is currently ineffective under the default PaddleX 3.6 SDK (community issue #18194), so callers relying on HPS output must pin a compatible SDK version until the bug is closed.
6. Deployment and SDK Surface
PaddleOCR ships multiple runtime targets so the same pipeline can be reached from different stacks (Source: deploy/README.md:0-0):
- Python inference via the
paddleocrpackage. - C++ inference configured through
deploy/cpp_infer/src/configs/OCR.yaml(Source: deploy/cpp_infer/src/configs/OCR.yaml:0-0). - HubServing exposing nine service modules on ports 8865–8872, including
ocr_det,ocr_cls,ocr_rec,ocr_system,structure_table,structure_system,structure_layout,kie_ser, andkie_ser_re(Source: deploy/hubserving/readme.md:0-0). - Official API SDKs in Python, TypeScript (
api_sdk/typescript/package.json, Node ≥ 18, Apache-2.0), and Go (Source: api_sdk/README.md:0-0 and api_sdk/typescript/package.json:0-0).
When choosing among the three core pipelines, the practical rule of thumb is: use PP-OCRv6 when speed and language breadth matter most; use PP-StructureV3 when downstream consumers need cell-level coordinates, recoverable Word output, or KIE; use PaddleOCR-VL when document layouts are highly irregular (skewed, warped, photographed) and structured Markdown is the primary deliverable (Source: README.md:0-0).
See Also
- PP-OCR Deployment Guide: deploy/README.md
- Layout Analysis Module: ppstructure/layout/README.md
- Table Recognition Module: ppstructure/table/README.md
- Layout Recovery Module: ppstructure/recovery/README.md
- Key Information Extraction: ppstructure/kie/README.md
- Official API SDKs: api_sdk/README.md
- HubServing Module Reference: deploy/hubserving/readme.md
Source: https://github.com/PaddlePaddle/PaddleOCR / Human Manual
Deployment, SDKs, and Integrations
Related topics: Repository Overview and System Architecture, Core Pipelines and Models (PP-OCR, PP-StructureV3, PaddleOCR-VL)
Continue reading this section for the full explanation and source context.
Continue reading this section for the full explanation and source context.
Continue reading this section for the full explanation and source context.
Continue reading this section for the full explanation and source context.
Related Pages
Related topics: Repository Overview and System Architecture, Core Pipelines and Models (PP-OCR, PP-StructureV3, PaddleOCR-VL)
Deployment, SDKs, and Integrations
1. Overview
PaddleOCR is a multilingual, document-parsing OCR toolkit that converts PDFs and images into structured, LLM-ready Markdown or JSON. Beyond its core inference engines, the project ships a layered deployment and integration surface that targets three audiences: server-side integrators who need REST or gRPC serving, application developers who consume Python/TypeScript/Go/JavaScript SDKs, and edge/mobile teams that deploy via Paddle-Lite or native Android. Source: README.md.
The repository organizes this surface into five concrete sub-trees: deploy/ for native and serving targets, api_sdk/ for the official PaddleOCR Cloud API client packages, paddleocr-js/ for the browser-oriented JavaScript client, ppstructure/ for downstream document-AI modules, and deploy/android_demo/ for the on-device Android sample.
2. Server-Side Deployment
2.1 Paddle Deployment Matrix
PaddleOCR supports a range of server-side deployment options through the deploy/ directory. According to deploy/README.md, the supported schemes are:
| Deployment Target | Use Case | Source Path |
|---|---|---|
| Python inference | Quick prototyping, batch scripts | doc/doc_en/inference_ppocr_en.md |
| C++ inference | High-throughput production servers | deploy/cpp_infer/readme.md |
| Paddle Serving (Python/C++) | REST/gRPC microservice | deploy/pdserving/README.md |
| Paddle2ONNX | Export to ONNX for cross-framework use | deploy/paddle2onnx/readme.md |
| Paddle-Lite | ARM CPU / OpenCL ARM GPU | deploy/lite/readme.md |
The deployment overview explicitly notes that the PaddlePaddle runtime "provides a variety of deployment schemes to meet the deployment requirements of different scenarios" and refers users to the diagram at ../doc/deployment_en.png for selection guidance. Source: deploy/README.md.
2.2 Paddle-Lite Mobile Path
For on-device deployment, deploy/lite/readme.md describes a two-phase flow: (1) prepare a cross-compilation environment (Docker, Linux, or other supported toolchains) and a Paddle-Lite toolchain, then (2) optimize the inference model with Paddle-Lite's converter and run the resulting model on an ARM7/ARM8 phone. Paddle-Lite itself is positioned as "a lightweight inference engine for PaddlePaddle" that targets mobile and IoT form factors, supporting cross-platform hardware acceleration. Source: deploy/lite/readme.md.
3. Official API SDKs
3.1 Multi-Language Client Packages
The api_sdk/ directory hosts the first-party SDKs that wrap the hosted PaddleOCR Cloud API. The package locations are summarized in api_sdk/README.md:
| Language | Source Location | User Docs |
|---|---|---|
| Python | ../paddleocr | docs/version3.x/inference_deployment/serving/paddleocr_official_api/python.md |
| TypeScript | api_sdk/typescript | docs/version3.x/inference_deployment/serving/paddleocr_official_api/typescript.md |
| Go | api_sdk/go | docs/version3.x/inference_deployment/serving/paddleocr_official_api/go.md |
Each language binding is validated through its own test runner: python -m pytest tests/api_client/, npm run lint && npm test for TypeScript, and go test ./... for Go. Source: api_sdk/README.md.
3.2 TypeScript and JavaScript Build Profiles
The TypeScript SDK is built with tsup and typed against @types/node ^25.9.1 on Node >=18, with vitest as its test runner. It targets the paddleocr keyword space covering ocr, document-parsing, api-sdk, typescript, and official-api. Source: api_sdk/typescript/package.json.
The browser-oriented paddleocr-js/ package uses vitest ^3.2.4 for testing, eslint with typescript-eslint ^8.57.2 for linting, and prettier ^3.8.1 for formatting, with lint-staged configured to run eslint --fix and prettier --write on staged files. Source: paddleocr-js/package.json.
4. Edge and Mobile: Android Demo
The Android sample under deploy/android_demo/ ships a native C++ pipeline that performs polygon clipping for text-region processing. The C++ source wraps a translated Delphi Clipper library, exposed via ocr_clipper.hpp with the namespace ClipperLib and version string CLIPPER_VERSION "6.4.2". Source: deploy/android_demo/app/src/main/cpp/ocr_clipper.hpp.
The companion ocr_clipper.cpp defines the supporting scanline data structures (TEdge, IntPoint), winding rules (ctIntersection, ctUnion, ctDifference, ctXor), and constants such as pi = 3.141592653589793238 and def_arc_tolerance = 0.25. These primitives are the geometric foundation that the on-device pipeline uses to merge, intersect, or offset text polygons before recognition. Source: deploy/android_demo/app/src/main/cpp/ocr_clipper.cpp.
5. PP-Structure Downstream Modules
The ppstructure/ tree extends PaddleOCR into document-AI workflows and is tightly coupled to deployment, since the same pipelines can be served through the Python inference or C++ paths.
- Layout analysis provides Chinese, English, and table-region detection built on PaddleDetection's PP-PicoDet. Models are available in
ppstructure/docs/models_list_en.md, and the README documents the PubLayNet and CDLA pre-training data download commands. Source: ppstructure/layout/README.md. - Key Information Extraction (KIE) combines text detection, text recognition, semantic entity recognition (SER), and optional relationship extraction (RE) on top of the VI-LayoutXLM backbone, with pretrained models published in
configs/kie/layoutlm_series/. Source: ppstructure/kie/README.md. - Layout recovery offers two strategies for restoring an editable Word file: a
pdf2docx-based path for standard PDFs and an image-format PDF path that combines layout analysis, table recognition, and rule-based parsing. Source: ppstructure/recovery/README.md.
6. Ecosystem Integrations
PaddleOCR is consumed by several top-tier open-source projects; the README badges list RAGFlow (deep document understanding), Pathway (real-time analytics and LLM pipelines), MinerU (multi-type document to Markdown), Umi-OCR (batch offline OCR), Cherry Studio (multi-LLM desktop client), and Haystack (deepset's RAG framework). These integrations typically consume the Python wheel directly or the PaddleOCR-VL/PP-OCRv6 model checkpoints, depending on the host project's deployment shape. Source: README.md.
7. Common Failure Modes
Community-reported issues that intersect with the deployment and SDK surface include:
- PaddleOCR-VL HPS option ignored on PaddleX 3.6:
returnMarkdownImages=falsedoes not take effect with the default PaddleX 3.6 SDK, requiring either a SDK upgrade or a workaround. Source: Issue #18194. - No text output for image input: Symptom of misconfigured detection or recognition parameters at the SDK or serving layer. Source: Issue #17974.
- Windows + torch compatibility:
OSError [WinError 127]when installing torch on Windows, which is a prerequisite for some PaddleOCR-VL pipelines. Source: Issue #14979. - Detection crop padding sensitivity: Long detection crops with large surrounding padding (≈5 px) degrade recognition; a tighter 1–2 px bounding box via OpenCV post-processing is the community-recommended workaround. Source: Issue #1663.
8. See Also
Source: https://github.com/PaddlePaddle/PaddleOCR / Human Manual
Configuration, Training, and Customization
Related topics: Core Pipelines and Models (PP-OCR, PP-StructureV3, PaddleOCR-VL), Deployment, SDKs, and Integrations
Continue reading this section for the full explanation and source context.
Related Pages
Related topics: Core Pipelines and Models (PP-OCR, PP-StructureV3, PaddleOCR-VL), Deployment, SDKs, and Integrations
Configuration, Training, and Customization
Overview and Scope
PaddleOCR is a multilingual OCR and document-parsing toolkit that ships a layered configuration and training system. Users can adopt pretrained models out of the box, or retrain and customize virtually every component — text detection, recognition, layout analysis, table recognition, key information extraction (KIE), and VLM-based parsing — to fit domain-specific data. The customization surface is exposed through three primary channels: YAML pipeline definitions, configuration files for individual modules, and per-language scripts under ppstructure/ Source: [README.md].
The project supports PP-OCRv6, PaddleOCR-VL, and PP-StructureV3 as headline models, and provides unified configuration paths for them. Customization typically follows a "config first, then train, then deploy" pattern.
Pipeline Configuration
PaddleOCR's production pipeline is described by a single YAML file that maps model names, module names, and hyperparameters. The canonical example is the C++ inference configuration Source: [deploy/cpp_infer/src/configs/OCR.yaml]:
pipeline_name: OCR
text_type: general
use_doc_preprocessor: True
use_textline_orientation: True
SubPipelines:
DocPreprocessor:
pipeline_name: doc_preprocessor
use_doc_orientation_classify: True
use_doc_unwarping: True
SubModules:
DocOrientationClassify:
module_name: doc_text_orientation
model_name: PP-LCNet_x1_0_doc_ori
DocUnwarping:
module_name: image_unwarping
model_name: UVDoc
SubModules:
TextDetection:
module_name: text_detection
model_name: PP-OCRv6_medium_det
limit_side_len: 64
limit_type: min
thresh: 0.3
box_thresh: 0.6
unclip_ratio: 1.5
TextRecognition:
module_name: text_recognition
model_name: PP-OCRv6_medium_rec
batch_size: 6
score_thresh: 0.0
Key configuration patterns observed in the YAML:
| Field | Purpose | Example Value |
|---|---|---|
pipeline_name | Declares the high-level pipeline | OCR, doc_preprocessor |
use_doc_preprocessor | Toggles orientation classification + unwarping | True |
model_name | Selects a pretrained model checkpoint | PP-OCRv6_medium_det |
module_name | Maps a model to its runtime module | text_detection |
limit_side_len / thresh / box_thresh | Detection hyper-parameters | 64, 0.3, 0.6 |
unclip_ratio | Expansion ratio for detected polygons | 1.5 |
batch_size / score_thresh | Recognition throughput and confidence gate | 6, 0.0 |
Swapping model_name is the primary way to switch between server, mobile, and multilingual variants. Setting model_dir: null defers model resolution to the runtime, while a populated model_dir overrides the default download Source: [deploy/cpp_infer/src/configs/OCR.yaml:1-39].
flowchart LR
A[YAML Pipeline] --> B[DocPreprocessor]
B --> C[TextDetection]
C --> D[TextLineOrientation]
D --> E[TextRecognition]
E --> F[Structured Output]
G[Custom model_dir] --> C
G --> ETraining Workflows
PaddleOCR exposes a uniform "download pretrained weights → prepare data → train → export → infer" loop. Each sub-module follows it.
Layout Analysis. Training relies on PaddleDetection's PP-PicoDet backbone. The repository documents pretrained downloads such as picodet_lcnet_x1_0_fgd_layout.pdparams for the PubLayNet dataset, and notes that Chinese CDLA and table-specific variants exist for other document types. FGD distillation is supported for accuracy improvements Source: [ppstructure/layout/README.md].
Key Information Extraction (KIE). The KIE pipeline extends layout analysis with semantic entity recognition (SER) and relationship extraction (RE). The repository ships LayoutXLM and VI-LayoutXLM configurations under configs/kie/, with a re_layoutxlm_xfund_zh.yml example reported at 74.83% accuracy. Customization paths include UDML knowledge distillation and textline sorting to fit reading order Source: [ppstructure/kie/README.md].
Layout Recovery. For PDF-to-Word recovery, two custom strategies are available: a rule-based pdf2docx path for standard PDFs, and an image-driven path that combines layout analysis, table recognition, and unwarping for image-based PDFs. Users can choose between them based on input format Source: [ppstructure/recovery/README.md].
Customization and Deployment Surfaces
Beyond core training, PaddleOCR is customizable along several axes:
- Multilingual switching. A single PP-OCRv6 model supports 50 languages (Chinese, English, Japanese, and 46 Latin-script languages), removing the need to swap checkpoints per locale Source: [README.md].
- VLM parsing. PaddleOCR-VL integrates a NaViT-style visual encoder with ERNIE-4.5-0.3B. PaddleOCR-VL-1.5 reaches 94.5% on OmniDocBench, supports 111 languages, and adds PP-DocLayoutV3 for irregular layouts (skew, warping, scanning, illumination, screen photography).
- Deployment targets. Customization extends to deployment: Python inference, C++ inference (
deploy/cpp_infer), Paddle Serving, Paddle-Lite for ARM/OpenCL, and Paddle2ONNX for cross-framework export Source: [deploy/README.md]. - Mobile deployment. Paddle-Lite requires cross-compilation toolchains, then Paddle-Lite's model optimization, and finally a phone-side runner. The documentation walks through each step in Source: [deploy/lite/readme.md].
- API SDKs. Official SDKs in Python, TypeScript, and Go enable service integration. The TypeScript SDK requires Node ≥ 18 and bundles tsup/vitest tooling Source: [api_sdk/README.md, api_sdk/typescript/package.json].
Common Failure Modes from the Community
Two patterns from community discussions are worth flagging when customizing:
- Border/whitespace sensitivity in recognition. Issue #1663 reports that when detection crops carry wide (≈5px) borders, recognition accuracy degrades noticeably compared to tight 1–2px crops, because training data was synthesized with tight borders. The proposed mitigation is to post-process detected crops (e.g., re-crop to a tight bounding rectangle) before recognition.
- Silent recognition failures. Issue #17974 documents cases where images yield no text output, often traced to pipeline configuration (e.g.,
use_textline_orientationdisabled, aggressivescore_thresh, or an inappropriatelimit_side_lenfor tiny text). Verifying the YAML and lowering thresholds typically restores output. - SDK/HPS parameter drift. Issue #18194 reports that the PaddleOCR-VL HPS option
returnMarkdownImages=falseis ignored under the default PaddleX 3.6 SDK, illustrating that SDK-side configuration must be validated against the installed runtime, not just the latest docs.
See Also
- PaddleOCR-VL and PaddleOCR-VL-1.5 release notes — flagship VLM-based document parsing
- PP-OCRv6 architecture — unified multilingual OCR engine
- PP-StructureV3 — structure-aware Markdown/JSON conversion with cell-level coordinates
- deploy/README.md — deployment options matrix
- api_sdk/README.md — multi-language SDK layout
Source: https://github.com/PaddlePaddle/PaddleOCR / Human Manual
Doramagic Pitfall Log
Source-linked risks stay visible on the manual page so the preview does not read like a recommendation.
Developers may fail before the first successful local run: Link Checker Report
May increase setup, validation, or first-run risk for the user.
May increase setup, validation, or first-run risk for the user.
May increase setup, validation, or first-run risk for the user.
Doramagic Pitfall Log
Found 14 structured pitfall item(s), including 0 high/blocking item(s). Top priority: Installation risk - Installation risk requires verification.
1. Installation risk: Installation risk requires verification
- Severity: medium
- Finding: Developers should check this installation risk before relying on the project: Link Checker Report
- User impact: Developers may fail before the first successful local run: Link Checker Report
- Recommended check: Before packaging this project, run the relevant install/config/quickstart check for: Link Checker Report. Context: Observed when using python
- Evidence: failure_mode_cluster:github_issue | https://github.com/PaddlePaddle/PaddleOCR/issues/18134
2. Installation risk: Installation risk requires verification
- Severity: medium
- Finding: Project evidence flags a installation risk. Review the linked source before relying on this workflow.
- User impact: May increase setup, validation, or first-run risk for the user.
- Recommended check: Reproduce the official install and quickstart path in an isolated environment.
- Evidence: failure_mode_cluster:github_issue | https://github.com/PaddlePaddle/PaddleOCR/issues/17974
3. Installation risk: Installation risk requires verification
- Severity: medium
- Finding: Project evidence flags a installation risk. Review the linked source before relying on this workflow.
- User impact: May increase setup, validation, or first-run risk for the user.
- Recommended check: Reproduce the official install and quickstart path in an isolated environment.
- Evidence: community_evidence:github | https://github.com/PaddlePaddle/PaddleOCR/issues/18157
4. Installation risk: Installation risk requires verification
- Severity: medium
- Finding: Project evidence flags a installation risk. Review the linked source before relying on this workflow.
- User impact: May increase setup, validation, or first-run risk for the user.
- Recommended check: Reproduce the official install and quickstart path in an isolated environment.
- Evidence: community_evidence:github | https://github.com/PaddlePaddle/PaddleOCR/issues/18194
5. Installation risk: Installation risk requires verification
- Severity: medium
- Finding: Project evidence flags a installation risk. Review the linked source before relying on this workflow.
- User impact: May increase setup, validation, or first-run risk for the user.
- Recommended check: Reproduce the official install and quickstart path in an isolated environment.
- Evidence: community_evidence:github | https://github.com/PaddlePaddle/PaddleOCR/issues/17974
6. Configuration risk: Configuration risk requires verification
- Severity: medium
- Finding: Developers should check this configuration risk before relying on the project: Link Checker Report
- User impact: Developers may misconfigure credentials, environment, or host setup: Link Checker Report
- Recommended check: Before packaging this project, run the relevant install/config/quickstart check for: Link Checker Report. Context: Source discussion did not expose a precise runtime context.
- Evidence: failure_mode_cluster:github_issue | https://github.com/PaddlePaddle/PaddleOCR/issues/18157
7. Configuration risk: Configuration risk requires verification
- Severity: medium
- Finding: Developers should check this configuration risk before relying on the project: PaddleOCR-VL HPS: returnMarkdownImages=false is ineffective with default PaddleX 3.6 SDK
- User impact: Developers may misconfigure credentials, environment, or host setup: PaddleOCR-VL HPS: returnMarkdownImages=false is ineffective with default PaddleX 3.6 SDK
- Recommended check: Before packaging this project, run the relevant install/config/quickstart check for: PaddleOCR-VL HPS: returnMarkdownImages=false is ineffective with default PaddleX 3.6 SDK. Context: Observed when using python, docker
- Evidence: failure_mode_cluster:github_issue | https://github.com/PaddlePaddle/PaddleOCR/issues/18194
8. Capability evidence risk: Capability evidence risk requires verification
- Severity: medium
- Finding: README/documentation is current enough for a first validation pass.
- User impact: May increase setup, validation, or first-run risk for the user.
- Recommended check: Reproduce the official install and quickstart path in an isolated environment.
- Evidence: capability.assumptions | https://github.com/PaddlePaddle/PaddleOCR
9. Maintenance risk: Maintenance risk requires verification
- Severity: medium
- Finding: Project evidence flags a maintenance risk. Review the linked source before relying on this workflow.
- User impact: May increase setup, validation, or first-run risk for the user.
- Recommended check: Reproduce the official install and quickstart path in an isolated environment.
- Evidence: evidence.maintainer_signals | https://github.com/PaddlePaddle/PaddleOCR
10. Security or permission risk: Security or permission risk requires verification
- Severity: medium
- Finding: no_demo
- User impact: May increase setup, validation, or first-run risk for the user.
- Recommended check: Reproduce the official install and quickstart path in an isolated environment.
- Evidence: downstream_validation.risk_items | https://github.com/PaddlePaddle/PaddleOCR
11. Security or permission risk: Security or permission risk requires verification
- Severity: medium
- Finding: no_demo
- User impact: May increase setup, validation, or first-run risk for the user.
- Recommended check: Reproduce the official install and quickstart path in an isolated environment.
- Evidence: risks.scoring_risks | https://github.com/PaddlePaddle/PaddleOCR
12. Runtime risk: Runtime risk requires verification
- Severity: low
- Finding: Developers should check this performance risk before relying on the project: v3.7.0
- User impact: Upgrade or migration may change expected behavior: v3.7.0
- Recommended check: Before packaging this project, run the relevant install/config/quickstart check for: v3.7.0. Context: Observed when using cuda
- Evidence: failure_mode_cluster:github_release | https://github.com/PaddlePaddle/PaddleOCR/releases/tag/v3.7.0
Source: Doramagic discovery, validation, and Project Pack records
Community Discussion Evidence
These external discussion links are review inputs, not standalone proof that the project is production-ready.
Count of project-level external discussion links exposed on this manual page.
Open the linked issues or discussions before treating the pack as ready for your environment.
Community Discussion Evidence
Doramagic exposes project-level community discussion separately from official documentation. Review these links before using PaddleOCR with real data or production workflows.
- PaddleOCR-VL HPS: returnMarkdownImages=false is ineffective with default - github / github_issue
- Community source 2 - github / github_issue
- Link Checker Report - github / github_issue
- Link Checker Report - github / github_issue
- Link Checker Report - github / github_issue
- Link Checker Report - github / github_issue
- Link Checker Report - github / github_issue
- Link Checker Report - github / github_issue
- Link Checker Report - github / github_issue
- Link Checker Report - github / github_issue
- Link Checker Report - github / github_issue
- Link Checker Report - github / github_issue
Source: Project Pack community evidence and pitfall evidence