Doramagic.ai Chinese

Vector Retrieval and RAG · Public

kreuzberg

Vector retrieval project for checking embedding storage, query semantics, RAG integration, data boundaries, and rollback.

Vector databaseRAGEmbeddingsSemantic searchData boundaries

Last verification date: 2026-07-05 Verification method: source evidence, semantic profile, public page gate, and static build acceptance.

Publication status · 2026-07-05

What is kreuzberg?

01

Quick decision

Use this section to decide whether the project is worth a deeper read.
Best forDevelopers connecting knowledge bases, documents, or app data to semantic retrieval or RAG workflows.

Match the project to your task before installing it.

CapabilityVector database setup checks, embedding model boundaries, collection management, query acceptance, and deletion guidance

Vector retrieval project for checking embedding storage, query semantics, RAG integration, data boundaries, and rollback.

Repositorykreuzberg-dev/kreuzberg

8.5k stars · 501 forks

02

What it can do

Translate the upstream project into concrete capabilities the user can judge before installing.
1

Introduction & Capabilities

Related topics: Workspace Layout & Crate Structure, Language Bindings, FFI & Polyglot, Deployment Modes & Serving

Sources: [docs/features.md:60-120](), community issues #1144 (pruning) and #1149 (PaddleOCR-VL 1.6 / PP-OCRv6 model support).
2

Workspace Layout & Crate Structure

Related topics: Extraction Pipeline & Format Handlers, Language Bindings, FFI & Polyglot

Source: https://github.com/kreuzberg-dev/kreuzberg / Human Manual
3

Extraction Pipeline & Format Handlers

Related topics: OCR Backends & Configuration, Plugin System, Enrichment & Embeddings, Known Issues, Limitations & Migration Notes

Source: https://github.com/kreuzberg-dev/kreuzberg / Human Manual
4

OCR Backends & Configuration

Related topics: Extraction Pipeline & Format Handlers, Known Issues, Limitations & Migration Notes

Source: https://github.com/kreuzberg-dev/kreuzberg / Human Manual
5

Language Bindings, FFI & Polyglot

Related topics: Workspace Layout & Crate Structure, Plugin System, Enrichment & Embeddings

Source: https://github.com/kreuzberg-dev/kreuzberg / Human Manual

Sources: https://github.com/kreuzberg-dev/kreuzberg, Human Manual, Project Pack evidence, and downstream validation signals.

03

Community Discussion Evidence

Project-level external discussion stays visible on the detail page, not only inside the manual.
Stars8.5k stars
Forks501 forks
Contributors46 contributors
Licenseunknown

Community Discussion Evidence

12 source-linked items

Review these external discussions before using kreuzberg with real data or production workflows. They are review inputs, not standalone proof that the project is production-ready.

04

How to start

Only source-backed commands are shown here. Verify them in an isolated environment first.
1

Try the prompt first

Test the workflow without installing the upstream project.

preview
2

Read the Human Manual

Understand inputs, outputs, limits, and failure modes.

manual
3

Take context to your AI host

Use the compiled assets in your preferred AI environment.

context
4

Run sandbox verification

Confirm install commands and rollback before using a primary environment.

verify
pip install kreuzberg

Official start command · https://github.com/kreuzberg-dev/kreuzberg#readme · verified: yes

05

Human Manual

The English page must expose the real manual, not a short placeholder.

8+ sections · Human Manual

kreuzberg Manual

The Plugin System is kreuzberg's extension surface, letting integrators add custom extraction, post-processing, and validation logic without modifying the core extraction pipeline. It is d...

Open the full manual
  1. https://github.com/kreuzberg-dev/kreuzberg Project Manual
  2. Table of Contents
  3. Introduction & Capabilities
  4. Related Pages
  5. What Kreuzberg Solves
  6. Core Capabilities
  7. Extraction Pipeline
  8. OCR Backends
1

Introduction & Capabilities

Related topics: Workspace Layout & Crate Structure, Language Bindings, FFI & Polyglot, Deployment Modes & Serving

Sources: [docs/features.md:60-120](), community issues #1144 (pruning) and #1149 (PaddleOCR-VL 1.6 / PP-OCRv6 model support).
2

Workspace Layout & Crate Structure

Related topics: Extraction Pipeline & Format Handlers, Language Bindings, FFI & Polyglot

Source: https://github.com/kreuzberg-dev/kreuzberg / Human Manual
3

Extraction Pipeline & Format Handlers

Related topics: OCR Backends & Configuration, Plugin System, Enrichment & Embeddings, Known Issues, Limitations & Migration Notes

Source: https://github.com/kreuzberg-dev/kreuzberg / Human Manual
4

OCR Backends & Configuration

Related topics: Extraction Pipeline & Format Handlers, Known Issues, Limitations & Migration Notes

Source: https://github.com/kreuzberg-dev/kreuzberg / Human Manual
5

Language Bindings, FFI & Polyglot

Related topics: Workspace Layout & Crate Structure, Plugin System, Enrichment & Embeddings

Source: https://github.com/kreuzberg-dev/kreuzberg / Human Manual

06

AI Context Pack and portable assets

After deciding to continue, take the project context into your own AI host.

Complete pack plus user-owned assets

These files are planning and verification assets for Claude Code, Codex, Gemini, Cursor, ChatGPT, and other AI hosts.

07

Preflight checks

Treat this page as a planning asset, not proof that your local environment is ready.

08

Pitfall Log and verification risks

Doramagic surfaces high-risk items before users treat a candidate capability as verified.
high

Security or permission risk requires verification

May increase setup, validation, or first-run risk for the user.

medium

Installation risk requires verification

Developers may fail before the first successful local run: bug: HF/ONNX model download fails behind corporate TLS-MITM — no custom CA support

medium

Installation risk requires verification

Developers may fail before the first successful local run: bug: kreuzberg maps PDF ligature glyphs to C0 control characters

medium

Installation risk requires verification

Developers may fail before the first successful local run: feat: support PaddleOCR-VL 1.6 and PP-OCRv6 models

medium

Installation risk requires verification

May increase setup, validation, or first-run risk for the user.

medium

Installation risk requires verification

May increase setup, validation, or first-run risk for the user.

medium

Installation risk requires verification

May increase setup, validation, or first-run risk for the user.

medium

Installation risk requires verification

May increase setup, validation, or first-run risk for the user.