OpenKB Manual - Doramagic.ai

Doramagic Project Pack · Human Manual

OpenKB

OpenKB: Open LLM Knowledge Base

Overview and Architecture

Related topics: Wiki Foundation: Compilation, Linting, and Lifecycle, Generators: Query, Chat, Skill Factory, Deck, and Visualize, Configuration, LLM Integration, and Deployment

Section Related Pages

Continue reading this section for the full explanation and source context.

Overview and Architecture

OpenKB (Open Knowledge Base) is an open-source system that compiles raw documents into a structured, interlinked wiki-style knowledge base using LLMs. It is powered by PageIndex's vectorless, reasoning-based retrieval for long documents. The system is described in README.md as having two layers: a wiki foundation that compiles and maintains knowledge, and generators (query, chat, Skill Factory, visualize, deck) that turn that knowledge into useful output. The core idea is borrowed from a concept described by Andrej Karpathy: LLMs generate summaries, concept pages, and cross-references that are maintained automatically, so knowledge compounds over time instead of being re-derived on every query (Source: README.md:1-50).

Two-Layer System Design

OpenKB separates concerns into two distinct layers. The wiki foundation is responsible for ingesting documents, converting them, indexing them, and compiling them into a persistent wiki. The generator layer reads from the compiled wiki and produces surfaces such as answers, conversations, skill folders, knowledge graphs, and slide decks.

Layer	Responsibility	Examples
Wiki foundation	Ingest → Convert → Index → Compile	`openkb add`, `openkb watch`, `openkb remove`, `openkb recompile`, `openkb lint`
Generators	Read compiled wiki → Produce output	`openkb query`, `openkb chat`, `openkb skill new`, `openkb visualize`, `openkb deck new`

This separation is described in README.md under the "Layer 2: Generators" section, which states: *"A 'generator' reads from the compiled wiki and produces something usable: an answer, a conversation, a skill folder. The wiki is the substrate; generators are the surfaces."* (Source: README.md:1-50)

The wiki directory structure produced by the foundation layer is:

wiki/
├── index.md         # Table of contents
├── AGENTS.md        # Schema (LLM instructions)
├── sources/         # Full-text conversions
├── summaries/       # Per-document summaries
├── concepts/        # Cross-document synthesis
├── entities/        # Specific named things (people, orgs, places, products)
├── explorations/    # Saved query results
└── reports/         # Lint reports

Compilation Pipeline

The central component of the wiki foundation is the compilation pipeline defined in openkb/agent/compiler.py. The module docstring describes the pipeline as: *"Step 1: Build base context A (schema + document content). Step 2: A → generate summary. Step 3: A + summary → concepts plan (create/update/related). Step 4: Concurrent LLM calls (A cached) → generate new + rewrite updated concepts. Step 5: Code adds cross-ref links to related concepts, updates index."* (Source: openkb/agent/compiler.py:1-50)

A key architectural decision is the use of Anthropic prompt caching via cache_control markers placed at two breakpoints: end of the document message and end of the assistant summary message. This means the system + document is cached across all N+M+2 calls, and the additional summary prefix is cached across the N+M concept-generation calls. For providers that do not support cache_control, the system normalizes the payload into a list-of-blocks content shape that LiteLLM passes through cleanly (Source: openkb/agent/compiler.py:1-50).

Short and long documents are handled differently. Short documents (PDFs below the pageindex_threshold, default 20 pages) are read in full by the LLM via markitdown → Markdown conversion. Long documents are processed by PageIndex into a hierarchical tree index, and the LLM reads the tree instead of the full text. This is described in README.md under "Short vs Long Document Handling."

flowchart TB
    A[Raw Document] --> B{pageindex_threshold?}
    B -->|Below| C[markitdown → Markdown]
    B -->|At/above| D[PageIndex → Tree Index]
    C --> E[Compile: Summary]
    D --> E
    E --> F[Compile: Concepts Plan]
    F --> G[Concurrent: Generate/Update Concepts]
    G --> H[Code: Cross-refs & Backlinks]
    H --> I[Update index.md]
    I --> J[Wiki Complete]
    J --> K[Generators: query / chat / skill / visualize / deck]

Agent Architecture

OpenKB builds multiple specialized LLM agents on top of the openai-agents SDK. Each agent is constructed with a specific role, model, instruction set, and tool set.

The query agent in openkb/agent/query.py is the workhorse for openkb query and openkb chat. It exposes three function tools: read_file (read any Markdown file from the wiki), get_page_content (read specific pages of a PageIndex document, gated to doc_type: pageindex documents), and get_image (view images from the wiki). The get_image tool returns a ToolOutputImage when the path resolves to a real image, or a ToolOutputText fallback. Model settings include extra_headers and timeout extra_args sourced from .openkb/config.yaml (Source: openkb/agent/query.py:1-50).

The lint agent in openkb/agent/linter.py performs semantic quality checks the structural linter cannot: contradictions, gaps, staleness, redundancy, concept coverage, and entity coverage (including orphaned entity pages). It uses list_files and read_file tools and caps execution at MAX_TURNS = 50 to bound cost (Source: openkb/agent/linter.py:1-50).

The skill runner agent in openkb/agent/skill_runner.py clones the base query agent, injects skill body + user intent into instructions, and adds two extra tools: write_file (scoped to wiki/explorations/ and output/) and read_output_or_skill_file (reads text from output/ or skills/). All path writes are validated against an allowlist to prevent escape from the KB root (Source: openkb/agent/skill_runner.py:1-50).

Configuration & LLM Provider Support

OpenKB stores its configuration in .openkb/config.yaml after openkb init. The schema is minimal by default — model, language, and pageindex_threshold — but advanced options such as entity_types, extra_headers, and OAuth can be added. Model names follow the provider/model LiteLLM format, so any LiteLLM-supported provider works: OpenAI (gpt-5.4), Anthropic (anthropic/claude-sonnet-4-6), Gemini (gemini/gemini-3.1-pro-preview), and local backends like Ollama (Source: README.md:1-50).

Timeout and extra headers are surfaced through get_timeout() and get_extra_headers() in openkb.config, which are passed to every LLM call. This directly addresses community request #132 (*"Support passing timeout parameter to LiteLLM"*): users hitting litellm.Timeout on long Ollama jobs can now configure a longer timeout via .openkb/config.yaml. Community request #137 (LiteLLM param passthrough for Ollama) follows the same extra_args mechanism (Source: openkb/agent/query.py:1-50).

The get_timeout_extra_args() function used by the lint agent, and the extra_headers propagation in every agent, ensure that the same provider quirks (custom headers, OAuth, extended timeouts) work uniformly across query, lint, and skill execution.

Document Lifecycle Operations

Beyond the initial compile, OpenKB provides lifecycle operations that operate on the wiki substrate. Community issue #41 (*"Remove Document"*) is addressed by the openkb remove <doc> command, which cleans up the document's wiki pages, images, registry, and PageIndex state. The _remove_doc_from_pages function in openkb/agent/compiler.py handles the symmetric teardown: for each page whose frontmatter sources: lists summaries/{doc_name}, it removes the source from the list, removes - [[summaries/{doc_name}]] entries from ## Related Documents, and removes standalone See also: lines (Source: openkb/agent/compiler.py:1-50).

openkb recompile [doc] [--all] re-runs the compile pipeline on already-indexed documents without re-indexing. It regenerates summaries and rewrites concept pages; manual edits are overwritten. --refresh-schema additionally updates wiki/AGENTS.md. openkb watch watches the raw/ directory and auto-compiles new files, enabling continuous ingestion (Source: README.md:1-50).

Limitations & Roadmap Signals

Several community requests highlight known limitations in the current architecture. Issue #61 (*"S3-compatible object storage support"*) points out that OpenKB currently only supports the local filesystem; enterprise deployments must download from object storage to a local temp directory before ingestion, and upload generated wiki artifacts back. This is a known gap — the architecture is designed around Path-based filesystem operations, and supporting S3/MinIO would require abstracting the storage layer.

Issue #28 (*"Feedback correction mechanism"*) is partially addressed by openkb feedback ["msg"], which opens a prefilled GitHub issue to file feedback. However, the request for an in-wiki review mechanism where users confirm uncertain LLM-generated statements is not yet implemented in the current pipeline. The lint command provides a one-shot audit, but not an interactive review workflow.

Wiki Foundation: Compilation, Linting, and Lifecycle

Related topics: Overview and Architecture, Generators: Query, Chat, Skill Factory, Deck, and Visualize, Configuration, LLM Integration, and Deployment

Section Related Pages

Continue reading this section for the full explanation and source context.

Section 1.1 Pipeline Steps

Continue reading this section for the full explanation and source context.

Section 1.2 Short vs. Long Documents

Continue reading this section for the full explanation and source context.

Section 1.3 Knowledge Outputs

Continue reading this section for the full explanation and source context.

Wiki Foundation: Compilation, Linting, and Lifecycle

The Wiki Foundation is the first of OpenKB's two architectural layers: it is the substrate that ingests raw documents and compiles them into a persistent, interlinked wiki. Where Layer 2 (Generators) consumes the wiki to produce answers, chats, or skills, the Foundation owns the *creation*, *maintenance*, and *health* of that wiki. Its responsibilities span three concerns: the compilation pipeline that turns files into structured pages, the linters that audit structural and semantic quality, and the lifecycle commands that add, remove, recompile, and watch documents.

1. Compilation Pipeline

The compilation pipeline lives in openkb/agent/compiler.py. It is an LLM-driven, multi-step process designed around prompt caching: the system builds a large base context once and reuses it across many calls.

1.1 Pipeline Steps

As documented in the module docstring, the pipeline executes five steps:

Build base context A — schema (LLM instructions) + document content.
A → generate summary — the LLM produces a per-document summary.
A + summary → concepts plan — the LLM returns a structured plan describing which concepts to create, update, or relate.
Concurrent LLM calls (A cached) — generate new concept pages and rewrite updated ones in parallel.
Code-only step — append cross-reference links, write entity backlinks, and update the index.

This design minimises redundant token spend: the same system + document prefix is reused across all N+M+2 calls within a compile. Source: openkb/agent/compiler.py:1-15

1.2 Short vs. Long Documents

The pipeline branches on document length (configured via pageindex_threshold in .openkb/config.yaml):

Path	Trigger	Conversion	LLM reads	Output
Short	Below threshold	`markitdown` → Markdown	Full text	`summary` + `concepts`
Long	PDF ≥ threshold	PageIndex tree index	Document trees	`summary` + `concepts`

Images are extracted inline (short docs via pymupdf) or by PageIndex (long docs).

1.3 Knowledge Outputs

The compile step writes into a fixed directory layout under wiki/:

sources/ — full-text conversions of the raw files
summaries/ — per-document summary pages
concepts/ — cross-document synthesis pages
entities/ — named-thing pages (people, orgs, products, etc.)
explorations/ — saved query results (written by generators)
reports/ — lint reports

The compiler also maintains cross-document backlinks: the summary page links to every related concept/entity, and each related page links back under a ## Related Documents section. The function _backlink_pages() in openkb/agent/compiler.py performs this symmetric linking. Source: openkb/agent/compiler.py:bottom-half backlink helpers

2. Linting

OpenKB ships two complementary lint systems: a structural linter for deterministic checks, and a semantic linter that delegates quality judgments to an LLM agent.

2.1 Structural Checks

Structural linting verifies wikilink integrity — that [[concepts/X]], [[summaries/Y]], and [[entities/Z]] targets resolve to actual files. The compiler calls strip_ghost_wikilinks() (in openkb/lint.py, imported by compiler.py) to remove links pointing to pages that do not exist, preventing broken references from accumulating. Source: openkb/agent/compiler.py:references strip_ghost_wikilinks import

2.2 Semantic Lint Agent

openkb/agent/linter.py defines an LLM-powered lint agent that audits the wiki for issues structural tools cannot detect. The agent is built with two read-only tools — list_files(directory) and read_file(path) — both backed by openkb.agent.tools. The instruction template (loaded from wiki/AGENTS.md via get_agents_md()) asks the agent to check:

Contradictions — pages making conflicting claims about the same fact
Gaps — missing topics or unexplained references
Staleness — references to "recent" work that may now be dated
Redundancy — multiple pages covering the same content
Concept coverage — themes present in summaries but missing a concept page
Entity coverage — orphaned entities, missing entity pages, or redundant ones

The agent's process is methodical: start at index.md, read summaries, then concepts, then entities, then emit a structured Markdown report. It is configured with the same extra_headers and timeout settings used by the rest of the system via get_extra_headers() and get_timeout_extra_args(). Source: openkb/agent/linter.py:lint agent construction

3. Document Lifecycle

The lifecycle is exposed through CLI commands documented in the README. The pipeline (add → recompile → lint) is mirrored by removal (remove) and continuous ingestion (watch).

3.1 Add and Recompile

Command	Description
`openkb add <file>`	Ingest a file through the compile pipeline
`openkb recompile [doc] [--all]`	Re-run the compile pipeline on already-indexed docs without re-indexing; regenerates summaries and rewrites concept pages (manual edits are overwritten). `--dry-run` previews; `--refresh-schema` also updates `wiki/AGENTS.md`

Source: README.md:recompile description

3.2 Remove

The openkb remove <doc> command tears down all artifacts for a document. Per the README, it cleans up wiki pages, images, the registry, and PageIndex state. The flag set supports safe removal:

--dry-run — preview what would be removed
--keep-raw — preserve the original file in raw/
--keep-empty — retain pages even when they become empty after source removal

The compiler exposes symmetric helpers (_remove_doc_from_pages() in openkb/agent/compiler.py) that strip the document from each affected page's sources: frontmatter and from its ## Related Documents section. Source: openkb/agent/compiler.py:_remove_doc_from_pages. This feature directly addresses community request #41 "Remove Document", where users needed a way to supersede or version documents.

3.3 Watch

openkb watch monitors raw/ and automatically triggers compilation when new files appear, enabling drop-folder workflows. Source: README.md:watch command

3.4 Feedback

openkb feedback ["msg"] opens a prefilled GitHub issue; --type bug|feature|question tags the report. This is OpenKB's lightweight correction mechanism, reflecting community interest in issue #28 "Feedback correction mechanism".

4. Compile + Lint Workflow

flowchart TD
    A[raw/ file added or modified] --> B{openkb add / watch}
    B --> C[Compile: build context A]
    C --> D[LLM: summary]
    D --> E[LLM: concepts plan]
    E --> F[Concurrent LLM: write/update concepts]
    F --> G[Code: backlinks + index update]
    G --> H[wiki/ ready]
    H --> I{openkb lint}
    I --> J[Structural: strip ghost wikilinks]
    I --> K[Semantic: LLM lint agent]
    K --> L[wiki/reports/lint.md]
    H --> M{openkb recompile}
    M --> F
    H --> N{openkb remove doc}
    N --> O[Strip sources/backlinks, delete artifacts]

5. Configuration Notes Relevant to the Foundation

Two community-reported configuration gaps intersect directly with the Foundation:

LiteLLM timeouts (#132, #137): long PDFs processed via PageIndex can exceed the default 600 s timeout. Both the compile and lint agents consume get_timeout() / get_timeout_extra_args() from openkb.config, so configuring a custom timeout in .openkb/config.yaml propagates to every Foundation call.
Storage backends (#61): the Foundation currently reads from and writes to the local filesystem; S3-compatible backends are not yet supported and would require changes in the file I/O layer used by the compile pipeline.

Generators: Query, Chat, Skill Factory, Deck, and Visualize

Related topics: Overview and Architecture, Wiki Foundation: Compilation, Linting, and Lifecycle, Configuration, LLM Integration, and Deployment

Section Related Pages

Continue reading this section for the full explanation and source context.

Section openkb query "question"

Continue reading this section for the full explanation and source context.

Section openkb chat

Continue reading this section for the full explanation and source context.

Section openkb skill new

Continue reading this section for the full explanation and source context.

Generators: Query, Chat, Skill Factory, Deck, and Visualize

Overview and Architecture

OpenKB is organized as a two-layer system: a wiki foundation that ingests raw documents and compiles them into summaries, concept pages, entity pages, and cross-references, and a set of generators that read the compiled wiki to produce user-facing outputs (Source: README.md). The wiki is the substrate; generators are the surfaces. The five primary generators are query, chat, skill new, deck new, and visualize.

All generators rely on the wiki's persistent structure rather than re-deriving knowledge on each invocation. Cross-references already exist as [[wikilinks]] between summaries, concepts, and entities, so generators can reason over a stable, accumulated corpus instead of a per-query retrieval step (Source: README.md).

flowchart LR
    A[raw/ documents] --> B[Wiki Foundation<br/>openkb add]
    B --> C[wiki/<br/>summaries • concepts • entities]
    C --> D[query<br/>Grounded answer]
    C --> E[chat<br/>Multi-turn session]
    C --> F[skill new<br/>Redistributable agent skill]
    C --> G[deck new<br/>HTML slide deck]
    C --> H[visualize<br/>Interactive knowledge graph]
    F --> I[output/skills/]
    G --> J[output/decks/]
    H --> K[output/visualize/]

Query and Chat — Conversational Surfaces

`openkb query "question"`

The query generator produces a single grounded answer with inline citations to wiki pages. The implementation in openkb/agent/query.py builds an Agent named wiki-query from the OpenAI Agents SDK and equips it with three function tools (Source: openkb/agent/query.py):

Tool	Purpose
`read_file(path)`	Read any Markdown file from the wiki (e.g. `summaries/paper.md`).
`get_page_content(doc_name, pages)`	Retrieve specific pages from a PageIndex long-document tree.
`get_image(image_path)`	View a figure or diagram by relative path; returns a `ToolOutputImage` or fallback text.

The model is registered as litellm/<model> so any LiteLLM-supported provider works. Long documents handled by PageIndex are navigated page-by-page via get_page_content, while short documents are read end-to-end via read_file. Calling openkb query --save "..." persists the exchange under wiki/explorations/ for later inspection (Source: README.md).

`openkb chat`

The chat generator is an interactive, multi-turn session over the wiki. It reuses the query agent's read tools but adds session persistence so conversations can be listed, resumed, or deleted via openkb chat --resume, --list, and --delete (Source: README.md). Saved chat artifacts land in wiki/explorations/, matching the allow-list enforced by the file-writing tools (Source: openkb/agent/tools.py).

Skill Factory and Skill Runner

`openkb skill new`

Skill Factory distills a redistributable agent skill — a portable folder an external agent (Claude Code, Codex, etc.) can install natively — from a subset of the wiki. The output is a self-contained skill directory (Source: README.md):

<kb>/output/skills/<skill-name>/
├── SKILL.md              # YAML frontmatter + when-to-use + approach
├── references/           # Depth material the agent loads on demand
│   ├── methodology.md
│   └── key-quotes.md
└── (scripts/)            # Optional, only if intent implies computation

Validation runs automatically at the end of skill new; openkb skill validate <name> and --strict re-run the YAML-frontmatter, file-size, wikilink, and script checks (Source: README.md). Skill versions can be inspected and rolled back with openkb skill history and openkb skill rollback <name> --to <n>.

Skill Runner

The runner in openkb/agent/skill_runner.py clones the base query agent and augments its instructions with the skill body plus the user's intent. It registers two additional function tools scoped to the KB root (Source: openkb/agent/skill_runner.py):

write_file(path, content) — Writes only into wiki/explorations/ or output/.
read_output_or_skill_file(path) — Reads artifacts under output/ or skills/.

The path allow-list is implemented in openkb/agent/tools.py: a candidate path must resolve under the KB root *and* match the prefix rules (wiki/explorations/... requires ≥3 path parts, output/... requires ≥2). Paths outside the allow-list return an access-denied message rather than raising, so the agent sees a recoverable error (Source: openkb/agent/tools.py).

Deck and Visualize — Artifact Generators

`openkb deck new`

Introduced in v0.4.1, openkb deck new <skill-name> "<intent>" compiles the wiki into a single self-contained HTML slide deck at output/decks/<skill-name>/index.html. No build step or runtime dependencies are required; navigation keys are ←/→ for slide change, F for full-screen, and P for presenter mode (Source: README.md).

`openkb visualize`

The visualize generator renders the wiki as a single offline HTML page at output/visualize/graph.html, exposing three linked views of the same knowledge base (Source: README.md):

3D force graph for exploring dense connections.
Mind-map rooted at the OpenKB entry point.
Radial tree showing hierarchical structure.

Nodes are coloured by type and linked by the same [[wikilinks]] the wiki uses internally.

Common Concerns and Configuration

Several recurring community questions intersect with generator behavior (Source: README.md):

Timeouts on local models — openkb uses LiteLLM, which defaults to a 10-minute timeout that can be exceeded with local providers such as Ollama on long PageIndex documents (community issue #132). Generators therefore inherit any LiteLLM timeout setting configured via environment or provider config.
Unsupported params with Ollama — Provider-specific LiteLLM parameters may be rejected by some backends (community issue #137). Generators are model-agnostic and surface these errors verbatim.
Storage location — Generators only write to the local KB root under the allow-listed prefixes; S3-compatible object storage is not yet supported (community issue #61).

The compiled wiki consumed by every generator is produced by the pipeline in openkb/agent/compiler.py, which uses Anthropic prompt caching at two breakpoints (end of the document message and end of the assistant summary) to amortize system-prompt and document-prefix cost across N+M+2 calls (Source: openkb/agent/compiler.py). The optional lint agent in openkb/agent/linter.py audits the same wiki for contradictions, gaps, staleness, redundancy, and coverage before generators consume it (Source: openkb/agent/linter.py).

Configuration, LLM Integration, and Deployment

Related topics: Overview and Architecture, Wiki Foundation: Compilation, Linting, and Lifecycle, Generators: Query, Chat, Skill Factory, Deck, and Visualize

Section Related Pages

Continue reading this section for the full explanation and source context.

Section Prompt caching and JSON mode

Continue reading this section for the full explanation and source context.

Configuration, LLM Integration, and Deployment

OpenKB is a wiki-compilation system that delegates all language-model work to LiteLLM, so configuration, model selection, and provider quirks are first-class concerns. This page documents the configuration surface, how OpenKB talks to LLMs, and the deployment-shaped limitations surfaced by recent users.

1. The `.openkb/config.yaml` Surface

OpenKB settings are initialized by openkb init and persisted to .openkb/config.yaml Source: README.md. The shipped template looks like:

model: gpt-5.4                   # LLM model (any LiteLLM-supported provider)
language: en                     # Wiki output language
pageindex_threshold: 20          # PDF pages threshold for PageIndex

Model identifiers follow LiteLLM's provider/model convention; OpenAI may omit the prefix Source: README.md. The same README documents supported provider examples such as gpt-5.4, anthropic/claude-sonnet-4-6, and gemini/gemini-3.1-pro-preview Source: README.md. Advanced fields include an optional entity_types list (overriding the default extraction taxonomy) and extra_headers for custom HTTP headers, with OAuth also supported Source: README.md.

Runtime helpers in openkb/config.py are consumed by every agent module:

get_timeout() and get_timeout_extra_args() are imported by the linter and compiler to inject LiteLLM timeout kwargs Source: openkb/agent/linter.py.
get_extra_headers() is imported wherever the LLM is called so user-supplied headers survive into the request Source: openkb/agent/compiler.py.
DEFAULT_ENTITY_TYPES and resolve_entity_types() centralize the entity taxonomy used during concept compilation Source: openkb/agent/compiler.py.

2. How Agents Reach the LLM

OpenKB ships three named agents, all of which are built on top of LiteLLM:

Agent	Module	Tools	Role
`wiki-query`	openkb/agent/query.py	`read_file`, `get_page_content`, `get_image`	Grounded answering and chat
Semantic linter	openkb/agent/linter.py	`list_wiki_files`, `read_wiki_file`	Audit for contradictions, gaps, staleness
`skill::<name>`	openkb/agent/skill_runner.py	Query tools + `write_file`, `read_output_or_skill_file`	Distill redistributable skills

The query agent is constructed with model=f"litellm/{model}", so the config's model field flows directly to the chat-completion router Source: openkb/agent/query.py. The skill runner clones the base agent, appends skill instructions, and reuses the same wiki_root / kb_root scoping for read and write tools Source: openkb/agent/skill_runner.py.

Prompt caching and JSON mode

The compiler pipeline is explicitly engineered for Anthropic prompt caching. Each user message is wrapped via _cached_text, which produces a list-of-blocks payload with an ephemeral cache_control marker:

return [{"type": "text", "text": text, "cache_control": {"type": "ephemeral"}}]

Source: openkb/agent/compiler.py. The compiler docstring describes two cache breakpoints — end-of-document and end-of-summary — so the same base context is reused across the N+M+2 LLM calls per ingest Source: openkb/agent/compiler.py. Providers that ignore cache_control still receive a valid OpenAI-compatible content shape.

Concept and entity generation is forced into JSON via a module-level constant _JSON_RESPONSE_FORMAT = {"type": "json_object"}, with a comment warning that DeepSeek/Qwen require the literal word "json" in the prompt itself when this flag is set Source: openkb/agent/compiler.py. Compilations run concurrently under DEFAULT_COMPILE_CONCURRENCY, and _close_async_llm_clients() is awaited after each doc to avoid CLOSE_WAIT / FD leaks during long ingests Source: openkb/agent/compiler.py.

3. Deployment Considerations

OpenKB is currently a local-filesystem tool. Documents are ingested from raw/, intermediate artifacts live under wiki/ (with subfolders sources/, summaries/, concepts/, entities/, explorations/, reports/), and generators write to output/ Source: README.md. Community issue #61 explicitly highlights that there is no built-in S3 or MinIO backend, so production users must stage downloads and re-upload artifacts themselves Source: README.md.

Two other operational constraints are worth surfacing:

Timeouts. The default LiteLLM client timeout is 10 minutes. Issue #132 reports litellm.Timeout: Connection timed out. Timeout passed=600.0 for large PDFs routed through Ollama, and asks for a pass-through timeout in config Source: README.md. get_timeout() exists in openkb.config, but the publicly shipped config.yaml template does not expose it yet.
Provider-specific params. Issue #137 shows litellm.UnsupportedParamsError: ollama does not support parameters... when local Ollama models reject fields OpenAI accepts. Until a litellm passthrough block is added to .openkb/config.yaml, users have to pin compatible params per model.

A typical self-hosted Ollama deployment therefore looks like the snippet from issue #132:

language: en
model: ollama/<model>
pageindex_threshold: 20

Source: README.md.

4. Observed Limitations and Workarounds

Pain point	Where it surfaces	Current state
Configurable LiteLLM timeout	Long PDFs on Ollama (issue #132)	`get_timeout()` exists; no UI in shipped template
S3 / MinIO storage	Enterprise ingest (issue #61)	Local filesystem only; staging required
Removing a document	Wiki lifecycle (issue #41)	`openkb remove <doc>` documented in README
User feedback on uncertain claims	Trust and review (issue #28)	`openkb feedback` opens a prefilled GitHub issue
Ollama param rejection	Local model runs (issue #137)	No litellm config passthrough yet

The shipped openkb feedback ["msg"] command already opens a prefilled GitHub issue and accepts a --type bug/feature/question tag, which is OpenKB's existing answer to issue #28 Source: README.md. Likewise, openkb remove <doc> (with --dry-run, --keep-raw, --keep-empty) handles the supersede-a-document workflow from issue #41 Source: README.md. The remaining gaps — configurable timeout, S3 I/O, and a litellm passthrough block — are the most likely near-term additions to this surface.

Doramagic Pitfall Log

Source-linked risks stay visible on the manual page so the preview does not read like a recommendation.

high Installation risk requires verification

May increase setup, validation, or first-run risk for the user.

high Installation risk requires verification

May increase setup, validation, or first-run risk for the user.

high Configuration risk requires verification

May increase setup, validation, or first-run risk for the user.

medium Installation risk requires verification

May increase setup, validation, or first-run risk for the user.

Doramagic Pitfall Log

Found 11 structured pitfall item(s), including 3 high/blocking item(s). Top priority: Installation risk - Installation risk requires verification.

1. Installation risk: Installation risk requires verification

Severity: high
Finding: Project evidence flags a installation risk. Review the linked source before relying on this workflow.
User impact: May increase setup, validation, or first-run risk for the user.
Recommended check: Reproduce the official install and quickstart path in an isolated environment.
Evidence: community_evidence:github | https://github.com/VectifyAI/OpenKB/issues/132

2. Installation risk: Installation risk requires verification

Severity: high
Finding: Project evidence flags a installation risk. Review the linked source before relying on this workflow.
User impact: May increase setup, validation, or first-run risk for the user.
Recommended check: Reproduce the official install and quickstart path in an isolated environment.
Evidence: community_evidence:github | https://github.com/VectifyAI/OpenKB/issues/130

3. Configuration risk: Configuration risk requires verification

Severity: high
Finding: Project evidence flags a configuration risk. Review the linked source before relying on this workflow.
User impact: May increase setup, validation, or first-run risk for the user.
Recommended check: Reproduce the official install and quickstart path in an isolated environment.
Evidence: community_evidence:github | https://github.com/VectifyAI/OpenKB/issues/135

4. Installation risk: Installation risk requires verification

Severity: medium
Finding: Project evidence flags a installation risk. Review the linked source before relying on this workflow.
User impact: May increase setup, validation, or first-run risk for the user.
Recommended check: Reproduce the official install and quickstart path in an isolated environment.
Evidence: community_evidence:github | https://github.com/VectifyAI/OpenKB/issues/77

5. Configuration risk: Configuration risk requires verification

Severity: medium
Finding: Project evidence flags a configuration risk. Review the linked source before relying on this workflow.
User impact: May increase setup, validation, or first-run risk for the user.
Recommended check: Reproduce the official install and quickstart path in an isolated environment.
Evidence: community_evidence:github | https://github.com/VectifyAI/OpenKB/issues/137

6. Capability evidence risk: Capability evidence risk requires verification

Severity: medium
Finding: README/documentation is current enough for a first validation pass.
User impact: May increase setup, validation, or first-run risk for the user.
Recommended check: Reproduce the official install and quickstart path in an isolated environment.
Evidence: capability.assumptions | https://github.com/VectifyAI/OpenKB

7. Maintenance risk: Maintenance risk requires verification

Severity: medium
Finding: Project evidence flags a maintenance risk. Review the linked source before relying on this workflow.
User impact: May increase setup, validation, or first-run risk for the user.
Recommended check: Reproduce the official install and quickstart path in an isolated environment.
Evidence: evidence.maintainer_signals | https://github.com/VectifyAI/OpenKB

8. Security or permission risk: Security or permission risk requires verification

Severity: medium
Finding: no_demo
User impact: May increase setup, validation, or first-run risk for the user.
Recommended check: Reproduce the official install and quickstart path in an isolated environment.
Evidence: downstream_validation.risk_items | https://github.com/VectifyAI/OpenKB

9. Security or permission risk: Security or permission risk requires verification

Severity: medium
Finding: no_demo
User impact: May increase setup, validation, or first-run risk for the user.
Recommended check: Reproduce the official install and quickstart path in an isolated environment.
Evidence: risks.scoring_risks | https://github.com/VectifyAI/OpenKB

10. Maintenance risk: Maintenance risk requires verification

Severity: low
Finding: issue_or_pr_quality=unknown。
User impact: May increase setup, validation, or first-run risk for the user.
Recommended check: Reproduce the official install and quickstart path in an isolated environment.
Evidence: evidence.maintainer_signals | https://github.com/VectifyAI/OpenKB

11. Maintenance risk: Maintenance risk requires verification

Severity: low
Finding: release_recency=unknown。
User impact: May increase setup, validation, or first-run risk for the user.
Recommended check: Reproduce the official install and quickstart path in an isolated environment.
Evidence: evidence.maintainer_signals | https://github.com/VectifyAI/OpenKB

Source: Doramagic discovery, validation, and Project Pack records

Community Discussion Evidence

These external discussion links are review inputs, not standalone proof that the project is production-ready.

Sources 12

Count of project-level external discussion links exposed on this manual page.

Use Review before install

Open the linked issues or discussions before treating the pack as ready for your environment.

Community Discussion Evidence

Doramagic exposes project-level community discussion separately from official documentation. Review these links before using OpenKB with real data or production workflows.

[[Feature] Plans to support litellm config in .openkb/config.yaml? (Encou](https://github.com/VectifyAI/OpenKB/issues/137) - github / github_issue
[[Feature] Share PDF page extraction across short PDFs and long PDF page](https://github.com/VectifyAI/OpenKB/issues/135) - github / github_issue
[[Feature] Support passing timeout parameter to LiteLLM](https://github.com/VectifyAI/OpenKB/issues/132) - github / github_issue
Discuss OpenKB OKF profile and wiki layout after #102 - github / github_issue
openkb visualize is not a command - github / github_issue
uv tool install openkb fails - github / github_issue
Suggestion: replacing the default PDF parser with a more capable alterna - github / github_issue
v0.4.0 - github / github_release
v0.4.1 - github / github_release
v0.4.0-rc3 - github / github_release
v0.4.0-rc2 - github / github_release
v0.4.0-rc1 - github / github_release

Source: Project Pack community evidence and pitfall evidence

OpenKB

Overview and Architecture

Related Pages

Overview and Architecture

Two-Layer System Design

Compilation Pipeline

Agent Architecture

Configuration & LLM Provider Support

Document Lifecycle Operations

Limitations & Roadmap Signals

See Also

Wiki Foundation: Compilation, Linting, and Lifecycle

Related Pages

Wiki Foundation: Compilation, Linting, and Lifecycle

1. Compilation Pipeline

1.1 Pipeline Steps

1.2 Short vs. Long Documents

1.3 Knowledge Outputs

2. Linting

2.1 Structural Checks

2.2 Semantic Lint Agent

3. Document Lifecycle

3.1 Add and Recompile

3.2 Remove

3.3 Watch

3.4 Feedback

4. Compile + Lint Workflow

5. Configuration Notes Relevant to the Foundation

See Also

Generators: Query, Chat, Skill Factory, Deck, and Visualize

Related Pages

Generators: Query, Chat, Skill Factory, Deck, and Visualize

Overview and Architecture

Query and Chat — Conversational Surfaces

`openkb query "question"`

`openkb chat`

Skill Factory and Skill Runner

`openkb skill new`

Skill Runner

Deck and Visualize — Artifact Generators

`openkb deck new`

`openkb visualize`

Common Concerns and Configuration

See Also

Configuration, LLM Integration, and Deployment

Related Pages

Configuration, LLM Integration, and Deployment

1. The `.openkb/config.yaml` Surface

2. How Agents Reach the LLM

Prompt caching and JSON mode

3. Deployment Considerations

4. Observed Limitations and Workarounds

See Also

Doramagic Pitfall Log

Doramagic Pitfall Log

1. Installation risk: Installation risk requires verification

2. Installation risk: Installation risk requires verification

3. Configuration risk: Configuration risk requires verification

4. Installation risk: Installation risk requires verification

5. Configuration risk: Configuration risk requires verification

6. Capability evidence risk: Capability evidence risk requires verification

7. Maintenance risk: Maintenance risk requires verification

8. Security or permission risk: Security or permission risk requires verification

9. Security or permission risk: Security or permission risk requires verification

10. Maintenance risk: Maintenance risk requires verification

11. Maintenance risk: Maintenance risk requires verification

Community Discussion Evidence

Community Discussion Evidence