Doramagic Project Pack · Human Manual

seekdb

The AI-Native Search Database. Best for agent storage, it unifies vector, text, structured, and semi-structured data into a single engine. This all-in-one database makes agents smarter, easier to run, and more stable.

Overview, Architecture & Build System

Related topics: Core SQL Engine, Storage & Log Service, Vector, Full-Text, Hybrid Search & AI Service

Section Related Pages

Continue reading this section for the full explanation and source context.

Section Key build-time configuration knobs

Continue reading this section for the full explanation and source context.

Related topics: Core SQL Engine, Storage & Log Service, Vector, Full-Text, Hybrid Search & AI Service

Overview, Architecture & Build System

Project Purpose and Scope

seekdb is an open-source, AI-oriented database built by the OceanBase team. It is designed to serve as a unified state and memory layer for AI agents, supporting streaming writes, millisecond-latency retrieval, and branchable, mergeable, rollback-able work environments (via FORK DATABASE / FORK TABLE / MERGE / DROP primitives). The project exposes a MySQL-compatible protocol, native vector and full-text search, and a Python SDK (pyseekdb) for embedded use. Source: README.md.

The repository is distributed under the Apache License 2.0 and ships as RPM packages (e.g. seekdb-1.3.0.0-100000092026051510 for the v1.3.0 release). Active release branches include master, release/1.3.0, and the historical 4_4_x_release, reflecting the project's multi-platform and high-availability evolution tracked across v1.1.0 → v1.2.0 → v1.3.0.

High-Level Architecture

seekdb follows a layered C/C++ architecture that reuses the proven OceanBase engine and a curated set of third-party dependencies. The top-level entry point is the observer server binary, which bootstraps the SQL engine, storage layer, and network listener. The codebase is organized around a small number of well-defined subsystems:

  • deps/oblib/ — the shared OceanBase common library providing memory allocation, address/network utilities, and storage primitives. It is explicitly documented as "a common library for OceanBase project" and exposes the building blocks reused across the engine. Source: deps/oblib/README.md.
  • deps/easy/ — a custom event-driven I/O framework that wraps libev (ev.h) and provides connection pooling, asynchronous message handling, and structured I/O buffers. The libev integration uses compile-time feature flags (e.g. EV_FEATURE_CODE, EV_FEATURE_API, EV_FEATURE_WATCHERS) to conditionally enable backends, watchers, and priority levels. Source: deps/easy/src/io/ev.h.
  • src/objit/ — the in-process JIT compiler layer for PL/SQL, based on LLVM ORC and an ObPLIRCompiler that sits on top of llvm::orc::IRCompileLayer::IRCompiler. A stub implementation (ob_llvm_helper_stub.cpp) returns OB_NOT_SUPPORTED for platforms where JIT is disabled.
  • src/share/ and tools/ob_error/ — shared error-code definitions generated from src/oberror_errno.def via the gen_errno.pl script; this is the canonical way new error causes and solutions are wired in. Source: tools/ob_error/README.md.
  • src/observer/ — the main server process (e.g. ob_server.cpp, main.cpp) that wires SQL, storage, replication, and the listener together.
flowchart TB
    subgraph Client["Client Layer"]
        SDK["pyseekdb / SQLAlchemy / mysql client"]
    end
    subgraph Server["Observer Process (src/observer)"]
        SQL["SQL / PL Engine"]
        JIT["objit JIT (LLVM ORC)"]
        ERR["Error system (tools/ob_error)"]
    end
    subgraph Common["Common Library (deps/oblib)"]
        MEM["Memory / Allocator"]
        NET["Net / ObAddr"]
        STORE["Storage helpers"]
    end
    subgraph IO["I/O Framework (deps/easy)"]
        EV["libev event loop"]
        EIO["easy_io connection pool"]
    end
    SDK -->|MySQL protocol| Server
    SQL --> JIT
    SQL --> ERR
    Server --> Common
    Server --> IO
    IO --> EV
    IO --> EIO

The architecture cleanly separates protocol/SQL concerns (top), shared utilities (middle), and platform eventing (bottom), which is what allows the same binary to be embedded in Python via pyseekdb or deployed as a standalone server.

Build System

The project uses CMake as its primary build configuration entry point (CMakeLists.txt) and provides two platform-specific driver scripts:

These scripts wrap the underlying CMake invocation, dependency bootstrapping, and obbuild packaging that the community references in bug reports (e.g. BUILD_INFO: obbuild-sanity-master-101170 and BUILD_FLAGS: RelWithDebInfo|Sanity). The supported build configurations observed in production issues include RelWithDebInfo|Sanity, indicating that sanitizers are first-class build configurations rather than optional add-ons.

The Windows build path is the source of the most common build-system friction — for example, issue #914 reports a compilation error in ob_req_packet_code.h that breaks ob_geo_func_covered_by2.cpp on Windows. The codebase explicitly accommodates this with _WIN32 guards in platform headers. Source: deps/easy/src/include/easy_define.h.

Key build-time configuration knobs

ConcernMechanismWhere to look
Event-loop featuresEV_FEATURES mask + EV_MINPRI/EV_MAXPRIdeps/easy/src/io/ev.h
I/O buffer sizesEASY_IO_BUFFER_SIZE, EASY_IOV_SIZE, EASY_FIRST_MSGLENdeps/easy/src/io/easy_io_struct.h
JIT supportObPLIRCompiler with optional object cachesrc/objit/src/core/ob_pl_ir_compiler.h
Error code generationgen_errno.pl regenerates ob_errno.h/.cpptools/ob_error/README.md
Platform shims_WIN32 / __cplusplus branchesdeps/easy/src/include/easy_define.h

Deployment Modes and Platform Support

seekdb ships in three execution modes, all backed by the same build artifacts:

  1. Embedded mode — in-process via pyseekdb, no separate server. This is the path most Python AI/agent applications use.
  2. Server mode — the observer binary listens on a MySQL port and is normally managed by systemd (DEB/RPM packages install a service unit). The README links to docs/deploy-by-systemd/ for offline install and configuration details.
  3. Cluster mode — OceanBase-style deployment for primary-standby replication and high availability, introduced in v1.2.0.

Platform coverage has expanded steadily: v1.1.0 added native macOS 15+ development, v1.2.0 added primary-standby replication and FORK DATABASE, and v1.3.0 (current) introduces async indexes backed by a new Change Stream incremental framework. The same release line also adds Windows portability work, which is still maturing (see issue #914 above).

See Also

Source: https://github.com/oceanbase/seekdb / Human Manual

Core SQL Engine, Storage & Log Service

Related topics: Overview, Architecture & Build System, Vector, Full-Text, Hybrid Search & AI Service, FORK/MERGE Sandboxes & Embedded SDKs

Section Related Pages

Continue reading this section for the full explanation and source context.

Related topics: Overview, Architecture & Build System, Vector, Full-Text, Hybrid Search & AI Service, FORK/MERGE Sandboxes & Embedded SDKs

Core SQL Engine, Storage & Log Service

seekdb is a hybrid AI database built on the OceanBase engine, exposing a MySQL-compatible SQL surface while internally combining vector, full-text, and relational storage. The "Core SQL Engine, Storage & Log Service" layer is the foundation that processes every query, persists every row, and records every event the system emits. This page describes how these subsystems fit together, drawing from the source files referenced above.

Purpose and Scope

The core layer is responsible for three concerns:

  1. SQL execution — accepting MySQL-protocol queries and turning them into physical operators over hybrid indexes.
  2. Storage — persisting rows, vectors, and index structures, including the async/Change-Stream-based index pipeline introduced in v1.3.0.
  3. Logging & diagnostics — providing the runtime event log, error-code utility, and supporting libraries (memory allocator, common storage helpers) used by every other subsystem.

These are the surfaces a developer will most often interact with when debugging crash reports (for example the SIGSEGV/SIGABRT issues seen recently), instrumenting a new feature, or extending the supported query types. Source: README.md

SQL Engine Layer

seekdb speaks MySQL wire protocol and therefore integrates directly with standard MySQL drivers, SQLAlchemy, and tooling such as LangChain, LlamaIndex, and Dify. A single SQL statement can combine scalar filters, full-text predicates, and vector similarity in one pass:

SELECT id, title,
       l2_distance(embedding, '[0.12, 0.34, ...]') AS dist
FROM articles
WHERE MATCH(content) AGAINST('quarterly report')
ORDER BY dist APPROXIMATE
LIMIT 10;

Source: README.md

Key engine characteristics visible from the documentation:

  • Hybrid plan operatorsVECTOR(N) columns, FULLTEXT INDEX ... WITH PARSER ik, and VECTOR INDEX ... WITH (DISTANCE=l2, TYPE=hnsw, LIB=vsag) can coexist on the same heap-organized table, and the optimizer produces plans that combine them. Source: README.md
  • Python wheel surface — the seekdb package exposes seekdb.open() / seekdb.connect() and returns a DB-API 2.0 cursor, so the same SQL is reachable from embedded mode. Source: package/wheel/README.md
  • MySQL-only error model — Oracle compatibility mode has been removed; the error utility no longer prints Oracle error codes. Source: tools/ob_error/src/ob_error.cpp:print_help

The recent fixes around inner_table.show_table_status (issue #922) and the SIGSEGV in ObExecContext::get_my_session during PX dispatch_sqcs (issue #920) both live inside this layer — they are symptoms of edge cases in execution contexts that the core engine must guard.

Storage Engine and Index Framework

Storage is hybrid by design. Rows are stored on heap-organized tables; vector and full-text indexes are maintained alongside them. The v1.3.0 release (May 2026) introduced async indexes backed by a Change Stream incremental framework that decouples writes from index builds, delivering high ingest throughput without blocking on index maintenance. Source: README.md

Community evidence highlights the storage subsystems under active hardening:

  • Memory fragmentation in iterator tagsBtreeI / iterator memory blocks (~3 MB) caused severe fragmentation and were optimized in issue #900.
  • SIGABRT (memory_sanity_abort) during vector index creation — tracked in issue #892, indicating the memory-allocator guards interact tightly with vector-index allocation paths.
  • FOR K DATABASE / FORK TABLE — Copy-on-Write sandboxes at database and table level (introduced in v1.1.0/v1.2.0) rely on the storage layer to snapshot pages cheaply.

Underneath, deps/oblib provides the common storage helpers exercised by tests such as test_common_storage.cpp, which validates URI construction, append-file writers, and abnormal writer behavior (writing a buffer shorter than the requested length). Source: deps/oblib/unittest/lib/restore/test_common_storage.cpp

The memory allocator is instrumented via hooks and is covered by test_malloc_hook.cpp, which exercises malloc, realloc, posix_memalign, aligned_alloc, pvalloc, and cross-page reallocation paths. The test asserts that the allocator's Header carries a valid magic code and the correct data_size_, which is the same sanity check that protects production workloads. Source: deps/oblib/unittest/lib/alloc/test_malloc_hook.cpp

flowchart LR
  A[Client SQL / MySQL Wire] --> B[SQL Engine<br/>parser + planner + executor]
  B --> C[Hybrid Storage<br/>heap rows + HNSW + FULLTEXT]
  B --> D[Change Stream<br/>async index pipeline]
  D --> C
  C --> E[(Persisted data<br/>+ index files)]
  B --> F[Log Service<br/>easy_log + ob_error]
  C --> F

Log Service and Supporting Libraries

The log service is implemented by the easy framework, which is also the asynchronous I/O backbone. The logging macros in easy_log.h provide a leveled surface (easy_fatal_logeasy_trace_log) plus a SYS_ERROR convenience macro, and they all funnel through easy_common_log with configurable formatters. Source: deps/easy/src/io/easy_log.h

Event-driven I/O — required for the high-throughput write path that the v1.3.0 async index pipeline relies on — is handled by an embedded libev whose feature flags (EV_FEATURE_CODE, EV_FEATURE_DATA, … EV_FEATURE_OS) are feature-tested at compile time, with sensible compile-time defaults for EV_MINPRI, EV_MAXPRI, EV_MULTIPLICITY, and per-feature toggles such as EV_PERIODIC_ENABLE. Source: deps/easy/src/io/ev.h

The connection / request model used by every SQL and RPC path is defined in easy_io_struct.h:

StructureRole
easy_connection_tPer-connection state: status flags, ssl_sm_, keepalive_failed, doing_request_count, summary node, TLS version option
easy_request_tInbound/outbound packet carrier (ipacket/opacket) with magic-code debug guards
easy_io_tEvent-loop wrapper holding handler/read/write callbacks, send queue, rx request queue, and user data

Source: deps/easy/src/io/easy_io_struct.h

A summary/observability subsystem tracks per-connection counters (send_bytes, recv_bytes, keepalive counters, ratelimit_enabled) and exposes diff/html output through easy_summary_diff / easy_summary_html_output, which is what powers the per-request diagnostic dumps seen in SIGABRT/SIGSEGV bug reports. Source: deps/easy/src/io/easy_summary.h

The companion ob_error tool resolves numeric error codes to human-readable messages, causes, and solutions for the MySQL facility, and prints both the seekdb error name and the equivalent MySQL errno plus SQLSTATE. Its print_help documents the supported invocation modes (ob_error error_code, ob_error MY error_code). Source: tools/ob_error/src/ob_error.cpp

The shared utilities (oblib) — described simply as "a common library for OceanBase project" — bundle the allocator, restore/storage helpers, and other primitives consumed by both the SQL engine and the storage engine. Source: deps/oblib/README.md

Common Failure Modes

Based on recent issues, developers working in this layer should watch for:

  • Crash in async RPC callbacks (ObExecContext::get_my_session, issue #920) — the session pointer must outlive any pending callback queued through easy_io.
  • Standby query timeout after switchover during dump (issue #921) — log service timestamps and replay markers must be consistent across primary/standby.
  • DDL core dumps in T1_DDLTaskExecu (issue #906) — long-running DDL tasks interact with the storage and log layers; tracing requires the per-task magic-code debug fields in easy_request_t.
  • Memory-sanity aborts during vector index creation (issue #892) — vector-index allocation paths must respect the Header magic-code contract validated by test_malloc_hook.cpp.

See Also

Source: https://github.com/oceanbase/seekdb / Human Manual

Vector, Full-Text, Hybrid Search & AI Service

Related topics: Overview, Architecture & Build System, Core SQL Engine, Storage & Log Service, FORK/MERGE Sandboxes & Embedded SDKs

Section Related Pages

Continue reading this section for the full explanation and source context.

Section Vector Columns and ANN Indexes

Continue reading this section for the full explanation and source context.

Section Full-Text Search

Continue reading this section for the full explanation and source context.

Section Hybrid Search (Vector + Text + Relational)

Continue reading this section for the full explanation and source context.

Related topics: Overview, Architecture & Build System, Core SQL Engine, Storage & Log Service, FORK/MERGE Sandboxes & Embedded SDKs

Vector, Full-Text, Hybrid Search & AI Service

Overview

seekdb is a MySQL-compatible database that extends a traditional relational engine with first-class vector search, full-text search, hybrid retrieval, and AI-oriented service primitives. The project's positioning — "the state store for the agent era" — is built around a single SQL surface that can express vector similarity, lexical matching, structured filters, and copy-on-write (COW) branching in one query.

The hybrid search and AI surface is documented in the top-level README.md, which presents a single end-to-end example in which a single articles table carries a VECTOR(384) embedding column, a FULLTEXT INDEX ... WITH PARSER ik, and a VECTOR INDEX ... WITH (DISTANCE=l2, TYPE=hnsw, LIB=vsag), and is then queried with a single SELECT that combines MATCH(content) AGAINST(...) and l2_distance(embedding, ...) with ORDER BY dist APPROXIMATE LIMIT 10. This single example defines the public contract of the search/AI surface and is the entry point for every downstream SDK and integration.

The Python wheel package documented in package/wheel/README.md describes the AI-oriented feature set at a higher level: vector storage and search, scalar and fuzzy search, unified hybrid (vector + scalar) ranking, and built-in "AI functions" that simplify AI application development. Both READMEs advertise MySQL protocol compatibility so that hybrid retrieval is reachable from LangChain, LlamaIndex, Dify, and any standard MySQL driver.

Core Capabilities

Vector Columns and ANN Indexes

A vector column is declared inline with the table DDL using the VECTOR(dim) type, and an approximate-nearest-neighbor (ANN) index is created with a VECTOR INDEX clause that takes a WITH option list for distance metric, index type, and underlying library. The README example wires all three:

CREATE TABLE articles (
  id        INT PRIMARY KEY,
  title     TEXT,
  content   TEXT,
  embedding VECTOR(384),
  FULLTEXT INDEX idx_fts (content) WITH PARSER ik,
  VECTOR   INDEX idx_vec (embedding) WITH (DISTANCE=l2, TYPE=hnsw, LIB=vsag)
) ORGANIZATION = HEAP;

Source: README.md

Three configuration knobs are surfaced through the index options:

OptionExample valueMeaning
DISTANCEl2Distance function (L2 / Euclidean)
TYPEhnswANN algorithm family
LIBvsagUnderlying vector-search library implementation

The ORGANIZATION = HEAP clause is the row-storage option exposed for tables that are dominated by vector and text workloads rather than transactional key lookups.

Full-text search is exposed through the standard MySQL FULLTEXT INDEX ... WITH PARSER syntax. The ik parser shown in the README example is the analyzer used for Chinese / mixed-language text. Queries use MATCH(col) AGAINST(expr) as in stock MySQL, and the result is composable with vector distance and relational predicates in a single statement.

Hybrid Search (Vector + Text + Relational)

Hybrid retrieval is expressed as a single SQL statement: the MATCH ... AGAINST predicate filters by lexical relevance, the l2_distance(embedding, ...) expression produces a numeric distance, and the ORDER BY dist APPROXIMATE LIMIT 10 clause delegates the top-k computation to the ANN index. The exact statement from the README is:

SELECT id, title,
       l2_distance(embedding, '[0.12, 0.34, ...]') AS dist
FROM articles
WHERE MATCH(content) AGAINST('quarterly report')
ORDER BY dist APPROXIMATE
LIMIT 10;

Source: README.md

The APPROXIMATE keyword on ORDER BY is the planner's switch from an exact sort to the HNSW-backed top-k path. Because everything is expressed in SQL, the same query is reachable from any MySQL driver, SQLAlchemy, or the in-process pyseekdb SDK.

Python SDK and AI Service

The pyseekdb wheel (and the renamed seekdb wheel) is the canonical entry point for application developers. The wheel README documents a three-step lifecycle — open, connect, cursor — and exposes a MySQL-compatible connection that can read system catalogs such as oceanbase.DBA_OB_USERS:

import seekdb

seekdb.open()
conn = seekdb.connect()
cursor = conn.cursor()
cursor.execute("SELECT * FROM oceanbase.DBA_OB_USERS")
results = cursor.fetchall()
conn.close()

Source: package/wheel/README.md

The open() / connect() separation is what enables embedded mode (in-process, no server) and server / OceanBase mode with the same code path. Wheel-level requirements are CPython 3.8+ on Linux x86_64 and aarch64; macOS 15+ builds are documented as supported starting with v1.1.0.

The wheel README also enumerates the AI-service surface: vector storage and search, scalar and fuzzy search, hybrid (vector + scalar) ranking, and "AI functions" — built-in stored functions that simplify common AI retrieval patterns. These AI functions sit on top of the same SQL types and indexes described above; they are not a separate query language, they are SQL functions callable from any MySQL client.

Architecture and Code Paths

The high-level data flow for a hybrid query is the same regardless of whether the entry point is a MySQL client, SQLAlchemy, or the embedded pyseekdb SDK:

flowchart LR
  A[SQL / Driver] --> B[MySQL Protocol / pyseekdb]
  B --> C[SQL Parser & Planner]
  C --> D{Hybrid Plan}
  D --> E[Full-Text MATCH via FTS index]
  D --> F[Vector distance via HNSW / vsag]
  D --> G[Relational predicates on heap rows]
  E --> H[Top-k via ORDER BY ... APPROXIMATE]
  F --> H
  G --> H
  H --> I[Result set]

The JIT layer that compiles PL/SQL is implemented in src/objit/src/core/ob_pl_ir_compiler.h (an IRCompileLayer::IRCompiler that drives the LLVM TargetMachine) and src/objit/src/ob_llvm_helper_stub.cpp provides the ObLLVMHelper facade. The src/objit/README.md confirms that the module "contains LLVM-based Code Generator logic." Collation-aware string comparison, which the planner uses during hybrid top-k fusion, is implemented in src/objit/src/expr/string_cmp.cpp (functions such as ob_strnncollsp_utf8mb4_help). The general-purpose utility surface that the search and AI layers depend on lives in the common library documented at deps/oblib/README.md.

Release Highlights and Known Issues

v1.3.0 (May 25, 2026) introduces async indexes backed by a new Change Stream incremental framework that decouples writes from index builds, which directly affects vector index construction throughput and stability — the community has tracked a memory_sanity_abort core during vector index creation in issue #892, and the async index path is the project's response to that class of bug. v1.2.0 added primary-standby replication and Fork Database; v1.1.0 introduced experimental FORK TABLE and macOS builds, both of which are prerequisites for the "agent sandbox" use case in which a hybrid-search workload is branched for safe experimentation. Long-standing community requests to expose this surface to other language ecosystems — TypeScript / JavaScript (#40), Rust (#76), and a stable C ABI for embedded use (#104) — all require a foreign-function entry point on top of the same hybrid SQL surface documented above.

See Also

Source: https://github.com/oceanbase/seekdb / Human Manual

FORK/MERGE Sandboxes & Embedded SDKs

Related topics: Overview, Architecture & Build System, Core SQL Engine, Storage & Log Service, Vector, Full-Text, Hybrid Search & AI Service

Section Related Pages

Continue reading this section for the full explanation and source context.

Section Components

Continue reading this section for the full explanation and source context.

Section Lifecycle

Continue reading this section for the full explanation and source context.

Section Community-Driven Refinements

Continue reading this section for the full explanation and source context.

Related topics: Overview, Architecture & Build System, Core SQL Engine, Storage & Log Service, Vector, Full-Text, Hybrid Search & AI Service

FORK/MERGE Sandboxes & Embedded SDKs

Overview and Purpose

seekdb ships two complementary capabilities designed for AI-agent workflows: a database-level copy-on-write (COW) sandbox mechanism (FORK / MERGE / DROP) and an embedded in-process SDK (pyseekdb). Together they let agents branch state for experimentation, run a single binary without external services, and stream continuous memory writes with millisecond-later retrieval.

The COW sandbox is built around the idea that an agent often needs to *try things* — re-rank results, run speculative writes, re-embed a corpus — and only later commit or roll back. seekdb exposes this as a first-class SQL operation rather than a sidecar feature. Source: README.md describes the feature as: *"FORK DATABASE for safe experimentation, MERGE to accept, DROP to roll back"*.

FORK TABLE was introduced as an experimental feature in v1.1.0 and FORK DATABASE graduated in v1.2.0 alongside primary–standby replication. Source: v1.1.0 release notes, v1.2.0 release notes.

FORK / MERGE / DROP Sandbox Architecture

The sandbox mechanism is implemented in the rootserver layer as a coordinated set of services, tasks, and helper utilities. The high-level flow is shown below.

flowchart LR
    A[SQL: FORK DATABASE / FORK TABLE] --> B[ob_fork_database_service.cpp / ob_fork_table_service.cpp]
    B --> C[ob_fork_table_info_builder.cpp]
    C --> D[ob_fork_table_task.cpp]
    D --> E[ob_fork_table_helper.cpp]
    E --> F[(Storage Layer - COW snapshots)]
    F -->|MERGE| G[Apply branch into source]
    F -->|DROP| H[Discard branch]
    G --> I[Source: src/rootserver/fork_table/ob_fork_database_service.cpp]
    H --> I

Components

ComponentFileRole
Fork database servicesrc/rootserver/fork_table/ob_fork_database_service.cppCoordinates FORK DATABASE lifecycle; gate for branch creation at database granularity. Source: src/rootserver/fork_table/ob_fork_database_service.cpp
Fork table servicesrc/rootserver/fork_table/ob_fork_table_service.cppHandles FORK TABLE requests; the experimental entry point introduced in v1.1.0. Source: src/rootserver/fork_table/ob_fork_table_service.cpp
Fork table tasksrc/rootserver/fork_table/ob_fork_table_task.cppAsync task that performs the actual COW copy of metadata/data references. Source: src/rootserver/fork_table/ob_fork_table_task.cpp
Fork table helpersrc/rootserver/fork_table/ob_fork_table_helper.cppShared utility routines for validation, locking, and resolution. Source: src/rootserver/fork_table/ob_fork_table_helper.cpp
Info buildersrc/rootserver/fork_table/ob_fork_table_info_builder.cppConstructs the snapshot description (schemas, partitions, indexes) handed to the task. Source: src/rootserver/fork_table/ob_fork_table_info_builder.cpp
Shared utilsrc/share/ob_fork_table_util.cppCross-component helpers used by both services and the rootserver. Source: src/share/ob_fork_table_util.cpp

Lifecycle

  1. FORK — A user/agent issues FORK DATABASE <src> TO <branch> (or FORK TABLE in v1.1.0+). The request enters the appropriate service (ob_fork_database_service.cpp / ob_fork_table_service.cpp), which validates state and dispatches a task.
  2. Snapshotob_fork_table_info_builder.cpp enumerates the source objects, and ob_fork_table_task.cpp installs COW pointers rather than copying physical blocks.
  3. Branch — Writes in the branch diverge from the source; reads remain shared until COW is triggered.
  4. MERGE — A merge operation promotes the branch back into the source. After merge, the branch is no longer independent.
  5. DROP — Discarding the branch frees the divergent blocks. This is the rollback path. Source: described in README.md: *"DROP to roll back"*.

Community-Driven Refinements

Issue #50 (*"[Feature]: Lightweight Table-Level Forking for Multi-Version Data in AI Workflows"*, 4 comments) argues for finer-grained branching. Today, the implementation supports both database- and table-level forking. Source: issue #50, v1.1.0 release notes.

Embedded SDKs (`pyseekdb`)

The Python SDK pyseekdb is the primary embedded entry point. It can be installed in one line and runs the engine *in-process* — no separate observer process required. Source: README.md: *"pip install -U pyseekdb ... No servers, no schemas, no embedding setup. Embedded mode runs in-process; switch to server / OceanBase mode with one line."*

Connection Modes

ModeTriggerProcess model
Embeddedpyseekdb.Client(path="./agent_state.db")In-process; engine library linked into the Python interpreter. Source: README.md agent memory example
ServerConnect to a remote observerStandard MySQL protocol client
OceanBaseConnect to a full OceanBase clusterDistributed deployment

Agent Memory Pattern

The canonical use of the embedded SDK is the "write-then-retrieve" loop, where an agent persists an observation and queries relevant context milliseconds later. The README's example demonstrates:

import pyseekdb

client = pyseekdb.Client(path="./agent_state.db")
memory = client.get_or_create_collection(name="episodic")

for step in agent.run():
    memory.upsert(ids=[step.id], documents=[step.observation])
    relevant = memory.query(query_texts=step.next_query, n_results=5)
    agent.act(relevant)

Source: README.md. The pattern works because seekdb's async index pipeline (introduced in v1.3.0 via the Change Stream incremental framework) keeps writes and incremental HNSW updates decoupled. Source: v1.3.0 release notes.

Multi-Language SDK Requests

Community requests for non-Python SDKs — TypeScript/JavaScript (issue #40), Rust (issue #76), and a stable C ABI for embedding from Rust/Go/Java (issue #104) — all reference the embedded mode as the desired integration point. Source: issue #40, issue #76, issue #104. Today, pyseekdb is the only officially shipped embedded client.

Common Failure Modes and Caveats

  • FORK TABLE experimentalFORK TABLE shipped as *experimental* in v1.1.0; database-level FORK DATABASE is the stable form as of v1.2.0. Source: v1.1.0 release notes, v1.2.0 release notes.
  • Merge semantics — Once merged, the branch cannot be re-opened; a misbehaving merge is reverted via DROP *before* merge, not after.
  • Index consistency under fork — Vector and full-text indexes are COW-pointer snapshotted; if your workflow mutates embeddings on the branch, expect incremental rebuild work on the branch's own index.
  • Windows builds — A compilation error in ob_req_packet_code.h affected ob_geo_func_covered_by2.cpp on Windows. Source: issue #914. Embedded SDK users on Windows should pin to a build that includes the fix.
  • Memory pressure — Background index work and sandbox branches can co-exist with live traffic; check release-specific tuning notes when running both at scale. Community issues #901 and #900 track ongoing memory cleanups. Source: issue #901, issue #900.

See Also

Source: https://github.com/oceanbase/seekdb / Human Manual

Doramagic Pitfall Log

Source-linked risks stay visible on the manual page so the preview does not read like a recommendation.

high Installation risk requires verification

May increase setup, validation, or first-run risk for the user.

high Installation risk requires verification

May increase setup, validation, or first-run risk for the user.

high Installation risk requires verification

May increase setup, validation, or first-run risk for the user.

high Installation risk requires verification

May increase setup, validation, or first-run risk for the user.

Doramagic Pitfall Log

Found 30 structured pitfall item(s), including 4 high/blocking item(s). Top priority: Installation risk - Installation risk requires verification.

1. Installation risk: Installation risk requires verification

  • Severity: high
  • Finding: Project evidence flags a installation risk. Review the linked source before relying on this workflow.
  • User impact: May increase setup, validation, or first-run risk for the user.
  • Recommended check: Reproduce the official install and quickstart path in an isolated environment.
  • Evidence: community_evidence:github | https://github.com/oceanbase/seekdb/issues/914

2. Installation risk: Installation risk requires verification

  • Severity: high
  • Finding: Project evidence flags a installation risk. Review the linked source before relying on this workflow.
  • User impact: May increase setup, validation, or first-run risk for the user.
  • Recommended check: Reproduce the official install and quickstart path in an isolated environment.
  • Evidence: community_evidence:github | https://github.com/oceanbase/seekdb/issues/892

3. Installation risk: Installation risk requires verification

  • Severity: high
  • Finding: Project evidence flags a installation risk. Review the linked source before relying on this workflow.
  • User impact: May increase setup, validation, or first-run risk for the user.
  • Recommended check: Reproduce the official install and quickstart path in an isolated environment.
  • Evidence: community_evidence:github | https://github.com/oceanbase/seekdb/issues/921

4. Installation risk: Installation risk requires verification

  • Severity: high
  • Finding: Project evidence flags a installation risk. Review the linked source before relying on this workflow.
  • User impact: May increase setup, validation, or first-run risk for the user.
  • Recommended check: Reproduce the official install and quickstart path in an isolated environment.
  • Evidence: community_evidence:github | https://github.com/oceanbase/seekdb/issues/920

5. Installation risk: Installation risk requires verification

  • Severity: medium
  • Finding: Developers should check this installation risk before relying on the project: Query times out on new standby after switchover during dump on primary
  • User impact: Developers may fail before the first successful local run: Query times out on new standby after switchover during dump on primary
  • Recommended check: Before packaging this project, run the relevant install/config/quickstart check for: Query times out on new standby after switchover during dump on primary. Context: Observed during installation or first-run setup.
  • Evidence: failure_mode_cluster:github_issue | https://github.com/oceanbase/seekdb/issues/921

6. Installation risk: Installation risk requires verification

  • Severity: medium
  • Finding: Developers should check this installation risk before relying on the project: v1.3.0
  • User impact: Upgrade or migration may change expected behavior: v1.3.0
  • Recommended check: Before packaging this project, run the relevant install/config/quickstart check for: v1.3.0. Context: Observed when using node, python, windows
  • Evidence: failure_mode_cluster:github_release | https://github.com/oceanbase/seekdb/releases/tag/v1.3.0

7. Installation risk: Installation risk requires verification

  • Severity: medium
  • Finding: Project evidence flags a installation risk. Review the linked source before relying on this workflow.
  • User impact: May increase setup, validation, or first-run risk for the user.
  • Recommended check: Reproduce the official install and quickstart path in an isolated environment.
  • Evidence: community_evidence:github | https://github.com/oceanbase/seekdb/issues/922

8. Installation risk: Installation risk requires verification

  • Severity: medium
  • Finding: Project evidence flags a installation risk. Review the linked source before relying on this workflow.
  • User impact: May increase setup, validation, or first-run risk for the user.
  • Recommended check: Reproduce the official install and quickstart path in an isolated environment.
  • Evidence: community_evidence:github | https://github.com/oceanbase/seekdb/issues/902

9. Installation risk: Installation risk requires verification

  • Severity: medium
  • Finding: Project evidence flags a installation risk. Review the linked source before relying on this workflow.
  • User impact: May increase setup, validation, or first-run risk for the user.
  • Recommended check: Reproduce the official install and quickstart path in an isolated environment.
  • Evidence: community_evidence:github | https://github.com/oceanbase/seekdb/issues/906

10. Configuration risk: Configuration risk requires verification

  • Severity: medium
  • Finding: Developers should check this configuration risk before relying on the project: v1.1.0
  • User impact: Upgrade or migration may change expected behavior: v1.1.0
  • Recommended check: Before packaging this project, run the relevant install/config/quickstart check for: v1.1.0. Context: Observed when using python, docker, macos
  • Evidence: failure_mode_cluster:github_release | https://github.com/oceanbase/seekdb/releases/tag/v1.1.0

11. Configuration risk: Configuration risk requires verification

  • Severity: medium
  • Finding: Project evidence flags a configuration risk. Review the linked source before relying on this workflow.
  • User impact: May increase setup, validation, or first-run risk for the user.
  • Recommended check: Reproduce the official install and quickstart path in an isolated environment.
  • Evidence: community_evidence:github | https://github.com/oceanbase/seekdb/issues/815

12. Capability evidence risk: Capability evidence risk requires verification

  • Severity: medium
  • Finding: README/documentation is current enough for a first validation pass.
  • User impact: May increase setup, validation, or first-run risk for the user.
  • Recommended check: Reproduce the official install and quickstart path in an isolated environment.
  • Evidence: capability.assumptions | https://github.com/oceanbase/seekdb

Source: Doramagic discovery, validation, and Project Pack records

Community Discussion Evidence

These external discussion links are review inputs, not standalone proof that the project is production-ready.

Sources 12

Count of project-level external discussion links exposed on this manual page.

Use Review before install

Open the linked issues or discussions before treating the pack as ready for your environment.

Community Discussion Evidence

Doramagic exposes project-level community discussion separately from official documentation. Review these links before using seekdb with real data or production workflows.

Source: Project Pack community evidence and pitfall evidence