Doramagic Project Pack · Human Manual
seekdb
The AI-Native Search Database. Best for agent storage, it unifies vector, text, structured, and semi-structured data into a single engine. This all-in-one database makes agents smarter, easier to run, and more stable.
Overview, Architecture & Build System
Related topics: Core SQL Engine, Storage & Log Service, Vector, Full-Text, Hybrid Search & AI Service
Continue reading this section for the full explanation and source context.
Continue reading this section for the full explanation and source context.
Related Pages
Related topics: Core SQL Engine, Storage & Log Service, Vector, Full-Text, Hybrid Search & AI Service
Overview, Architecture & Build System
Project Purpose and Scope
seekdb is an open-source, AI-oriented database built by the OceanBase team. It is designed to serve as a unified state and memory layer for AI agents, supporting streaming writes, millisecond-latency retrieval, and branchable, mergeable, rollback-able work environments (via FORK DATABASE / FORK TABLE / MERGE / DROP primitives). The project exposes a MySQL-compatible protocol, native vector and full-text search, and a Python SDK (pyseekdb) for embedded use. Source: README.md.
The repository is distributed under the Apache License 2.0 and ships as RPM packages (e.g. seekdb-1.3.0.0-100000092026051510 for the v1.3.0 release). Active release branches include master, release/1.3.0, and the historical 4_4_x_release, reflecting the project's multi-platform and high-availability evolution tracked across v1.1.0 → v1.2.0 → v1.3.0.
High-Level Architecture
seekdb follows a layered C/C++ architecture that reuses the proven OceanBase engine and a curated set of third-party dependencies. The top-level entry point is the observer server binary, which bootstraps the SQL engine, storage layer, and network listener. The codebase is organized around a small number of well-defined subsystems:
deps/oblib/— the shared OceanBase common library providing memory allocation, address/network utilities, and storage primitives. It is explicitly documented as "a common library for OceanBase project" and exposes the building blocks reused across the engine. Source: deps/oblib/README.md.deps/easy/— a custom event-driven I/O framework that wrapslibev(ev.h) and provides connection pooling, asynchronous message handling, and structured I/O buffers. The libev integration uses compile-time feature flags (e.g.EV_FEATURE_CODE,EV_FEATURE_API,EV_FEATURE_WATCHERS) to conditionally enable backends, watchers, and priority levels. Source: deps/easy/src/io/ev.h.src/objit/— the in-process JIT compiler layer for PL/SQL, based on LLVM ORC and anObPLIRCompilerthat sits on top ofllvm::orc::IRCompileLayer::IRCompiler. A stub implementation (ob_llvm_helper_stub.cpp) returnsOB_NOT_SUPPORTEDfor platforms where JIT is disabled.src/share/andtools/ob_error/— shared error-code definitions generated fromsrc/oberror_errno.defvia thegen_errno.plscript; this is the canonical way new error causes and solutions are wired in. Source: tools/ob_error/README.md.src/observer/— the main server process (e.g.ob_server.cpp,main.cpp) that wires SQL, storage, replication, and the listener together.
flowchart TB
subgraph Client["Client Layer"]
SDK["pyseekdb / SQLAlchemy / mysql client"]
end
subgraph Server["Observer Process (src/observer)"]
SQL["SQL / PL Engine"]
JIT["objit JIT (LLVM ORC)"]
ERR["Error system (tools/ob_error)"]
end
subgraph Common["Common Library (deps/oblib)"]
MEM["Memory / Allocator"]
NET["Net / ObAddr"]
STORE["Storage helpers"]
end
subgraph IO["I/O Framework (deps/easy)"]
EV["libev event loop"]
EIO["easy_io connection pool"]
end
SDK -->|MySQL protocol| Server
SQL --> JIT
SQL --> ERR
Server --> Common
Server --> IO
IO --> EV
IO --> EIOThe architecture cleanly separates protocol/SQL concerns (top), shared utilities (middle), and platform eventing (bottom), which is what allows the same binary to be embedded in Python via pyseekdb or deployed as a standalone server.
Build System
The project uses CMake as its primary build configuration entry point (CMakeLists.txt) and provides two platform-specific driver scripts:
These scripts wrap the underlying CMake invocation, dependency bootstrapping, and obbuild packaging that the community references in bug reports (e.g. BUILD_INFO: obbuild-sanity-master-101170 and BUILD_FLAGS: RelWithDebInfo|Sanity). The supported build configurations observed in production issues include RelWithDebInfo|Sanity, indicating that sanitizers are first-class build configurations rather than optional add-ons.
The Windows build path is the source of the most common build-system friction — for example, issue #914 reports a compilation error in ob_req_packet_code.h that breaks ob_geo_func_covered_by2.cpp on Windows. The codebase explicitly accommodates this with _WIN32 guards in platform headers. Source: deps/easy/src/include/easy_define.h.
Key build-time configuration knobs
| Concern | Mechanism | Where to look |
|---|---|---|
| Event-loop features | EV_FEATURES mask + EV_MINPRI/EV_MAXPRI | deps/easy/src/io/ev.h |
| I/O buffer sizes | EASY_IO_BUFFER_SIZE, EASY_IOV_SIZE, EASY_FIRST_MSGLEN | deps/easy/src/io/easy_io_struct.h |
| JIT support | ObPLIRCompiler with optional object cache | src/objit/src/core/ob_pl_ir_compiler.h |
| Error code generation | gen_errno.pl regenerates ob_errno.h/.cpp | tools/ob_error/README.md |
| Platform shims | _WIN32 / __cplusplus branches | deps/easy/src/include/easy_define.h |
Deployment Modes and Platform Support
seekdb ships in three execution modes, all backed by the same build artifacts:
- Embedded mode — in-process via
pyseekdb, no separate server. This is the path most Python AI/agent applications use. - Server mode — the
observerbinary listens on a MySQL port and is normally managed bysystemd(DEB/RPM packages install a service unit). The README links todocs/deploy-by-systemd/for offline install and configuration details. - Cluster mode — OceanBase-style deployment for primary-standby replication and high availability, introduced in v1.2.0.
Platform coverage has expanded steadily: v1.1.0 added native macOS 15+ development, v1.2.0 added primary-standby replication and FORK DATABASE, and v1.3.0 (current) introduces async indexes backed by a new Change Stream incremental framework. The same release line also adds Windows portability work, which is still maturing (see issue #914 above).
See Also
Source: https://github.com/oceanbase/seekdb / Human Manual
Core SQL Engine, Storage & Log Service
Related topics: Overview, Architecture & Build System, Vector, Full-Text, Hybrid Search & AI Service, FORK/MERGE Sandboxes & Embedded SDKs
Continue reading this section for the full explanation and source context.
Related Pages
Related topics: Overview, Architecture & Build System, Vector, Full-Text, Hybrid Search & AI Service, FORK/MERGE Sandboxes & Embedded SDKs
Core SQL Engine, Storage & Log Service
seekdb is a hybrid AI database built on the OceanBase engine, exposing a MySQL-compatible SQL surface while internally combining vector, full-text, and relational storage. The "Core SQL Engine, Storage & Log Service" layer is the foundation that processes every query, persists every row, and records every event the system emits. This page describes how these subsystems fit together, drawing from the source files referenced above.
Purpose and Scope
The core layer is responsible for three concerns:
- SQL execution — accepting MySQL-protocol queries and turning them into physical operators over hybrid indexes.
- Storage — persisting rows, vectors, and index structures, including the async/Change-Stream-based index pipeline introduced in v1.3.0.
- Logging & diagnostics — providing the runtime event log, error-code utility, and supporting libraries (memory allocator, common storage helpers) used by every other subsystem.
These are the surfaces a developer will most often interact with when debugging crash reports (for example the SIGSEGV/SIGABRT issues seen recently), instrumenting a new feature, or extending the supported query types. Source: README.md
SQL Engine Layer
seekdb speaks MySQL wire protocol and therefore integrates directly with standard MySQL drivers, SQLAlchemy, and tooling such as LangChain, LlamaIndex, and Dify. A single SQL statement can combine scalar filters, full-text predicates, and vector similarity in one pass:
SELECT id, title,
l2_distance(embedding, '[0.12, 0.34, ...]') AS dist
FROM articles
WHERE MATCH(content) AGAINST('quarterly report')
ORDER BY dist APPROXIMATE
LIMIT 10;
Source: README.md
Key engine characteristics visible from the documentation:
- Hybrid plan operators —
VECTOR(N)columns,FULLTEXT INDEX ... WITH PARSER ik, andVECTOR INDEX ... WITH (DISTANCE=l2, TYPE=hnsw, LIB=vsag)can coexist on the same heap-organized table, and the optimizer produces plans that combine them. Source: README.md - Python wheel surface — the
seekdbpackage exposesseekdb.open()/seekdb.connect()and returns a DB-API 2.0 cursor, so the same SQL is reachable from embedded mode. Source: package/wheel/README.md - MySQL-only error model — Oracle compatibility mode has been removed; the error utility no longer prints Oracle error codes. Source: tools/ob_error/src/ob_error.cpp:print_help
The recent fixes around inner_table.show_table_status (issue #922) and the SIGSEGV in ObExecContext::get_my_session during PX dispatch_sqcs (issue #920) both live inside this layer — they are symptoms of edge cases in execution contexts that the core engine must guard.
Storage Engine and Index Framework
Storage is hybrid by design. Rows are stored on heap-organized tables; vector and full-text indexes are maintained alongside them. The v1.3.0 release (May 2026) introduced async indexes backed by a Change Stream incremental framework that decouples writes from index builds, delivering high ingest throughput without blocking on index maintenance. Source: README.md
Community evidence highlights the storage subsystems under active hardening:
- Memory fragmentation in iterator tags —
BtreeI/ iterator memory blocks (~3 MB) caused severe fragmentation and were optimized in issue #900. - SIGABRT (memory_sanity_abort) during vector index creation — tracked in issue #892, indicating the memory-allocator guards interact tightly with vector-index allocation paths.
FOR K DATABASE/FORK TABLE— Copy-on-Write sandboxes at database and table level (introduced in v1.1.0/v1.2.0) rely on the storage layer to snapshot pages cheaply.
Underneath, deps/oblib provides the common storage helpers exercised by tests such as test_common_storage.cpp, which validates URI construction, append-file writers, and abnormal writer behavior (writing a buffer shorter than the requested length). Source: deps/oblib/unittest/lib/restore/test_common_storage.cpp
The memory allocator is instrumented via hooks and is covered by test_malloc_hook.cpp, which exercises malloc, realloc, posix_memalign, aligned_alloc, pvalloc, and cross-page reallocation paths. The test asserts that the allocator's Header carries a valid magic code and the correct data_size_, which is the same sanity check that protects production workloads. Source: deps/oblib/unittest/lib/alloc/test_malloc_hook.cpp
flowchart LR A[Client SQL / MySQL Wire] --> B[SQL Engine<br/>parser + planner + executor] B --> C[Hybrid Storage<br/>heap rows + HNSW + FULLTEXT] B --> D[Change Stream<br/>async index pipeline] D --> C C --> E[(Persisted data<br/>+ index files)] B --> F[Log Service<br/>easy_log + ob_error] C --> F
Log Service and Supporting Libraries
The log service is implemented by the easy framework, which is also the asynchronous I/O backbone. The logging macros in easy_log.h provide a leveled surface (easy_fatal_log … easy_trace_log) plus a SYS_ERROR convenience macro, and they all funnel through easy_common_log with configurable formatters. Source: deps/easy/src/io/easy_log.h
Event-driven I/O — required for the high-throughput write path that the v1.3.0 async index pipeline relies on — is handled by an embedded libev whose feature flags (EV_FEATURE_CODE, EV_FEATURE_DATA, … EV_FEATURE_OS) are feature-tested at compile time, with sensible compile-time defaults for EV_MINPRI, EV_MAXPRI, EV_MULTIPLICITY, and per-feature toggles such as EV_PERIODIC_ENABLE. Source: deps/easy/src/io/ev.h
The connection / request model used by every SQL and RPC path is defined in easy_io_struct.h:
| Structure | Role |
|---|---|
easy_connection_t | Per-connection state: status flags, ssl_sm_, keepalive_failed, doing_request_count, summary node, TLS version option |
easy_request_t | Inbound/outbound packet carrier (ipacket/opacket) with magic-code debug guards |
easy_io_t | Event-loop wrapper holding handler/read/write callbacks, send queue, rx request queue, and user data |
Source: deps/easy/src/io/easy_io_struct.h
A summary/observability subsystem tracks per-connection counters (send_bytes, recv_bytes, keepalive counters, ratelimit_enabled) and exposes diff/html output through easy_summary_diff / easy_summary_html_output, which is what powers the per-request diagnostic dumps seen in SIGABRT/SIGSEGV bug reports. Source: deps/easy/src/io/easy_summary.h
The companion ob_error tool resolves numeric error codes to human-readable messages, causes, and solutions for the MySQL facility, and prints both the seekdb error name and the equivalent MySQL errno plus SQLSTATE. Its print_help documents the supported invocation modes (ob_error error_code, ob_error MY error_code). Source: tools/ob_error/src/ob_error.cpp
The shared utilities (oblib) — described simply as "a common library for OceanBase project" — bundle the allocator, restore/storage helpers, and other primitives consumed by both the SQL engine and the storage engine. Source: deps/oblib/README.md
Common Failure Modes
Based on recent issues, developers working in this layer should watch for:
- Crash in async RPC callbacks (
ObExecContext::get_my_session, issue #920) — the session pointer must outlive any pending callback queued througheasy_io. - Standby query timeout after switchover during dump (issue #921) — log service timestamps and replay markers must be consistent across primary/standby.
- DDL core dumps in
T1_DDLTaskExecu(issue #906) — long-running DDL tasks interact with the storage and log layers; tracing requires the per-task magic-code debug fields ineasy_request_t. - Memory-sanity aborts during vector index creation (issue #892) — vector-index allocation paths must respect the
Headermagic-code contract validated bytest_malloc_hook.cpp.
See Also
- README.md — Project overview, hybrid search examples, ecosystem integrations.
- package/wheel/README.md — Python wheel packaging and quickstart.
- tools/ob_error/src/ob_error.cpp — Error-code lookup CLI.
- pyseekdb User Guide — Python SDK walkthrough.
- Seekdb User Guide — Full integration reference.
Source: https://github.com/oceanbase/seekdb / Human Manual
Vector, Full-Text, Hybrid Search & AI Service
Related topics: Overview, Architecture & Build System, Core SQL Engine, Storage & Log Service, FORK/MERGE Sandboxes & Embedded SDKs
Continue reading this section for the full explanation and source context.
Continue reading this section for the full explanation and source context.
Continue reading this section for the full explanation and source context.
Continue reading this section for the full explanation and source context.
Related Pages
Related topics: Overview, Architecture & Build System, Core SQL Engine, Storage & Log Service, FORK/MERGE Sandboxes & Embedded SDKs
Vector, Full-Text, Hybrid Search & AI Service
Overview
seekdb is a MySQL-compatible database that extends a traditional relational engine with first-class vector search, full-text search, hybrid retrieval, and AI-oriented service primitives. The project's positioning — "the state store for the agent era" — is built around a single SQL surface that can express vector similarity, lexical matching, structured filters, and copy-on-write (COW) branching in one query.
The hybrid search and AI surface is documented in the top-level README.md, which presents a single end-to-end example in which a single articles table carries a VECTOR(384) embedding column, a FULLTEXT INDEX ... WITH PARSER ik, and a VECTOR INDEX ... WITH (DISTANCE=l2, TYPE=hnsw, LIB=vsag), and is then queried with a single SELECT that combines MATCH(content) AGAINST(...) and l2_distance(embedding, ...) with ORDER BY dist APPROXIMATE LIMIT 10. This single example defines the public contract of the search/AI surface and is the entry point for every downstream SDK and integration.
The Python wheel package documented in package/wheel/README.md describes the AI-oriented feature set at a higher level: vector storage and search, scalar and fuzzy search, unified hybrid (vector + scalar) ranking, and built-in "AI functions" that simplify AI application development. Both READMEs advertise MySQL protocol compatibility so that hybrid retrieval is reachable from LangChain, LlamaIndex, Dify, and any standard MySQL driver.
Core Capabilities
Vector Columns and ANN Indexes
A vector column is declared inline with the table DDL using the VECTOR(dim) type, and an approximate-nearest-neighbor (ANN) index is created with a VECTOR INDEX clause that takes a WITH option list for distance metric, index type, and underlying library. The README example wires all three:
CREATE TABLE articles (
id INT PRIMARY KEY,
title TEXT,
content TEXT,
embedding VECTOR(384),
FULLTEXT INDEX idx_fts (content) WITH PARSER ik,
VECTOR INDEX idx_vec (embedding) WITH (DISTANCE=l2, TYPE=hnsw, LIB=vsag)
) ORGANIZATION = HEAP;
Source: README.md
Three configuration knobs are surfaced through the index options:
| Option | Example value | Meaning |
|---|---|---|
| DISTANCE | l2 | Distance function (L2 / Euclidean) |
| TYPE | hnsw | ANN algorithm family |
| LIB | vsag | Underlying vector-search library implementation |
The ORGANIZATION = HEAP clause is the row-storage option exposed for tables that are dominated by vector and text workloads rather than transactional key lookups.
Full-Text Search
Full-text search is exposed through the standard MySQL FULLTEXT INDEX ... WITH PARSER syntax. The ik parser shown in the README example is the analyzer used for Chinese / mixed-language text. Queries use MATCH(col) AGAINST(expr) as in stock MySQL, and the result is composable with vector distance and relational predicates in a single statement.
Hybrid Search (Vector + Text + Relational)
Hybrid retrieval is expressed as a single SQL statement: the MATCH ... AGAINST predicate filters by lexical relevance, the l2_distance(embedding, ...) expression produces a numeric distance, and the ORDER BY dist APPROXIMATE LIMIT 10 clause delegates the top-k computation to the ANN index. The exact statement from the README is:
SELECT id, title,
l2_distance(embedding, '[0.12, 0.34, ...]') AS dist
FROM articles
WHERE MATCH(content) AGAINST('quarterly report')
ORDER BY dist APPROXIMATE
LIMIT 10;
Source: README.md
The APPROXIMATE keyword on ORDER BY is the planner's switch from an exact sort to the HNSW-backed top-k path. Because everything is expressed in SQL, the same query is reachable from any MySQL driver, SQLAlchemy, or the in-process pyseekdb SDK.
Python SDK and AI Service
The pyseekdb wheel (and the renamed seekdb wheel) is the canonical entry point for application developers. The wheel README documents a three-step lifecycle — open, connect, cursor — and exposes a MySQL-compatible connection that can read system catalogs such as oceanbase.DBA_OB_USERS:
import seekdb
seekdb.open()
conn = seekdb.connect()
cursor = conn.cursor()
cursor.execute("SELECT * FROM oceanbase.DBA_OB_USERS")
results = cursor.fetchall()
conn.close()
Source: package/wheel/README.md
The open() / connect() separation is what enables embedded mode (in-process, no server) and server / OceanBase mode with the same code path. Wheel-level requirements are CPython 3.8+ on Linux x86_64 and aarch64; macOS 15+ builds are documented as supported starting with v1.1.0.
The wheel README also enumerates the AI-service surface: vector storage and search, scalar and fuzzy search, hybrid (vector + scalar) ranking, and "AI functions" — built-in stored functions that simplify common AI retrieval patterns. These AI functions sit on top of the same SQL types and indexes described above; they are not a separate query language, they are SQL functions callable from any MySQL client.
Architecture and Code Paths
The high-level data flow for a hybrid query is the same regardless of whether the entry point is a MySQL client, SQLAlchemy, or the embedded pyseekdb SDK:
flowchart LR
A[SQL / Driver] --> B[MySQL Protocol / pyseekdb]
B --> C[SQL Parser & Planner]
C --> D{Hybrid Plan}
D --> E[Full-Text MATCH via FTS index]
D --> F[Vector distance via HNSW / vsag]
D --> G[Relational predicates on heap rows]
E --> H[Top-k via ORDER BY ... APPROXIMATE]
F --> H
G --> H
H --> I[Result set]The JIT layer that compiles PL/SQL is implemented in src/objit/src/core/ob_pl_ir_compiler.h (an IRCompileLayer::IRCompiler that drives the LLVM TargetMachine) and src/objit/src/ob_llvm_helper_stub.cpp provides the ObLLVMHelper facade. The src/objit/README.md confirms that the module "contains LLVM-based Code Generator logic." Collation-aware string comparison, which the planner uses during hybrid top-k fusion, is implemented in src/objit/src/expr/string_cmp.cpp (functions such as ob_strnncollsp_utf8mb4_help). The general-purpose utility surface that the search and AI layers depend on lives in the common library documented at deps/oblib/README.md.
Release Highlights and Known Issues
v1.3.0 (May 25, 2026) introduces async indexes backed by a new Change Stream incremental framework that decouples writes from index builds, which directly affects vector index construction throughput and stability — the community has tracked a memory_sanity_abort core during vector index creation in issue #892, and the async index path is the project's response to that class of bug. v1.2.0 added primary-standby replication and Fork Database; v1.1.0 introduced experimental FORK TABLE and macOS builds, both of which are prerequisites for the "agent sandbox" use case in which a hybrid-search workload is branched for safe experimentation. Long-standing community requests to expose this surface to other language ecosystems — TypeScript / JavaScript (#40), Rust (#76), and a stable C ABI for embedded use (#104) — all require a foreign-function entry point on top of the same hybrid SQL surface documented above.
See Also
- README.md — project overview, hybrid SQL example, use cases
- package/wheel/README.md — Python SDK (
seekdb/pyseekdb) reference - deps/oblib/README.md — common library used by the search and AI layers
- src/objit/README.md — LLVM-based code generator that backs PL/SQL execution
Source: https://github.com/oceanbase/seekdb / Human Manual
FORK/MERGE Sandboxes & Embedded SDKs
Related topics: Overview, Architecture & Build System, Core SQL Engine, Storage & Log Service, Vector, Full-Text, Hybrid Search & AI Service
Continue reading this section for the full explanation and source context.
Continue reading this section for the full explanation and source context.
Continue reading this section for the full explanation and source context.
Continue reading this section for the full explanation and source context.
Related Pages
Related topics: Overview, Architecture & Build System, Core SQL Engine, Storage & Log Service, Vector, Full-Text, Hybrid Search & AI Service
FORK/MERGE Sandboxes & Embedded SDKs
Overview and Purpose
seekdb ships two complementary capabilities designed for AI-agent workflows: a database-level copy-on-write (COW) sandbox mechanism (FORK / MERGE / DROP) and an embedded in-process SDK (pyseekdb). Together they let agents branch state for experimentation, run a single binary without external services, and stream continuous memory writes with millisecond-later retrieval.
The COW sandbox is built around the idea that an agent often needs to *try things* — re-rank results, run speculative writes, re-embed a corpus — and only later commit or roll back. seekdb exposes this as a first-class SQL operation rather than a sidecar feature. Source: README.md describes the feature as: *"FORK DATABASE for safe experimentation, MERGE to accept, DROP to roll back"*.
FORK TABLE was introduced as an experimental feature in v1.1.0 and FORK DATABASE graduated in v1.2.0 alongside primary–standby replication. Source: v1.1.0 release notes, v1.2.0 release notes.
FORK / MERGE / DROP Sandbox Architecture
The sandbox mechanism is implemented in the rootserver layer as a coordinated set of services, tasks, and helper utilities. The high-level flow is shown below.
flowchart LR
A[SQL: FORK DATABASE / FORK TABLE] --> B[ob_fork_database_service.cpp / ob_fork_table_service.cpp]
B --> C[ob_fork_table_info_builder.cpp]
C --> D[ob_fork_table_task.cpp]
D --> E[ob_fork_table_helper.cpp]
E --> F[(Storage Layer - COW snapshots)]
F -->|MERGE| G[Apply branch into source]
F -->|DROP| H[Discard branch]
G --> I[Source: src/rootserver/fork_table/ob_fork_database_service.cpp]
H --> IComponents
| Component | File | Role |
|---|---|---|
| Fork database service | src/rootserver/fork_table/ob_fork_database_service.cpp | Coordinates FORK DATABASE lifecycle; gate for branch creation at database granularity. Source: src/rootserver/fork_table/ob_fork_database_service.cpp |
| Fork table service | src/rootserver/fork_table/ob_fork_table_service.cpp | Handles FORK TABLE requests; the experimental entry point introduced in v1.1.0. Source: src/rootserver/fork_table/ob_fork_table_service.cpp |
| Fork table task | src/rootserver/fork_table/ob_fork_table_task.cpp | Async task that performs the actual COW copy of metadata/data references. Source: src/rootserver/fork_table/ob_fork_table_task.cpp |
| Fork table helper | src/rootserver/fork_table/ob_fork_table_helper.cpp | Shared utility routines for validation, locking, and resolution. Source: src/rootserver/fork_table/ob_fork_table_helper.cpp |
| Info builder | src/rootserver/fork_table/ob_fork_table_info_builder.cpp | Constructs the snapshot description (schemas, partitions, indexes) handed to the task. Source: src/rootserver/fork_table/ob_fork_table_info_builder.cpp |
| Shared util | src/share/ob_fork_table_util.cpp | Cross-component helpers used by both services and the rootserver. Source: src/share/ob_fork_table_util.cpp |
Lifecycle
- FORK — A user/agent issues
FORK DATABASE <src> TO <branch>(orFORK TABLEin v1.1.0+). The request enters the appropriate service (ob_fork_database_service.cpp / ob_fork_table_service.cpp), which validates state and dispatches a task. - Snapshot — ob_fork_table_info_builder.cpp enumerates the source objects, and ob_fork_table_task.cpp installs COW pointers rather than copying physical blocks.
- Branch — Writes in the branch diverge from the source; reads remain shared until COW is triggered.
- MERGE — A merge operation promotes the branch back into the source. After merge, the branch is no longer independent.
- DROP — Discarding the branch frees the divergent blocks. This is the rollback path. Source: described in README.md: *"
DROPto roll back"*.
Community-Driven Refinements
Issue #50 (*"[Feature]: Lightweight Table-Level Forking for Multi-Version Data in AI Workflows"*, 4 comments) argues for finer-grained branching. Today, the implementation supports both database- and table-level forking. Source: issue #50, v1.1.0 release notes.
Embedded SDKs (`pyseekdb`)
The Python SDK pyseekdb is the primary embedded entry point. It can be installed in one line and runs the engine *in-process* — no separate observer process required. Source: README.md: *"pip install -U pyseekdb ... No servers, no schemas, no embedding setup. Embedded mode runs in-process; switch to server / OceanBase mode with one line."*
Connection Modes
| Mode | Trigger | Process model |
|---|---|---|
| Embedded | pyseekdb.Client(path="./agent_state.db") | In-process; engine library linked into the Python interpreter. Source: README.md agent memory example |
| Server | Connect to a remote observer | Standard MySQL protocol client |
| OceanBase | Connect to a full OceanBase cluster | Distributed deployment |
Agent Memory Pattern
The canonical use of the embedded SDK is the "write-then-retrieve" loop, where an agent persists an observation and queries relevant context milliseconds later. The README's example demonstrates:
import pyseekdb
client = pyseekdb.Client(path="./agent_state.db")
memory = client.get_or_create_collection(name="episodic")
for step in agent.run():
memory.upsert(ids=[step.id], documents=[step.observation])
relevant = memory.query(query_texts=step.next_query, n_results=5)
agent.act(relevant)
Source: README.md. The pattern works because seekdb's async index pipeline (introduced in v1.3.0 via the Change Stream incremental framework) keeps writes and incremental HNSW updates decoupled. Source: v1.3.0 release notes.
Multi-Language SDK Requests
Community requests for non-Python SDKs — TypeScript/JavaScript (issue #40), Rust (issue #76), and a stable C ABI for embedding from Rust/Go/Java (issue #104) — all reference the embedded mode as the desired integration point. Source: issue #40, issue #76, issue #104. Today, pyseekdb is the only officially shipped embedded client.
Common Failure Modes and Caveats
- FORK TABLE experimental —
FORK TABLEshipped as *experimental* in v1.1.0; database-levelFORK DATABASEis the stable form as of v1.2.0. Source: v1.1.0 release notes, v1.2.0 release notes. - Merge semantics — Once merged, the branch cannot be re-opened; a misbehaving merge is reverted via
DROP*before* merge, not after. - Index consistency under fork — Vector and full-text indexes are COW-pointer snapshotted; if your workflow mutates embeddings on the branch, expect incremental rebuild work on the branch's own index.
- Windows builds — A compilation error in
ob_req_packet_code.haffectedob_geo_func_covered_by2.cppon Windows. Source: issue #914. Embedded SDK users on Windows should pin to a build that includes the fix. - Memory pressure — Background index work and sandbox branches can co-exist with live traffic; check release-specific tuning notes when running both at scale. Community issues #901 and #900 track ongoing memory cleanups. Source: issue #901, issue #900.
See Also
- pyseekdb User Guide — full Python SDK walkthrough
- seekdb launch blog — performance and design narrative
- v1.3.0 release notes — Change Stream async index framework
- v1.2.0 release notes — primary–standby and
FORK DATABASEGA - v1.1.0 release notes — macOS support and experimental
FORK TABLE - Developer Guide — build and contribution instructions
- Community issue #50 — table-level forking motivation
- Community issue #104 — stable C ABI for embedded multi-language SDKs
Source: https://github.com/oceanbase/seekdb / Human Manual
Doramagic Pitfall Log
Source-linked risks stay visible on the manual page so the preview does not read like a recommendation.
May increase setup, validation, or first-run risk for the user.
May increase setup, validation, or first-run risk for the user.
May increase setup, validation, or first-run risk for the user.
May increase setup, validation, or first-run risk for the user.
Doramagic Pitfall Log
Found 30 structured pitfall item(s), including 4 high/blocking item(s). Top priority: Installation risk - Installation risk requires verification.
1. Installation risk: Installation risk requires verification
- Severity: high
- Finding: Project evidence flags a installation risk. Review the linked source before relying on this workflow.
- User impact: May increase setup, validation, or first-run risk for the user.
- Recommended check: Reproduce the official install and quickstart path in an isolated environment.
- Evidence: community_evidence:github | https://github.com/oceanbase/seekdb/issues/914
2. Installation risk: Installation risk requires verification
- Severity: high
- Finding: Project evidence flags a installation risk. Review the linked source before relying on this workflow.
- User impact: May increase setup, validation, or first-run risk for the user.
- Recommended check: Reproduce the official install and quickstart path in an isolated environment.
- Evidence: community_evidence:github | https://github.com/oceanbase/seekdb/issues/892
3. Installation risk: Installation risk requires verification
- Severity: high
- Finding: Project evidence flags a installation risk. Review the linked source before relying on this workflow.
- User impact: May increase setup, validation, or first-run risk for the user.
- Recommended check: Reproduce the official install and quickstart path in an isolated environment.
- Evidence: community_evidence:github | https://github.com/oceanbase/seekdb/issues/921
4. Installation risk: Installation risk requires verification
- Severity: high
- Finding: Project evidence flags a installation risk. Review the linked source before relying on this workflow.
- User impact: May increase setup, validation, or first-run risk for the user.
- Recommended check: Reproduce the official install and quickstart path in an isolated environment.
- Evidence: community_evidence:github | https://github.com/oceanbase/seekdb/issues/920
5. Installation risk: Installation risk requires verification
- Severity: medium
- Finding: Developers should check this installation risk before relying on the project: Query times out on new standby after switchover during dump on primary
- User impact: Developers may fail before the first successful local run: Query times out on new standby after switchover during dump on primary
- Recommended check: Before packaging this project, run the relevant install/config/quickstart check for: Query times out on new standby after switchover during dump on primary. Context: Observed during installation or first-run setup.
- Evidence: failure_mode_cluster:github_issue | https://github.com/oceanbase/seekdb/issues/921
6. Installation risk: Installation risk requires verification
- Severity: medium
- Finding: Developers should check this installation risk before relying on the project: v1.3.0
- User impact: Upgrade or migration may change expected behavior: v1.3.0
- Recommended check: Before packaging this project, run the relevant install/config/quickstart check for: v1.3.0. Context: Observed when using node, python, windows
- Evidence: failure_mode_cluster:github_release | https://github.com/oceanbase/seekdb/releases/tag/v1.3.0
7. Installation risk: Installation risk requires verification
- Severity: medium
- Finding: Project evidence flags a installation risk. Review the linked source before relying on this workflow.
- User impact: May increase setup, validation, or first-run risk for the user.
- Recommended check: Reproduce the official install and quickstart path in an isolated environment.
- Evidence: community_evidence:github | https://github.com/oceanbase/seekdb/issues/922
8. Installation risk: Installation risk requires verification
- Severity: medium
- Finding: Project evidence flags a installation risk. Review the linked source before relying on this workflow.
- User impact: May increase setup, validation, or first-run risk for the user.
- Recommended check: Reproduce the official install and quickstart path in an isolated environment.
- Evidence: community_evidence:github | https://github.com/oceanbase/seekdb/issues/902
9. Installation risk: Installation risk requires verification
- Severity: medium
- Finding: Project evidence flags a installation risk. Review the linked source before relying on this workflow.
- User impact: May increase setup, validation, or first-run risk for the user.
- Recommended check: Reproduce the official install and quickstart path in an isolated environment.
- Evidence: community_evidence:github | https://github.com/oceanbase/seekdb/issues/906
10. Configuration risk: Configuration risk requires verification
- Severity: medium
- Finding: Developers should check this configuration risk before relying on the project: v1.1.0
- User impact: Upgrade or migration may change expected behavior: v1.1.0
- Recommended check: Before packaging this project, run the relevant install/config/quickstart check for: v1.1.0. Context: Observed when using python, docker, macos
- Evidence: failure_mode_cluster:github_release | https://github.com/oceanbase/seekdb/releases/tag/v1.1.0
11. Configuration risk: Configuration risk requires verification
- Severity: medium
- Finding: Project evidence flags a configuration risk. Review the linked source before relying on this workflow.
- User impact: May increase setup, validation, or first-run risk for the user.
- Recommended check: Reproduce the official install and quickstart path in an isolated environment.
- Evidence: community_evidence:github | https://github.com/oceanbase/seekdb/issues/815
12. Capability evidence risk: Capability evidence risk requires verification
- Severity: medium
- Finding: README/documentation is current enough for a first validation pass.
- User impact: May increase setup, validation, or first-run risk for the user.
- Recommended check: Reproduce the official install and quickstart path in an isolated environment.
- Evidence: capability.assumptions | https://github.com/oceanbase/seekdb
Source: Doramagic discovery, validation, and Project Pack records
Community Discussion Evidence
These external discussion links are review inputs, not standalone proof that the project is production-ready.
Count of project-level external discussion links exposed on this manual page.
Open the linked issues or discussions before treating the pack as ready for your environment.
Community Discussion Evidence
Doramagic exposes project-level community discussion separately from official documentation. Review these links before using seekdb with real data or production workflows.
- Optimize binary file size - github / github_issue
- Clean up memory usage by removing SQL Audit and 3A code - github / github_issue
- Fix inner_table.show_table_status bug (bug49621617) for master < 4_4_x_r - github / github_issue
- SIGSEGV crash in ObExecContext::get_my_session during PX dispatch_sqcs a - github / github_issue
- Query times out on new standby after switchover during dump on primary - github / github_issue
- Investigate core dump potentially related to T1_DDLTaskExecu thread - github / github_issue
- Optimize memory usage of Iterator/BtreeI tags - github / github_issue
- Fix more asynchronous RPC semantic implementation errors - github / github_issue
- Fix core SIGABRT (memory_sanity_abort) during vector index creation - github / github_issue
- Fix Windows compilation error in ob_req_packet_code.h - github / github_issue
- [[Enhancement]: patch 820 to release/1.3.0](https://github.com/oceanbase/seekdb/issues/909) - github / github_issue
- v1.3.0 - github / github_release
Source: Project Pack community evidence and pitfall evidence