# https://github.com/Vektor-Memory/Vex Project Manual

Generated at: 2026-06-20 05:06:47 UTC

## Table of Contents

- [Project Overview and CLI Reference](#page-1)
- [Vector Store Connectors and Migration Engine](#page-2)
- [Extraction Pipeline, LLM Integration, and Conversation Imports](#page-3)
- [Data Formats, Sovereign Backup, Cryptographic Operations, and Converters](#page-4)

<a id='page-1'></a>

## Project Overview and CLI Reference

### Related Pages

Related topics: [Vector Store Connectors and Migration Engine](#page-2), [Extraction Pipeline, LLM Integration, and Conversation Imports](#page-3), [Data Formats, Sovereign Backup, Cryptographic Operations, and Converters](#page-4)

<details>
<summary>Related Source Files</summary>

The following source files were used to generate this page:

- [README.md](https://github.com/Vektor-Memory/Vex/blob/main/README.md)
- [package.json](https://github.com/Vektor-Memory/Vex/blob/main/package.json)
- [CHANGELOG.md](https://github.com/Vektor-Memory/Vex/blob/main/CHANGELOG.md)
- [pipeline/index.js](https://github.com/Vektor-Memory/Vex/blob/main/pipeline/index.js)
- [pipeline/02-extract.js](https://github.com/Vektor-Memory/Vex/blob/main/pipeline/02-extract.js)
- [formats/vmig.js](https://github.com/Vektor-Memory/Vex/blob/main/formats/vmig.js)
- [SPEC.md](https://github.com/Vektor-Memory/Vex/blob/main/SPEC.md)
- [adapters/convert/index.js](https://github.com/Vektor-Memory/Vex/blob/main/adapters/convert/index.js)
- [adapters/langchain.js](https://github.com/Vektor-Memory/Vex/blob/main/adapters/langchain.js)
- [connectors/pinecone.js](https://github.com/Vektor-Memory/Vex/blob/main/connectors/pinecone.js)
- [connectors/chroma.js](https://github.com/Vektor-Memory/Vex/blob/main/connectors/chroma.js)
- [SECURITY.md](https://github.com/Vektor-Memory/Vex/blob/main/SECURITY.md)
</details>

# Project Overview and CLI Reference

## 1. Purpose and Scope

Vex is a vector-memory portability CLI published by VEKTOR Memory under the Apache-2.0 license. It targets the long-standing friction of moving conversational and embedding data between heterogeneous vector stores, agents, and LLM training pipelines. The project positions itself around a portable JSONL container (`.vmig.jsonl`) and an LLM-driven extraction pipeline that turns raw conversation logs into atomic, scored, deduplicated facts. Source: [package.json:1-15]().

The repository's primary deliverable is the executable entry point `vex.mjs`, accompanied by modular directories for connectors (`connectors/`), formats (`formats/`), pipeline stages (`pipeline/`), adapters (`adapters/`), and utilities (`utils/`). Source: [package.json:30-45](). Vex runs on Node.js >= 18 and intentionally treats most vector-store SDKs as optional peer dependencies, allowing users to install only the connectors they need.

Architecturally, Vex separates four concerns:

1. **Storage abstraction** — a uniform record shape (`id`, `text`, `vector`, `dims`, `metadata`, `source_store`, `vex_version`) flowing across every connector.
2. **Pipeline orchestration** — a 7-stage import pipeline configurable in `raw`, `extract`, or `smart` mode.
3. **Format portability** — the `vmig` JSONL format with BLAKE3 + Ed25519 signing for tamper-evident archives.
4. **Ecosystem integration** — adapters for LangChain, OpenAI/Anthropic fine-tuning, and conversation exports from Claude and ChatGPT.

## 2. CLI Command Surface

The `vex` command exposes a small, composable verb set. Running `vex` with no arguments in a real TTY launches the interactive Ink-powered TUI introduced in v0.8.6; invoking `vex --help` retains the plain-text reference. Source: [README.md:10-30](), [CHANGELOG.md:25-50]().

| Command | Purpose |
|---|---|
| `vex export` | Export memory from a store to a `.vmig.jsonl` file |
| `vex import` | Import a `.vmig.jsonl` file into any supported store |
| `vex migrate` | Stream directly between two vector stores |
| `vex convert` | Convert a `.vmig.jsonl` into training or chat formats |
| `vex sign` | Sign an export with BLAKE3 + Ed25519 |
| `vex verify` | Verify an export's signature |
| `vex inspect` | Show stats, namespaces, and dimensions for a file |
| `vex validate` | Lint all records in a `.vmig.jsonl` file |
| `vex adapters` | List available convert adapters |

The interactive TUI provides a guided wizard per command — for example, `export` walks through store selection, DB path, and output path before emitting the equivalent CLI invocation. Source: [CHANGELOG.md:30-55]().

## 3. Pipeline Architecture

The extraction pipeline defined in `pipeline/index.js` executes seven steps. Source: [pipeline/index.js:1-30]().

```mermaid
flowchart LR
    A[1. CHUNK] --> B[2. EXTRACT]
    B --> C[3. SCORE]
    C --> D[4. DEDUP]
    D --> E[5. EMBED]
    E --> F[6. GRAPH]
    F --> G[7. STORE]
```

- **CHUNK** splits conversation exports into processable units.
- **EXTRACT** calls an LLM to produce atomic facts with `tags`, `entities`, and three pre-generated `potential` queries for downstream BM25 recall. Source: [pipeline/02-extract.js:1-25]().
- **SCORE** assigns importance and recency values.
- **DEDUP** removes near-duplicate facts (default cosine threshold 0.72).
- **EMBED** generates vectors via OpenAI, Ollama, or custom endpoints.
- **GRAPH** extracts entities and writes temporal, tag-similarity, and causal edges.
- **STORE** writes to the target connector.

Three modes gate which steps run: `raw` (1, 5, 6, 7 — fast blob storage), `extract` (all 7 — requires an LLM key), and `smart` (all 7, but exchanges chunking strategy for short conversations). Source: [pipeline/index.js:18-30]().

The extraction stage implements a provider cascade with automatic fallback: `groq → ollama → openai → anthropic`, supports round-robin key rotation (`--groq-key k1,k2,k3`), and reads exact `retry-after` windows from rate-limit responses so that exhausted chunks are skipped rather than aborting the job. Source: [pipeline/02-extract.js:10-30]().

## 4. Connectors, Adapters, and Format

Vex ships connectors for VEKTOR Slipstream (local SQLite), Pinecone, Qdrant, ChromaDB, Weaviate, pgvector, Redis (with RediSearch VSS), Milvus/Zilliz, Neo4j, plus read-only importers for Claude and ChatGPT conversation exports. Source: [README.md:60-90](). Connector code follows a uniform streaming pattern with progress reporting; for example, the Pinecone connector iterates `listPaginated` and yields records normalised through a shared `toRecord` helper. Source: [connectors/pinecone.js:1-15](). Chroma follows the same shape with paginated `get` requests offset by `pageSize`. Source: [connectors/chroma.js:1-20]().

Convert adapters — selected via `vex convert` — produce training-ready formats for OpenAI fine-tuning, Anthropic Messages, generic chat (also used as an alias for Mistral, Groq, and Together), and plain text. Source: [adapters/convert/index.js:1-20]().

The LangChain integration exposes a `VektorMemory` class that subclasses `BaseMemory` when `@langchain/core` is installed and falls back to a duck-typed shim otherwise. Conversations saved via `saveContext()` are tagged `episodic` by default, with an `opts.memory_type` override for `semantic` storage. Source: [adapters/langchain.js:1-30]().

The `.vmig.jsonl` container is a strict spec: each line is one record with `id`, optional `text` and `vector` (at least one required), `dims` equal to `vector.length`, scalar-only metadata, a unique `id` per file, and a `vex_version` Semver string. A sidecar `meta` file holds checksums; `vex sign` and `vex verify` produce or check BLAKE3 + Ed25519 signatures. Source: [SPEC.md:1-25](), [formats/vmig.js:1-20]().

## 5. Security and Operational Notes

Vex is local-only by design. The security policy is explicit: VEKTOR Memory does not store user data, never asks for API keys, vault files, or credentials, and supports only the latest release with security fixes. Source: [SECURITY.md:1-30](). Optional Ed25519 signing depends on `@noble/ed25519` and `@noble/hashes`, declared as optional dependencies so that the core CLI remains usable without crypto support. Source: [package.json:15-25]().

---

## See Also

- Pipeline Modes and Provider Cascade
- VMIG Format Specification
- Connector Matrix and Optional Dependencies
- LangChain Integration Guide

---

<a id='page-2'></a>

## Vector Store Connectors and Migration Engine

### Related Pages

Related topics: [Project Overview and CLI Reference](#page-1), [Extraction Pipeline, LLM Integration, and Conversation Imports](#page-3), [Data Formats, Sovereign Backup, Cryptographic Operations, and Converters](#page-4)

<details>
<summary>Related Source Files</summary>

The following source files were used to generate this page:

- [connectors/index.js](https://github.com/Vektor-Memory/Vex/blob/main/connectors/index.js)
- [connectors/claude-export.js](https://github.com/Vektor-Memory/Vex/blob/main/connectors/claude-export.js)
- [connectors/pinecone.js](https://github.com/Vektor-Memory/Vex/blob/main/connectors/pinecone.js)
- [formats/vmig.js](https://github.com/Vektor-Memory/Vex/blob/main/formats/vmig.js)
- [core/migrate.js](https://github.com/Vektor-Memory/Vex/blob/main/core/migrate.js)
- [pipeline/index.js](https://github.com/Vektor-Memory/Vex/blob/main/pipeline/index.js)
- [README.md](https://github.com/Vektor-Memory/Vex/blob/main/README.md)
- [package.json](https://github.com/Vektor-Memory/Vex/blob/main/package.json)
- [SPEC.md](https://github.com/Vektor-Memory/Vex/blob/main/SPEC.md)
</details>

# Vector Store Connectors and Migration Engine

## Overview

Vex is a cross-standard vector database migration tool that treats every supported store as a pluggable adapter behind a common `.vmig.jsonl` interchange format. The "Vector Store Connectors and Migration Engine" subsystem is the layer that makes that promise real: a centralized registry maps connector name strings to source and target modules, and a reusable migration engine streams records between them with dimension checking, optional re-embedding, and vec2vec projection fallback (Source: [package.json:1-120](), [README.md:1-80]()).

The system is composed of three collaborating parts:

- A **connector registry** in `connectors/index.js` that names every adapter Vex understands.
- A set of **per-store connectors** that implement a uniform read/write surface.
- A **migration engine** in `core/migrate.js` that orchestrates streaming import/export and embedding reconciliation between any source and any target.

Source connectors additionally support importing LLM conversation exports (Claude, ChatGPT) so memory can be ingested from chat platforms rather than existing vector stores (Source: [connectors/index.js:1-30](), [connectors/claude-export.js:1-20]()).

## Connector Registry and Architecture

All connectors are registered in a single `connectors` object exported from `connectors/index.js`. Each value is a module exposing at minimum `import`, `export`, and (for stores that need it) `inspect` methods. Aliases are first-class: `postgres` maps to the pgvector adapter, and `zilliz` maps to the Milvus adapter, so users can use the most familiar name for each cloud (Source: [connectors/index.js:1-30]()).

```mermaid
flowchart LR
    A[vex CLI / pipeline/index.js] --> B[connectors/index.js registry]
    B --> C1[vektor / jsonl]
    B --> C2[pinecone / qdrant / chroma / weaviate]
    B --> C3[pgvector / redis / milvus / neo4j]
    B --> C4[claude-export / chatgpt-export]
    C1 & C2 & C3 & C4 <--> D[.vmig.jsonl interchange]
    D <--> E[core/migrate.js engine]
    E --> F[utils/embed.js re-embed]
    E --> G[utils/adapt.js vec2vec]
```

Every record that crosses this layer is normalized into the VMIG record shape — `id`, `text`, `vector`, `dims`, `namespace`, `metadata`, `created_at`, `source_store`, `modality`, `score`, and `vex_version` (Source: [formats/vmig.js:1-40](), [SPEC.md:1-30]()). This is the contract that lets the same pipeline work for every target store.

## Migration Engine

The reusable engine lives in `core/migrate.js` and is invoked by `vex migrate`, `vex export`, and `vex import`. It is streaming-aware: once the record count exceeds `STREAM_THRESHOLD = 100_000`, it switches to line-by-line reading via `readline` to keep memory bounded (Source: [core/migrate.js:1-25]()).

The engine performs three reconciliation passes before writing to the target:

1. **Dimension check** — every record whose `vector.length` does not equal the target `dims` is flagged (Source: [core/migrate.js:10-30]()).
2. **Adapter projection** — when `--adapter` is supplied together with `--adapter-model`, mismatched vectors are projected via `utils/adapt.js` into the target embedding space without re-running the source model (Source: [core/migrate.js:12-35]()).
3. **Re-embedding** — when an adapter is not available, `utils/embed.js` re-embeds the record's `text` against the configured provider (OpenAI, Ollama, custom OpenAI-compatible endpoint) (Source: [core/migrate.js:1-15]()).

Records that remain dimension-mismatched after both passes are warned about and skipped rather than aborting the run, so a partially-failed migration still produces a usable output file (Source: [core/migrate.js:25-40]()).

## Pipeline Orchestration and Source Connectors

The full ingestion path is driven by `pipeline/index.js`, which composes seven steps: chunk, extract, score, dedup, embed, graph, and store. The connector layer plugs into the last step; the first six steps produce VMIG records regardless of where they originated (Source: [pipeline/index.js:1-25]()).

Source connectors carry their own behavior for raw mode. The Claude connector, for instance, accepts two input shapes (the `claude.ai` `conversations.json` export and the API `messages` array) and supports three chunking modes — `turn`, `conversation`, and `exchange` — controlled by `--chunk-mode` and `--sender` (Source: [connectors/claude-export.js:1-20]()). If no embedding options are provided, records are still valid VMIG and remain BM25/FTS5-searchable on VEKTOR, but will not be insertable into pure ANN stores such as Pinecone or Qdrant (Source: [connectors/claude-export.js:15-25]()).

Per-store streaming is illustrated by the Pinecone connector, which paginates server-side vector fetches, maps each Pinecone record into the VMIG shape (extracting `id`, `values`, `metadata` into the standard fields), and reports a progress line per page (Source: [connectors/pinecone.js:1-30]()).

## Usage Patterns and Failure Modes

The typical direct migration invocation is:

```bash
vex migrate --from claude-export --to vektor \
  --file conversations.json --db memory.db
```

For a full LLM-extracted import, add a mode flag and an API key:

```bash
vex migrate --from claude-export --to vektor \
  --file conversations.json --db memory.db \
  --mode extract --groq-key $KEY
```

Common failure modes to watch for:

- **Dimension mismatch with no adapter** — engine warns and skips; supply `--adapter-model` or `--embed-model` to recover (Source: [core/migrate.js:10-30]()).
- **Missing optional peer dependency** — connectors such as Redis, Milvus, Neo4j, and pgvector require the corresponding SDK; without it the connector will not load (Source: [package.json:60-100]()).
- **No text, no vector** — VMIG validation requires at least one of the two to be non-null, and `dims` must equal `vector.length` when both are present (Source: [SPEC.md:1-20]()).
- **Rate limits during extraction** — the extraction provider applies a cascade with key rotation and reads exact `retry-after` values; exhausted chunks are skipped and logged so the job always finishes (Source: [pipeline/02-extract.js:1-25]()).

## See Also

- VMIG Interchange Format Specification — see [SPEC.md]()
- Extraction Pipeline and Provider Cascade — see `pipeline/02-extract.js`
- LangChain and Convert Adapters — see `adapters/langchain.js` and `adapters/convert/index.js`
- Sovereign Backup (`vex sync`) — encryption push to GitHub, Codeberg, Gitea

---

<a id='page-3'></a>

## Extraction Pipeline, LLM Integration, and Conversation Imports

### Related Pages

Related topics: [Project Overview and CLI Reference](#page-1), [Vector Store Connectors and Migration Engine](#page-2), [Data Formats, Sovereign Backup, Cryptographic Operations, and Converters](#page-4)

<details>
<summary>Related Source Files</summary>

The following source files were used to generate this page:

- [pipeline/index.js](https://github.com/Vektor-Memory/Vex/blob/main/pipeline/index.js)
- [pipeline/01-chunk.js](https://github.com/Vektor-Memory/Vex/blob/main/pipeline/01-chunk.js)
- [pipeline/02-extract.js](https://github.com/Vektor-Memory/Vex/blob/main/pipeline/02-extract.js)
- [pipeline/03-score.js](https://github.com/Vektor-Memory/Vex/blob/main/pipeline/03-score.js)
- [pipeline/04-dedup.js](https://github.com/Vektor-Memory/Vex/blob/main/pipeline/04-dedup.js)
- [pipeline/05-embed.js](https://github.com/Vektor-Memory/Vex/blob/main/pipeline/05-embed.js)
- [pipeline/06-graph.js](https://github.com/Vektor-Memory/Vex/blob/main/pipeline/06-graph.js)
- [pipeline/07-store.js](https://github.com/Vektor-Memory/Vex/blob/main/pipeline/07-store.js)
- [formats/vmig.js](https://github.com/Vektor-Memory/Vex/blob/main/formats/vmig.js)
- [connectors/claude-export.js](https://github.com/Vektor-Memory/Vex/blob/main/connectors/claude-export.js)
- [connectors/pinecone.js](https://github.com/Vektor-Memory/Vex/blob/main/connectors/pinecone.js)
- [adapters/langchain.js](https://github.com/Vektor-Memory/Vex/blob/main/adapters/langchain.js)
- [README.md](https://github.com/Vektor-Memory/Vex/blob/main/README.md)
- [CHANGELOG.md](https://github.com/Vektor-Memory/Vex/blob/main/CHANGELOG.md)
- [SPEC.md](https://github.com/Vektor-Memory/Vex/blob/main/SPEC.md)
- [package.json](https://github.com/Vektor-Memory/Vex/blob/main/package.json)
</details>

# Extraction Pipeline, LLM Integration, and Conversation Imports

The **Extraction Pipeline** is the core ingestion engine of Vex. It transforms raw conversation exports (Claude, ChatGPT, or generic JSONL) into a structured, deduplicated, embedded, and graph-linked memory store, then writes the result to any of nine supported vector backends. The pipeline is invoked by `vex migrate` and `vex import`, and it is the recommended path for producing high-quality, semantically searchable memory from chat history. Source: [pipeline/index.js:1-31]()

## Pipeline Architecture

Vex executes a 7-step orchestration for every import job. Steps 3–4 only run in `extract` and `smart` modes; the `raw` mode performs only chunking, embedding, graph, and store. Source: [pipeline/index.js:15-22]()

```mermaid
flowchart LR
    A[Raw Conversations] --> B[1. CHUNK]
    B --> C[2. EXTRACT]
    C --> D[3. SCORE]
    D --> E[4. DEDUP]
    E --> F[5. EMBED]
    F --> G[6. GRAPH]
    G --> H[7. STORE]
    H --> I[(Vector DB)]
```

| Step | Module | Responsibility |
|------|--------|----------------|
| 1 | [pipeline/01-chunk.js](https://github.com/Vektor-Memory/Vex/blob/main/pipeline/01-chunk.js) | Split conversations into processable units (turn / exchange / conversation / smart) |
| 2 | [pipeline/02-extract.js](https://github.com/Vektor-Memory/Vex/blob/main/pipeline/02-extract.js) | LLM fact extraction with provider cascade |
| 3 | [pipeline/03-score.js](https://github.com/Vektor-Memory/Vex/blob/main/pipeline/03-score.js) | Importance + recency scoring |
| 4 | [pipeline/04-dedup.js](https://github.com/Vektor-Memory/Vex/blob/main/pipeline/04-dedup.js) | Remove near-duplicate facts (default threshold 0.72) |
| 5 | [pipeline/05-embed.js](https://github.com/Vektor-Memory/Vex/blob/main/pipeline/05-embed.js) | Generate vectors via OpenAI, Ollama, or custom endpoint |
| 6 | [pipeline/06-graph.js](https://github.com/Vektor-Memory/Vex/blob/main/pipeline/06-graph.js) | Build temporal, tag-similarity, and causal edges |
| 7 | [pipeline/07-store.js](https://github.com/Vektor-Memory/Vex/blob/main/pipeline/07-store.js) | Write final records to the target connector |

### Pipeline Modes

Three operating modes are exposed through `--mode`:

- **`raw`** — Steps 1, 5, 6, 7 only. Stores whole messages as opaque blobs. Fast, no LLM cost. Source: [README.md:148-152]()
- **`extract`** — All 7 steps. Requires an LLM provider. Produces atomic, self-contained facts. Source: [pipeline/index.js:25-26]()
- **`smart`** — All 7 steps with adaptive chunking: `exchange` for short conversations (<20 turns), `conversation` for long ones. Source: [pipeline/01-chunk.js:5-9]()

## LLM Provider Cascade

The extraction step in [pipeline/02-extract.js](https://github.com/Vektor-Memory/Vex/blob/main/pipeline/02-extract.js) implements a **waterfall failover** across providers so that a job never stops because of one provider outage. Source: [pipeline/02-extract.js:9-12]()

### Supported Providers and Defaults

Each provider is configured in `PROVIDER_DEFAULTS` with endpoint URL, default model, and a per-minute token budget. Source: [pipeline/02-extract.js:67-74]()

- **Groq** — `llama-3.1-8b-instant` (TPM 6,000)
- **OpenAI** — `gpt-4o-mini` (TPM 60,000)
- **Anthropic** — `claude-haiku-4-5-20251001` (TPM 20,000, native Messages API)
- **Mistral** — `mistral-small-latest` (TPM 10,000)
- **Together** — `meta-llama/Llama-3.2-3B-Instruct-Turbo` (TPM 30,000)
- **Ollama** — local endpoint, configurable model

### Key Rotation and Spec Decoding

Multiple Groq or OpenAI keys can be supplied comma-separated to rotate round-robin, multiplying the effective tokens-per-minute budget. Source: [pipeline/02-extract.js:13-14](), [README.md:166-170]()

Ollama additionally supports **speculative decoding** through a draft model (`--ollama-draft llama3.2`), which can yield 2–4× speedups on long contexts. Source: [pipeline/02-extract.js:15-16]()

### Resilience Mechanisms

Three helpers implement the resilience layer: Source: [pipeline/02-extract.js:43-54]()

- `isRateLimitError(e)` — matches `429`, `rate limit`, `TPM`, etc.
- `isTransientError(e)` — also matches `503`, `502`, `timeout`, `econnrefused`.
- `parseRetryAfter(errorMessage)` — extracts exact wait time from Groq/OpenAI error messages such as "Please try again in 3.435s".

When a chunk exhausts all retries it is **skipped and logged**, never blocking the overall job. Source: [pipeline/02-extract.js:18-19]()

### Extraction Prompt and Fact Schema

Each extracted fact is a JSON object with the following fields: Source: [pipeline/02-extract.js:1-22]()

- `fact` — atomic, self-contained statement
- `type` — `decision | preference | fact | entity | action | relationship`
- `importance` — 0.0–1.0 score
- `entities[]` — extracted entity names
- `tags[]` — 1–4 short lowercase keywords
- `potential[]` — exactly 3 natural-language questions this fact would answer

Pre-generating `potential` questions at extraction time has been shown to improve BM25 recall quality. Source: [CHANGELOG.md:1-9]()

## Conversation Imports

### The `.vmig` Format

Vex's portable interchange format is `.vmig.jsonl`. Each line is a record with at least `id` and one of `text`/`vector`, plus a `vex_version` field. Source: [SPEC.md](), [formats/vmig.js]()

Records are emitted by the `toRecord({...}, 'claude-export')` helper and normalized through `writeJsonl()` and `writeMeta()`. Source: [formats/vmig.js:1-32]()

### Claude Export Connector

The Claude source connector accepts two input shapes: Source: [connectors/claude-export.js:34-42]()

1. **claude.ai export** — `conversations.json` with `{ uuid, name, chat_messages: [{ uuid, sender, text, created_at }] }`
2. **API format** — array of `{ id, messages: [{ role, content }] }`

It supports three chunking modes via `--chunk-mode`: Source: [connectors/claude-export.js:18-22]()

- `turn` (default) — one record per message
- `conversation` — one record per full conversation
- `exchange` — one record per adjacent human+assistant pair

A `--sender` filter (`both` | `human` | `assistant`) further refines output. The `extractText()` helper normalizes both string and block-array content shapes from the Anthropic Messages API. Source: [connectors/claude-export.js:18-30]()

### Other Connectors

Vex ships with read/write connectors for VEKTOR Slipstream, Pinecone, Qdrant, ChromaDB, Weaviate, pgvector, Redis (with RediSearch VSS), Milvus/Zilliz, and Neo4j. The Pinecone connector streams pages and emits a progress bar with the namespace and total record count. Source: [connectors/pinecone.js](), [README.md:79-92]()

## Usage Patterns

### Basic Migration

```bash
vex migrate --from claude-export --to vektor \
  --file conversations.json --db mem.db \
  --mode extract --groq-key $GROQ_API_KEY
```

Source: [pipeline/index.js:28-31]()

### Provider Cascade

```bash
# Auto-detect from local config
vex migrate ... --provider auto

# Explicit cascade with override model per provider
vex migrate ... --provider groq,ollama \
  --extract-model groq:llama-3.3-70b-versatile,ollama:mistral
```

Source: [pipeline/02-extract.js:21-29](), [README.md:154-165]()

### LangChain Integration

The `VektorMemory` class in [adapters/langchain.js](https://github.com/Vektor-Memory/Vex/blob/main/adapters/langchain.js) subclasses `BaseMemory` when `@langchain/core` is installed, otherwise falls back to a duck-typed implementation. Conversations saved through `saveContext()` are auto-tagged `episodic`; pass `opts.memory_type` to override. Source: [adapters/langchain.js:5-18]()

## Common Failure Modes

- **Rate limits on free Groq tier** — Use `--groq-key k1,k2,k3` for rotation, drop `--concurrency` to 1, or set `--rate-limit <ms>` for a fixed delay. Source: [pipeline/02-extract.js:24-27]()
- **Anthropic Messages API** — Uses the native `/v1/messages` endpoint, not the OpenAI-compatible shim, so model names must be Anthropic-format. Source: [pipeline/02-extract.js:70-71]()
- **Records without embeddings** — Vector is `null`. Valid in `.vmig` for BM25/FTS5 stores (VEKTOR), but rejected by ANN targets (Pinecone, Qdrant). Source: [connectors/claude-export.js:23-30]()

## See Also

- [Connector Reference](connectors.md)
- [LLM Provider Cascade Details](llm-cascade.md)
- [vmig Format Specification](vmig-spec.md)
- [LangChain Adapter Guide](langchain-adapter.md)

---

<a id='page-4'></a>

## Data Formats, Sovereign Backup, Cryptographic Operations, and Converters

### Related Pages

Related topics: [Project Overview and CLI Reference](#page-1), [Vector Store Connectors and Migration Engine](#page-2), [Extraction Pipeline, LLM Integration, and Conversation Imports](#page-3)

<details>
<summary>Related Source Files</summary>

The following source files were used to generate this page:

- [formats/vmig.js](https://github.com/Vektor-Memory/Vex/blob/main/formats/vmig.js)
- [pipeline/02-extract.js](https://github.com/Vektor-Memory/Vex/blob/main/pipeline/02-extract.js)
- [pipeline/index.js](https://github.com/Vektor-Memory/Vex/blob/main/pipeline/index.js)
- [adapters/langchain.js](https://github.com/Vektor-Memory/Vex/blob/main/adapters/langchain.js)
- [connectors/pinecone.js](https://github.com/Vektor-Memory/Vex/blob/main/connectors/pinecone.js)
- [README.md](https://github.com/Vektor-Memory/Vex/blob/main/README.md)
- [CHANGELOG.md](https://github.com/Vektor-Memory/Vex/blob/main/CHANGELOG.md)
- [SECURITY.md](https://github.com/Vektor-Memory/Vex/blob/main/SECURITY.md)
- [SPEC.md](https://github.com/Vektor-Memory/Vex/blob/main/SPEC.md)
- [package.json](https://github.com/Vektor-Memory/Vex/blob/main/package.json)
</details>

# Data Formats, Sovereign Backup, Cryptographic Operations, and Converters

Vex is an open-source CLI and library for portable agent memory. Beyond import/export and store migration, it ships four tightly-related subsystems: the **`.vmig.jsonl` portable data format**, the **`vex sign` / `vex verify` cryptographic operations**, the **sovereign `vex sync` hybrid backup**, and the **format converters** that re-shape exported memories into LLM training or chat payloads. Together they let users inspect, sign, encrypt, redistribute, and re-inject their memory with no server in the middle.

## The `.vmig.jsonl` Portable Format

The `.vmig.jsonl` (Vex Migrate Interchange Graph) format is the canonical interchange file used by every Vex command. Each line is one JSON record with stable, well-defined fields. The `writeJsonl` helper concatenates records with `JSON.stringify`, appends a trailing newline, and then calls `writeMeta` to write a sidecar metadata file capturing `exported_at`, `source_store`, `record_count`, and a checksum.

Required and optional fields are defined in the project spec. The minimum rule is that at least one of `text` or `vector` must be non-null, `dims` must equal `vector.length` when both are present, and `id` must be unique within a file (`SPEC.md`). Optional fields include `metadata` (scalar-only), `created_at`, `source_store`, `modality`, `score`, and a `vex_version` field stamped automatically by `writeJsonl` (`formats/vmig.js`).

A typical record looks like:

```json
{
  "id": "mem_1780525199",
  "text": "Mini chose SQLite for the slipstream-memory.",
  "vector": [0.012, -0.044, ...],
  "dims": 1536,
  "model": "text-embedding-3-small",
  "namespace": "default",
  "metadata": {
    "importance": 0.85,
    "tags": ["decision", "architecture"],
    "potential": [
      "What database did Mini choose?",
      "Why was SQLite selected over Postgres?",
      "What storage approach does VEKTOR use?"
    ]
  }
}
```

The `metadata.tags` and `metadata.potential` fields are part of the new extraction output: every LLM-extracted fact now carries lowercase tags and exactly three pre-generated natural-language questions to improve BM25 recall quality (`CHANGELOG.md`). Users can inspect or validate these files without a vector store: `vex inspect` prints stats, namespaces, and dimensions; `vex validate` lints every record.

## Cryptographic Operations: BLAKE3 + Ed25519

Vex signs and verifies `.vmig.jsonl` exports locally, with no network round-trip. Signatures use **BLAKE3** for the content hash and **Ed25519** for the asymmetric signature. The signing primitives live in the optional peer-dependency block of `package.json` (`@noble/ed25519 ^2.3.0`, `@noble/hashes ^1.8.0`), so users who do not need signing do not pay the dependency cost.

The CLI exposes this through two commands documented in `README.md`:

```bash
vex sign   memories.vmig.jsonl   # BLAKE3 + Ed25519 signature
vex verify memories.vmig.jsonl   # exit 0=valid, 1=tampered
```

The signing flow lets users produce a verifiable artifact before sharing a memory export with collaborators, embedding it in a paper, or storing it on a third-party host. Because verification is fully local, a tampered record is detected immediately on the user's own machine, which is the foundation that the sovereign backup feature builds on.

## Sovereign Hybrid Backup with `vex sync`

Building on the signed `.vmig.jsonl` format, Vex provides a **client-side encrypted** backup system that uses a remote Git host only as a dumb blob store. The `vex sync` subcommand set (`init`, `push`, `pull`, `status`, `diff`) ships in v0.8.5 and is the headline feature for users who want to back up VEKTOR memory to a host they do not fully trust (`CHANGELOG.md`).

The threat model and architecture are explicit in the security policy: "Your memories, credentials, and vault files live on your machine only. We have no access to them and never will" (`SECURITY.md`). The technical implementation is described in `CHANGELOG.md`:

- **Encryption**: AES-256-GCM with a key derived via HKDF-SHA256 from the local machine-id plus a SHA-256 hash of the user's token. The raw token is never transmitted, and the derived key is never transmitted either.
- **Storage layout**: the encrypted blob is pushed to `memory/vektor-backup.enc` in a private Git repository; a plaintext `memory/manifest.json` containing only count and timestamp is also pushed so that the host can see a file exists but cannot see its contents.
- **Config**: stored at `~/.vex/sync.json`; the derived key at `~/.vex/sync.key` with `chmod 600` permissions.

Four Git providers are supported and documented in `README.md`: GitHub, Codeberg (recommended — free, nonprofit, GDPR-compliant), self-hosted Gitea, and GitLab. Restoring on a new machine only requires re-running `vex sync init` with the original token and then `vex sync pull`.

## Format Converters and Pipeline Integration

The `vex convert` command (alongside the `vex migrate` pipeline orchestrator in `pipeline/index.js`) re-shapes exported memories into other useful formats. The pipeline's 7-step flow — `CHUNK → EXTRACT → SCORE → DEDUP → EMBED → GRAPH → STORE` — feeds facts in `raw`, `extract`, or `smart` modes; the smart mode is recommended for mixed-size conversations (`pipeline/index.js`).

`vex convert` targets five output formats for downstream consumption, with sample record fields that round-trip cleanly through the `.vmig.jsonl` shape produced by `writeJsonl` (`formats/vmig.js`). The extracted facts are then also re-injected into agents through the LangChain adapter (`adapters/langchain.js`), where a `VektorMemory` class implements LangChain's `BaseMemory` interface, supports `topK` recall, default keys `input`/`output`, and `memory_type` tagging (`episodic` for `saveContext()` results, overridable to `semantic`).

```mermaid
flowchart LR
  A[Conversation export] --> B[CHUNK]
  B --> C[EXTRACT<br/>LLM cascade]
  C --> D[SCORE / DEDUP]
  D --> E[EMBED]
  E --> F[GRAPH edges]
  F --> G[.vmig.jsonl]
  G --> H[vex sign<br/>BLAKE3 + Ed25519]
  G --> I[vex sync push<br/>AES-256-GCM]
  G --> J[vex convert<br/>OpenAI / Anthropic / chat / text]
  J --> K[Agent runtime<br/>LangChain / CrewAI / n8n]
```

This single end-to-end diagram captures how a raw conversation eventually becomes an encrypted, signed, distributable, and re-injectable memory artifact — entirely under the user's control, with no Vex server in the loop. The Pinecone connector (`connectors/pinecone.js`) and the other ten supported stores are the write-targets for the `STORE` step; they all read and emit the same `.vmig.jsonl` shape, which is what makes the format the single point of truth for everything described above.

## See Also

- Pipeline Architecture and Provider Cascade — covers `CHUNK`, `EXTRACT`, `SCORE`, `DEDUP`, `EMBED`, `GRAPH`, `STORE` in detail and the LLM provider waterfall (groq → ollama → openai → anthropic).
- Vector Store Connectors — per-connector usage, index types, and configuration for VEKTOR, Pinecone, Qdrant, Chroma, Weaviate, pgvector, Redis, Milvus, and Neo4j.
- Interactive TUI (v0.8.6) — the arrow-key command palette and per-command wizard flows.
- Security Policy — disclosure process and the "we will never ask for your keys" guarantee.

---

<!-- evidence_pipeline_checked: true -->

---

## Pitfall Log

Project: Vektor-Memory/Vex

Summary: Found 7 structured pitfall item(s), including 0 high/blocking item(s). Top priority: Configuration risk - Configuration risk requires verification.

## 1. Configuration risk - Configuration risk requires verification

- Severity: medium
- Evidence strength: source_linked
- Finding: Project evidence flags a configuration risk. Review the linked source before relying on this workflow.
- User impact: May increase setup, validation, or first-run risk for the user.
- Evidence: capability.host_targets | https://github.com/Vektor-Memory/Vex

## 2. Capability evidence risk - Capability evidence risk requires verification

- Severity: medium
- Evidence strength: source_linked
- Finding: README/documentation is current enough for a first validation pass.
- User impact: May increase setup, validation, or first-run risk for the user.
- Evidence: capability.assumptions | https://github.com/Vektor-Memory/Vex

## 3. Maintenance risk - Maintenance risk requires verification

- Severity: medium
- Evidence strength: source_linked
- Finding: Project evidence flags a maintenance risk. Review the linked source before relying on this workflow.
- User impact: May increase setup, validation, or first-run risk for the user.
- Evidence: evidence.maintainer_signals | https://github.com/Vektor-Memory/Vex

## 4. Security or permission risk - Security or permission risk requires verification

- Severity: medium
- Evidence strength: source_linked
- Finding: no_demo
- User impact: May increase setup, validation, or first-run risk for the user.
- Evidence: downstream_validation.risk_items | https://github.com/Vektor-Memory/Vex

## 5. Security or permission risk - Security or permission risk requires verification

- Severity: medium
- Evidence strength: source_linked
- Finding: no_demo
- User impact: May increase setup, validation, or first-run risk for the user.
- Evidence: risks.scoring_risks | https://github.com/Vektor-Memory/Vex

## 6. Maintenance risk - Maintenance risk requires verification

- Severity: low
- Evidence strength: source_linked
- Finding: issue_or_pr_quality=unknown。
- User impact: May increase setup, validation, or first-run risk for the user.
- Evidence: evidence.maintainer_signals | https://github.com/Vektor-Memory/Vex

## 7. Maintenance risk - Maintenance risk requires verification

- Severity: low
- Evidence strength: source_linked
- Finding: release_recency=unknown。
- User impact: May increase setup, validation, or first-run risk for the user.
- Evidence: evidence.maintainer_signals | https://github.com/Vektor-Memory/Vex

<!-- canonical_name: Vektor-Memory/Vex; human_manual_source: deepwiki_human_wiki -->
