# https://github.com/microsoft/kernel-memory Project Manual

Generated at: 2026-06-26 06:30:16 UTC

## Table of Contents

- [Overview & Core Architecture](#page-overview)
- [Ingestion Pipeline & Retrieval (RAG)](#page-pipeline)
- [Extensions, Connectors & Client Integrations](#page-extensions)
- [Deployment, Configuration & Customization](#page-deploy)

<a id='page-overview'></a>

## Overview & Core Architecture

### Related Pages

Related topics: [Ingestion Pipeline & Retrieval (RAG)](#page-pipeline), [Extensions, Connectors & Client Integrations](#page-extensions), [Deployment, Configuration & Customization](#page-deploy)

<details>
<summary>Related Source Files</summary>

The following source files were used to generate this page:

- [applications/evaluation/README.md](https://github.com/microsoft/kernel-memory/blob/main/applications/evaluation/README.md)
- [service/Service/README.md](https://github.com/microsoft/kernel-memory/blob/main/service/Service/README.md)
- [examples/README.md](https://github.com/microsoft/kernel-memory/blob/main/examples/README.md)
- [examples/005-dotnet-async-memory-custom-pipeline/README.md](https://github.com/microsoft/kernel-memory/blob/main/examples/005-dotnet-async-memory-custom-pipeline/README.md)
- [examples/106-dotnet-retrieve-synthetics/README.md](https://github.com/microsoft/kernel-memory/blob/main/examples/106-dotnet-retrieve-synthetics/README.md)
- [tools/README.md](https://github.com/microsoft/kernel-memory/blob/main/tools/README.md)
- [extensions/Chunkers/README.md](https://github.com/microsoft/kernel-memory/blob/main/extensions/Chunkers/README.md)
- [extensions/Aspire/README.md](https://github.com/microsoft/kernel-memory/blob/main/extensions/Aspire/README.md)
- [extensions/OpenAI/README.md](https://github.com/microsoft/kernel-memory/blob/main/extensions/OpenAI/README.md)
- [extensions/Ollama/README.md](https://github.com/microsoft/kernel-memory/blob/main/extensions/Ollama/README.md)
- [extensions/LlamaSharp/README.md](https://github.com/microsoft/kernel-memory/blob/main/extensions/LlamaSharp/README.md)
- [extensions/Tiktoken/README.md](https://github.com/microsoft/kernel-memory/blob/main/extensions/Tiktoken/README.md)
- [extensions/Qdrant/README.md](https://github.com/microsoft/kernel-memory/blob/main/extensions/Qdrant/README.md)
- [extensions/AzureBlobs/README.md](https://github.com/microsoft/kernel-memory/blob/main/extensions/AzureBlobs/README.md)
- [extensions/AWS/S3/README.md](https://github.com/microsoft/kernel-memory/blob/main/extensions/AWS/S3/README.md)
- [extensions/AzureAIDocIntel/README.md](https://github.com/microsoft/kernel-memory/blob/main/extensions/AzureAIDocIntel/README.md)
</details>

# Overview & Core Architecture

## Purpose and Scope

Kernel Memory (KM) is an open-source, multi-modal retrieval-augmented generation (RAG) service. It is designed to ingest heterogeneous content (PDFs, Office documents, images, audio, web pages, raw text), transform it into embeddings, store it in a vector database, and expose a unified API for semantic search and answer generation. As described in [service/Service/README.md](https://github.com/microsoft/kernel-memory/blob/main/service/Service/README.md), the project is a *Knowledge Management* system, not merely a vector store: a text-generation LLM is part of the default pipeline so that queries return synthesized answers grounded in retrieved passages.

The repository is organized around a small Core library and a set of pluggable extension projects. The Core provides the ingestion pipeline, the public `IKernelMemory` interface, and the dependency-injection builder (`KernelMemoryBuilder`). Extension projects contribute concrete implementations for storage, AI models, and vector databases. This separation lets the same code base run in three different deployment topologies, which is the central architectural decision of the project.

## Architectural Pillars

### Three Logical Layers

KM cleanly separates three concerns, each mapped to a folder in the repository:

| Layer | Responsibility | Example Projects |
|-------|----------------|------------------|
| **Core** | Pipeline, interfaces, builder, defaults | `service/Core`, `service/Abstractions` |
| **Connectors** | Concrete AI / Storage / Vector-DB adapters | `extensions/OpenAI`, `extensions/Qdrant`, `extensions/AzureBlobs` |
| **Service & Apps** | Web API, async pipeline host, evaluation | `service/Service`, `applications/evaluation` |

The Core never references a specific vendor; every dependency (LLM, embedding generator, vector DB, document store, OCR engine) is injected through interfaces. This is what allows the same `IKernelMemory` instance to be reconfigured from OpenAI to Ollama, or from Azure AI Search to Qdrant, by changing builder extension methods (see [examples/README.md](https://github.com/microsoft/kernel-memory/blob/main/examples/README.md)).

### The `IKernelMemory` Surface

The public entry point is the `IKernelMemory` interface, exposed by `KernelMemoryBuilder.Build()`. From the caller's perspective, KM is a small object with two main verbs: `ImportXxxAsync` (and its async counterpart `ImportDocumentAsync`) for ingestion, and `AskAsync` / `SearchAsync` for retrieval. The service README clarifies that the same interface is used in both *serverless* (in-process) and *service* (web + queue) modes, so application code does not change between the two. The `Build()` method also accepts options that control which optional services are wired up — this was generalized in 0.96.250116.1 (release notes: *Support Build() options in KM builder extension methods*).

### The Ingestion Pipeline

Document ingestion is implemented as a sequence of named *steps* executed by handlers. A canonical sequence is visible in [examples/005-dotnet-async-memory-custom-pipeline/README.md](https://github.com/microsoft/kernel-memory/blob/main/examples/005-dotnet-async-memory-custom-pipeline/README.md):

1. `extract_text` — decode the binary document into text (plain decoder or Azure AI Document Intelligence via [extensions/AzureAIDocIntel/README.md](https://github.com/microsoft/kernel-memory/blob/main/extensions/AzureAIDocIntel/README.md)).
2. `split_text_in_partitions` — chunking, delegated to the chunker package ([extensions/Chunkers/README.md](https://github.com/microsoft/kernel-memory/blob/main/extensions/Chunkers/README.md)).
3. `generate_embeddings` — call the configured embedding model.
4. `save_memory_records` — persist vectors to the configured memory DB.

The pipeline is asynchronous and queue-driven when running as a service: the service README states that the *Core assembly includes also a basic in-memory queue called SimpleQueues, useful for tests and demos*, while production deployments use Azure Queues or RabbitMQ for *reliability and horizontal scaling*. The same pipeline can be customized by registering additional handlers, allowing custom enrichment (summarization, tagging, translation) — see the synthetic-memory example in [examples/106-dotnet-retrieve-synthetics/README.md](https://github.com/microsoft/kernel-memory/blob/main/examples/106-dotnet-retrieve-synthetics/README.md).

```mermaid
flowchart LR
    A[Document Upload] --> B[extract_text]
    B --> C[split_text_in_partitions]
    C --> D[generate_embeddings]
    D --> E[save_memory_records]
    E --> F[(Vector DB)]
    G[User Query] --> H[AskAsync]
    H --> I[Vector Search]
    I --> F
    F --> J[LLM Answer Generation]
    J --> K[Synthesized Answer]
```

## Deployment Modes

KM supports three deployment topologies, all sharing the same `IKernelMemory` API:

- **Serverless (in-process)** — `KernelMemoryBuilder` is built inside the host application. No external services are required beyond the configured LLM and vector DB. Suited for small files, tests, and single-tenant apps.
- **Service (web + async pipeline)** — A stand-alone web service accepts uploads and exposes a documented REST API (Swagger UI at `/swagger/index.html` when running locally). Handlers run in background processes consuming a persistent queue. The official Docker image is published at `kernelmemory/service`; the source Dockerfile in the repository root can be used for custom builds (see [service/Service/README.md](https://github.com/microsoft/kernel-memory/blob/main/service/Service/README.md)).
- **.NET Aspire** — The Aspire extension ([extensions/Aspire/README.md](https://github.com/microsoft/kernel-memory/blob/main/extensions/Aspire/README.md)) wires KM into an Aspire AppHost for local orchestration and cloud deployment, introduced in 0.95.241216.1 and expanded in subsequent releases.

The Service README warns that, since the 0.96.250115.1 release, *the system throws an exception when mixing volatile and persistent data*, so a deployment must be consistent about whether memory records are ephemeral or durable.

## Ecosystem and Extensibility

Around the Core, the repository ships a rich set of official extensions, each published as a separate NuGet package:

- **AI** — OpenAI, Ollama, LlamaSharp (local Llama), Anthropic, Semantic Kernel text completion, and Tiktoken/GPT tokenizers ([extensions/OpenAI/README.md](https://github.com/microsoft/kernel-memory/blob/main/extensions/OpenAI/README.md), [extensions/Ollama/README.md](https://github.com/microsoft/kernel-memory/blob/main/extensions/Ollama/README.md), [extensions/LlamaSharp/README.md](https://github.com/microsoft/kernel-memory/blob/main/extensions/LlamaSharp/README.md), [extensions/Tiktoken/README.md](https://github.com/microsoft/kernel-memory/blob/main/extensions/Tiktoken/README.md)).
- **Vector DBs** — Qdrant (with a documented caveat about its GUID/INT point-ID limitation forcing an extra round-trip on upsert — [extensions/Qdrant/README.md](https://github.com/microsoft/kernel-memory/blob/main/extensions/Qdrant/README.md)), plus Azure AI Search, Elasticsearch, Postgres, Redis, and SQL Server.
- **Document Storage** — Azure Blob Storage and AWS S3 (with `ForcePathStyle` support for MinIO added in 0.98.250324.1 — [extensions/AzureBlobs/README.md](https://github.com/microsoft/kernel-memory/blob/main/extensions/AzureBlobs/README.md), [extensions/AWS/S3/README.md](https://github.com/microsoft/kernel-memory/blob/main/extensions/AWS/S3/README.md)).
- **OCR / parsing** — Azure AI Document Intelligence.

In addition, the `tools/` directory ([tools/README.md](https://github.com/microsoft/kernel-memory/blob/main/tools/README.md)) provides CLI clients (`km-cli/upload-file.sh`, `ask.sh`, `search.sh`) and Docker launch scripts for local vector DBs (Elasticsearch, MSSQL, Qdrant, Redis). The `applications/evaluation` project ([applications/evaluation/README.md](https://github.com/microsoft/kernel-memory/blob/main/applications/evaluation/README.md)) ships a `TestSetGenerator` that synthesizes evaluation queries from an existing index and computes standard RAG metrics — Faithfulness, Answer Relevancy, Context Recall/Precision, Context Relevancy, Context Entity Recall, Answer Semantic Similarity, and Answer Correctness.

A final architectural trait worth highlighting is the project's commitment to **composability over monolithism**: every public interface has multiple concrete implementations, and the README's example list explicitly groups topics into *Customizations* (custom handlers, embeddings, decoders, web scrapers) and *Local models and external connectors*. Recent releases reinforce this direction — for example, 0.98.250508.3 added a Japanese text split character and fixed OpenAPI specifications for upload tags/steps, while 0.94.241201.1 introduced response streaming. These changes were made possible precisely because the Core exposes a small, stable surface and defers everything else to extensions.

## See Also

- [Ingestion Pipeline & Handlers](Ingestion-Pipeline.md)
- [Vector Database Connectors](Vector-DB-Connectors.md)
- [LLM & Embedding Connectors](LLM-Connectors.md)
- [Service Deployment & Docker](Service-Deployment.md)
- [Evaluation & Test-Set Generation](Evaluation.md)

---

<a id='page-pipeline'></a>

## Ingestion Pipeline & Retrieval (RAG)

### Related Pages

Related topics: [Overview & Core Architecture](#page-overview), [Extensions, Connectors & Client Integrations](#page-extensions), [Deployment, Configuration & Customization](#page-deploy)

<details>
<summary>Related Source Files</summary>

The following source files were used to generate this page:

- [service/Abstractions/Pipeline/IPipelineStepHandler.cs](https://github.com/microsoft/kernel-memory/blob/main/service/Abstractions/Pipeline/IPipelineStepHandler.cs)
- [service/Core/Handlers/TextExtractionHandler.cs](https://github.com/microsoft/kernel-memory/blob/main/service/Core/Handlers/TextExtractionHandler.cs)
- [service/Core/Handlers/TextPartitioningHandler.cs](https://github.com/microsoft/kernel-memory/blob/main/service/Core/Handlers/TextPartitioningHandler.cs)
- [service/Core/Handlers/GenerateEmbeddingsHandler.cs](https://github.com/microsoft/kernel-memory/blob/main/service/Core/Handlers/GenerateEmbeddingsHandler.cs)
- [service/Core/Handlers/GenerateEmbeddingsParallelHandler.cs](https://github.com/microsoft/kernel-memory/blob/main/service/Core/Handlers/GenerateEmbeddingsParallelHandler.cs)
- [service/Core/Handlers/GenerateEmbeddingsHandlerBase.cs](https://github.com/microsoft/kernel-memory/blob/main/service/Core/Handlers/GenerateEmbeddingsHandlerBase.cs)
- [examples/005-dotnet-async-memory-custom-pipeline/README.md](https://github.com/microsoft/kernel-memory/blob/main/examples/005-dotnet-async-memory-custom-pipeline/README.md)
- [examples/106-dotnet-retrieve-synthetics/README.md](https://github.com/microsoft/kernel-memory/blob/main/examples/106-dotnet-retrieve-synthetics/README.md)
- [applications/evaluation/README.md](https://github.com/microsoft/kernel-memory/blob/main/applications/evaluation/README.md)
- [service/Service/README.md](https://github.com/microsoft/kernel-memory/blob/main/service/Service/README.md)
- [extensions/Chunkers/README.md](https://github.com/microsoft/kernel-memory/blob/main/extensions/Chunkers/README.md)
</details>

# Ingestion Pipeline & Retrieval (RAG)

## Overview

Kernel Memory processes user content through a modular ingestion pipeline and answers user questions through a Retrieval-Augmented Generation (RAG) loop. The pipeline is composed of discrete step handlers that each advance a shared `DataPipeline` object, while retrieval combines vector search, prompt construction, and LLM-based answer generation. Source: [service/Abstractions/Pipeline/IPipelineStepHandler.cs:1-]()

The `IPipelineStepHandler` interface defines the contract that every handler implements, making the pipeline composable and extensible. Standard handlers shipped in the Core project cover text extraction, partitioning (chunking), embedding generation, and persisting memory records to the configured vector store. Source: [service/Core/Handlers/TextExtractionHandler.cs:1-]()

## Ingestion Pipeline

### Default Step Sequence

When a `Document` is submitted through `ImportDocumentAsync`, Kernel Memory enqueues a pipeline that flows through a sequence of named steps. Each step is a discrete handler hosted either in-process (serverless mode) or as a background service.

| Step name | Handler | Responsibility |
|---|---|---|
| `extract_text` | `TextExtractionHandler` | Decode raw files (PDF, DOCX, images via Azure AI Doc Intel, etc.) into plain text |
| `split_text_in_partitions` | `TextPartitioningHandler` | Chunk text into smaller partitions suitable for embedding and retrieval |
| `generate_embeddings` | `GenerateEmbeddingsHandler` | Produce vector embeddings for each partition (sequential) |
| `generate_embeddings_parallel` | `GenerateEmbeddingsParallelHandler` | Variant that batches embedding calls concurrently for higher throughput |
| `summarize` | `SummarizationHandler` | Optional synthetic memory generation (LLM-based summary of the source) |
| `save_memory_records` | `SaveRecordsHandler` | Persist partitions and vectors to the configured memory DB |

Source: [examples/005-dotnet-async-memory-custom-pipeline/README.md:1-](), [service/Core/Handlers/TextPartitioningHandler.cs:1-](), [service/Core/Handlers/GenerateEmbeddingsHandlerBase.cs:1-]()

### Pipeline Data Flow

```mermaid
sequenceDiagram
    participant Client
    participant Queue
    participant Extract as TextExtractionHandler
    participant Chunk as TextPartitioningHandler
    participant Embed as GenerateEmbeddingsHandler
    participant Save as SaveRecordsHandler
    Client->>Queue: ImportDocumentAsync(file, tags, steps)
    Queue->>Extract: extract_text
    Extract->>Chunk: split_text_in_partitions
    Chunk->>Embed: generate_embeddings
    Embed->>Save: save_memory_records
    Save-->>Client: document ready (via IsDocumentReadyAsync)
```

### Selecting and Customizing Steps

Steps can be chosen per request via the `steps` argument, and handlers can run as hosted background services through `AddHandlerAsHostedService`. Source: [examples/005-dotnet-async-memory-custom-pipeline/README.md:1-]()

```csharp
host.Services.AddHandlerAsHostedService<TextExtractionHandler>("extract_text");
host.Services.AddHandlerAsHostedService<TextPartitioningHandler>("split_text_in_partitions");
host.Services.AddHandlerAsHostedService<SummarizationHandler>("summarize");
host.Services.AddHandlerAsHostedService<GenerateEmbeddingsHandler>("generate_embeddings");
host.Services.AddHandlerAsHostedService<SaveRecordsHandler>("save_memory_records");

string docId = await memory.ImportDocumentAsync(
    new Document("inProcessTest")
        .AddFile("file1-Wikipedia-Carbon.txt")
        .AddTag("testName", "example3"),
    steps: new[] {
        "extract_text",
        "split_text_in_partitions",
        "generate_embeddings",
        "save_memory_records"
    });
```

By dropping `summarize` from the `steps` array, callers skip synthetic-data generation; by inserting a custom step name they can wire their own `IPipelineStepHandler` implementation into the same flow. Source: [examples/005-dotnet-async-memory-custom-pipeline/README.md:1-]()

## Retrieval and RAG

Kernel Memory exposes two retrieval primitives:

- `SearchAsync` — returns relevant partitions (and citations) from the memory store without invoking an LLM.
- `AskAsync` — performs full RAG: it searches, builds a grounded prompt from the hits, and asks the configured text generator to produce an answer.

The evaluation harness measures Faithfulness, Answer Relevancy, Context Recall, Context Precision, Context Relevancy, Context Entity Recall, Answer Semantic Similarity, and Answer Correctness. Source: [applications/evaluation/README.md:1-]()

Since release `0.96.250115.1`, duplicate facts are discarded by default during RAG answer synthesis, improving precision in the generated output. Source: community release note at [packages-0.96.250115.1](https://github.com/microsoft/kernel-memory/releases/tag/packages-0.96.250115.1). Synthetic memories such as summaries are first-class retrieval targets — the `summarize` step writes them back through the same indexing path, so they can be returned alongside raw chunks at query time. Source: [examples/106-dotnet-retrieve-synthetics/README.md:1-]()

### Configuration Highlights

- **Chunkers**: shipped as a dedicated package, `Microsoft.KernelMemory.Chunkers`, configurable per deployment. Source: [extensions/Chunkers/README.md:1-]()
- **Embedding generator**: pluggable; defaults to the configured text-embedding model, but custom generators can be substituted. Source: [service/Service/README.md:1-]()
- **LLM**: used both at ingestion (synthetic data) and at answer time; the service has been tested primarily with OpenAI GPT-3.5 and GPT-4. Source: [service/Service/README.md:1-]()
- **Queue**: in-process `SimpleQueues` for tests and demos; production deployments use Azure Queues or RabbitMQ for reliability and horizontal scaling. Source: [service/Service/README.md:1-]()

## Common Failure Modes and Tips

- **Mixing volatile and persistent data** in the same pipeline raises an exception by design (added in `0.96.250115.1`). Source: [packages-0.96.250115.1](https://github.com/microsoft/kernel-memory/releases/tag/packages-0.96.250115.1)
- **Step name typos** cause the pipeline to wait indefinitely — the strings passed to `AddHandlerAsHostedService` must exactly match those passed in the `steps` array. Source: [examples/005-dotnet-async-memory-custom-pipeline/README.md:1-]()
- **Async completion**: poll `IsDocumentReadyAsync` after `ImportDocumentAsync` to confirm that the background handlers finished. Source: [examples/005-dotnet-async-memory-custom-pipeline/README.md:1-]()
- **AWS S3 with MinIO** requires `ForcePathStyle = true` on `AWSS3Config` (added in `0.98.250324.1`). Source: [packages-0.98.250324.1](https://github.com/microsoft/kernel-memory/releases/tag/packages-0.98.250324.1)
- **Localization**: chunker split characters must match the language; a Japanese split character was added in `0.98.250508.3`. Source: [packages-0.98.250508.3](https://github.com/microsoft/kernel-memory/releases/tag/packages-0.98.250508.3)

## See Also

- Service architecture overview: [service/Service/README.md](https://github.com/microsoft/kernel-memory/blob/main/service/Service/README.md)
- Examples index (serverless, async, custom pipelines, RAG): [examples/README.md](https://github.com/microsoft/kernel-memory/blob/main/examples/README.md)
- Text chunkers extension: [extensions/Chunkers/README.md](https://github.com/microsoft/kernel-memory/blob/main/extensions/Chunkers/README.md)
- Evaluation harness (RAG quality metrics): [applications/evaluation/README.md](https://github.com/microsoft/kernel-memory/blob/main/applications/evaluation/README.md)

---

<a id='page-extensions'></a>

## Extensions, Connectors & Client Integrations

### Related Pages

Related topics: [Overview & Core Architecture](#page-overview), [Ingestion Pipeline & Retrieval (RAG)](#page-pipeline), [Deployment, Configuration & Customization](#page-deploy)

<details>
<summary>Related Source Files</summary>

The following source files were used to generate this page:

- [extensions/Chunkers/README.md](https://github.com/microsoft/kernel-memory/blob/main/extensions/Chunkers/README.md)
- [extensions/Aspire/README.md](https://github.com/microsoft/kernel-memory/blob/main/extensions/Aspire/README.md)
- [extensions/AWS/S3/README.md](https://github.com/microsoft/kernel-memory/blob/main/extensions/AWS/S3/README.md)
- [extensions/Ollama/README.md](https://github.com/microsoft/kernel-memory/blob/main/extensions/Ollama/README.md)
- [extensions/AzureAIDocIntel/README.md](https://github.com/microsoft/kernel-memory/blob/main/extensions/AzureAIDocIntel/README.md)
- [extensions/LlamaSharp/README.md](https://github.com/microsoft/kernel-memory/blob/main/extensions/LlamaSharp/README.md)
- [extensions/Tiktoken/README.md](https://github.com/microsoft/kernel-memory/blob/main/extensions/Tiktoken/README.md)
- [service/Service/README.md](https://github.com/microsoft/kernel-memory/blob/main/service/Service/README.md)
- [tools/README.md](https://github.com/microsoft/kernel-memory/blob/main/tools/README.md)
- [applications/evaluation/README.md](https://github.com/microsoft/kernel-memory/blob/main/applications/evaluation/README.md)
- [examples/005-dotnet-async-memory-custom-pipeline/README.md](https://github.com/microsoft/kernel-memory/blob/main/examples/005-dotnet-async-memory-custom-pipeline/README.md)
- [examples/README.md](https://github.com/microsoft/kernel-memory/blob/main/examples/README.md)
</details>

# Extensions, Connectors & Client Integrations

The Kernel Memory repository is built around a small Core package and a large set of satellite extension projects published as independent NuGet packages. The `extensions/` folder is the home for these integrations, and it spans three broad families: LLM/embedding connectors, storage and content-extraction connectors, and developer-tooling projects such as .NET Aspire, chunkers, tokenizers, and the evaluation harness. The `examples/` folder provides runnable, step-by-step demos for the most common customizations.

## Extension Architecture

Core defines the abstract interfaces that any connector must implement, while each project under `extensions/` provides a concrete implementation. The service overview makes the separation explicit: Kernel Memory has a clear boundary between the orchestration engine and the underlying storage, embeddings, and LLM dependencies, which is what makes plug-in style extensions practical (Source: [service/Service/README.md:18-24]()). Extensions follow a consistent shape — they expose a typed configuration class plus one or more `KernelMemoryBuilder` extension methods (e.g. `WithOllamaTextGeneration`, `WithOllamaTextEmbeddingGeneration`) that register the dependency in the DI container used by the memory pipeline (Source: [extensions/Ollama/README.md:11-23]()).

## Catalog of Official Extensions

| Package / Project | Role | Reference |
| --- | --- | --- |
| `Microsoft.KernelMemory.AI.Ollama` | LLM and embedding generation via a local Ollama daemon | [extensions/Ollama/README.md:1-23]() |
| `Microsoft.KernelMemory.AI.LlamaSharp` | On-device Llama inference using LLamaSharp | [extensions/LlamaSharp/README.md:1-12]() |
| `Microsoft.KernelMemory.AI.Tiktoken` | Token counting/clamping via Tiktoken | [extensions/Tiktoken/README.md:1-9]() |
| `Microsoft.KernelMemory.Chunkers` | Standalone text partitioning primitives | [extensions/Chunkers/README.md:1-9]() |
| `Microsoft.KernelMemory.AI` (Aspire) | .NET Aspire AppHost integration for local/cloud | [extensions/Aspire/README.md:1-9]() |
| `Microsoft.KernelMemory.DataFormats.AzureAIDocIntel` | Azure AI Document Intelligence for OCR/layout | [extensions/AzureAIDocIntel/README.md:1-8]() |
| AWS S3 adapter | S3-backed binary content storage (MinIO compatible) | [extensions/AWS/S3/README.md:1-9]() |
| `km-cli/` shell scripts | `upload`, `ask`, `search` clients over HTTP | [tools/README.md:1-30]() |
| `applications/evaluation` | Offline RAG quality harness (faithfulness, recall, etc.) | [applications/evaluation/README.md:3-13]() |

The catalog is intentionally open: contributors are encouraged to add new connectors under `extensions/`, and the `examples/` folder ships a curated list of sample projects covering custom partitioning, embeddings, content decoders, web scrapers, handlers, and provider integrations (Source: [examples/README.md:1-30]()).

## LLM and Embedding Connectors

Every LLM connector wraps a third-party model API and exposes it through the `ITextGenerator` and `ITextEmbeddingGenerator` interfaces defined in Core. The Ollama connector is a representative example: it accepts an `OllamaConfig` containing an endpoint URL plus two `OllamaModelConfig` entries (one for chat, one for embeddings) and is wired in with two builder calls (Source: [extensions/Ollama/README.md:13-23]()). The same pattern is used by the LlamaSharp connector for fully local Llama inference (Source: [extensions/LlamaSharp/README.md:1-12]()), by the Azure OpenAI and OpenAI connectors, and by the Anthropic connector. The service README recommends GPT-3.5/GPT-4 for production and warns that the available token budget directly impacts summarization and answer quality (Source: [service/Service/README.md:12-18]()).

Token management is a first-class concern. The Tiktoken extension is a tokenizer implementation that any connector can be configured to use for accurate token counts, which is critical for chunking and prompt assembly (Source: [extensions/Tiktoken/README.md:1-9]()). The Chunkers extension complements it with reusable text-splitting primitives (Source: [extensions/Chunkers/README.md:1-9]()) that other pipelines can consume without pulling in the full Core.

## Storage, Document Intelligence, and Tooling

The repository ships adapters for storing the binary content that backs memory records outside the vector DB. The AWS S3 adapter uploads and retrieves documents using the standard S3 API; recent work added a `ForcePathStyle` flag to make the same code path work against MinIO (Source: [extensions/AWS/S3/README.md:1-9]()). For richer content extraction, the Azure AI Document Intelligence adapter enables high-accuracy OCR and layout-aware parsing of images and PDFs (Source: [extensions/AzureAIDocIntel/README.md:1-8]()).

On the developer-experience side, the Aspire extension provides a curated set of AppHost extension methods so the service, vector store, and LLM can be orchestrated through .NET Aspire for local and cloud deployments (Source: [extensions/Aspire/README.md:1-9]()). Shell-based clients for `upload`, `ask`, and `search` live under `tools/km-cli/` and are documented alongside Docker helpers for spinning up Elasticsearch, MS SQL, Qdrant, and Redis for local debugging (Source: [tools/README.md:1-30]()). The `applications/evaluation` project adds an offline quality harness that scores a RAG pipeline on faithfulness, answer relevancy, context recall/precision, context relevancy, context entity recall, answer semantic similarity, and answer correctness (Source: [applications/evaluation/README.md:3-13]()). A `TestSetGenerator` is also provided, which synthesizes a test set from an existing memory and index using a configurable distribution of question types (Source: [applications/evaluation/README.md:13-30]()).

## Integration Pattern

In practice a connector is selected at build time and then ignored by application code. The example for async memory with a custom pipeline shows the typical flow: a `KernelMemoryBuilder` is created, extensions register their services via methods such as `AddHandlerAsHostedService`, the builder produces a `Memory` (or async equivalent), and the application calls `ImportDocumentAsync` / `AskAsync` against the same high-level API regardless of which LLM, embedder, vector DB, or storage backend is wired in (Source: [examples/005-dotnet-async-memory-custom-pipeline/README.md:30-58]()). The list of example projects under `examples/README.md` covers the most common customizations, including custom partitioning, custom embeddings, custom content decoders, custom web scrapers, custom handlers, and Anthropic/Ollama/LlamaSharp/LM Studio integrations (Source: [examples/README.md:6-30]()). This uniform contract is what makes the extension ecosystem composable: swapping one connector for another is a builder change, not an application-code change.

## See Also

- Service deployment and Docker: [service/Service/README.md](https://github.com/microsoft/kernel-memory/blob/main/service/Service/README.md)
- Example catalog: [examples/README.md](https://github.com/microsoft/kernel-memory/blob/main/examples/README.md)
- Evaluation harness: [applications/evaluation/README.md](https://github.com/microsoft/kernel-memory/blob/main/applications/evaluation/README.md)
- Tooling scripts and CLI: [tools/README.md](https://github.com/microsoft/kernel-memory/blob/main/tools/README.md)

---

<a id='page-deploy'></a>

## Deployment, Configuration & Customization

### Related Pages

Related topics: [Overview & Core Architecture](#page-overview), [Ingestion Pipeline & Retrieval (RAG)](#page-pipeline), [Extensions, Connectors & Client Integrations](#page-extensions)

<details>
<summary>Related Source Files</summary>

The following source files were used to generate this page:

- [service/Service/README.md](https://github.com/microsoft/kernel-memory/blob/main/service/Service/README.md)
- [examples/002-dotnet-Serverless/README.md](https://github.com/microsoft/kernel-memory/blob/main/examples/002-dotnet-Serverless/README.md)
- [examples/005-dotnet-AsyncMemoryCustomPipeline/README.md](https://github.com/microsoft/kernel-memory/blob/main/examples/005-dotnet-AsyncMemoryCustomPipeline/README.md)
- [examples/007-dotnet-serverless-azure/README.md](https://github.com/microsoft/kernel-memory/blob/main/examples/007-dotnet-serverless-azure/README.md)
- [extensions/Aspire/README.md](https://github.com/microsoft/kernel-memory/blob/main/extensions/Aspire/README.md)
- [extensions/AWS/S3/README.md](https://github.com/microsoft/kernel-memory/blob/main/extensions/AWS/S3/README.md)
- [extensions/Ollama/README.md](https://github.com/microsoft/kernel-memory/blob/main/extensions/Ollama/README.md)
- [extensions/Chunkers/README.md](https://github.com/microsoft/kernel-memory/blob/main/extensions/Chunkers/README.md)
- [tools/README.md](https://github.com/microsoft/kernel-memory/blob/main/tools/README.md)
- [infra/README.md](https://github.com/microsoft/kernel-memory/blob/main/infra/README.md)
- [examples/README.md](https://github.com/microsoft/kernel-memory/blob/main/examples/README.md)
</details>

# Deployment, Configuration & Customization

## Overview

Kernel Memory supports a wide spectrum of deployment topologies, from fully in-process "serverless" use to a horizontally scalable web service backed by persistent queues. Customization is achieved through extension packages (LLM connectors, vector stores, content decoders, chunkers) and through the `KernelMemoryBuilder` fluent API. This page summarizes how the project is deployed, configured, and extended, drawing on the official service README, example projects, extensions, and infrastructure deployment guides.

The service exposes a web API for upload and query, plus an asynchronous data pipeline that ingests documents in the background. Source: [service/Service/README.md](https://github.com/microsoft/kernel-memory/blob/main/service/Service/README.md).

## Deployment Topologies

### Serverless (In-Process)

For small workloads and demos, all logic runs locally inside the host process. No service is deployed; the application uses `MemoryServerless` and the default C# handlers. Files can be stored on disk or in Azure Blobs depending on configuration. Source: [examples/002-dotnet-Serverless/README.md](https://github.com/microsoft/kernel-memory/blob/main/examples/002-dotnet-Serverless/README.md).

```csharp
var memory = new KernelMemoryBuilder()
    .WithOpenAIDefaults(Environment.GetEnvironmentVariable("OPENAI_API_KEY"))
    .Build<MemoryServerless>();

await memory.ImportDocumentAsync(new Document("doc012")
    .AddFiles([ "file2.txt", "file3.docx", "file4.pdf" ])
    .AddTag("user", "Blake"));
```

### Async Pipeline (Custom Handlers)

When reliability and scale matter, ingestion can run via hosted background services, with explicit pipeline steps such as `extract_text`, `split_text_in_partitions`, `generate_embeddings`, and `save_memory_records`. Source: [examples/005-dotnet-AsyncMemoryCustomPipeline/README.md](https://github.com/microsoft/kernel-memory/blob/main/examples/005-dotnet-AsyncMemoryCustomPipeline/README.md).

### Kernel Memory as a Service

The reference deployment packages a web service and an asynchronous handler pipeline as separate, independently scalable components. Persistent queues (Azure Queues, RabbitMQ, or the built-in `SimpleQueues` for tests) decouple ingestion from the API. Source: [service/Service/README.md](https://github.com/microsoft/kernel-memory/blob/main/service/Service/README.md).

### Docker and Azure Infrastructure

A pre-built image is published on Docker Hub (`kernelmemory/service`). A quick-start in demo mode only requires the `OPENAI_API_KEY` environment variable:

```
docker run -e OPENAI_API_KEY="..." -p 9001:9001 -it --rm kernelmemory/service
```

A production-style run mounts an `appsettings.Production.json` file into `/app`. Source: [service/Service/README.md](https://github.com/microsoft/kernel-memory/blob/main/service/Service/README.md).

For full cloud provisioning, the `infra/` folder contains an ARM/Bicep template that registers the `Microsoft.AlertsManagement`, `Microsoft.App`, and `Microsoft.ContainerService` resource providers and deploys the entire stack via the "Deploy to Azure" button. The deployment typically takes up to 20 minutes. Source: [infra/README.md](https://github.com/microsoft/kernel-memory/blob/main/infra/README.md).

```mermaid
flowchart LR
    Client[Client / Web App] -->|HTTP| API[KM Web Service<br/>:9001]
    API -->|enqueue| Q[(Queue: Azure / RabbitMQ / SimpleQueues)]
    Q --> Worker[Async Pipeline Handlers]
    Worker -->|read/write| Blob[(Blob Storage)]
    Worker -->|embeddings + chunks| Vec[(Vector DB)]
    API -->|search| Vec
```

## Configuration

Configuration follows standard ASP.NET Core conventions. Endpoints and authentication details are stored in `appsettings.json` and can be overridden by `appsettings.Development.json` when `ASPNETCORE_ENVIRONMENT=Development`. Source: [examples/007-dotnet-serverless-azure/README.md](https://github.com/microsoft/kernel-memory/blob/main/examples/007-dotnet-serverless-azure/README.md).

Common configuration areas include:

| Area | Notes |
|------|-------|
| LLM endpoint | OpenAI, Azure OpenAI, Anthropic, Ollama, LlamaSharp, LM Studio |
| Embedding generator | Pluggable; bring your own via `WithCustomEmbeddingGeneration` |
| Vector store | Azure AI Search, Elasticsearch, Postgres, Qdrant, Redis, MS SQL |
| Content storage | Local disk, Azure Blobs, AWS S3 |
| Queues | `SimpleQueues` (default), Azure Queues, RabbitMQ |
| Tokenizer | Selectable via configuration (GA 1.0.0) |

Source: [service/Service/README.md](https://github.com/microsoft/kernel-memory/blob/main/service/Service/README.md) and [examples/README.md](https://github.com/microsoft/kernel-memory/blob/main/examples/README.md).

When running the service, we recommend persistent queues for reliability and horizontal scaling, like Azure Queues and RabbitMQ. Source: [service/Service/README.md](https://github.com/microsoft/kernel-memory/blob/main/service/Service/README.md).

A "service config check" was introduced in release 0.96.250115.1 to validate the configuration at startup, and version 0.96.250115.1 also began throwing an exception when callers mix volatile and persistent data inadvertently. Source: release notes referenced in community context.

## Customization & Extensions

Kernel Memory is designed for plug-and-play customization. The `extensions/` folder ships first-party adapters, while the `examples/` folder demonstrates common customization patterns. Source: [examples/README.md](https://github.com/microsoft/kernel-memory/blob/main/examples/README.md).

### Extensions

- **Ollama** — Connects to a local Ollama service for both text generation and embeddings. Configure endpoint and per-model token limits. Source: [extensions/Ollama/README.md](https://github.com/microsoft/kernel-memory/blob/main/extensions/Ollama/README.md).
- **AWS S3** — Storage adapter that uploads documents and tracks pipeline state in S3 buckets. Source: [extensions/AWS/S3/README.md](https://github.com/microsoft/kernel-memory/blob/main/extensions/AWS/S3/README.md).
- **Chunkers** — Standalone `Microsoft.KernelMemory.Chunkers` package for advanced text partitioning, including language-specific separators such as the Japanese split character added in 0.98.250508.3. Source: [extensions/Chunkers/README.md](https://github.com/microsoft/kernel-memory/blob/main/extensions/Chunkers/README.md).
- **Aspire** — .NET Aspire extensions for local and cloud orchestration of Kernel Memory components. Source: [extensions/Aspire/README.md](https://github.com/microsoft/kernel-memory/blob/main/extensions/Aspire/README.md).

### Custom Pipelines, Prompts, and Decoders

The example catalogue covers custom partitioning (102), custom embedding generators (103), custom LLMs (104), custom content decoders (108), custom web scrapers (109), and custom ingestion handlers (201). RAG prompts and summarization prompts can also be overridden (101), and context parameters can tune the prompt per request (209). Source: [examples/README.md](https://github.com/microsoft/kernel-memory/blob/main/examples/README.md).

For advanced scenarios, a single asynchronous pipeline handler can be deployed as a standalone service (202), and `Memory` instances can be constructed without `KernelMemoryBuilder` (210). Source: [examples/README.md](https://github.com/microsoft/kernel-memory/blob/main/examples/README.md).

### CLI and Operational Tools

The `tools/` folder includes shell scripts (`upload-file.sh`, `ask.sh`, `search.sh`) for command-line interaction, scripts to launch Elasticsearch, MS SQL, Qdrant, and Redis containers, and an `InteractiveSetup` project that generates `appsettings.Development.json`. Source: [tools/README.md](https://github.com/microsoft/kernel-memory/blob/main/tools/README.md).

## Common Failure Modes

- Mixing volatile and persistent data without explicit configuration now raises an exception (release 0.96.250115.1). Plan your index and storage choices before deployment.
- SQL Server-backed deployments require the ICU library, which was added to the Docker image in release 0.98.250323.1. Missing ICU causes globalization-related runtime failures.
- MinIO compatibility with AWS S3 requires `ForcePathStyle = true` in `AWSS3Config`, added in release 0.98.250324.1.
- OpenAPI clients should regenerate against the latest schema, as the `/upload` endpoint specification for `tags` and `steps` was corrected in release 0.98.250508.3.

## See Also

- [extensions/Chunkers/README.md](https://github.com/microsoft/kernel-memory/blob/main/extensions/Chunkers/README.md) — Text partitioning extensions
- [extensions/Aspire/README.md](https://github.com/microsoft/kernel-memory/blob/main/extensions/Aspire/README.md) — .NET Aspire orchestration
- [extensions/Ollama/README.md](https://github.com/microsoft/kernel-memory/blob/main/extensions/Ollama/README.md) — Local LLM via Ollama
- [extensions/AWS/S3/README.md](https://github.com/microsoft/kernel-memory/blob/main/extensions/AWS/S3/README.md) — S3 storage adapter
- [infra/README.md](https://github.com/microsoft/kernel-memory/blob/main/infra/README.md) — Azure deployment accelerator
- [tools/README.md](https://github.com/microsoft/kernel-memory/blob/main/tools/README.md) — CLI scripts and dev tools

---

<!-- evidence_pipeline_checked: true -->
<!-- evidence_injected: true -->

---

## Pitfall Log

Project: microsoft/kernel-memory

Summary: Found 7 structured pitfall item(s), including 0 high/blocking item(s). Top priority: Installation risk - Installation risk requires verification.

## 1. Installation risk - Installation risk requires verification

- Severity: medium
- Evidence strength: runtime_trace
- Finding: Project evidence flags a installation risk. Review the linked source before relying on this workflow.
- User impact: May increase setup, validation, or first-run risk for the user.
- Repro command: `docker run -e OPENAI_API_KEY="..." -it --rm -p 9001:9001 kernelmemory/service`
- Evidence: identity.distribution | https://github.com/microsoft/kernel-memory

## 2. Capability evidence risk - Capability evidence risk requires verification

- Severity: medium
- Evidence strength: source_linked
- Finding: README/documentation is current enough for a first validation pass.
- User impact: May increase setup, validation, or first-run risk for the user.
- Evidence: capability.assumptions | https://github.com/microsoft/kernel-memory

## 3. Maintenance risk - Maintenance risk requires verification

- Severity: medium
- Evidence strength: source_linked
- Finding: Project evidence flags a maintenance risk. Review the linked source before relying on this workflow.
- User impact: May increase setup, validation, or first-run risk for the user.
- Evidence: evidence.maintainer_signals | https://github.com/microsoft/kernel-memory

## 4. Security or permission risk - Security or permission risk requires verification

- Severity: medium
- Evidence strength: source_linked
- Finding: no_demo
- User impact: May increase setup, validation, or first-run risk for the user.
- Evidence: downstream_validation.risk_items | https://github.com/microsoft/kernel-memory

## 5. Security or permission risk - Security or permission risk requires verification

- Severity: medium
- Evidence strength: source_linked
- Finding: no_demo
- User impact: May increase setup, validation, or first-run risk for the user.
- Evidence: risks.scoring_risks | https://github.com/microsoft/kernel-memory

## 6. Maintenance risk - Maintenance risk requires verification

- Severity: low
- Evidence strength: source_linked
- Finding: issue_or_pr_quality=unknown。
- User impact: May increase setup, validation, or first-run risk for the user.
- Evidence: evidence.maintainer_signals | https://github.com/microsoft/kernel-memory

## 7. Maintenance risk - Maintenance risk requires verification

- Severity: low
- Evidence strength: source_linked
- Finding: release_recency=unknown。
- User impact: May increase setup, validation, or first-run risk for the user.
- Evidence: evidence.maintainer_signals | https://github.com/microsoft/kernel-memory

<!-- canonical_name: microsoft/kernel-memory; human_manual_source: deepwiki_human_wiki -->