# https://github.com/chopratejas/headroom Project Manual

Generated at: 2026-06-02 10:58:52 UTC

## Table of Contents

- [Introduction](#introduction)
- [Getting Started](#getting-started)
- [Architecture](#architecture)
- [Compression Pipeline](#compression-pipeline)
- [Compression Algorithms](#compression-algorithms)
- [CCR (Reversible Compression)](#ccr-reversible-compression)
- [Memory System](#memory-system)
- [MCP Integration](#mcp-integration)
- [Proxy Deployment](#proxy-deployment)
- [CLI Wrappers](#cli-wrappers)

<a id='introduction'></a>

## Introduction

### Related Pages

Related topics: [Getting Started](#getting-started), [Architecture](#architecture)

<details>
<summary>Related Source Files</summary>

The following source files were used to generate this page:

- [README.md](https://github.com/chopratejas/headroom/blob/main/README.md)
- [headroom/cli/wrap.py](https://github.com/chopratejas/headroom/blob/main/headroom/cli/wrap.py)
- [examples/README.md](https://github.com/chopratejas/headroom/blob/main/examples/README.md)
- [headroom/cli/learn.py](https://github.com/chopratejas/headroom/blob/main/headroom/cli/learn.py)
- [headroom/cli/init.py](https://github.com/chopratejas/headroom/blob/main/headroom/cli/init.py)
- [examples/langchain_demo/README.md](https://github.com/chopratejas/headroom/blob/main/examples/langchain_demo/README.md)
</details>

# Introduction

Headroom is a context compression framework designed to reduce token usage and costs when working with large language models (LLMs) in AI-assisted coding workflows. By intelligently compressing conversation history, tool outputs, and context before sending to the LLM, Headroom achieves 60–90% token savings while preserving critical information.

## Overview

Headroom intercepts and optimizes AI traffic through multiple integration points:

| Integration Method | Use Case | Configuration |
|-------------------|----------|---------------|
| CLI Wrapper (`headroom wrap`) | Claude Code, Codex, Continue, Goose, OpenHands | `headroom wrap claude` |
| SDK Integration | Python/TypeScript applications | `withHeadroom(new Anthropic())` |
| ASGI Middleware | Web applications | `app.add_middleware(CompressionMiddleware)` |
| MCP Server | Model Context Protocol clients | `headroom mcp install` |
| Proxy Server | Any HTTP-based LLM traffic | `headroom proxy --port 8080` |

Source: [README.md:1-25]()

## Core Architecture

The Headroom pipeline exposes one stable request lifecycle across all integration methods:

```
Setup → Pre-Start → Post-Start → Input Received → Input Cached → Input Routed → Input Compressed → Input Remembered → Pre-Send → Post-Send → Response Received
```

### Transform Components

| Component | Purpose | Savings |
|-----------|--------|---------|
| **SmartCrusher** | Universal JSON compression (arrays, nested objects, mixed types) | 40–70% |
| **CodeCompressor** | AST-aware compression for Python, JS, Go, Rust, Java, C++ | 50–80% |
| **Kompress-base** | HuggingFace model trained on agentic traces | 40–90% |
| **CacheAligner** | Stabilizes prefixes for Anthropic/OpenAI KV cache hits | Variable |
| **IntelligentContext** | Score-based context fitting with learned importance | 30–60% |
| **CCR (Context Compression Retrieval)** | Reversible compression with on-demand retrieval | 40–80% |

Source: [README.md:45-60]()

### Extension Seams

- **Pipeline extensions** — observe or customize lifecycle stages via `on_pipeline_event(...)`
- **Compression hooks** — additional extension points alongside the canonical lifecycle
- **Proxy extensions** — server/app integration seam for ASGI middleware, routes, and startup policy

Source: [README.md:55-58]()

## CLI Wrappers

The `headroom wrap` command provides zero-configuration setup for popular AI coding assistants:

```bash
headroom wrap claude                    # Start everything
headroom wrap claude --memory           # With persistent memory
headroom wrap claude --resume <id>      # Resume a session
headroom wrap claude --code-graph       # With code graph intelligence
headroom wrap claude --no-context-tool  # Skip CLI context-tool setup
```

### Supported Agents

| Agent | Command | Key Features |
|-------|---------|--------------|
| Claude Code | `headroom wrap claude` | Memory sync, MCP retrieve, Serena integration |
| Codex | `headroom wrap codex` | RTK injection, MCP registration, config snapshot |
| Continue | `headroom wrap continue` | Config.toml modification, systemMessage injection |
| Goose | `headroom wrap goose` | Independent session handling |
| OpenHands | `headroom wrap openhands` | Recent support (v0.22.4) |
| OpenCode | Planned | Feature request [#74](https://github.com/chopratejas/headroom/issues/74) |

Source: [headroom/cli/wrap.py:1-50]()

## RTK Context Tool Integration

Headroom integrates with RTK (Rewritten Tool Kit) for CLI output compression. Commands are prefixed with `rtk` to achieve 60–90% savings:

```bash
# Files & Search (60-75% savings)
rtk ls <path>           rtk read <file>         rtk grep <pattern>
rtk find <pattern>      rtk diff <file>

# Test (90-99% savings) — shows failures only
rtk pytest tests/       rtk cargo test          rtk test <cmd>

# Build & Lint (80-90% savings) — shows errors only
rtk tsc                 rtk lint                rtk cargo build
rtk prettier --check    rtk mypy                rtk ruff check
```

The RTK block is injected into agent configuration files with idempotent markers (`<!-- headroom:rtk-instructions -->`) to prevent duplicate insertions.

Source: [headroom/cli/wrap.py:25-75]()

## SDK Integration

### Python SDK

```python
from anthropic import Anthropic
from headroom import with_headroom

client = with_headroom(Anthropic())
response = client.messages.create(
    model="claude-sonnet-4-6-20250514",
    max_tokens=1024,
    messages=[{"role": "user", "content": "Hello"}]
)
```

### TypeScript/SDK

```typescript
import { withHeadroom } from "@headroom/sdk";
import { createOpenAI } from "openai";

const model = withHeadroom(createOpenAI({ apiKey: process.env.OPENAI_API_KEY }));
```

### Other Framework Integrations

| Framework | Integration Method |
|-----------|-------------------|
| OpenAI SDK | `withHeadroom(new OpenAI())` |
| Vercel AI SDK | `wrapLanguageModel({ model, middleware: headroomMiddleware() })` |
| LangChain | `HeadroomChatModel(your_llm)` |
| Agno | `HeadroomAgnoModel(your_model)` |
| Strands | See [Strands guide](https://headroom-docs.vercel.app/docs/strands) |

Source: [README.md:30-40]()

## Memory System

Headroom provides cross-agent memory capabilities for persistent knowledge:

```bash
headroom wrap claude --memory  # Enable persistent cross-session memory
```

The memory system injects guidance into agent configuration:

```
## Memory

Use the `headroom_memory` MCP server for persistent cross-session knowledge.

**Before** answering questions about prior decisions, conventions, project context,
architecture, user preferences — call `memory_search` first.

**After** making durable decisions, discovering conventions — call `memory_save`.
```

Memory storage is per-project to prevent context bleeding between projects (fixed in v0.21.34).

Source: [headroom/cli/wrap.py:200-220]()

## Pipeline Lifecycle

```mermaid
graph TD
    A[Setup] --> B[Pre-Start]
    B --> C[Post-Start]
    C --> D[Input Received]
    D --> E[Input Cached]
    E --> F[Input Routed]
    F --> G[Input Compressed]
    G --> H[Input Remembered]
    H --> I[Pre-Send]
    I --> J[Post-Send]
    J --> K[Response Received]
    
    L[Transforms] --> F
    M[Extensions] --> D
    N[Hooks] --> G
```

Provider and tool-specific behavior lives under `headroom/providers/` so core orchestration stays focused on lifecycle, sequencing, and policy:

- **CLI/tool slices**: `headroom/providers/claude`, `copilot`, `codex`, `openai`, `gemini`
- **Core transforms**: CacheAligner, ContentRouter, SmartCrusher, CodeCompressor

Source: [README.md:50-65]()

## Learning from Failures

The `headroom learn` command analyzes past tool call failures to generate preventive context:

```bash
headroom learn                        # Auto-detect agent & model
headroom learn --apply                # Write recommendations to context files
headroom learn --model gpt-4o         # Use specific model for analysis
headroom learn --all                  # Analyze all discovered projects
```

Plugin architecture supports multiple coding agents with built-in support for Claude Code, Codex, and Gemini CLI.

Source: [headroom/cli/learn.py:1-50]()

## Known Limitations

### MCP Endpoint Availability

The `headroom proxy` command does **not** expose an HTTP MCP endpoint at `/mcp`. The stdio MCP server works correctly, but the HTTP endpoint returns 404. See [issue #460](https://github.com/chopratejas/headroom/issues/460) for details.

### CCR Multi-Agent Attribution

In multi-agent setups, CCR proactive expansion may corrupt message attribution when injected into messages containing XML markup (`<peer_turn from="AgentX">`). See [issue #503](https://github.com/chopratejas/headroom/issues/503).

### Provider-Agnostic Proxy

The proxy currently intercepts traffic at the Anthropic API level (`/v1/messages`). Users on AWS Bedrock, OpenAI, or Google Vertex cannot use the proxy due to provider-specific SDKs. See [issue #510](https://github.com/chopratejas/headroom/issues/510).

### LiteLLM Security Concern

The `litellm` PyPI package was subject to a supply chain attack in version 1.82.8. See [issue #56](https://github.com/chopratejas/headroom/issues/56) for mitigation recommendations.

Source: [README.md:1-30]()

## Contributing

```bash
git clone https://github.com/chopratejas/headroom.git && cd headroom
pip install -e ".[dev]" && pytest
```

Devcontainers available in `.devcontainer/` (default + `memory-stack` with Qdrant & Neo4j).

Source: [README.md:100-105]()

## Community Resources

| Resource | Link |
|----------|------|
| Live Leaderboard | [headroomlabs.ai/dashboard](https://headroomlabs.ai/dashboard) — 60B+ tokens saved |
| Discord | [discord.gg/yRmaUNpsPJ](https://discord.gg/yRmaUNpsPJ) |
| HuggingFace Model | [huggingface.co/chopratejas/kompress-base](https://huggingface.co/chopratejas/kompress-base) |

## Recent Releases

| Version | Date | Key Changes |
|---------|------|-------------|
| v0.22.4 | Latest | wrap CLI breadth for cline, continue, goose, openhands |
| v0.22.2 | 2026-05-20 | Memory IDs exposure in auto-tail + memory_list tool |
| v0.22.0 | 2026-05-19 | `--exclude-tools` flag + `HEADROOM_EXCLUDE_TOOLS` env var |
| v0.21.34 | 2026-05-13 | Per-project memory storage (fixes #462) |
| v0.21.33 | 2026-05-13 | Narrow compressed type for mypy 1.14 compatibility |

Source: [README.md:20-45]()

## Next Steps

- **[Installation Guide](../installation)** — Set up Headroom for your preferred integration method
- **[Quick Start](../quickstart)** — Get started with `headroom wrap` in under 5 minutes
- **[SDK Reference](../sdk)** — Detailed API documentation for Python and TypeScript SDKs
- **[Proxy Configuration](../proxy)** — Advanced proxy setup and configuration options
- **[MCP Integration](../mcp)** — Connect Headroom as a Model Context Protocol server

---

<a id='getting-started'></a>

## Getting Started

### Related Pages

Related topics: [Introduction](#introduction), [Proxy Deployment](#proxy-deployment)

<details>
<summary>Related Source Files</summary>

The following source files were used to generate this page:

- [README.md](https://github.com/chopratejas/headroom/blob/main/README.md)
- [headroom/cli/wrap.py](https://github.com/chopratejas/headroom/blob/main/headroom/cli/wrap.py)
- [headroom/cli/main.py](https://github.com/chopratejas/headroom/blob/main/headroom/cli/main.py)
- [examples/README.md](https://github.com/chopratejas/headroom/blob/main/examples/README.md)
- [sdk/typescript/src/index.ts](https://github.com/chopratejas/headroom/blob/main/sdk/typescript/src/index.ts)
- [headroom/cli/learn.py](https://github.com/chopratejas/headroom/blob/main/headroom/cli/learn.py)
- [sdk/typescript/examples/shared-context-multi-agent.ts](https://github.com/chopratejas/headroom/blob/main/sdk/typescript/examples/shared-context-multi-agent.ts)
</details>

# Getting Started

Headroom is a context compression platform for AI coding assistants that reduces token usage by 40-90% while preserving relevance. It intercepts LLM API traffic through a local proxy, compresses conversation context using ML-based transforms, and restores original content when needed via CCR (Context Compression & Retrieval) markers.

## Prerequisites

Before installing Headroom, ensure you have:

| Requirement | Version/Details |
|-------------|-----------------|
| Python | 3.10+ |
| API Key | Anthropic, OpenAI, or compatible provider |
| Supported OS | Linux, macOS, Windows |
| Package Manager | pip, uv, or conda |

## Installation

### Standard Installation

Install Headroom with all dependencies:

```bash
pip install headroom
```

### Development Installation

For contributing or testing latest features:

```bash
git clone https://github.com/chopratejas/headroom.git && cd headroom
pip install -e ".[dev]" && pytest
```

Source: [README.md:1-10]()

### Dev Container

Headroom provides pre-configured devcontainers:

```bash
# Default devcontainer (basic Python development)
# .devcontainer/ directory

# Memory-stack devcontainer (with Qdrant & Neo4j)
# .devcontainer/memory-stack/
```

## Quick Start with Claude

The fastest way to use Headroom is with the `headroom wrap` command, which starts the proxy and configures Claude Code automatically:

```bash
headroom wrap claude
```

This single command:
1. Starts the Headroom proxy on the default port
2. Configures Claude Code to route API traffic through the proxy
3. Sets up the RTK context tool for efficient CLI output
4. Registers the MCP retrieve tool for CCR decompression

### Options

| Flag | Description |
|------|-------------|
| `--memory` | Enable persistent cross-session memory |
| `--resume <id>` | Resume a previous session |
| `--no-context-tool` | Skip RTK/lean-ctx CLI tool setup |
| `--no-mcp` | Skip MCP retrieve tool registration |
| `--no-serena` | Skip Serena MCP server registration |
| `--code-graph` | Enable code graph indexing via codebase-memory-mcp |
| `--no-proxy` | Use existing proxy instead of starting one |
| `--learn` | Enable live traffic learning (patterns saved to AGENTS.md) |
| `--port <n>` | Custom proxy port (default: 8080) |

Example with memory enabled:

```bash
headroom wrap claude --memory
```

Source: [headroom/cli/wrap.py:1-50]()

## Quick Start with Codex

Headroom also supports OpenCode's Codex:

```bash
headroom wrap codex
```

For Codex-specific options:

| Flag | Description |
|------|-------------|
| `--backend anyllm` | Use any-llm backend |
| `--anyllm-provider <provider>` | Provider for any-llm: openai, mistral, groq, etc. |
| `--region <region>` | Cloud region for Bedrock/Vertex |

Source: [headroom/cli/wrap.py:200-280]()

## SDK Integration

### Python SDK

#### Basic Usage

```python
from headroom import Headroom

# Initialize with your API key
h = Headroom(api_key="sk-ant-...")

# Compress a prompt
result = h.compress("Your long prompt here...")
print(result.compressed)      # Compressed text
print(result.original_tokens) # Original token count
print(result.saved_tokens)    # Tokens saved
```

#### Streaming Responses

```python
from headroom import Headroom

h = Headroom(api_key="sk-ant-...")

# Streaming with automatic compression
for chunk in h.stream("Your prompt"):
    print(chunk, end="", flush=True)
```

#### Integration with Anthropic SDK

```python
from anthropic import Anthropic
from headroom import with_headroom

# Wrap any SDK client
client = with_headroom(Anthropic())

# All calls are automatically compressed
response = client.messages.create(
    model="claude-sonnet-4-6-20250514",
    max_tokens=1024,
    messages=[{"role": "user", "content": "Your prompt"}]
)
```

Source: [headroom/cli/main.py:1-50]()

### TypeScript SDK

#### Installation

```bash
npm install @headroom/sdk
# or
yarn add @headroom/sdk
# or
pnpm add @headroom/sdk
```

#### Basic Usage

```typescript
import { Headroom } from "@headroom/sdk";

const headroom = new Headroom({
  apiKey: process.env.ANTHROPIC_API_KEY,
});

const result = await headroom.compress({
  content: "Your long prompt here...",
});

console.log(result.compressed);
console.log(`Saved ${result.savingsPercent.toFixed(0)}% tokens`);
```

#### Streaming

```typescript
import { generateText } from "ai";
import { headroomMiddleware } from "@headroom/sdk/middleware";

const result = await generateText({
  model: headroomMiddleware({
    model: yourModel,
    apiKey: process.env.ANTHROPIC_API_KEY,
  }),
  prompt: "Your prompt",
});
```

#### Shared Context (Multi-Agent)

```typescript
import { SharedContext } from "@headroom/sdk";

const ctx = new SharedContext({
  projectId: "my-agent-team",
});

// Agent 1: Researcher
await ctx.put("k8s-scaling-research", researchData);

// Agent 2: Writer (reads compressed context)
const compressed = await ctx.get("k8s-scaling-research");
console.log(`Reading compressed context (${compressed?.length ?? 0} chars)`);

// Stats
const stats = ctx.stats();
console.log(`Total saved: ${stats.totalTokensSaved} (${stats.savingsPercent.toFixed(0)}%)`);
```

Source: [sdk/typescript/src/index.ts:1-80]()

## Configuration

### Environment Variables

| Variable | Description | Default |
|----------|-------------|---------|
| `HEADROOM_API_KEY` | API key for LLM provider | Required |
| `HEADROOM_PROXY_PORT` | Proxy port | 8080 |
| `HEADROOM_BACKEND` | Backend type: anthropic, anyllm, litellm-vertex | anthropic |
| `HEADROOM_ANYLLM_PROVIDER` | Provider for any-llm backend | - |
| `HEADROOM_REGION` | Cloud region for Bedrock/Vertex | - |
| `HEADROOM_EXCLUDE_TOOLS` | Comma-separated tool names to exclude | - |
| `HEADROOM_CONTEXT_TOOL` | CLI context tool: rtk, lean-ctx | rtk |

### Compression Profiles

Headroom supports multiple compression strategies:

| Profile | Description | Typical Savings |
|---------|-------------|-----------------|
| `balanced` | Default profile, good for most use cases | 50-70% |
| `aggressive` | Maximum compression, may lose some detail | 70-90% |
| `conservative` | Minimal compression, preserves more detail | 30-50% |
| `custom` | User-defined weights for different transform types | Varies |

### Compression Transforms

Headroom uses multiple compression transforms:

| Transform | Purpose | Savings |
|-----------|---------|---------|
| **SmartCrusher** | Universal JSON compression (arrays, dicts, nested objects) | 40-60% |
| **CodeCompressor** | AST-aware compression for Python, JS, Go, Rust, Java, C++ | 60-75% |
| **Kompress-base** | HuggingFace model trained on agentic traces | 40-90% |
| **CacheAligner** | Stabilizes prefixes for KV cache efficiency | Variable |
| **IntelligentContext** | Score-based context fitting | 30-50% |
| **CCR** | Reversible compression with on-demand retrieval | 50-80% |

Source: [README.md:50-100]()

## MCP Server Setup

Model Context Protocol (MCP) enables Headroom to retrieve compressed content when needed.

### Installation

```bash
headroom mcp install
```

### Available MCP Tools

| Tool | Description |
|------|-------------|
| `headroom_retrieve` | Retrieves original content for CCR markers |
| `headroom_stats` | Shows compression statistics |
| `headroom_memory_search` | Search persistent memory (requires `--memory`) |
| `headroom_memory_save` | Save to persistent memory (requires `--memory`) |

### Note on MCP Endpoint

> **Important:** The `headroom proxy` command does not expose an HTTP MCP endpoint at `/mcp`. The MCP server uses stdio transport and must be configured in your IDE/editor's MCP settings. See [Issue #460](https://github.com/chopratejas/headroom/issues/460) for details.

Source: [headroom/cli/main.py:100-150]()

## Memory System

Headroom provides persistent cross-session memory using vector storage:

### Enable Memory

```bash
headroom wrap claude --memory
```

### Memory Features

- **Per-project storage**: Memories are isolated per project directory
- **Auto-dedup**: Duplicate memories are automatically filtered
- **Agent provenance**: Tracks which agent saved each memory
- **Semantic search**: Query past decisions, conventions, and context

### Usage in Claude

When memory is enabled, Claude Code automatically:

1. **Searches memory** before answering questions about past decisions
2. **Saves important facts** discovered during the session
3. **References project context** from previous sessions

Source: [headroom/cli/wrap.py:300-350]()

## Learning System

Headroom can analyze your coding patterns and optimize compression:

```bash
headroom learn --project /path/to/project --apply
```

### Options

| Flag | Description |
|------|-------------|
| `--project <path>` | Project directory to analyze |
| `--all` | Analyze all discovered projects |
| `--apply` | Write recommendations to context/memory files |
| `--agent <name>` | Specific agent to analyze (claude, codex, gemini, auto) |
| `--model <model>` | LLM model for analysis |
| `--workers <n>` | Parallel workers (default: auto) |

Source: [headroom/cli/learn.py:1-60]()

## Examples

The repository includes comprehensive examples:

### Python Examples

```bash
# Basic usage
export OPENAI_API_KEY='your-key'
python examples/basic_usage.py

# Anthropic integration
export ANTHROPIC_API_KEY='your-key'
python examples/anthropic_example.py

# Streaming
python examples/streaming_example.py

# Evaluation
python examples/smart_vs_naive_eval.py
python examples/real_world_eval.py
```

### TypeScript Examples

```bash
# Shared context multi-agent
npx tsx sdk/typescript/examples/shared-context-multi-agent.ts
```

### LangChain Integration

```bash
# Compression demo
PYTHONPATH=. python -m examples.langchain_demo.show_compression

# Full comparison
export OPENAI_API_KEY='your-key'
PYTHONPATH=. python -m examples.langchain_demo.run_comparison
```

Source: [examples/README.md:1-80]()

## Next Steps

| Topic | Description |
|-------|-------------|
| [CLI Reference](cli.md) | Full documentation of `headroom` commands |
| [Proxy Configuration](proxy.md) | Advanced proxy settings and backends |
| [Memory System](memory.md) | Deep dive into cross-session memory |
| [SDK Reference](sdk.md) | Complete API documentation |
| [Compression Internals](compression.md) | How Headroom's transforms work |
| [Contributing](../contributing.md) | Development setup and guidelines |

## Troubleshooting

### Claude not found

```
Error: 'claude' not found in PATH.
Install Claude Code: https://docs.anthropic.com/en/docs/claude-code
```

**Solution:** Install Claude Code or use the SDK directly.

### MCP retrieve tool not working

**Symptoms:** CCR markers appear but content isn't retrieved.

**Solutions:**
1. Ensure `--no-mcp` was not used
2. Check MCP server is registered in your IDE settings
3. Verify proxy is running: `headroom status`

### Token savings lower than expected

**Possible causes:**
- Short prompts (less data to compress)
- Already compressed content
- High relevance content that can't be safely reduced

**Solutions:**
- Enable more aggressive compression profiles
- Use `--learn` to optimize for your patterns

## Community

- **[Discord](https://discord.gg/yRmaUNpsPJ)** — Questions, feedback, support
- **[Live leaderboard](https://headroomlabs.ai/dashboard)** — 60B+ tokens saved and counting
- **[HuggingFace](https://huggingface.co/chopratejas/kompress-base)** — Kompress-base model

---

<a id='architecture'></a>

## Architecture

### Related Pages

Related topics: [Compression Pipeline](#compression-pipeline), [CCR (Reversible Compression)](#ccr-reversible-compression)

<details>
<summary>Related Source Files</summary>

The following source files were used to generate this page:

- [crates/headroom-proxy/src/proxy.rs](https://github.com/chopratejas/headroom/blob/main/crates/headroom-proxy/src/proxy.rs)
- [crates/headroom-core/src/lib.rs](https://github.com/chopratejas/headroom/blob/main/crates/headroom-core/src/lib.rs)
- [headroom/pipeline.py](https://github.com/chopratejas/headroom/blob/main/headroom/pipeline.py)
- [headroom/cli/wrap.py](https://github.com/chopratejas/headroom/blob/main/headroom/cli/wrap.py)
- [crates/headroom-py/src/lib.rs](https://github.com/chopratejas/headroom/blob/main/crates/headroom-py/src/lib.rs)
- [sdk/typescript/examples/shared-context-multi-agent.ts](https://github.com/chopratejas/headroom/blob/main/sdk/typescript/examples/shared-context-multi-agent.ts)
</details>

# Architecture

## Overview

Headroom is a context compression proxy and SDK designed to reduce token costs when running AI coding agents. The architecture follows a layered design with a Rust-based proxy core, Python SDK, and TypeScript SDK that exposes a unified request lifecycle across all integration paths.

The system intercepts LLM API calls at the proxy layer, applies a pipeline of compression transforms, and routes compressed requests to upstream providers while maintaining the ability to retrieve original content via CCR markers.

Source: [crates/headroom-core/src/lib.rs](https://github.com/chopratejas/headroom/blob/main/crates/headroom-core/src/lib.rs)()

## High-Level Architecture

```mermaid
graph TD
    subgraph "Client Layer"
        CLI[CLI<br/>headroom wrap]
        SDK_PY[Python SDK]
        SDK_TS[TypeScript SDK]
        MCP[MCP Clients]
    end

    subgraph "Proxy Layer"
        PROXY[Headroom Proxy]
        MIDDLEWARE[ASGI Middleware]
    end

    subgraph "Core Engine"
        PIPELINE[Compression Pipeline]
        TRANSFORMS[Transforms]
        MEMORY[Memory System]
    end

    subgraph "Transforms"
        CC[CacheAligner]
        CR[ContentRouter]
        SC[SmartCrusher]
        CODEC[CodeCompressor]
        KB[Kompress-base]
        IC[IntelligentContext]
    end

    subgraph "Storage"
        QDRANT[Qdrant]
        NEO4J[Neo4j]
        SQLITE[SQLite]
    end

    CLI --> PROXY
    SDK_PY --> PROXY
    SDK_TS --> PROXY
    MCP --> MIDDLEWARE
    
    PROXY --> PIPELINE
    MIDDLEWARE --> PIPELINE
    
    PIPELINE --> TRANSFORMS
    TRANSFORMS --> MEMORY
    
    MEMORY --> QDRANT
    MEMORY --> NEO4J
    MEMORY --> SQLITE

    style PROXY fill:#4a90d9
    style PIPELINE fill:#5ba85b
    style TRANSFORMS fill:#d94a4a
```

## Request Lifecycle

All compression passes through a stable, 11-stage request lifecycle that exposes consistent hooks regardless of integration method:

`Setup` → `Pre-Start` → `Post-Start` → `Input Received` → `Input Cached` → `Input Routed` → `Input Compressed` → `Input Remembered` → `Pre-Send` → `Post-Send` → `Response Received`

```mermaid
graph LR
    A[Setup] --> B[Pre-Start]
    B --> C[Post-Start]
    C --> D[Input Received]
    D --> E[Input Cached]
    E --> F[Input Routed]
    F --> G[Input Compressed]
    G --> H[Input Remembered]
    H --> I[Pre-Send]
    I --> J[Post-Send]
    J --> K[Response Received]
    
    style A fill:#f0f0f0
    style G fill:#5ba85b
    style K fill:#4a90d9
```

### Lifecycle Stages

| Stage | Purpose | Extension Point |
|-------|---------|-----------------|
| Setup | Initialize transforms, load config | `on_pipeline_event()` |
| Pre-Start | Prepare upstream connection | `on_pipeline_event()` |
| Post-Start | Confirm upstream health | `on_pipeline_event()` |
| Input Received | Capture raw request | `on_pipeline_event()` |
| Input Cached | Check KV cache alignment | CacheAligner |
| Input Routed | Route to appropriate compression path | ContentRouter |
| Input Compressed | Apply compression transforms | SmartCrusher, CodeCompressor, Kompress-base |
| Input Remembered | Store in cross-agent memory | Memory system |
| Pre-Send | Finalize compressed request | `on_pipeline_event()` |
| Post-Send | Record outcome metrics | RequestOutcome funnel |
| Response Received | Process streaming/final response | Compression hooks |

Source: [headroom/pipeline.py](https://github.com/chopratejas/headroom/blob/main/headroom/pipeline.py)()

## Compression Transforms

The transform layer applies specialized compression algorithms. Each transform handles a specific content type.

### Transform Components

| Transform | Function | Reduction |
|-----------|----------|-----------|
| **CacheAligner** | Stabilizes prefixes so Anthropic/OpenAI KV caches hit | Indirect |
| **ContentRouter** | Routes content to appropriate compression path | 10-40% |
| **SmartCrusher** | Universal JSON compression (arrays, nested objects) | 60-90% |
| **CodeCompressor** | AST-aware for Python, JS, Go, Rust, Java, C++ | 60-75% |
| **Kompress-base** | HuggingFace model for ML-based token compression | 40-90% |
| **IntelligentContext** | Score-based context fitting with learned importance | Variable |
| **RollingWindow** | Fixed-context summarization | Variable |

### SmartCrusher Configuration

```python
@dataclass
class SmartCrusherConfig:
    enabled: bool = True
    min_items_to_analyze: int = 3
    min_tokens_to_crush: int = 500
    max_items_after_crush: int = 50
    relevance_threshold: float = 0.3
    enable_ccr_marker: bool = True
```

Source: [crates/headroom-py/src/lib.rs](https://github.com/chopratejas/headroom/blob/main/crates/headroom-py/src/lib.rs)()

### CodeCompressor

AST-aware compression supports:
- Python (via `ast` module)
- JavaScript, Go, Rust, Java, C++ (via tree-sitter)

Requires optional dependency: `pip install headroom-ai[code]`

Enabled via `--code-aware` flag or `HEADROOM_CODE_AWARE_ENABLED=1` environment variable.

Source: [headroom/cli/proxy.py](https://github.com/chopratejas/headroom/blob/main/headroom/cli/proxy.py)()

## Memory System

### Architecture

```mermaid
graph TD
    subgraph "Memory Layer"
        MEM[Memory Manager]
        RANKER[MemoryRanker]
        DECISION[MemoryDecision]
    end
    
    subgraph "Storage Backends"
        QDRANT[Qdrant<br/>Vector Search]
        NEO4J[Neo4j<br/>Graph]
        SQLITE[SQLite<br/>Project-local]
    end
    
    subgraph "Tools"
        SEARCH[memory_search]
        SAVE[memory_save]
        LIST[memory_list]
    end
    
    MEM --> RANKER
    MEM --> DECISION
    MEM --> QDRANT
    MEM --> NEO4J
    MEM --> SQLITE
    
    SEARCH --> MEM
    SAVE --> MEM
    LIST --> MEM
```

### Per-Project Storage

Memory storage is isolated per project to prevent cross-contamination:

> **Bug Fix**: v0.21.34 introduced per-project storage so projects can no longer bleed memories.

Source: [Release v0.21.34](https://github.com/chopratejas/headroom/releases/tag/v0.21.34)()

### Memory Integration in CLI

The `headroom wrap` command injects memory guidance into `AGENTS.md`:

```python
def _inject_memory_agents_md(file_path: Path) -> bool:
    """Inject memory usage guidance into AGENTS.md.

    Idempotent — skips if marker already present.
    """
    memory_block = (
        f"{_MEMORY_AGENTS_MARKER}\n"
        "## Memory\n\n"
        "Use the `headroom_memory` MCP server for persistent cross-session knowledge.\n\n"
        "**Before** answering questions about prior decisions, conventions, project context,\n"
        "architecture, user preferences — call `memory_search` first.\n\n"
        "**After** making durable decisions — call `memory_save` to persist them.\n\n"
    )
```

Source: [headroom/cli/wrap.py](https://github.com/chopratejas/headroom/blob/main/headroom/cli/wrap.py)()

## Proxy Architecture

### Request Flow

```mermaid
sequenceDiagram
    participant Client
    participant Proxy as Headroom Proxy
    participant Pipeline
    participant Upstream as LLM Provider
    
    Client->>Proxy: /v1/messages (raw)
    Proxy->>Pipeline: Input Received
    Pipeline->>Pipeline: Input Cached
    Pipeline->>Pipeline: Input Routed
    Pipeline->>Pipeline: Input Compressed
    Pipeline->>Pipeline: Input Remembered
    Pipeline->>Proxy: Compressed request
    Proxy->>Upstream: /v1/messages (compressed)
    Upstream->>Proxy: Response
    Proxy->>Pipeline: Response Received
    Pipeline->>Pipeline: Post-Send (outcome)
    Proxy->>Client: Streaming/Final response
```

### Proxy Configuration

| Option | Env Var | Default | Description |
|--------|---------|---------|-------------|
| `--port` | `HEADROOM_PORT` | 8787 | Proxy port |
| `--backend` | `HEADROOM_BACKEND` | anthropic | API backend |
| `--memory` | - | false | Enable memory |
| `--code-graph` | - | false | Code graph indexing |
| `--budget` | `HEADROOM_BUDGET` | None | Daily budget limit (USD) |
| `--exclude-tools` | `HEADROOM_EXCLUDE_TOOLS` | None | Tools to skip |
| `--code-aware` | `HEADROOM_CODE_AWARE_ENABLED` | false | AST-based compression |

Source: [headroom/cli/proxy.py](https://github.com/chopratejas/headroom/blob/main/headroom/cli/proxy.py)()

### RequestOutcome Funnel

v0.21.38 introduced the `RequestOutcome` funnel to collapse streaming finalizers:

> **Refactor**: proxy: introduce RequestOutcome funnel; collapse 3 streaming finalizers

Source: [Release v0.21.38](https://github.com/chopratejas/headroom/releases/tag/v0.21.38)()

## Integration Architecture

### SDK Integration Points

| Integration | Method |
|-------------|--------|
| Python app | `compress(messages, model=...)` |
| TypeScript app | `await compress(messages, { model })` |
| Anthropic/OpenAI SDK | `withHeadroom(new Anthropic())` |
| Vercel AI SDK | `wrapLanguageModel({ model, middleware: headroomMiddleware() })` |
| LiteLLM | `litellm.callbacks = [HeadroomCallback()]` |
| LangChain | `HeadroomChatModel(your_llm)` |
| Agno | `HeadroomAgnoModel(your_model)` |
| ASGI apps | `app.add_middleware(CompressionMiddleware)` |

Source: [README.md](https://github.com/chopratejas/headroom/blob/main/README.md)()

### CLI Wrapper Architecture

```mermaid
graph TD
    subgraph "headroom wrap <agent>"
        WRAP[wrap.py]
        RTK[RTK Setup]
        MCP[MCP Registration]
        PROXY[Proxy Startup]
    end
    
    subgraph "Agent Types"
        CLAUDE[Claude]
        CODEX[Codex]
        OPENCODE[OpenCode]
        COPILOT[Copilot]
        AIDER[Aider]
        OPENCLAW[OpenClaw]
    end
    
    WRAP --> RTK
    WRAP --> MCP
    WRAP --> PROXY
    
    PROXY --> CLAUDE
    PROXY --> CODEX
    PROXY --> OPENCODE
    PROXY --> COPILOT
    PROXY --> AIDER
    PROXY --> OPENCLAW
```

Each agent wrapper:
1. Snapshots pre-wrap config (e.g., `~/.codex/config.toml`)
2. Sets up CLI context tool (RTK or lean-ctx)
3. Registers MCP server for CCR retrieval
4. Starts proxy if not already running
5. Launches the agent

Source: [headroom/cli/wrap.py](https://github.com/chopratejas/headroom/blob/main/headroom/cli/wrap.py)()

## Multi-Agent Shared Context

### Architecture

```mermaid
graph LR
    subgraph "Agent A"
        CA[Claude]
    end
    
    subgraph "Agent B"  
        CB[Codex]
    end
    
    subgraph "Shared Context"
        SC[SharedContext<br/>.put() / .get()]
    end
    
    CA <--> SC
    CB <--> SC
    
    SC --> COMPRESS[Compression]
    COMPRESS --> TRANSFORM[Transforms]
```

### Usage Example

```typescript
import { SharedContext } from "@headroom/sdk";

// Create shared context
const ctx = new SharedContext({ 
    projectId: "k8s-scaling-research" 
});

// Agent A: Publish findings
await ctx.put("k8s-scaling-research", {
    role: "assistant",
    content: "Research findings on K8s autoscaling..."
});

// Agent B: Retrieve compressed context
const compressed = await ctx.get("k8s-scaling-research");
```

Source: [sdk/typescript/examples/shared-context-multi-agent.ts](https://github.com/chopratejas/headroom/blob/main/sdk/typescript/examples/shared-context-multi-agent.ts)()

## CCR (Compress-Cache-Retrieve)

CCR provides reversible compression:

1. **Compress**: Original content stored, marker inserted
2. **Cache**: Markers indexed for retrieval
3. **Retrieve**: Agent calls `headroom_retrieve` tool to expand marker

### MCP Registration for Retrieval

```python
# Register headroom MCP server in ~/.codex/config.toml so Codex can
# call headroom_retrieve on compression markers from the proxy.
if not no_mcp:
    from headroom.mcp_registry import CodexRegistrar
    _setup_headroom_mcp(CodexRegistrar(), port, verbose=verbose, force=True)
```

Source: [headroom/cli/wrap.py](https://github.com/chopratejas/headroom/blob/main/headroom/cli/wrap.py)()

### Known Limitations

> **[BUG #503]**: CCR proactive expansion blocks corrupt message attribution in multi-agent threads. The `_append_context_to_latest_non_frozen_user_turn()` function injects proactive expansion blocks into the latest user message content. In multi-agent setups, that message can contain structured XML attribution markup (`<peer_turn from="AgentX">`). The injected block ends up corrupting the attribution.

Source: [GitHub Issue #503](https://github.com/chopratejas/headroom/issues/503)()

## Provider Architecture

Provider-specific behavior lives under `headroom/providers/` to keep core orchestration focused:

```
headroom/providers/
├── claude/      # Claude Code integration
├── copilot/     # GitHub Copilot CLI
├── codex/       # OpenAI Codex
└── open/        # OpenAI native clients
```

This separation ensures:
- Core pipeline remains provider-agnostic
- Provider-specific auth and routing handled at edges
- New providers can be added without modifying core logic

Source: [README.md](https://github.com/chopratejas/headroom/blob/main/README.md)()

## Rust Extension

The Rust extension (`headroom-core`) provides performance-critical transforms:

### Exports to Python

```rust
use headroom_core::transforms::{
    compress_openai_responses_live_zone,
    detect as rust_detect_chain,
    is_json_array_of_dicts,
    LogCompressor,
    SearchCompressor,
    DiffCompressor,
    DiffCompressorConfig,
};
```

### Build Optimizations

v0.21.37 introduced wheel size optimizations:

> **Build**: shrink Rust extension wheels (strip + thin-LTO + single codegen unit)

Source: [Release v0.21.37](https://github.com/chopratejas/headroom/releases/tag/v0.21.37)()

## Extension Points

### Pipeline Extensions

- `on_pipeline_event(...)` — Hook into lifecycle stages
- Compression hooks — Additional seam alongside canonical lifecycle
- Proxy extensions — ASGI middleware, routes, startup policy

### Plugin System

```python
# headroom learn registers via entry point
# 'headroom.learn_plugin'
```

Source: [headroom/cli/learn.py](https://github.com/chopratejas/headroom/blob/main/headroom/cli/learn.py)()

## Observability

### RTK Metrics

RTK (Rewrite Tool Kit) metrics are wired into the observability stack:

> **Fix**: fix(observability): RTK metrics + Rust observability (Phase H blocker)

Source: [Release v0.22.4](https://github.com/chopratejas/headroom/releases/tag/v0.22.4)()

### Logging Options

| Option | Purpose |
|--------|---------|
| `--log-file` | Path to JSONL log file |
| `--log-messages` | Full message logging (request/response content) |
| `--codex-wire-debug` | Local Codex wire snapshots + proxy.log traces |

Source: [headroom/cli/proxy.py](https://github.com/chopratejas/headroom/blob/main/headroom/cli/proxy.py)()

## Development Setup

```bash
git clone https://github.com/chopratejas/headroom.git && cd headroom
pip install -e ".[dev]" && pytest
```

Devcontainers available in `.devcontainer/`:
- Default
- `memory-stack` with Qdrant & Neo4j

Source: [README.md](https://github.com/chopratejas/headroom/blob/main/README.md)()

## Related Documentation

- [Contributing Guide](CONTRIBUTING.md)
- [MCP Setup](docs/) — Note: `/mcp` HTTP endpoint returns 404; stdio MCP server works ([Issue #460](https://github.com/chopratejas/headroom/issues/460))
- [Provider-agnostic proxy mode](https://github.com/chopratejas/headroom/issues/510) — Planned for Bedrock, OpenAI, Vertex support

---

<a id='compression-pipeline'></a>

## Compression Pipeline

### Related Pages

Related topics: [Architecture](#architecture), [Compression Algorithms](#compression-algorithms)

<details>
<summary>Related Source Files</summary>

The following source files were used to generate this page:

- [README.md](https://github.com/chopratejas/headroom/blob/main/README.md)
- [headroom/transforms/pipeline.py](https://github.com/chopratejas/headroom/blob/main/headroom/transforms/pipeline.py)
- [headroom/transforms/content_router.py](https://github.com/chopratejas/headroom/blob/main/headroom/transforms/content_router.py)
- [headroom/transforms/smart_crusher.py](https://github.com/chopratejas/headroom/blob/main/headroom/transforms/smart_crusher.py)
- [sdk/typescript/src/hooks.ts](https://github.com/chopratejas/headroom/blob/main/sdk/typescript/src/hooks.ts)
- [sdk/typescript/src/types.ts](https://github.com/chopratejas/headroom/blob/main/sdk/typescript/src/types.ts)
- [crates/headroom-py/src/lib.rs](https://github.com/chopratejas/headroom/blob/main/crates/headroom-py/src/lib.rs)
- [crates/headroom-core/src/transforms/mod.rs](https://github.com/chopratejas/headroom/blob/main/crates/headroom-core/src/transforms/mod.rs)
- [examples/langchain_demo/README.md](https://github.com/chopratejas/headroom/blob/main/examples/langchain_demo/README.md)
</details>

# Compression Pipeline

The Compression Pipeline is Headroom's core orchestration system for reducing token usage in LLM requests. It exposes a single, stable request lifecycle that operates consistently across the Python SDK, TypeScript SDK, CLI, and proxy server. The pipeline sequences multiple transform components to analyze, route, and compress content while preserving critical information through the CCR (Compress-Cache-Retrieve) pattern.

## Architecture Overview

The pipeline follows a deterministic lifecycle with defined stages, configurable transforms, and extension points for hooks and plugins. Each request flows through the same stages regardless of entry point (SDK, CLI, or proxy), ensuring predictable behavior and observable outcomes.

```mermaid
graph TD
    subgraph Lifecycle["Request Lifecycle"]
        A[Input Received] --> B[Input Cached]
        B --> C[Input Routed]
        C --> D[Input Compressed]
        D --> E[Input Remembered]
        E --> F[Pre-Send]
        F --> G[Post-Send]
        G --> H[Response Received]
    end
    
    subgraph Transforms["Transform Components"]
        T1[CacheAligner]
        T2[ContentRouter]
        T3[SmartCrusher]
        T4[CodeCompressor]
        T5[Kompress-base]
        T6[IntelligentContext]
        T7[RollingWindow]
    end
    
    C --> T1
    T1 --> T2
    T2 --> T3
    T3 --> T4
    T4 --> T5
    T5 --> T6
    T6 --> T7
```

## Request Lifecycle Stages

The pipeline implements **11 lifecycle stages** that execute in order. Each stage is observable and can be extended or intercepted by pipeline extensions.

| Stage | Purpose | Extensions Available |
|-------|---------|---------------------|
| `Setup` | Initialize request context and configuration | Yes |
| `Pre-Start` | Pre-processing before transform execution | Yes |
| `Post-Start` | Post-processing after initialization | Yes |
| `Input Received` | Capture raw request input | Yes |
| `Input Cached` | Check and update cache state | Yes |
| `Input Routed` | Route content to appropriate transforms | Yes |
| `Input Compressed` | Apply compression transforms | Yes |
| `Input Remembered` | Store relevant context for memory | Yes |
| `Pre-Send` | Final modifications before LLM call | Yes |
| `Post-Send` | Process response metadata | Yes |
| `Response Received` | Handle and log response | Yes |

Source: [README.md]()

## Transform Components

Transforms are the execution units within the pipeline. Each transform specializes in a specific compression strategy.

### SmartCrusher

SmartCrusher is the primary content-aware compressor for structured data. It analyzes JSON arrays, tool outputs, and log files using statistical selection to preserve critical items.

**Key capabilities:**
- **100% ERROR preservation** — Never drops error entries from tool outputs
- **Anomaly detection** — Statistical identification of outliers (high CPU, memory spikes)
- **Boundary preservation** — Always keeps first and last items in arrays
- **Relevance scoring** — Weights items by relevance to the user's query
- **Change point detection** — Identifies significant transitions in data

**Configuration options:**

| Parameter | Default | Description |
|-----------|---------|-------------|
| `enabled` | `true` | Enable/disable the transform |
| `min_items_to_analyze` | `10` | Minimum array size to apply analysis |
| `min_tokens_to_crush` | `500` | Minimum content size to trigger compression |
| `max_items_after_crush` | `20` | Target maximum items after compression |
| `relevance_threshold` | `0.3` | Score threshold for item retention |
| `bias` | `1.0` | Compression bias (>1 preserves more, <1 compresses more) |

Source: [headroom/transforms/smart_crusher.py](), [crates/headroom-py/src/lib.rs]()

### ContentRouter

ContentRouter determines which transforms should be applied to each content block based on content type detection. It routes JSON arrays, code, logs, and text to appropriate specialized compressors.

**Routing logic:**
- Detects content type (JSON array, code, log, plain text)
- Applies scoring weights for each content category
- Selects optimal compression profile per block

Source: [headroom/transforms/content_router.py]()

### CacheAligner

CacheAligner stabilizes request prefixes to maximize KV cache hit rates across Anthropic and OpenAI providers. It analyzes common prefix patterns and aligns new requests to existing cache entries.

**Behavior:**
- Computes prefix stability scores
- Aligns new requests to cached prefixes when beneficial
- Records cache prefix metrics for observability

### Kompress-base

Kompress-base is Headroom's ML-based text compressor using a fine-tuned model. It provides aggressive token reduction (up to 90%) for arbitrary text content.

**Usage:** Applied after specialized compressors have processed structured data

### IntelligentContext / RollingWindow

Two complementary context management strategies:

| Strategy | Description |
|----------|-------------|
| `IntelligentContext` | Score-based context fitting with learned importance weights |
| `RollingWindow` | Maintains recent turns with configurable window size |

### CodeCompressor

AST-aware code compression using tree-sitter parsing. Preserves code structure while removing whitespace, comments, and non-essential formatting.

### SearchCompressor

Specialized compressor for search results and ranked lists. Applies relevance-based selection and deduplication.

### LogCompressor

Format-aware log compression supporting multiple log formats. Detects format automatically and applies appropriate compression strategies.

Source: [crates/headroom-py/src/lib.rs](), [crates/headroom-core/src/transforms/mod.rs]()

## CCR Pattern (Compress-Cache-Retrieve)

CCR provides reversible, lossless compression by storing originals and allowing retrieval on demand.

```mermaid
graph LR
    A[Original Content] --> B[Compress]
    B --> C[Compressed + Hash]
    C --> D[Storage]
    D --> E[Retrieve by Hash]
    E --> F[Original Restored]
    
    style C fill:#90EE90
    style F fill:#90EE90
```

**How it works:**

1. **Compress** — Content is analyzed and compressed, generating a `cache_key` (hash)
2. **Cache** — Original content stored in the compression store keyed by hash
3. **Retrieve** — Agent uses `headroom_retrieve` tool to access originals when needed

**Rust bindings expose CCR functionality:**

```python
# Python usage via Rust extension
result = compressor.compress(content, bias=1.0)
# result.inner.cache_key contains the CCR hash
```

Source: [crates/headroom-py/src/lib.rs]()

## Compression Hooks

Hooks provide extension points for customizing compression behavior in the TypeScript SDK.

### CompressContext

Context object passed to hook methods:

```typescript
interface CompressContext {
  model: string;
  userQuery: string;
  turnNumber: number;
  toolCalls: string[];
  provider: string;
}
```

### CompressEvent

Event object received by post-compression hooks:

```typescript
interface CompressEvent {
  tokensBefore: number;
  tokensAfter: number;
  tokensSaved: number;
  compressionRatio: number;
  transformsApplied: string[];
  ccrHashes: string[];
  model: string;
  userQuery: string;
  provider: string;
}
```

### Hook Methods

| Method | Timing | Can Modify? | Purpose |
|--------|--------|-------------|---------|
| `preCompress` | Before compression | Yes | Modify messages before pipeline |
| `computeBiases` | During routing | Yes | Per-message compression weights |
| `postCompress` | After compression | No | Observability and logging |

**Example implementation:**

```typescript
class LoggingHooks extends CompressionHooks {
  postCompress(event: CompressEvent) {
    console.log(`Saved ${event.tokensSaved} tokens (${event.compressionRatio})`);
  }
}
```

Source: [sdk/typescript/src/hooks.ts]()

## Compression Results

The `CompressResult` type returned by compression operations:

| Field | Type | Description |
|-------|------|-------------|
| `messages` | `any[]` | Compressed messages in same format as input |
| `tokensBefore` | `number` | Token count before compression |
| `tokensAfter` | `number` | Token count after compression |
| `tokensSaved` | `number` | Absolute tokens saved |
| `compressionRatio` | `number` | Percentage reduction (0-1) |
| `transformsApplied` | `string[]` | List of transforms that modified content |
| `ccrHashes` | `string[]` | CCR cache keys for retrievable content |
| `compressed` | `boolean` | Whether compression actually occurred |

Source: [sdk/typescript/src/types.ts]()

## SDK Integration

### TypeScript SDK

```typescript
import { compress } from "headroom-ai";

// Direct compression
const result = await compress(messages, {
  model: "claude-sonnet-4-20250514",
  hooks: new LoggingHooks()
});
```

### Python SDK

```python
from headroom import compress

result = compress(messages, model="claude-sonnet-4-20250514")
```

### CLI

```bash
headroom wrap -- model claude "Analyze this codebase"
```

Source: [sdk/typescript/examples/basic-compress.ts](), [examples/langchain_demo/README.md]()

## Configuration Profiles

Compression behavior can be tuned via profiles:

| Profile | Bias | Min K | Max K | Use Case |
|---------|------|-------|-------|----------|
| `balanced` | 1.0 | 2 | 8 | General purpose |
| `aggressive` | 0.7 | 1 | 5 | Long contexts |
| `conservative` | 1.3 | 3 | 12 | High-fidelity |

Configuration interface:

```typescript
interface CompressionProfile {
  bias?: number;
  minK?: number;
  maxK?: number | null;
}
```

Source: [sdk/typescript/src/types/config.ts]()

## Observability

Pipeline stages emit lifecycle events for monitoring:

| Metric | Description |
|--------|-------------|
| `tokens_saved` | Cumulative tokens preserved |
| `compression_ratio` | Real-time reduction percentage |
| `cache_hit_rate` | Percentage of requests aligned to cache |
| `transform_timing` | Per-transform latency breakdown |

## Known Limitations

### CCR in Multi-Agent Threads

**Issue #503** — CCR proactive expansion blocks can corrupt message attribution in multi-agent setups. When `_append_context_to_latest_non_frozen_user_turn()` injects expansion blocks into messages containing XML attribution markup (`<peer_turn from="AgentX">`), the injected block can interfere with structured attribution.

**Workaround:** Avoid using CCR retrieval markers in multi-agent threads with peer attribution until the issue is resolved.

## Extension Points

The pipeline supports three extension mechanisms:

| Extension Type | Scope | Use Case |
|---------------|-------|----------|
| **Pipeline Extensions** | Lifecycle stages | Custom stage logic |
| **Compression Hooks** | Pre/post processing | Logging, bias computation |
| **Proxy Extensions** | Server integration | ASGI middleware, routes |

Provider and tool-specific behavior lives under `headroom/providers/` to keep core orchestration focused on lifecycle, sequencing, and policy.

Source: [README.md]()

---

<a id='compression-algorithms'></a>

## Compression Algorithms

### Related Pages

Related topics: [Compression Pipeline](#compression-pipeline), [CCR (Reversible Compression)](#ccr-reversible-compression)

<details>
<summary>Related Source Files</summary>

The following source files were used to generate this page:

- [crates/headroom-core/src/transforms/smart_crusher/mod.rs](https://github.com/chopratejas/headroom/blob/main/crates/headroom-core/src/transforms/smart_crusher/mod.rs)
- [crates/headroom-core/src/transforms/smart_crusher/compaction/mod.rs](https://github.com/chopratejas/headroom/blob/main/crates/headroom-core/src/transforms/smart_crusher/compaction/mod.rs)
- [headroom/transforms/code_compressor.py](https://github.com/chopratejas/headroom/blob/main/headroom/transforms/code_compressor.py)
- [headroom/transforms/kompress_compressor.py](https://github.com/chopratejas/headroom/blob/main/headroom/transforms/kompress_compressor.py)
- [headroom/image/compressor.py](https://github.com/chopratejas/headroom/blob/main/headroom/image/compressor.py)
- [docs/content/docs/code-compression.mdx](https://github.com/chopratejas/headroom/blob/main/docs/content/docs/code-compression.mdx)
</details>

# Compression Algorithms

Headroom employs a multi-layered compression system that reduces token usage by 60–95% across AI agent workflows. The compression algorithms work together in a configurable pipeline, with each algorithm optimized for specific content types.

## Overview

Headroom's compression stack includes six distinct algorithms:

| Algorithm | Primary Use Case | Typical Savings |
|-----------|-----------------|-----------------|
| SmartCrusher | Tool outputs (JSON arrays, logs) | 70–90% |
| CodeCompressor | Source code files | 60–80% |
| Kompress-base | General text via ML model | 50–70% |
| CacheAligner | API request prefixes | 20–40% |
| IntelligentContext | Long conversations | 40–60% |
| RollingWindow | Simple context trimming | Variable |

Source: [README.md:smart-crusher]()

## Architecture

```mermaid
graph TD
    A[Input Messages] --> B[CacheAligner]
    B --> C[ContentRouter]
    C --> D{Select Algorithm}
    D -->|Tool Output| E[SmartCrusher]
    D -->|Code| F[CodeCompressor]
    D -->|Text| G[Kompress-base]
    D -->|Long Context| H[IntelligentContext]
    E --> I[CCR Store]
    F --> I
    G --> I
    H --> I
    I --> J[Output to LLM]
```

### Pipeline Lifecycle

The stable request lifecycle that all compression algorithms follow:

`Setup` → `Pre-Start` → `Post-Start` → `Input Received` → `Input Cached` → `Input Routed` → `Input Compressed` → `Input Remembered` → `Pre-Send` → `Post-Send` → `Response Received`

Transforms execute during the `Input Compressed` stage, with each algorithm responsible for specific content types.

Source: [README.md:pipeline-internals]()

## SmartCrusher

SmartCrusher is Headroom's primary algorithm for compressing structured tool outputs, particularly JSON arrays from command results.

### Core Features

- **100% ERROR preservation** — Never drops error items from output
- **Boundary preservation** — Always keeps first and last items
- **Anomaly detection** — Statistically identifies outliers (CPU spikes, high error rates)
- **Relevance scoring** — Prioritizes items matching the user's query
- **Change point detection** — Identifies significant transitions in data

### Configuration

```python
class SmartCrusherConfig:
    enabled: bool = True
    min_items_to_analyze: int = 10
    min_tokens_to_crush: int = 500
    max_items_after_crush: int | None = None
    relevance_threshold: float = 0.5
    enable_ccr_marker: bool = True
```

| Parameter | Default | Description |
|-----------|---------|-------------|
| `min_items_to_analyze` | 10 | Minimum items before analysis activates |
| `min_tokens_to_crush` | 500 | Minimum token count to trigger compression |
| `max_items_after_crush` | None | Cap on output items (None = unlimited) |
| `relevance_threshold` | 0.5 | Score threshold for item retention |

Source: [crates/headroom-py/src/lib.rs:PySmartCrusherConfig]()

### CrushResult

The compression result object exposes:

| Property | Type | Description |
|----------|------|-------------|
| `compressed` | str | The compressed output |
| `original` | str | The original input |
| `was_modified` | bool | Whether compression occurred |
| `strategy` | str | Strategy used ("preserve_all", "crush", etc.) |

Source: [crates/headroom-py/src/lib.rs:PyCrushResult]()

## CodeCompressor

CodeCompressor uses AST-aware analysis via tree-sitter to compress source code while preserving semantic structure.

### Compression Strategy

1. **AST Parsing** — Parse code into an abstract syntax tree
2. **Importance Scoring** — Rank nodes by relevance to the query
3. **Selective Retention** — Keep high-importance nodes, summarize low-importance regions
4. **CCR Markers** — Insert reversible markers for compressed sections

### Supported Languages

CodeCompressor supports 75+ programming languages through tree-sitter grammars, including Python, JavaScript, TypeScript, Rust, Go, Java, C++, and more.

Source: [headroom/transforms/code_compressor.py]()

### Language-Aware Features

- Preserves function signatures and class definitions
- Retains docstrings and comments for critical functions
- Compresses implementation details proportionally to relevance
- Maintains indentation structure for readability

## Kompress-base

Kompress-base is an ML-based compression model trained specifically for text compression in AI agent contexts.

### Model Information

| Property | Value |
|----------|-------|
| Model Name | kompress-base |
| Provider | HuggingFace |
| Publisher | chopratejas |
| Architecture | Transformer-based |

Source: [README.md:kompress-base-huggingface]()

### Usage

```python
from headroom.transforms.kompress_compressor import KompressCompressor

compressor = KompressCompressor()
result = compressor.compress(
    content="...",
    bias=1.0  # Higher = preserve more
)
```

The model is automatically used when ContentRouter classifies content as general-purpose text.

## CacheAligner

CacheAligner optimizes request prefixes to maximize KV cache hit rates across Anthropic and OpenAI providers.

### How It Works

1. Analyze the prefix structure of incoming requests
2. Identify stable vs. variable components
3. Reorder or normalize prefix content for better cache alignment
4. Track prefix metrics for observability

### Configuration

```python
class CacheAlignerConfig:
    enabled: bool = True
    validation_marker: str | None = None
    feedback_enabled: bool = True
    min_items_to_cache: int = 3
    inject_tool: bool = True
    inject_system_instructions: bool = True
    marker_template: str | None = None
```

Source: [sdk/typescript/src/types/config.ts:CacheAlignerConfig]()

## IntelligentContext

IntelligentContext uses score-based context fitting with learned importance weights to determine what content to retain.

### Configuration

```python
class IntelligentContextConfig:
    enabled: bool = True
    scoring_weights: ScoringWeights | None = None
    relevance_scorer: RelevanceScorerConfig | None = None
    anchor_config: AnchorConfig | None = None
```

| Component | Purpose |
|-----------|---------|
| `scoring_weights` | Tune importance factors (recency, relevance, role) |
| `relevance_scorer` | Configure relevance detection |
| `anchor_config` | Pin critical messages to prevent compression |

Source: [sdk/typescript/src/types/config.ts:IntelligentContextConfig]()

## RollingWindow

RollingWindow provides simple context trimming for straightforward compression needs.

### Configuration

```python
class RollingWindowConfig:
    enabled: bool = True
    max_turns: int | None = None
    preserve_system: bool = True
    preserve_last_n: int = 2
```

Source: [sdk/typescript/src/types/config.ts:RollingWindowConfig]()

## Compress-Cache-Retrieve (CCR)

CCR enables reversible compression — originals are stored and can be retrieved by the LLM on demand.

### Mechanism

1. **Compress** — Algorithm compresses content, produces a hash
2. **Cache** — Original stored in the CompressionStore
3. **Retrieve** — LLM uses `headroom_retrieve` tool to access original

### Usage Tracking

```python
class CCRStats:
    entries: int
    total_original_tokens: int
    total_compressed_tokens: int
    total_tokens_saved: int
    savings_percent: float
```

Source: [sdk/typescript/src/types/models.ts:CCRStats]()

## Image Compression

Image content is handled separately through the image compressor module.

### Features

- Intelligent downsampling based on content type
- Format optimization (JPEG for photos, PNG for graphics)
- Size limits configurable per request

Source: [headroom/image/compressor.py]()

## Compression Hooks

The TypeScript SDK exposes hooks for customizing compression behavior:

```typescript
export class CompressionHooks {
  preCompress(messages: any[], ctx: CompressContext): any[] | Promise<any[]>;
  computeBiases(messages: any[], ctx: CompressContext): Record<number, number>;
  postCompress(event: CompressEvent): void | Promise<void>;
}
```

### CompressContext

```typescript
interface CompressContext {
  model: string;
  userQuery: string;
  turnNumber: number;
  toolCalls: string[];
  provider: string;
}
```

### CompressEvent

```typescript
interface CompressEvent {
  tokensBefore: number;
  tokensAfter: number;
  tokensSaved: number;
  compressionRatio: number;
  transformsApplied: string[];
  ccrHashes: string[];
  model: string;
  userQuery: string;
  provider: string;
}
```

Source: [sdk/typescript/src/hooks.ts]()

## Configuration Profiles

Headroom supports compression profiles for different use cases:

```python
class CompressionProfile:
    bias: float = 1.0      # >1 = preserve more, <1 = compress more
    minK: int = 10         # Minimum items to keep
    maxK: int | None = None # Maximum items to keep
```

### Preset Profiles

| Profile | Bias | Use Case |
|---------|------|----------|
| `balanced` | 1.0 | General purpose |
| `aggressive` | 0.5 | Maximize compression |
| `conservative` | 2.0 | Preserve more context |

## Performance Characteristics

| Algorithm | Latency | Memory | Best For |
|-----------|---------|--------|----------|
| SmartCrusher | Low | Low | Tool outputs |
| CodeCompressor | Medium | Medium | Source files |
| Kompress-base | Higher | Higher | Free-form text |
| CacheAligner | Low | Low | Prefix optimization |

## Known Limitations

### Multi-Agent Attribution Issue

In multi-agent setups, CCR proactive expansion can corrupt message attribution. When `_append_context_to_latest_non_frozen_user_turn()` injects expansion blocks into messages containing XML attribution markup (`<peer_turn from="AgentX">`), the injected block may interfere with attribution parsing.

**Workaround:** Use explicit CCR retrieval calls instead of relying on proactive expansion in multi-agent threads.

Source: [GitHub Issue #503](https://github.com/chopratejas/headroom/issues/503)

## API Reference

### Python SDK

```python
from headroom import compress

result = compress(
    messages,
    model="claude-sonnet-4-20250514",
    profile="balanced"
)
```

### TypeScript SDK

```typescript
import { compress } from "headroom-ai";

const result = await compress(messages, {
  model: "gpt-4o",
  hooks: new LoggingHooks(),
  tokenBudget: 100000
});
```

### CLI

```bash
headroom compress --input messages.json --output compressed.json
```

## See Also

- [Pipeline Internals](pipeline-internals) — Detailed compression lifecycle
- [Configuration Reference](configuration) — Full configuration options
- [CCR System](compression-store) — Storage and retrieval mechanism
- [Code Compression](code-compression) — Code-specific compression docs

---

<a id='ccr-reversible-compression'></a>

## CCR (Reversible Compression)

### Related Pages

Related topics: [Compression Algorithms](#compression-algorithms), [MCP Integration](#mcp-integration)

<details>
<summary>Related Source Files</summary>

The following source files were used to generate this page:

- [crates/headroom-core/src/ccr/mod.rs](https://github.com/chopratejas/headroom/blob/main/crates/headroom-core/src/ccr/mod.rs)
- [crates/headroom-core/src/ccr/backends/mod.rs](https://github.com/chopratejas/headroom/blob/main/crates/headroom-core/src/ccr/backends/mod.rs)
- [headroom/ccr/mcp_server.py](https://github.com/chopratejas/headroom/blob/main/headroom/ccr/mcp_server.py)
- [headroom/ccr/response_handler.py](https://github.com/chopratejas/headroom/blob/main/headroom/ccr/response_handler.py)
- [docs/content/docs/ccr.mdx](https://github.com/chopratejas/headroom/blob/main/docs/content/docs/ccr.mdx)
</details>

# CCR (Reversible Compression)

## Overview

CCR (Compress-Cache-Retrieve) is Headroom's reversible compression mechanism that enables lossless context reduction. Unlike traditional compression that permanently discards information, CCR stores compressed content alongside its original form, allowing the LLM to retrieve full details on demand through a specialized tool.

The core value proposition is straightforward: achieve aggressive token reduction while maintaining zero data loss. When the agent needs original content—whether debugging an error, reviewing a code change, or examining a log entry—it calls `headroom_retrieve` with a cache key to decompress and return the full original content.

Source: [README.md](https://github.com/chopratejas/headroom/blob/main/README.md)

## Architecture

CCR consists of three primary phases that form a continuous cycle:

```mermaid
graph LR
    A[Compress] -->|Store originals| B[Cache]
    B -->|Insert placeholder| C[Send to LLM]
    C -->|Agent requests| D[Retrieve]
    D -->|Return originals| C
```

### Core Components

| Component | Responsibility | Location |
|-----------|---------------|----------|
| `InMemoryCcrStore` | Runtime storage of compressed originals | `crates/headroom-core/src/ccr/mod.rs` |
| CCR Backend | Pluggable storage implementations | `crates/headroom-core/src/ccr/backends/mod.rs` |
| `headroom_retrieve` | MCP tool for on-demand retrieval | `headroom/ccr/mcp_server.py` |
| Response Handler | Processes retrieval responses | `headroom/ccr/response_handler.py` |

### Data Flow

```mermaid
sequenceDiagram
    participant Transform as Compression Transform
    participant Store as CCR Store
    participant Proxy as Headroom Proxy
    participant LLM as LLM
    participant Agent as AI Agent

    Transform->>Store: compress(content)
    Store->>Store: Generate cache_key
    Store->>Store: Store original with key
    Transform->>Proxy: Return compressed + cache_key
    Proxy->>LLM: Send compressed content
    LLM->>Agent: Request via headroom_retrieve
    Agent->>Proxy: retrieve(cache_key)
    Proxy->>Store: Lookup original
    Store->>Proxy: Return original
    Proxy->>Agent: Decompressed content
```

## Compression Phase

During the compression phase, Headroom transforms apply lossy compression strategies (SmartCrusher, DiffCompressor, LogCompressor, etc.) while simultaneously preserving originals in the CCR store.

### Cache Key Generation

Each compressed item receives a unique cache key that serves as the retrieval identifier. The key format enables:

- Fast O(1) lookup in the store
- Correlation with specific compression transforms
- Version tracking for cache invalidation

Source: [crates/headroom-core/src/ccr/mod.rs](https://github.com/chopratejas/headroom/blob/main/crates/headroom-core/src/ccr/mod.rs)

### Storage Backend

The default backend is `InMemoryCcrStore`, which provides:

- Thread-safe in-memory storage during a session
- Automatic cleanup on session end
- Minimal latency for retrieval operations

```python
# Python shim creates InMemoryCcrStore for Rust compression
store = headroom_core::ccr::InMemoryCcrStore::new();
let (result, stats) = self.inner.compress_with_store(&owned, bias, Some(&store));
```

Source: [crates/headroom-py/src/lib.rs](https://github.com/chopratejas/headroom/blob/main/crates/headroom-py/src/lib.rs)

## Retrieval Phase

### MCP Tool: `headroom_retrieve`

The `headroom_retrieve` tool is exposed via the Headroom MCP server and allows agents to decompress original content on demand.

```python
# headroom/ccr/mcp_server.py exposes retrieval capabilities
class HeadroomMcpServer:
    def retrieve(self, cache_key: str) -> str:
        """Retrieve original content by cache key."""
```

#### Tool Parameters

| Parameter | Type | Required | Description |
|-----------|------|----------|-------------|
| `cache_key` | string | Yes | The unique identifier returned during compression |
| `search_query` | string | No | Optional search within retrieved content |

### Retrieval Response Handling

When an agent requests content, the response handler processes the lookup and formats the result:

```python
# headroom/ccr/response_handler.py
def handle_retrieve_request(cache_key: str) -> RetrieveResult:
    original = store.get(cache_key)
    return format_response(original)
```

Source: [headroom/ccr/response_handler.py](https://github.com/chopratejas/headroom/blob/main/headroom/ccr/response_handler.py)

## Configuration

CCR behavior is controlled through the main `CCRConfig`:

| Option | Type | Default | Description |
|--------|------|---------|-------------|
| `enabled` | bool | `true` | Enable/disable CCR entirely |
| `store_type` | string | `"in_memory"` | Storage backend selection |
| `ttl_seconds` | int | `3600` | Cache expiration for stored originals |
| `max_store_size` | int | `10000` | Maximum entries before eviction |

Source: [docs/content/docs/ccr.mdx](https://github.com/chopratejas/headroom/blob/main/docs/content/docs/ccr.mdx)

### Environment Variables

| Variable | Description |
|----------|-------------|
| `HEADROOM_CCR_ENABLED` | Set to `0` to disable CCR |
| `HEADROOM_CCR_STORE_TYPE` | Override storage backend |
| `HEADROOM_CCR_TTL` | Override TTL in seconds |

## Multi-Agent Considerations

When using CCR in multi-agent workflows, content attribution becomes critical. The system supports structured XML markup for tracking content provenance:

```xml
<peer_turn from="AgentX">
  <!-- Agent-generated content -->
</peer_turn>
```

### Known Limitation: Attribution Corruption

A known issue exists where CCR proactive expansion can corrupt message attribution in multi-agent threads. When `_append_context_to_latest_non_frozen_user_turn()` injects expansion blocks into messages containing peer attribution markup, the injected block may interfere with the structured XML.

**Issue Reference:** [#503 - CCR proactive expansion blocks corrupt message attribution in multi-agent threads](https://github.com/chopratejas/headroom/issues/503)

This affects multi-agent setups where:
- Messages contain structured XML attribution markup
- CCR proactive expansion is enabled
- Multiple agents contribute to the same thread

## Transform Integration

CCR is integrated into the compression pipeline as a sidecar mechanism:

```mermaid
graph TD
    A[Input Content] --> B[Transform Applies]
    B --> C{CCR Enabled?}
    C -->|Yes| D[Store Original]
    C -->|No| E[Skip CCR]
    D --> F[Return Compressed + Key]
    E --> G[Return Compressed Only]
```

### Supported Transforms

| Transform | CCR Support | Typical Savings |
|-----------|-------------|-----------------|
| SmartCrusher | Full | 40-70% |
| DiffCompressor | Full | 60-90% |
| LogCompressor | Full | 70-85% |
| SearchCompressor | Full | 50-75% |
| CacheAligner | Metadata only | N/A |

Source: [crates/headroom-core/src/ccr/backends/mod.rs](https://github.com/chopratejas/headroom/blob/main/crates/headroom-core/src/ccr/backends/mod.rs)

## Observability

CCR operations emit metrics for monitoring:

| Metric | Description |
|--------|-------------|
| `ccr_entries_stored` | Total originals stored |
| `ccr_retrievals` | Total retrieval requests |
| `ccr_hit_rate` | Retrieval success rate |
| `ccr_store_size` | Current store memory usage |
| `ccr_ttl_evictions` | Entries expired by TTL |

### Stats Object

The compression result includes a `stats` dictionary with diagnostic information:

```python
result = compressor.compress(content)
print(f"Cache key: {result.cache_key}")  # For retrieval
print(f"Stats: {result.stats}")           # Observability data
```

Source: [crates/headroom-py/src/lib.rs:45-52](https://github.com/chopratejas/headroom/blob/main/crates/headroom-py/src/lib.rs)

## Best Practices

1. **Session-Based Usage**: CCR store is designed for session-scoped operation. For long-running agents, configure appropriate TTL values to manage memory.

2. **Key Preservation**: Cache keys must be preserved in the conversation context for retrieval to work. The LLM must pass the key back to `headroom_retrieve`.

3. **Error Handling**: Implement fallback behavior when retrieval fails—either re-compress with lower settings or request original content through alternative means.

4. **Multi-Agent Attribution**: In multi-agent setups, track content provenance explicitly to avoid the attribution corruption issue documented in [#503](https://github.com/chopratejas/headroom/issues/503).

## Related Documentation

- [Pipeline Internals](README.md#pipeline-internals) - How CCR fits into the broader compression pipeline
- [MCP Server Setup](docs/content/docs/ccr.mdx) - Detailed MCP configuration
- [Compression Hooks](sdk/typescript/src/hooks.ts) - Pre/post compression customization

---

<a id='memory-system'></a>

## Memory System

### Related Pages

Related topics: [MCP Integration](#mcp-integration)

<details>
<summary>Related Source Files</summary>

The following source files were used to generate this page:

- [headroom/cli/wrap.py](https://github.com/chopratejas/headroom/blob/main/headroom/cli/wrap.py)
- [headroom/cli/memory.py](https://github.com/chopratejas/headroom/blob/main/headroom/cli/memory.py)
- [headroom/cli/evals.py](https://github.com/chopratejas/headroom/blob/main/headroom/cli/evals.py)
- [crates/headroom-py/src/lib.rs](https://github.com/chopratejas/headroom/blob/main/crates/headroom-py/src/lib.rs)
- [sdk/typescript/examples/shared-context-multi-agent.ts](https://github.com/chopratejas/headroom/blob/main/sdk/typescript/examples/shared-context-multi-agent.ts)
- [examples/langchain_demo/README.md](https://github.com/chopratejas/headroom/blob/main/examples/langchain_demo/README.md)
- [docs/content/docs/memory.mdx](https://github.com/chopratejas/headroom/blob/main/docs/content/docs/memory.mdx)
</details>

# Memory System

The Headroom Memory System provides persistent cross-session knowledge storage and retrieval for AI coding agents. It enables agents to remember important decisions, conventions, project context, architecture details, and user preferences across multiple sessions without requiring manual re-entry.

## Overview

The Memory System addresses a fundamental limitation of AI coding assistants: the context window is ephemeral. When a session ends, all learned information is lost. Headroom's memory system solves this by providing:

- **Persistent storage** - Memories survive session boundaries
- **Multi-agent awareness** - Shared store with agent provenance
- **Automatic retrieval** - Relevant memories surfaced when needed
- **Reversible compression** - Full fidelity retrieval via CCR

Source: [headroom/cli/wrap.py:1-50]()

## Architecture

```mermaid
graph TD
    subgraph "Client Layer"
        CLI[headroom memory CLI]
        MCP[Memory MCP Server]
        Wrap[headroom wrap]
    end
    
    subgraph "Core Memory"
        Bridge[Memory Bridge]
        Core[Memory Core]
    end
    
    subgraph "Backends"
        Local[Local SQLite]
        Mem0[Mem0 Backend]
        QdrantNeo4j[Qdrant + Neo4j]
    end
    
    subgraph "Integrations"
        ClaudeMCP[Claude MCP]
        CodexMCP[Codex MCP]
    end
    
    CLI --> Core
    MCP --> Bridge
    Wrap --> Bridge
    Bridge --> Core
    Core --> Local
    Core --> Mem0
    Core --> QdrantNeo4j
    
    ClaudeMCP -.->|memory_search| MCP
    ClaudeMCP -.->|memory_save| MCP
```

## Memory Scopes

Memories are organized by scope, allowing fine-grained control over persistence and visibility:

| Scope | Description | Use Case |
|-------|-------------|----------|
| `USER` | User-wide memories | Preferences, coding style, org info |
| `SESSION` | Session-specific memories | Current task context |
| `AGENT` | Agent-specific memories | Agent identity, capabilities |
| `TURN` | Single turn memories | Ephemeral context |

Source: [headroom/cli/memory.py:20-25]()

## CLI Commands

### Memory Management

```bash
headroom memory list                     # List all stored memories
headroom memory list --limit 10          # List the 10 most recent memories
headroom memory list --scope USER        # List only USER-level memories
headroom memory list --since 7d          # List memories from the last 7 days
headroom memory show <id>                # Show full details of a memory
headroom memory stats                    # Show memory statistics
headroom memory edit <id> --content ...  # Edit a memory's content
headroom memory delete <id>              # Delete a memory
headroom memory prune --older-than 30d   # Delete memories older than 30 days
headroom memory purge --confirm          # Delete ALL memories
headroom memory export --output file.json  # Export all memories to JSON
headroom memory import file.json         # Import memories from JSON
```

Source: [headroom/cli/memory.py:1-30]()

## Memory Integration in Wrapped Agents

When running `headroom wrap` with the `--memory` flag, the system automatically:

1. Registers the `headroom_memory` MCP server
2. Injects memory usage guidance into `AGENTS.md`
3. Enables `memory_search` and `memory_save` tools

Source: [headroom/cli/wrap.py:50-80]()

### Memory Guidance Injection

The system injects guidance into `AGENTS.md` files to instruct agents on when to use memory:

```markdown
<!-- headroom:memory-instructions -->
## Memory

Use the `headroom_memory` MCP server for persistent cross-session knowledge.

**Before** answering questions about prior decisions, conventions, project context,
architecture, user preferences, org info, codenames, debugging history, or anything
from past sessions — call `memory_search` first.

**After** making durable decisions, discovering conventions, or learning important
facts — call `memory_save` to persist them for future sessions.

Memory is your first source of truth for anything not visible in the current conversation.
```

Source: [headroom/cli/wrap.py:80-100]()

## Memory Evaluation

Headroom includes comprehensive evaluation suites for memory systems:

### LoCoMo V2 Evaluation

Tests the architecture where:
- LLM decides what to save (`memory_save` tool)
- LLM decides when to search (`memory_search` tool)
- Graph relationships enable multi-hop reasoning

```bash
headroom evals memory-v2 -n 3
headroom evals memory-v2 --answer-model gpt-4o --save-model gpt-4o-mini
```

Parameters:

| Parameter | Description | Default |
|-----------|-------------|---------|
| `--n-conversations` | Number of conversations to evaluate | All (10) |
| `--categories` | Categories 1-5 (default: 1,2,3,4) | 1,2,3,4 |
| `--include-adversarial` | Include category 5 (unanswerable) | False |
| `--f1-threshold` | F1 score threshold for 'correct' | 0.5 |
| `--answer-model` | LLM model for generating answers | None |
| `--llm-judge` | Use LLM-as-judge scoring | False |
| `--judge-model` | Model for judging | None |
| `--parallel` | Parallel evaluation workers | 1 |

Source: [headroom/cli/evals.py:30-80]()

## Storage Backends

### Per-Project Storage

As of v0.21.34, memories use per-project storage, preventing cross-project memory leakage. Each project has isolated memory storage.

Source: [headroom/cli/evals.py](), Community Release v0.21.34

### Local SQLite Backend

Default backend using SQLite for storage with support for:
- Scope-based filtering
- Time-based queries
- Full-text search
- Import/export

### Mem0 Backend

External Mem0 integration for users with existing Mem0 deployments.

### Qdrant + Neo4j Backend

Advanced backend providing:
- Vector search via Qdrant
- Graph relationships via Neo4j
- Multi-hop reasoning capabilities

Configuration:

| Option | Description | Default |
|--------|-------------|---------|
| `--memory-qdrant-url` | Full Qdrant URL | None |
| `--memory-qdrant-host` | Qdrant host | localhost |
| `--memory-qdrant-port` | Qdrant port | 6333 |
| `--memory-neo4j-uri` | Neo4j URI | None |
| `--memory-neo4j-user` | Neo4j user | None |

Source: [headroom/cli/proxy.py]()

## MCP Tools

The memory MCP server exposes the following tools:

| Tool | Description | Parameters |
|------|-------------|------------|
| `memory_search` | Search memories by query | `query`, `limit`, `scope`, `session_id` |
| `memory_save` | Save a new memory | `content`, `scope`, `agent_id`, `session_id` |
| `memory_list` | List memories | `limit`, `scope`, `since`, `search` |
| `memory_show` | Show memory details | `id` |
| `memory_edit` | Edit memory content | `id`, `content` |
| `memory_delete` | Delete a memory | `id` |

Source: [headroom/cli/wrap.py:100-150]()

## Multi-Agent Memory

In multi-agent setups, the memory system provides:

- **Shared store** - All agents can access common memories
- **Agent provenance** - Track which agent saved each memory
- **Auto-dedup** - Prevent duplicate memories
- **Cross-agent context** - Memory context passed across agent boundaries

```mermaid
graph LR
    subgraph "Agent A"
        A_Save[memory_save]
        A_Search[memory_search]
    end
    
    subgraph "Agent B"
        B_Save[memory_save]
        B_Search[memory_search]
    end
    
    subgraph "Shared Memory"
        Store[(Memory Store)]
    end
    
    A_Save --> Store
    B_Save --> Store
    Store --> A_Search
    Store --> B_Search
```

## Known Issues

### CCR Proactive Expansion in Multi-Agent Threads

Issue [#503](https://github.com/chopratejas/headroom/issues/503): CCR proactive expansion blocks corrupt message attribution in multi-agent threads.

**TL;DR**: The `_append_context_to_latest_non_frozen_user_turn()` function injects proactive expansion blocks into the latest user message content. In multi-agent setups, that message can contain structured XML attribution markup (`<peer_turn from="AgentX">`). The injected block can corrupt this attribution.

**Status**: Open, under investigation.

## Configuration Options

### Proxy Configuration

| Option | Description | Default |
|--------|-------------|---------|
| `--memory` | Enable memory integration | False |
| `--memory-storage` | Storage backend | `local` |
| `--memory-project-root` | Override project root | `""` |
| `--no-memory-tools` | Disable memory tool injection | False |
| `--no-memory-context` | Disable memory context injection | False |
| `--memory-top-k` | Memories to inject as context | 10 |

Source: [headroom/cli/proxy.py]()

## Usage Examples

### Basic Memory Workflow

```python
# After discovering a convention
await mcp_client.call_tool("memory_save", {
    "content": "Use TypeScript strict mode in all new projects",
    "scope": "USER"
})

# In a new session, before answering
results = await mcp_client.call_tool("memory_search", {
    "query": "coding conventions and project standards"
})
```

### Multi-Agent Shared Context

```typescript
import { createSharedContext } from "@headroom/sdk";

const ctx = createSharedContext({
  agentId: "architect-agent",
  projectId: "k8s-scaling"
});

// Save findings
await ctx.set("research", { provider: "aws", region: "us-east" });

// Another agent reads it
const compressed = await ctx.get("research");
```

Source: [sdk/typescript/examples/shared-context-multi-agent.ts]()

## Performance Considerations

- **Memory ID exposure** - As of v0.22.2, memory IDs are exposed in auto-tail and memory_list tool with ID-usage guidance
- **Regex-based prefix extraction removed** - v0.21.35 dropped regex-based pref extraction and filters system-reminder noise
- **Query cap removed** - v0.22.0 dropped the 500-char query cap for memory search

## Further Reading

- [Memory Documentation](https://headroom-docs.vercel.app/docs/memory)
- [Memory MCP Setup](https://headroom-docs.vercel.app/docs/mcp)
- [Failure Learning](https://headroom-docs.vercel.app/docs/failure-learning)

---

<a id='mcp-integration'></a>

## MCP Integration

### Related Pages

Related topics: [CCR (Reversible Compression)](#ccr-reversible-compression), [Memory System](#memory-system)

<details>
<summary>Related Source Files</summary>

The following source files were used to generate this page:

- [headroom/cli/wrap.py](https://github.com/chopratejas/headroom/blob/main/headroom/cli/wrap.py)
- [headroom/cli/mcp.py](https://github.com/chopratejas/headroom/blob/main/headroom/cli/mcp.py)
- [examples/README.md](https://github.com/chopratejas/headroom/blob/main/examples/README.md)
- [sdk/typescript/examples/shared-context-multi-agent.ts](https://github.com/chopratejas/headroom/blob/main/sdk/typescript/examples/shared-context-multi-agent.ts)
- [examples/mcp_demo/](https://github.com/chopratejas/headroom/tree/main/examples/mcp_demo)
</details>

# MCP Integration

MCP (Model Context Protocol) integration enables Headroom to expose compression, retrieval, and memory tools to AI coding assistants like Claude Code. This integration is foundational to Headroom's CCR (Compress-Cache-Retrieve) pattern, allowing agents to work with compressed content summaries and retrieve original data on demand.

## Overview

MCP integration serves three primary purposes in Headroom:

1. **Content Retrieval** — Exposes `headroom_retrieve` as an MCP tool that Claude Code calls to decompress compressed content
2. **Memory Persistence** — Provides a persistent cross-session memory MCP server (`headroom_memory`) for knowledge retention
3. **Subscription Access** — Enables CCR functionality for Claude Code subscription users without API key access

The MCP server operates as a stdio-based service, meaning it communicates via standard input/output rather than HTTP. This is distinct from an HTTP endpoint — the `/mcp` path is not an HTTP route on the proxy server.

## Architecture

```mermaid
graph TD
    A[Claude Code] -->|stdio| B[Headroom MCP Server]
    B -->|retrieve| C[Headroom Proxy]
    C -->|compressed content| B
    B -->|original content| A
    
    D[Claude Code] -->|stdio| E[Memory MCP Server]
    E -->|persist/query| F[(SQLite DB)]
    
    G[headroom wrap] -->|configures| A
    G -->|registers| B
    G -->|registers| E
```

## Installation and Setup

### Automatic Setup via CLI Wrapper

The recommended approach uses the `headroom wrap` command, which automatically configures MCP servers:

```bash
# For Claude Code
headroom wrap claude

# For Codex
headroom wrap codex

# For Claude Code with persistent memory
headroom wrap claude --memory

# For Codex with persistent memory
headroom wrap codex --memory
```

The `wrap` command handles multiple setup steps including proxy startup, CLI context tool configuration, and MCP server registration.

### Manual MCP Installation

For manual configuration, use the MCP CLI commands:

```bash
# Install MCP server for Claude Code
headroom mcp install

# Verify installation
headroom mcp status

# Uninstall MCP server
headroom mcp uninstall
```

Source: [headroom/cli/mcp.py:60-80]()

### Standalone MCP Server

Start the MCP server independently when the proxy runs separately:

```bash
# Start the MCP server (requires proxy running)
headroom mcp serve

# With custom proxy URL
headroom mcp serve --proxy-url http://127.0.0.1:8787
```

## MCP Server Implementation

### MCP Command Structure

The CLI provides a command group for MCP operations:

```python
@main.group()
def mcp() -> None:
    """MCP server for Claude Code integration."""
```

Source: [headroom/cli/mcp.py:43-60]()

### Configuration Management

MCP configuration is stored in `~/.claude/mcp.json`:

```python
def load_mcp_config() -> dict[str, Any]:
    """Load existing MCP config or return empty structure."""
    if MCP_CONFIG_PATH.exists():
        with open(MCP_CONFIG_PATH) as f:
            return json.load(f)
    return {"mcpServers": {}}

def save_mcp_config(config: dict) -> None:
    """Save MCP config, creating directory if needed."""
    CLAUDE_CONFIG_DIR.mkdir(parents=True, exist_ok=True)
    with open(MCP_CONFIG_PATH, "w") as f:
        json.dump(config, f, indent=2)
```

Source: [headroom/cli/mcp.py:25-40]()

### Headroom Command Generation

The MCP server command is generated dynamically:

```python
def get_headroom_command() -> list[str]:
    """Get the command to run headroom MCP server."""
    return ["headroom", "mcp", "serve"]
```

Source: [headroom/cli/mcp.py:18-23]()

## CCR (Compress-Cache-Retrieve) Workflow

The MCP integration enables the CCR pattern for Claude Code subscription users:

```mermaid
sequenceDiagram
    participant CC as Claude Code
    participant MCP as Headroom MCP Server
    participant Proxy as Headroom Proxy
    
    CC->>Proxy: API request with ANTHROPIC_BASE_URL
    Proxy->>Proxy: Compress large tool outputs
    Proxy-->>CC: Compressed summary with hash markers
    CC->>MCP: headroom_retrieve(hash)
    MCP->>Proxy: Fetch original content
    Proxy-->>MCP: Original data
    MCP-->>CC: Full content restored
```

### How CCR Works

1. **Compression** — The proxy compresses large tool outputs (file listings, search results) and replaces them with hash markers
2. **Caching** — Original content is stored temporarily with the hash as key
3. **Retrieval** — When Claude Code needs full details, it calls `headroom_retrieve` with the hash
4. **Restoration** — The MCP server fetches and returns the original content

Source: [headroom/cli/mcp.py:50-75]()

## Memory MCP Server

Headroom includes a dedicated MCP server for persistent cross-session memory:

### Registration

The memory MCP server is registered in Claude Code's `config.toml`:

```python
def _inject_memory_mcp_config(db_path: str, user_id: str) -> None:
    """Register headroom memory as an MCP server in Codex's config.toml."""
    mcp_section = (
        f"\n{_MEMORY_MCP_MARKER}\n"
        f"[mcp_servers.headroom_memory]\n"
        f'command = "{python_bin}"\n'
        f'args = ["-m", "headroom.memory.mcp_server", "--db", "{db_path_toml}", "--user", "{user_id}"]\n'
        f"startup_timeout_sec = 30\n"
        f"tool_timeout_sec = 30\n"
        f"{_MEMORY_MCP_END}\n"
    )
```

Source: [headroom/cli/wrap.py:120-145]()

### Memory Operations

The memory MCP server provides tools for:

- `memory_search` — Query persistent knowledge from past sessions
- `memory_save` — Store important decisions, conventions, and context
- `memory_list` — List stored memories

### Memory Usage Guidance

Memory instructions are injected into `AGENTS.md`:

```python
def _inject_memory_agents_md(file_path: Path) -> bool:
    """Inject memory usage guidance into AGENTS.md."""
    memory_block = (
        f"{_MEMORY_AGENTS_MARKER}\n"
        "## Memory\n\n"
        "Use the `headroom_memory` MCP server for persistent cross-session knowledge.\n\n"
        "**Before** answering questions about prior decisions, conventions...\n"
        "**After** making durable decisions... call `memory_save` to persist them.\n"
    )
```

Source: [headroom/cli/wrap.py:170-200]()

## Codex Integration

For Codex, the MCP server registration differs slightly:

### Config File Paths

```python
def _codex_config_paths() -> tuple[Path, Path]:
    """Return ``(config_file, backup_file)`` paths for the Codex TOML config."""
    config_dir = Path.home() / ".codex"
    config_file = config_dir / "config.toml"
    backup_file = config_dir / f"config.toml{_CODEX_CONFIG_BACKUP_SUFFIX}"
    return config_file, backup_file
```

Source: [headroom/cli/wrap.py:95-102]()

### Idempotent Registration

MCP registration is idempotent — existing sections are replaced:

```python
if _MEMORY_MCP_MARKER in content:
    start = content.index(_MEMORY_MCP_MARKER)
    end = content.index(_MEMORY_MCP_END) + len(_MEMORY_MCP_END)
    content = content[:start].rstrip("\n") + mcp_section + content[end:].lstrip("\n")
else:
    content = content.rstrip() + "\n" + mcp_section
```

Source: [headroom/cli/wrap.py:140-155]()

### Configuration Backup

Pre-wrap state is snapshotted to enable clean unwrapping:

```python
def _snapshot_codex_config_if_unwrapped(config_file: Path, backup_file: Path) -> None:
    """Snapshot ~/.codex/config.toml BEFORE any wrap-time mutation."""
```

Source: [headroom/cli/wrap.py:104-120]()

## TypeScript SDK Integration

The TypeScript SDK supports shared context for multi-agent setups:

```typescript
import { SharedContext } from "@headroomhq/sdk";

// Create shared context
const ctx = new SharedContext();

// Store compressed content
await ctx.put("k8s-scaling-research", compressedContent);

// Retrieve later
const compressed = await ctx.get("k8s-scaling-research");

// Access stats
const stats = ctx.stats();
console.log(`Total saved: ${stats.totalTokensSaved}`);
```

Source: [sdk/typescript/examples/shared-context-multi-agent.ts]()

## Configuration Options

### CLI Options

| Option | Description | Default |
|--------|-------------|---------|
| `--no-mcp` | Skip MCP retrieve tool registration | `False` |
| `--no-serena` | Skip Serena MCP registration | `False` |
| `--memory` | Enable persistent cross-session memory | `False` |
| `--code-graph` | Enable code graph indexing via codebase-memory-mcp | `False` |

Source: [headroom/cli/wrap.py:40-75]()

### Environment Variables

| Variable | Description |
|----------|-------------|
| `ANTHROPIC_BASE_URL` | Route Claude Code traffic through proxy (set to `http://127.0.0.1:8787`) |

## Known Limitations

### Multi-Agent Message Attribution

> **Issue #503** — CCR proactive expansion blocks may corrupt message attribution in multi-agent threads. The `_append_context_to_latest_non_frozen_user_turn()` function injects proactive expansion blocks into the latest user message content, which can contain structured XML attribution markup (`<peer_turn from="AgentX">`). The injected block ends up corrupting this structure.

### HTTP Endpoint Misconception

> **Issue #460** — The MCP server operates via stdio, not HTTP. The proxy does not expose an HTTP endpoint at `/mcp`. Users should run `headroom mcp serve` as a standalone process, not expect `/mcp` on the proxy server.

## Examples

### Running the MCP Demo

```bash
# Configure API key
export OPENAI_API_KEY='your-key'

# Run the MCP demo
PYTHONPATH=. python -m examples.mcp_demo.run_agent_eval
```

Source: [examples/README.md]()

### AWS Bedrock with Strands

```bash
# Configure AWS credentials
export AWS_ACCESS_KEY_ID='your-access-key'
export AWS_SECRET_ACCESS_KEY='your-secret-key'
export AWS_DEFAULT_REGION='us-west-2'

# Run the demo
python examples/strands_bedrock_demo.py
```

Source: [examples/README.md]()

## Related Documentation

- [Claude Code Integration](cli-integration/claude-code.md) — Complete Claude Code setup
- [Codex Integration](cli-integration/codex.md) — Complete Codex setup
- [Memory System](features/memory.md) — Cross-session memory architecture
- [CCR Pattern](features/CCR.md) — Compress-Cache-Retrieve details

---

<a id='proxy-deployment'></a>

## Proxy Deployment

### Related Pages

Related topics: [Getting Started](#getting-started), [CLI Wrappers](#cli-wrappers)

<details>
<summary>Related Source Files</summary>

The following source files were used to generate this page:

- [README.md](https://github.com/chopratejas/headroom/blob/main/README.md)
- [headroom/cli/proxy.py](https://github.com/chopratejas/headroom/blob/main/headroom/cli/proxy.py)
- [headroom/cli/wrap.py](https://github.com/chopratejas/headroom/blob/main/headroom/cli/wrap.py)
- [headroom/cli/perf.py](https://github.com/chopratejas/headroom/blob/main/headroom/cli/perf.py)
- [sdk/typescript/src/types/config.ts](https://github.com/chopratejas/headroom/blob/main/sdk/typescript/src/types/config.ts)
- [sdk/typescript/src/index.ts](https://github.com/chopratejas/headroom/blob/main/sdk/typescript/src/index.ts)
</details>

# Proxy Deployment

## Overview

The Headroom proxy is a central component that intercepts, optimizes, and routes LLM API traffic through Headroom's context compression pipeline. It serves as the contextual optimization layer between AI coding tools (Claude Code, Codex, Goose, Continue, etc.) and upstream LLM providers like Anthropic and OpenAI.

The proxy enables:
- **Token savings** through compression transforms (CacheAligner, SmartCrusher, IntelligentContext)
- **Reversible compression** via CCR (Context Compression Retrieval) — originals remain retrievable on demand
- **Shared memory** across multi-agent workflows
- **Semantic caching** for repeated query patterns
- **Cross-agent context passing** via SharedContext

Source: [README.md]()

## Architecture

```mermaid
graph TD
    subgraph "AI Coding Tools"
        Claude[Claude Code]
        Codex[OpenAI Codex]
        Goose[Goose]
        Continue[Continue Dev]
        OpenHands[OpenHands]
        Custom[Custom SDK / App]
    end

    subgraph "Headroom Proxy"
        Intercept[Request Interception]
        Pipeline[Compression Pipeline]
        Memory[Cross-Agent Memory]
        CCR[CCR Retrieval]
        Cache[Semantic Cache]
    end

    subgraph "LLM Providers"
        Anthropic[Anthropic /v1/messages]
        OpenAI[OpenAI /v1/chat/completions]
        Vertex[Vertex AI]
        Bedrock[AWS Bedrock]
    end

    Claude --> Intercept
    Codex --> Intercept
    Goose --> Intercept
    Continue --> Intercept
    OpenHands --> Intercept
    Custom --> Intercept

    Intercept --> Pipeline
    Pipeline --> Memory
    Pipeline --> CCR
    Pipeline --> Cache

    Pipeline --> Anthropic
    Pipeline --> OpenAI
```

### Request Lifecycle

The proxy exposes one stable request lifecycle across all integration paths:

`Setup` → `Pre-Start` → `Post-Start` → `Input Received` → `Input Cached` → `Input Routed` → `Input Compressed` → `Input Remembered` → `Pre-Send` → `Post-Send` → `Response Received`

Source: [README.md]()

### Pipeline Transforms

| Transform | Purpose |
|-----------|---------|
| CacheAligner | Stabilizes prefixes so KV caches hit effectively |
| ContentRouter | Routes content to appropriate compression strategies |
| SmartCrusher | ML-based compression routing (~90% reduction) |
| CodeCompressor | Specialized code content handling |
| Kompress-base | Trained ML text compression |
| IntelligentContext | Score-based context fitting with learned importance |
| RollingWindow | Sliding conversation window management |

Source: [README.md]()

## CLI Commands

### `headroom proxy`

Starts the Headroom proxy server. This is the primary command for deploying the proxy as a standalone service.

```bash
headroom proxy [OPTIONS]
```

#### Core Options

| Option | Default | Description |
|--------|---------|-------------|
| `--host` | `127.0.0.1` | Host to bind to |
| `--port`, `-p` | `8787` | Proxy port |
| `--mode` | `optimize` | Proxy mode: `audit`, `optimize`, `simulate` |
| `--backend` | `anthropic` | API backend: `anthropic`, `anyllm`, `litellm-vertex` |
| `--anyllm-provider` | `None` | Provider for any-llm backend |
| `--region` | `None` | Cloud region for Bedrock/Vertex |
| `--exclude-tools` | — | Comma-separated tools to exclude from processing |
| `--no-optimize` | — | Disable optimization (passthrough mode) |
| `--no-cache` | — | Disable semantic caching |
| `--no-rate-limit` | — | Disable rate limiting |
| `--no-subscription-tracking` | — | Disable Anthropic subscription usage poller |
| `--intercept-tool-results` | — | Enable tool_result interceptors (opt-in) |
| `--memory` | — | Enable persistent cross-session memory |
| `--learn` | — | Enable live traffic learning |
| `--verbose`, `-v` | — | Verbose output |

Source: [headroom/cli/proxy.py](headroom/cli/proxy.py)

#### Environment Variables

| Variable | Description |
|----------|-------------|
| `HEADROOM_HOST` | Proxy host binding |
| `HEADROOM_PORT` | Proxy port |
| `HEADROOM_MODE` | Proxy mode |
| `HEADROOM_BACKEND` | API backend selection |
| `HEADROOM_ANYLLM_PROVIDER` | Provider for any-llm backend |
| `HEADROOM_REGION` | Cloud region for Bedrock/Vertex |
| `HEADROOM_EXCLUDE_TOOLS` | Tools to exclude |
| `HEADROOM_NO_SUBSCRIPTION_TRACKING` | Disable subscription poller |
| `HEADROOM_PROXY_EXTENSIONS` | Enabled proxy extensions |
| `HEADROOM_CONTEXT_TOOL` | Context tool selection: `rtk` or `lean-ctx` |

Source: [headroom/cli/proxy.py](headroom/cli/proxy.py)

### `headroom perf`

Analyzes proxy performance from logs.

```bash
headroom perf [OPTIONS]
```

| Option | Default | Description |
|--------|---------|-------------|
| `--hours` | `168` (7 days) | Analyze logs from last N hours |
| `--raw` | — | Show raw PERF records instead of formatted report |

Source: [headroom/cli/perf.py](headroom/cli/perf.py)

### `headroom wrap <tool>`

Launches AI coding tools with the proxy automatically configured. Available wrappers:

| Tool | Command |
|------|---------|
| Claude Code | `headroom wrap claude` |
| GitHub Copilot CLI | `headroom wrap copilot` |
| OpenAI Codex | `headroom wrap codex` |
| Aider | `headroom wrap aider` |
| Cursor | `headroom wrap cursor` |
| Goose | `headroom wrap goose` |
| OpenHands | `headroom wrap openhands` |
| Continue | `headroom wrap continue` |
| OpenClaw | `headroom wrap openclaw` |

Each wrap command shares common options:

| Option | Description |
|--------|-------------|
| `--port`, `-p` | Proxy port (default: 8787) |
| `--no-context-tool` / `--no-rtk` | Skip CLI context-tool setup |
| `--no-proxy` | Skip proxy startup (use existing) |
| `--no-mcp` | Skip headroom MCP server registration |
| `--no-serena` | Skip Serena MCP server registration |
| `--code-graph` | Enable code graph indexing via codebase-memory-mcp |
| `--memory` | Enable persistent cross-session memory |
| `--learn` | Enable live traffic learning |
| `--backend` | API backend selection |
| `--anyllm-provider` | Provider for any-llm backend |
| `--region` | Cloud region for Bedrock/Vertex |
| `--verbose`, `-v` | Verbose output |
| `--prepare-only` | Prepare environment without launching tool |

Source: [headroom/cli/wrap.py](headroom/cli/wrap.py)

## Configuration

### Proxy Modes

| Mode | Description |
|------|-------------|
| `audit` | Log requests/responses without modification |
| `optimize` | Full compression and optimization enabled |
| `simulate` | Preview compression effects without API calls |

### TypeScript Configuration Types

```typescript
export type HeadroomMode = "audit" | "optimize" | "simulate";

export interface CompressionProfile {
  cacheAligner?: CacheAlignerConfig;
  rollingWindow?: RollingWindowConfig;
  scoringWeights?: ScoringWeights;
  intelligentContext?: IntelligentContextConfig;
  smartCrusher?: SmartCrusherConfig;
  cacheOptimizer?: CacheOptimizerConfig;
  ccr?: CCRConfig;
  prefixFreeze?: PrefixFreezeConfig;
}
```

Source: [sdk/typescript/src/types/config.ts](sdk/typescript/src/types/config.ts)

### HeadroomConfig Interface

```typescript
export interface HeadroomConfig {
  mode?: HeadroomMode;
  optimize?: boolean;
  cacheEnabled?: boolean;
  rateLimitEnabled?: boolean;
  profile?: CompressionProfile;
  toolCrusher?: ToolCrusherConfig;
  memory?: MemoryConfig;
  extensions?: string[];
}
```

Source: [sdk/typescript/src/types/config.ts](sdk/typescript/src/types/config.ts)

## Integration Patterns

### SDK Integration

#### Python

```python
from headroom import compress

result = compress(messages, model="claude-sonnet-4-20250514")
```

#### TypeScript

```typescript
import { compress } from 'headroom-ai';

const result = await compress(messages, { model: 'claude-sonnet-4-20250514' });
```

Source: [README.md]()

### SDK Wrapper Integration

```python
from headroom import withHeadroom

# Wrap Anthropic SDK
client = withHeadroom(Anthropic())

# Wrap OpenAI SDK
client = withHeadroom(OpenAI())
```

Source: [README.md]()

### Vercel AI SDK Integration

```typescript
import { wrapLanguageModel } from 'ai';
import { headroomMiddleware } from 'headroom-ai';

const model = wrapLanguageModel({
  model: yourModel,
  middleware: headroomMiddleware(),
});
```

Source: [README.md]()

### LiteLLM Integration

```python
import litellm
from headroom.integrations.litellm import HeadroomCallback

litellm.callbacks = [HeadroomCallback()]
```

### LangChain Integration

LangChain supports callback-based integration for Headroom compression.

## Supported API Routes

The proxy routes traffic to different upstream providers:

| Route | Upstream Target |
|-------|-----------------|
| `/v1/messages` | Anthropic API |
| `/v1/chat/completions` | OpenAI API |
| `/v1/responses` | OpenAI API (HTTP + WebSocket) |
| `/v1internal:streamGenerateContent` | CloudCode API |

Source: [headroom/cli/proxy.py](headroom/cli/proxy.py)

## Installation

### Python Package

```bash
# Full installation
pip install "headroom-ai[all]"

# Granular extras
pip install "headroom-ai[proxy]"    # Proxy only
pip install "headroom-ai[mcp]"      # MCP support
pip install "headroom-ai[ml]"       # Kompress-base ML
pip install "headroom-ai[agno]"     # Agno framework
pip install "headroom-ai[langchain]" # LangChain integration
pip install "headroom-ai[evals]"     # Evaluation tools
```

Requires Python 3.10+.

### Docker

```bash
docker pull ghcr.io/chopratejas/headroom:latest
```

### npm / TypeScript

```bash
npm install headroom-ai
```

Source: [README.md]()

## Known Limitations and Issues

### MCP Endpoint Unavailability

The proxy does not expose an HTTP MCP endpoint at `/mcp`. The MCP server functionality requires stdio-based communication, not HTTP routing. Users should use `headroom mcp install` for MCP integration rather than expecting HTTP endpoint access through the proxy.

Source: [GitHub Issue #460](https://github.com/chopratejas/headroom/issues/460)

### CCR in Multi-Agent Threads

The `_append_context_to_latest_non_frozen_user_turn()` function injects proactive expansion blocks into the latest user message content. In multi-agent setups where messages contain structured XML attribution markup (`<peer_turn from="AgentX">`), injected blocks may corrupt message attribution.

Source: [GitHub Issue #503](https://github.com/chopratejas/headroom/issues/503)

### Provider-Agnostic Limitations

The proxy currently intercepts traffic at the Anthropic API level (`/v1/messages`). Users on AWS Bedrock, OpenAI, or Google Vertex cannot use the proxy because their LLM traffic goes through provider-specific SDKs with different authentication mechanisms (SigV4 for Bedrock, API keys for OpenAI).

Source: [GitHub Issue #510](https://github.com/chopratejas/headroom/issues/510)

## Proxy Extensions

Proxy extensions provide integration points for ASGI middleware, custom routes, and startup policy:

```bash
headroom proxy --proxy-extension my-extension --proxy-extension another-extension
```

Use `--proxy-extension '*'` to enable all discovered extensions.

Source: [headroom/cli/proxy.py](headroom/cli/proxy.py)

## Memory and Learning

### Persistent Memory

Enable cross-session memory with the `--memory` flag:

```bash
headroom proxy --memory
```

Memory storage is per-project to prevent cross-project memory bleeding (fixed in v0.21.34).

### Live Traffic Learning

Enable pattern learning from agent failures:

```bash
headroom proxy --learn
headroom wrap claude --learn
```

Patterns are saved to `AGENTS.md` and used to improve future compression decisions.

## Exports Reference

The TypeScript SDK exports the following proxy-related types and functions:

```typescript
export type {
  HeadroomMode,
  RelevanceTier,
  ContentType,
  BlockKind,
  CompressionProfile,
  HeadroomConfig,
  WasteSignals,
  CachePrefixMetrics,
  TransformDiff,
  RequestMetrics,
  ProxyStats,
} from "./types/config.js";

export type {
  MetricsSummary,
  HealthStatus,
  ProxyStats,
  MemoryUsage,
} from "./types/models.js";
```

Source: [sdk/typescript/src/index.ts](sdk/typescript/src/index.ts)

---

<a id='cli-wrappers'></a>

## CLI Wrappers

### Related Pages

Related topics: [Proxy Deployment](#proxy-deployment)

<details>
<summary>Related Source Files</summary>

The following source files were used to generate this page:

- [headroom/cli/wrap.py](https://github.com/chopratejas/headroom/blob/main/headroom/cli/wrap.py)
- [headroom/providers/claude/runtime.py](https://github.com/chopratejas/headroom/blob/main/headroom/providers/claude/runtime.py)
- [headroom/providers/codex/runtime.py](https://github.com/chopratejas/headroom/blob/main/headroom/providers/codex/runtime.py)
- [headroom/providers/cursor/runtime.py](https://github.com/chopratejas/headroom/blob/main/headroom/providers/cursor/runtime.py)
- [plugins/openclaw/src/index.ts](https://github.com/chopratejas/headroom/blob/main/plugins/openclaw/src/index.ts)
</details>

# CLI Wrappers

## Overview

CLI Wrappers (`headroom wrap`) are the primary entry point for integrating Headroom's context compression with standalone AI coding assistants. They automate the setup of the Headroom proxy, MCP servers, CLI context tools (RTK or lean-ctx), and memory integration—eliminating manual configuration for supported tools.

The wrapper system acts as a launch orchestrator that:

1. Starts the Headroom proxy server on a configurable port
2. Configures the target CLI tool to route API calls through the proxy
3. Registers MCP servers for compression marker retrieval
4. Injects context tool instructions into the CLI's configuration files
5. Optionally enables persistent cross-session memory

**Source:** [headroom/cli/wrap.py:1-100]()

## Supported CLI Tools

| Tool | Command | Supported Options |
|------|---------|-------------------|
| Claude Code | `headroom wrap claude` | `--memory`, `--resume`, `--model`, `--code-graph`, `--no-context-tool`, `--no-mcp`, `--no-serena` |
| OpenCode | `headroom wrap codex` | `--port`, `--backend`, `--anyllm-provider`, `--no-context-tool`, `--no-mcp`, `--no-serena`, `--no-proxy` |
| Continue | `headroom wrap continue` | `--config`, `--memory`, `--no-rtk`, `--no-proxy`, `--learn` |
| Goose | `headroom wrap goose` | Standard wrap options |
| OpenHands | `headroom wrap openhands` | Standard wrap options |
| Cursor | `headroom wrap cursor` | Standard wrap options |

**Source:** [headroom/cli/wrap.py:150-300]()

## Architecture

```mermaid
graph TD
    A["headroom wrap <tool>"] --> B[Parse CLI Arguments]
    B --> C{prepare_only flag?}
    C -->|Yes| D[Setup Context Tool Only]
    C -->|No| E[Snapshot Pre-Wrap Config]
    E --> F[Setup Context Tool]
    F --> G[Register MCP Servers]
    G --> H[Start Headroom Proxy]
    H --> I[Inject Config Into CLI]
    I --> J[Launch Target CLI Tool]
    J --> K[Monitor & Forward Traffic]
    
    L[Proxy Server] <--> M[Compression Engine]
    M --> N[CacheAligner]
    M --> O[SmartCrusher]
    M --> P[CCR Markers]
    
    K --> L
    P --> Q[MCP Retrieve Tool]
    Q --> R[LLM Retrieval on Demand]
```

### Component Responsibilities

| Component | File | Role |
|-----------|------|------|
| Wrap Command Dispatcher | `headroom/cli/wrap.py` | Parses arguments, routes to provider runtime |
| Claude Runtime | `headroom/providers/claude/runtime.py` | Claude Code specific setup and lifecycle |
| Codex Runtime | `headroom/providers/codex/runtime.py` | OpenCode/Codex specific setup |
| MCP Registry | `headroom/mcp_registry/` | MCP server registration for all tools |
| Proxy Manager | `plugins/openclaw/src/index.ts` | Cross-platform proxy command resolution |

**Source:** [headroom/cli/wrap.py:200-280]()

## Common Command Options

### Proxy Configuration

| Option | Environment Variable | Description |
|--------|---------------------|-------------|
| `--port <n>` | `HEADROOM_PORT` | Proxy listen port (default: 8080) |
| `--backend <backend>` | `HEADROOM_BACKEND` | API backend: `anthropic`, `anyllm`, `litellm-vertex` |
| `--anyllm-provider <provider>` | `HEADROOM_ANYLLM_PROVIDER` | Provider for any-llm: `openai`, `mistral`, `groq` |
| `--region <region>` | `HEADROOM_REGION` | Cloud region for Bedrock/Vertex |
| `--no-proxy` | - | Use existing proxy instead of starting new one |

**Source:** [headroom/cli/wrap.py:220-260]()

### Context Tool Options

| Option | Description |
|--------|-------------|
| `--no-context-tool` / `--no-rtk` | Skip CLI context-tool setup (RTK or lean-ctx) |
| `--learn` | Enable live traffic learning, patterns saved to AGENTS.md |

### MCP Integration Options

| Option | Description |
|--------|-------------|
| `--no-mcp` | Skip headroom MCP server registration |
| `--no-serena` | Skip Serena MCP server registration |

### Memory Options

| Option | Description |
|--------|-------------|
| `--memory` | Enable persistent cross-session memory |
| `--resume <id>` | Resume a specific memory session (Claude-specific) |

**Source:** [headroom/cli/wrap.py:260-320]()

## Claude Code Wrapper

The `headroom wrap claude` command provides deep integration with Anthropic's Claude Code CLI.

```bash
# Basic usage
headroom wrap claude

# With persistent memory
headroom wrap claude --memory

# Resume a session
headroom wrap claude --resume <session-id>

# Pass arguments to Claude
headroom wrap claude -- "fix the bug"

# With code graph intelligence
headroom wrap claude --code-graph

# Skip context tool setup
headroom wrap claude --no-context-tool
```

### Claude-Specific Setup Flow

```mermaid
sequenceDiagram
    participant User
    participant CLI as headroom wrap claude
    participant RTK as RTK/lean-ctx
    participant Config as Claude Config
    participant Proxy as Headroom Proxy
    participant MCP as MCP Server
    
    User->>CLI: headroom wrap claude --memory
    CLI->>Config: Snapshot pre-wrap state
    CLI->>RTK: Setup context tool
    RTK->>Config: Inject instructions into CLAUDE.md
    CLI->>MCP: Register headroom MCP server
    CLI->>MCP: Register Serena MCP server
    CLI->>Proxy: Start proxy on port 8080
    CLI->>Config: Set ANTHROPIC_BASE_URL to proxy
    CLI->>User: Launch Claude Code
```

**Source:** [headroom/providers/claude/runtime.py:1-150]()

## OpenCode/Codex Wrapper

The `headroom wrap codex` command integrates with OpenCode (formerly OpenCode/Codex).

```bash
# Basic usage
headroom wrap codex

# Custom proxy port
headroom wrap codex --port 9999

# Pass prompt to codex
headroom wrap codex -- "fix the bug"

# With specific backend
headroom wrap codex --backend anyllm --anyllm-provider groq

# Skip all tool registration
headroom wrap codex --no-context-tool --no-mcp --no-serena
```

### Codex Configuration Handling

The wrapper snapshots `~/.codex/config.toml` before any modifications, ensuring `headroom unwrap codex` can restore the original state byte-for-byte.

```python
# Snapshot happens BEFORE MCP install
_codex_config_file, _codex_backup_file = _codex_config_paths()
_snapshot_codex_config_if_unwrapped(_codex_config_file, _codex_backup_file)
```

**Source:** [headroom/cli/wrap.py:60-100]()

## Continue IDE Wrapper

The `headroom wrap continue` command configures the Continue VS Code/JetBrains extension.

```bash
# Basic usage
headroom wrap continue

# With custom config path
headroom wrap continue --config .continue/config.json

# Enable learning
headroom wrap continue --learn
```

### Continue Configuration Injection

The wrapper injects RTK guidance into both top-level and per-model `systemMessage` fields:

```python
# Non-string systemMessage values are NEVER overwritten
# Only string values get the RTK marker injected
if isinstance(existing_value, str):
    # Append RTK instructions
```

**Source:** [headroom/cli/wrap.py:400-500]()

## Context Tool Integration

### RTK (Default)

The default context tool uses [RTK](https://github.com/rtk-ai/rtk) for shell output rewriting.

| Command Category | Commands | Typical Savings |
|-----------------|----------|-----------------|
| Git | `rtk git diff`, `rtk git log` | 40-60% |
| Files & Search | `rtk ls`, `rtk read`, `rtk grep` | 60-75% |
| Testing | `rtk pytest`, `rtk cargo test` | 90-99% |
| Build & Lint | `rtk tsc`, `rtk lint`, `rtk ruff check` | 80-90% |
| Infrastructure | `rtk docker ps`, `rtk kubectl get` | 85% |

### lean-ctx Alternative

Set `HEADROOM_CONTEXT_TOOL=lean-ctx` before running wrap commands to use [lean-ctx](https://github.com/yvgude/lean-ctx) instead of RTK.

**Source:** [headroom/cli/wrap.py:500-600]()

## MCP Server Registration

CLI wrappers automatically register MCP servers that enable on-demand decompression of CCR markers.

```mermaid
graph LR
    A[Compressed Content<br/>with CCR Markers] --> B[headroom_retrieve MCP Tool]
    B --> C[Headroom Proxy]
    C --> D[Decompressed Original]
    D --> E[LLM Processing]
```

### Supported MCP Tools

| Tool | Purpose | Registration |
|------|---------|--------------|
| headroom | Primary compression retrieval | Auto-registered in CLI config |
| serena | Additional context handling | Auto-registered unless `--no-serena` |

**Source:** [headroom/cli/wrap.py:100-150]()

## Memory Integration

When `--memory` is enabled, the wrapper:

1. Syncs Headroom's memory database with the CLI's conversation files
2. Enables cross-session context persistence
3. Registers memory-specific MCP tools

```bash
# Memory is automatically synced before proxy startup
if memory:
    _memory_sync(proxy_holder, port)
```

**Source:** [headroom/providers/claude/runtime.py:200-250]()

## Cleanup and Unwrap

CLI wrappers handle graceful cleanup on SIGINT/SIGTERM:

1. Restore original CLI configuration files
2. Stop the proxy server
3. Remove MCP server registrations

```python
cleanup = _make_cleanup(proxy_holder, port)
signal.signal(signal.SIGINT, _ignore_child_sigint)
signal.signal(signal.SIGTERM, cleanup)
```

**Source:** [headroom/cli/wrap.py:300-350]()

## Known Issues and Limitations

### MCP Endpoint Unavailability

The MCP docs currently imply that `headroom proxy` can be used as an HTTP MCP endpoint at `/mcp`, but the installed package returns `404` for this endpoint while the stdio MCP server works correctly.

> See: [Issue #460](https://github.com/chopratejas/headroom/issues/460) - docs: clarify MCP setup when proxy /mcp is unavailable

### CCR in Multi-Agent Threads

When using CCR (Context Compression Retrieval) with multi-agent setups, proactive expansion blocks can corrupt message attribution when injected into messages containing XML markup like `<peer_turn from="AgentX">`.

> See: [Issue #503](https://github.com/chopratejas/headroom/issues/503) - CCR proactive expansion blocks corrupt message attribution in multi-agent threads

## Related Community Features

### Feature Requests

- **#74**: [headroom wrap opencode — CLI wrapper for OpenCode](https://github.com/chopratejas/headroom/issues/74)
- **#76**: [OpenCode plugin — headroom-opencode npm package](https://github.com/chopratejas/headroom/issues/76)
- **#510**: [provider-agnostic proxy mode (Bedrock, OpenAI, Vertex)](https://github.com/chopratejas/headroom/issues/510)

## See Also

- [Headroom Proxy](headroom-proxy.md) - The compression proxy architecture
- [CCR (Context Compression Retrieval)](context-compression-retrieval.md) - Reversible compression system
- [MCP Integration](mcp-integration.md) - Model Context Protocol setup
- [Memory System](memory-system.md) - Cross-session memory management

---

<!-- evidence_pipeline_checked: true -->
<!-- evidence_injected: true -->

---

## Pitfall Log

Project: chopratejas/headroom

Summary: Found 15 structured pitfall item(s), including 2 high/blocking item(s). Top priority: Configuration risk - Configuration risk requires verification.

## 1. Configuration risk - Configuration risk requires verification

- Severity: high
- Evidence strength: source_linked
- Finding: Project evidence flags a configuration risk. Review the linked source before relying on this workflow.
- User impact: May increase setup, validation, or first-run risk for the user.
- Suggested check: Reproduce the official install and quickstart path in an isolated environment.
- Evidence: community_evidence:github | cevd_04817419db9f40abb9c953ce30494c44 | https://github.com/chopratejas/headroom/issues/517

## 2. Security or permission risk - Security or permission risk requires verification

- Severity: high
- Evidence strength: source_linked
- Finding: Project evidence flags a security or permission risk. Review the linked source before relying on this workflow.
- User impact: May increase setup, validation, or first-run risk for the user.
- Suggested check: Reproduce the official install and quickstart path in an isolated environment.
- Evidence: community_evidence:github | cevd_7be4ca48f77a496cadb0a00d943bb95a | https://github.com/chopratejas/headroom/issues/488

## 3. Installation risk - Installation risk requires verification

- Severity: medium
- Evidence strength: source_linked
- Finding: Project evidence flags a installation risk. Review the linked source before relying on this workflow.
- User impact: May increase setup, validation, or first-run risk for the user.
- Suggested check: Reproduce the official install and quickstart path in an isolated environment.
- Evidence: community_evidence:github | cevd_46c97725a0304b659cfaa50b79312fcd | https://github.com/chopratejas/headroom/issues/525

## 4. Installation risk - Installation risk requires verification

- Severity: medium
- Evidence strength: source_linked
- Finding: Project evidence flags a installation risk. Review the linked source before relying on this workflow.
- User impact: May increase setup, validation, or first-run risk for the user.
- Suggested check: Reproduce the official install and quickstart path in an isolated environment.
- Evidence: community_evidence:github | cevd_109db57bc201482abc7bf318a0ee4792 | https://github.com/chopratejas/headroom/issues/460

## 5. Configuration risk - Configuration risk requires verification

- Severity: medium
- Evidence strength: source_linked
- Finding: Project evidence flags a configuration risk. Review the linked source before relying on this workflow.
- User impact: May increase setup, validation, or first-run risk for the user.
- Suggested check: Reproduce the official install and quickstart path in an isolated environment.
- Evidence: capability.host_targets | github_repo:1129940957 | https://github.com/chopratejas/headroom

## 6. Capability evidence risk - Capability evidence risk requires verification

- Severity: medium
- Evidence strength: source_linked
- Finding: README/documentation is current enough for a first validation pass.
- User impact: May increase setup, validation, or first-run risk for the user.
- Suggested check: Reproduce the official install and quickstart path in an isolated environment.
- Evidence: capability.assumptions | github_repo:1129940957 | https://github.com/chopratejas/headroom

## 7. Maintenance risk - Maintenance risk requires verification

- Severity: medium
- Evidence strength: source_linked
- Finding: Project evidence flags a maintenance risk. Review the linked source before relying on this workflow.
- User impact: May increase setup, validation, or first-run risk for the user.
- Suggested check: Reproduce the official install and quickstart path in an isolated environment.
- Evidence: evidence.maintainer_signals | github_repo:1129940957 | https://github.com/chopratejas/headroom

## 8. Security or permission risk - Security or permission risk requires verification

- Severity: medium
- Evidence strength: source_linked
- Finding: no_demo
- User impact: May increase setup, validation, or first-run risk for the user.
- Suggested check: Reproduce the official install and quickstart path in an isolated environment.
- Evidence: downstream_validation.risk_items | github_repo:1129940957 | https://github.com/chopratejas/headroom

## 9. Security or permission risk - Security or permission risk requires verification

- Severity: medium
- Evidence strength: source_linked
- Finding: no_demo
- User impact: May increase setup, validation, or first-run risk for the user.
- Suggested check: Reproduce the official install and quickstart path in an isolated environment.
- Evidence: risks.scoring_risks | github_repo:1129940957 | https://github.com/chopratejas/headroom

## 10. Security or permission risk - Security or permission risk requires verification

- Severity: medium
- Evidence strength: source_linked
- Finding: Project evidence flags a security or permission risk. Review the linked source before relying on this workflow.
- User impact: May increase setup, validation, or first-run risk for the user.
- Suggested check: Reproduce the official install and quickstart path in an isolated environment.
- Evidence: community_evidence:github | cevd_a0e9cf430514488eb093dc09b617e6ca | https://github.com/chopratejas/headroom/issues/520

## 11. Security or permission risk - Security or permission risk requires verification

- Severity: medium
- Evidence strength: source_linked
- Finding: Project evidence flags a security or permission risk. Review the linked source before relying on this workflow.
- User impact: May increase setup, validation, or first-run risk for the user.
- Suggested check: Reproduce the official install and quickstart path in an isolated environment.
- Evidence: community_evidence:github | cevd_9ccb556fe9cd431cba78c2ee3ebb27ea | https://github.com/chopratejas/headroom/issues/510

## 12. Security or permission risk - Security or permission risk requires verification

- Severity: medium
- Evidence strength: source_linked
- Finding: Project evidence flags a security or permission risk. Review the linked source before relying on this workflow.
- User impact: May increase setup, validation, or first-run risk for the user.
- Suggested check: Reproduce the official install and quickstart path in an isolated environment.
- Evidence: community_evidence:github | cevd_5ab74a3c022a4924998aaa72cc334c04 | https://github.com/chopratejas/headroom/issues/503

## 13. Security or permission risk - Security or permission risk requires verification

- Severity: medium
- Evidence strength: source_linked
- Finding: Project evidence flags a security or permission risk. Review the linked source before relying on this workflow.
- User impact: May increase setup, validation, or first-run risk for the user.
- Suggested check: Reproduce the official install and quickstart path in an isolated environment.
- Evidence: community_evidence:github | cevd_5f07fe5a304a4d13b4c219d359e69e4a | https://github.com/chopratejas/headroom/issues/260

## 14. Maintenance risk - Maintenance risk requires verification

- Severity: low
- Evidence strength: source_linked
- Finding: issue_or_pr_quality=unknown。
- User impact: May increase setup, validation, or first-run risk for the user.
- Suggested check: Reproduce the official install and quickstart path in an isolated environment.
- Evidence: evidence.maintainer_signals | github_repo:1129940957 | https://github.com/chopratejas/headroom

## 15. Maintenance risk - Maintenance risk requires verification

- Severity: low
- Evidence strength: source_linked
- Finding: release_recency=unknown。
- User impact: May increase setup, validation, or first-run risk for the user.
- Suggested check: Reproduce the official install and quickstart path in an isolated environment.
- Evidence: evidence.maintainer_signals | github_repo:1129940957 | https://github.com/chopratejas/headroom

<!-- canonical_name: chopratejas/headroom; human_manual_source: deepwiki_human_wiki -->