Doramagic Project Pack · Human Manual

headroom

Related topics: Getting Started, Architecture

Introduction

Related topics: Getting Started, Architecture

Section Related Pages

Continue reading this section for the full explanation and source context.

Section Transform Components

Continue reading this section for the full explanation and source context.

Section Extension Seams

Continue reading this section for the full explanation and source context.

Section Supported Agents

Continue reading this section for the full explanation and source context.

Related topics: Getting Started, Architecture

Introduction

Headroom is a context compression framework designed to reduce token usage and costs when working with large language models (LLMs) in AI-assisted coding workflows. By intelligently compressing conversation history, tool outputs, and context before sending to the LLM, Headroom achieves 60–90% token savings while preserving critical information.

Overview

Headroom intercepts and optimizes AI traffic through multiple integration points:

Integration MethodUse CaseConfiguration
CLI Wrapper (headroom wrap)Claude Code, Codex, Continue, Goose, OpenHandsheadroom wrap claude
SDK IntegrationPython/TypeScript applicationswithHeadroom(new Anthropic())
ASGI MiddlewareWeb applicationsapp.add_middleware(CompressionMiddleware)
MCP ServerModel Context Protocol clientsheadroom mcp install
Proxy ServerAny HTTP-based LLM trafficheadroom proxy --port 8080

Source: README.md:1-25

Core Architecture

The Headroom pipeline exposes one stable request lifecycle across all integration methods:

Setup → Pre-Start → Post-Start → Input Received → Input Cached → Input Routed → Input Compressed → Input Remembered → Pre-Send → Post-Send → Response Received

Transform Components

ComponentPurposeSavings
SmartCrusherUniversal JSON compression (arrays, nested objects, mixed types)40–70%
CodeCompressorAST-aware compression for Python, JS, Go, Rust, Java, C++50–80%
Kompress-baseHuggingFace model trained on agentic traces40–90%
CacheAlignerStabilizes prefixes for Anthropic/OpenAI KV cache hitsVariable
IntelligentContextScore-based context fitting with learned importance30–60%
CCR (Context Compression Retrieval)Reversible compression with on-demand retrieval40–80%

Source: README.md:45-60

Extension Seams

  • Pipeline extensions — observe or customize lifecycle stages via on_pipeline_event(...)
  • Compression hooks — additional extension points alongside the canonical lifecycle
  • Proxy extensions — server/app integration seam for ASGI middleware, routes, and startup policy

Source: README.md:55-58

CLI Wrappers

The headroom wrap command provides zero-configuration setup for popular AI coding assistants:

headroom wrap claude                    # Start everything
headroom wrap claude --memory           # With persistent memory
headroom wrap claude --resume <id>      # Resume a session
headroom wrap claude --code-graph       # With code graph intelligence
headroom wrap claude --no-context-tool  # Skip CLI context-tool setup

Supported Agents

AgentCommandKey Features
Claude Codeheadroom wrap claudeMemory sync, MCP retrieve, Serena integration
Codexheadroom wrap codexRTK injection, MCP registration, config snapshot
Continueheadroom wrap continueConfig.toml modification, systemMessage injection
Gooseheadroom wrap gooseIndependent session handling
OpenHandsheadroom wrap openhandsRecent support (v0.22.4)
OpenCodePlannedFeature request #74

Source: headroom/cli/wrap.py:1-50

RTK Context Tool Integration

Headroom integrates with RTK (Rewritten Tool Kit) for CLI output compression. Commands are prefixed with rtk to achieve 60–90% savings:

# Files & Search (60-75% savings)
rtk ls <path>           rtk read <file>         rtk grep <pattern>
rtk find <pattern>      rtk diff <file>

# Test (90-99% savings) — shows failures only
rtk pytest tests/       rtk cargo test          rtk test <cmd>

# Build & Lint (80-90% savings) — shows errors only
rtk tsc                 rtk lint                rtk cargo build
rtk prettier --check    rtk mypy                rtk ruff check

The RTK block is injected into agent configuration files with idempotent markers (<!-- headroom:rtk-instructions -->) to prevent duplicate insertions.

Source: headroom/cli/wrap.py:25-75

SDK Integration

Python SDK

from anthropic import Anthropic
from headroom import with_headroom

client = with_headroom(Anthropic())
response = client.messages.create(
    model="claude-sonnet-4-6-20250514",
    max_tokens=1024,
    messages=[{"role": "user", "content": "Hello"}]
)

TypeScript/SDK

import { withHeadroom } from "@headroom/sdk";
import { createOpenAI } from "openai";

const model = withHeadroom(createOpenAI({ apiKey: process.env.OPENAI_API_KEY }));

Other Framework Integrations

FrameworkIntegration Method
OpenAI SDKwithHeadroom(new OpenAI())
Vercel AI SDKwrapLanguageModel({ model, middleware: headroomMiddleware() })
LangChainHeadroomChatModel(your_llm)
AgnoHeadroomAgnoModel(your_model)
StrandsSee Strands guide

Source: README.md:30-40

Memory System

Headroom provides cross-agent memory capabilities for persistent knowledge:

headroom wrap claude --memory  # Enable persistent cross-session memory

The memory system injects guidance into agent configuration:

## Memory

Use the `headroom_memory` MCP server for persistent cross-session knowledge.

**Before** answering questions about prior decisions, conventions, project context,
architecture, user preferences — call `memory_search` first.

**After** making durable decisions, discovering conventions — call `memory_save`.

Memory storage is per-project to prevent context bleeding between projects (fixed in v0.21.34).

Source: headroom/cli/wrap.py:200-220

Pipeline Lifecycle

graph TD
    A[Setup] --> B[Pre-Start]
    B --> C[Post-Start]
    C --> D[Input Received]
    D --> E[Input Cached]
    E --> F[Input Routed]
    F --> G[Input Compressed]
    G --> H[Input Remembered]
    H --> I[Pre-Send]
    I --> J[Post-Send]
    J --> K[Response Received]
    
    L[Transforms] --> F
    M[Extensions] --> D
    N[Hooks] --> G

Provider and tool-specific behavior lives under headroom/providers/ so core orchestration stays focused on lifecycle, sequencing, and policy:

Source: README.md:50-65

Learning from Failures

The headroom learn command analyzes past tool call failures to generate preventive context:

headroom learn                        # Auto-detect agent & model
headroom learn --apply                # Write recommendations to context files
headroom learn --model gpt-4o         # Use specific model for analysis
headroom learn --all                  # Analyze all discovered projects

Plugin architecture supports multiple coding agents with built-in support for Claude Code, Codex, and Gemini CLI.

Source: headroom/cli/learn.py:1-50

Known Limitations

MCP Endpoint Availability

The headroom proxy command does not expose an HTTP MCP endpoint at /mcp. The stdio MCP server works correctly, but the HTTP endpoint returns 404. See issue #460 for details.

CCR Multi-Agent Attribution

In multi-agent setups, CCR proactive expansion may corrupt message attribution when injected into messages containing XML markup (<peer_turn from="AgentX">). See issue #503.

Provider-Agnostic Proxy

The proxy currently intercepts traffic at the Anthropic API level (/v1/messages). Users on AWS Bedrock, OpenAI, or Google Vertex cannot use the proxy due to provider-specific SDKs. See issue #510.

LiteLLM Security Concern

The litellm PyPI package was subject to a supply chain attack in version 1.82.8. See issue #56 for mitigation recommendations.

Source: README.md:1-30

Contributing

git clone https://github.com/chopratejas/headroom.git && cd headroom
pip install -e ".[dev]" && pytest

Devcontainers available in .devcontainer/ (default + memory-stack with Qdrant & Neo4j).

Source: README.md:100-105

Community Resources

ResourceLink
Live Leaderboardheadroomlabs.ai/dashboard — 60B+ tokens saved
Discorddiscord.gg/yRmaUNpsPJ
HuggingFace Modelhuggingface.co/chopratejas/kompress-base

Recent Releases

VersionDateKey Changes
v0.22.4Latestwrap CLI breadth for cline, continue, goose, openhands
v0.22.22026-05-20Memory IDs exposure in auto-tail + memory_list tool
v0.22.02026-05-19--exclude-tools flag + HEADROOM_EXCLUDE_TOOLS env var
v0.21.342026-05-13Per-project memory storage (fixes #462)
v0.21.332026-05-13Narrow compressed type for mypy 1.14 compatibility

Source: README.md:20-45

Next Steps

Source: https://github.com/chopratejas/headroom / Human Manual

Getting Started

Related topics: Introduction, Proxy Deployment

Section Related Pages

Continue reading this section for the full explanation and source context.

Section Standard Installation

Continue reading this section for the full explanation and source context.

Section Development Installation

Continue reading this section for the full explanation and source context.

Section Dev Container

Continue reading this section for the full explanation and source context.

Related topics: Introduction, Proxy Deployment

Getting Started

Headroom is a context compression platform for AI coding assistants that reduces token usage by 40-90% while preserving relevance. It intercepts LLM API traffic through a local proxy, compresses conversation context using ML-based transforms, and restores original content when needed via CCR (Context Compression & Retrieval) markers.

Prerequisites

Before installing Headroom, ensure you have:

RequirementVersion/Details
Python3.10+
API KeyAnthropic, OpenAI, or compatible provider
Supported OSLinux, macOS, Windows
Package Managerpip, uv, or conda

Installation

Standard Installation

Install Headroom with all dependencies:

pip install headroom

Development Installation

For contributing or testing latest features:

git clone https://github.com/chopratejas/headroom.git && cd headroom
pip install -e ".[dev]" && pytest

Source: README.md:1-10

Dev Container

Headroom provides pre-configured devcontainers:

# Default devcontainer (basic Python development)
# .devcontainer/ directory

# Memory-stack devcontainer (with Qdrant & Neo4j)
# .devcontainer/memory-stack/

Quick Start with Claude

The fastest way to use Headroom is with the headroom wrap command, which starts the proxy and configures Claude Code automatically:

headroom wrap claude

This single command:

  1. Starts the Headroom proxy on the default port
  2. Configures Claude Code to route API traffic through the proxy
  3. Sets up the RTK context tool for efficient CLI output
  4. Registers the MCP retrieve tool for CCR decompression

Options

FlagDescription
--memoryEnable persistent cross-session memory
--resume <id>Resume a previous session
--no-context-toolSkip RTK/lean-ctx CLI tool setup
--no-mcpSkip MCP retrieve tool registration
--no-serenaSkip Serena MCP server registration
--code-graphEnable code graph indexing via codebase-memory-mcp
--no-proxyUse existing proxy instead of starting one
--learnEnable live traffic learning (patterns saved to AGENTS.md)
--port <n>Custom proxy port (default: 8080)

Example with memory enabled:

headroom wrap claude --memory

Source: headroom/cli/wrap.py:1-50

Quick Start with Codex

Headroom also supports OpenCode's Codex:

headroom wrap codex

For Codex-specific options:

FlagDescription
--backend anyllmUse any-llm backend
--anyllm-provider <provider>Provider for any-llm: openai, mistral, groq, etc.
--region <region>Cloud region for Bedrock/Vertex

Source: headroom/cli/wrap.py:200-280

SDK Integration

Python SDK

#### Basic Usage

from headroom import Headroom

# Initialize with your API key
h = Headroom(api_key="sk-ant-...")

# Compress a prompt
result = h.compress("Your long prompt here...")
print(result.compressed)      # Compressed text
print(result.original_tokens) # Original token count
print(result.saved_tokens)    # Tokens saved

#### Streaming Responses

from headroom import Headroom

h = Headroom(api_key="sk-ant-...")

# Streaming with automatic compression
for chunk in h.stream("Your prompt"):
    print(chunk, end="", flush=True)

#### Integration with Anthropic SDK

from anthropic import Anthropic
from headroom import with_headroom

# Wrap any SDK client
client = with_headroom(Anthropic())

# All calls are automatically compressed
response = client.messages.create(
    model="claude-sonnet-4-6-20250514",
    max_tokens=1024,
    messages=[{"role": "user", "content": "Your prompt"}]
)

Source: headroom/cli/main.py:1-50

TypeScript SDK

#### Installation

npm install @headroom/sdk
# or
yarn add @headroom/sdk
# or
pnpm add @headroom/sdk

#### Basic Usage

import { Headroom } from "@headroom/sdk";

const headroom = new Headroom({
  apiKey: process.env.ANTHROPIC_API_KEY,
});

const result = await headroom.compress({
  content: "Your long prompt here...",
});

console.log(result.compressed);
console.log(`Saved ${result.savingsPercent.toFixed(0)}% tokens`);

#### Streaming

import { generateText } from "ai";
import { headroomMiddleware } from "@headroom/sdk/middleware";

const result = await generateText({
  model: headroomMiddleware({
    model: yourModel,
    apiKey: process.env.ANTHROPIC_API_KEY,
  }),
  prompt: "Your prompt",
});

#### Shared Context (Multi-Agent)

import { SharedContext } from "@headroom/sdk";

const ctx = new SharedContext({
  projectId: "my-agent-team",
});

// Agent 1: Researcher
await ctx.put("k8s-scaling-research", researchData);

// Agent 2: Writer (reads compressed context)
const compressed = await ctx.get("k8s-scaling-research");
console.log(`Reading compressed context (${compressed?.length ?? 0} chars)`);

// Stats
const stats = ctx.stats();
console.log(`Total saved: ${stats.totalTokensSaved} (${stats.savingsPercent.toFixed(0)}%)`);

Source: sdk/typescript/src/index.ts:1-80

Configuration

Environment Variables

VariableDescriptionDefault
HEADROOM_API_KEYAPI key for LLM providerRequired
HEADROOM_PROXY_PORTProxy port8080
HEADROOM_BACKENDBackend type: anthropic, anyllm, litellm-vertexanthropic
HEADROOM_ANYLLM_PROVIDERProvider for any-llm backend-
HEADROOM_REGIONCloud region for Bedrock/Vertex-
HEADROOM_EXCLUDE_TOOLSComma-separated tool names to exclude-
HEADROOM_CONTEXT_TOOLCLI context tool: rtk, lean-ctxrtk

Compression Profiles

Headroom supports multiple compression strategies:

ProfileDescriptionTypical Savings
balancedDefault profile, good for most use cases50-70%
aggressiveMaximum compression, may lose some detail70-90%
conservativeMinimal compression, preserves more detail30-50%
customUser-defined weights for different transform typesVaries

Compression Transforms

Headroom uses multiple compression transforms:

TransformPurposeSavings
SmartCrusherUniversal JSON compression (arrays, dicts, nested objects)40-60%
CodeCompressorAST-aware compression for Python, JS, Go, Rust, Java, C++60-75%
Kompress-baseHuggingFace model trained on agentic traces40-90%
CacheAlignerStabilizes prefixes for KV cache efficiencyVariable
IntelligentContextScore-based context fitting30-50%
CCRReversible compression with on-demand retrieval50-80%

Source: README.md:50-100

MCP Server Setup

Model Context Protocol (MCP) enables Headroom to retrieve compressed content when needed.

Installation

headroom mcp install

Available MCP Tools

ToolDescription
headroom_retrieveRetrieves original content for CCR markers
headroom_statsShows compression statistics
headroom_memory_searchSearch persistent memory (requires --memory)
headroom_memory_saveSave to persistent memory (requires --memory)

Note on MCP Endpoint

Important: The headroom proxy command does not expose an HTTP MCP endpoint at /mcp. The MCP server uses stdio transport and must be configured in your IDE/editor's MCP settings. See Issue #460 for details.

Source: headroom/cli/main.py:100-150

Memory System

Headroom provides persistent cross-session memory using vector storage:

Enable Memory

headroom wrap claude --memory

Memory Features

  • Per-project storage: Memories are isolated per project directory
  • Auto-dedup: Duplicate memories are automatically filtered
  • Agent provenance: Tracks which agent saved each memory
  • Semantic search: Query past decisions, conventions, and context

Usage in Claude

When memory is enabled, Claude Code automatically:

  1. Searches memory before answering questions about past decisions
  2. Saves important facts discovered during the session
  3. References project context from previous sessions

Source: headroom/cli/wrap.py:300-350

Learning System

Headroom can analyze your coding patterns and optimize compression:

headroom learn --project /path/to/project --apply

Options

FlagDescription
--project <path>Project directory to analyze
--allAnalyze all discovered projects
--applyWrite recommendations to context/memory files
--agent <name>Specific agent to analyze (claude, codex, gemini, auto)
--model <model>LLM model for analysis
--workers <n>Parallel workers (default: auto)

Source: headroom/cli/learn.py:1-60

Examples

The repository includes comprehensive examples:

Python Examples

# Basic usage
export OPENAI_API_KEY='your-key'
python examples/basic_usage.py

# Anthropic integration
export ANTHROPIC_API_KEY='your-key'
python examples/anthropic_example.py

# Streaming
python examples/streaming_example.py

# Evaluation
python examples/smart_vs_naive_eval.py
python examples/real_world_eval.py

TypeScript Examples

# Shared context multi-agent
npx tsx sdk/typescript/examples/shared-context-multi-agent.ts

LangChain Integration

# Compression demo
PYTHONPATH=. python -m examples.langchain_demo.show_compression

# Full comparison
export OPENAI_API_KEY='your-key'
PYTHONPATH=. python -m examples.langchain_demo.run_comparison

Source: examples/README.md:1-80

Next Steps

TopicDescription
CLI ReferenceFull documentation of headroom commands
Proxy ConfigurationAdvanced proxy settings and backends
Memory SystemDeep dive into cross-session memory
SDK ReferenceComplete API documentation
Compression InternalsHow Headroom's transforms work
ContributingDevelopment setup and guidelines

Troubleshooting

Claude not found

Error: 'claude' not found in PATH.
Install Claude Code: https://docs.anthropic.com/en/docs/claude-code

Solution: Install Claude Code or use the SDK directly.

MCP retrieve tool not working

Symptoms: CCR markers appear but content isn't retrieved.

Solutions:

  1. Ensure --no-mcp was not used
  2. Check MCP server is registered in your IDE settings
  3. Verify proxy is running: headroom status

Token savings lower than expected

Possible causes:

  • Short prompts (less data to compress)
  • Already compressed content
  • High relevance content that can't be safely reduced

Solutions:

  • Enable more aggressive compression profiles
  • Use --learn to optimize for your patterns

Community

Source: https://github.com/chopratejas/headroom / Human Manual

Architecture

Related topics: Compression Pipeline, CCR (Reversible Compression)

Section Related Pages

Continue reading this section for the full explanation and source context.

Section Lifecycle Stages

Continue reading this section for the full explanation and source context.

Section Transform Components

Continue reading this section for the full explanation and source context.

Section SmartCrusher Configuration

Continue reading this section for the full explanation and source context.

Related topics: Compression Pipeline, CCR (Reversible Compression)

Architecture

Overview

Headroom is a context compression proxy and SDK designed to reduce token costs when running AI coding agents. The architecture follows a layered design with a Rust-based proxy core, Python SDK, and TypeScript SDK that exposes a unified request lifecycle across all integration paths.

The system intercepts LLM API calls at the proxy layer, applies a pipeline of compression transforms, and routes compressed requests to upstream providers while maintaining the ability to retrieve original content via CCR markers.

Source: crates/headroom-core/src/lib.rs()

High-Level Architecture

graph TD
    subgraph "Client Layer"
        CLI[CLI<br/>headroom wrap]
        SDK_PY[Python SDK]
        SDK_TS[TypeScript SDK]
        MCP[MCP Clients]
    end

    subgraph "Proxy Layer"
        PROXY[Headroom Proxy]
        MIDDLEWARE[ASGI Middleware]
    end

    subgraph "Core Engine"
        PIPELINE[Compression Pipeline]
        TRANSFORMS[Transforms]
        MEMORY[Memory System]
    end

    subgraph "Transforms"
        CC[CacheAligner]
        CR[ContentRouter]
        SC[SmartCrusher]
        CODEC[CodeCompressor]
        KB[Kompress-base]
        IC[IntelligentContext]
    end

    subgraph "Storage"
        QDRANT[Qdrant]
        NEO4J[Neo4j]
        SQLITE[SQLite]
    end

    CLI --> PROXY
    SDK_PY --> PROXY
    SDK_TS --> PROXY
    MCP --> MIDDLEWARE
    
    PROXY --> PIPELINE
    MIDDLEWARE --> PIPELINE
    
    PIPELINE --> TRANSFORMS
    TRANSFORMS --> MEMORY
    
    MEMORY --> QDRANT
    MEMORY --> NEO4J
    MEMORY --> SQLITE

    style PROXY fill:#4a90d9
    style PIPELINE fill:#5ba85b
    style TRANSFORMS fill:#d94a4a

Request Lifecycle

All compression passes through a stable, 11-stage request lifecycle that exposes consistent hooks regardless of integration method:

SetupPre-StartPost-StartInput ReceivedInput CachedInput RoutedInput CompressedInput RememberedPre-SendPost-SendResponse Received

graph LR
    A[Setup] --> B[Pre-Start]
    B --> C[Post-Start]
    C --> D[Input Received]
    D --> E[Input Cached]
    E --> F[Input Routed]
    F --> G[Input Compressed]
    G --> H[Input Remembered]
    H --> I[Pre-Send]
    I --> J[Post-Send]
    J --> K[Response Received]
    
    style A fill:#f0f0f0
    style G fill:#5ba85b
    style K fill:#4a90d9

Lifecycle Stages

StagePurposeExtension Point
SetupInitialize transforms, load configon_pipeline_event()
Pre-StartPrepare upstream connectionon_pipeline_event()
Post-StartConfirm upstream healthon_pipeline_event()
Input ReceivedCapture raw requeston_pipeline_event()
Input CachedCheck KV cache alignmentCacheAligner
Input RoutedRoute to appropriate compression pathContentRouter
Input CompressedApply compression transformsSmartCrusher, CodeCompressor, Kompress-base
Input RememberedStore in cross-agent memoryMemory system
Pre-SendFinalize compressed requeston_pipeline_event()
Post-SendRecord outcome metricsRequestOutcome funnel
Response ReceivedProcess streaming/final responseCompression hooks

Source: headroom/pipeline.py()

Compression Transforms

The transform layer applies specialized compression algorithms. Each transform handles a specific content type.

Transform Components

TransformFunctionReduction
CacheAlignerStabilizes prefixes so Anthropic/OpenAI KV caches hitIndirect
ContentRouterRoutes content to appropriate compression path10-40%
SmartCrusherUniversal JSON compression (arrays, nested objects)60-90%
CodeCompressorAST-aware for Python, JS, Go, Rust, Java, C++60-75%
Kompress-baseHuggingFace model for ML-based token compression40-90%
IntelligentContextScore-based context fitting with learned importanceVariable
RollingWindowFixed-context summarizationVariable

SmartCrusher Configuration

@dataclass
class SmartCrusherConfig:
    enabled: bool = True
    min_items_to_analyze: int = 3
    min_tokens_to_crush: int = 500
    max_items_after_crush: int = 50
    relevance_threshold: float = 0.3
    enable_ccr_marker: bool = True

Source: crates/headroom-py/src/lib.rs()

CodeCompressor

AST-aware compression supports:

  • Python (via ast module)
  • JavaScript, Go, Rust, Java, C++ (via tree-sitter)

Requires optional dependency: pip install headroom-ai[code]

Enabled via --code-aware flag or HEADROOM_CODE_AWARE_ENABLED=1 environment variable.

Source: headroom/cli/proxy.py()

Memory System

Architecture

graph TD
    subgraph "Memory Layer"
        MEM[Memory Manager]
        RANKER[MemoryRanker]
        DECISION[MemoryDecision]
    end
    
    subgraph "Storage Backends"
        QDRANT[Qdrant<br/>Vector Search]
        NEO4J[Neo4j<br/>Graph]
        SQLITE[SQLite<br/>Project-local]
    end
    
    subgraph "Tools"
        SEARCH[memory_search]
        SAVE[memory_save]
        LIST[memory_list]
    end
    
    MEM --> RANKER
    MEM --> DECISION
    MEM --> QDRANT
    MEM --> NEO4J
    MEM --> SQLITE
    
    SEARCH --> MEM
    SAVE --> MEM
    LIST --> MEM

Per-Project Storage

Memory storage is isolated per project to prevent cross-contamination:

Bug Fix: v0.21.34 introduced per-project storage so projects can no longer bleed memories.

Source: Release v0.21.34()

Memory Integration in CLI

The headroom wrap command injects memory guidance into AGENTS.md:

def _inject_memory_agents_md(file_path: Path) -> bool:
    """Inject memory usage guidance into AGENTS.md.

    Idempotent — skips if marker already present.
    """
    memory_block = (
        f"{_MEMORY_AGENTS_MARKER}\n"
        "## Memory\n\n"
        "Use the `headroom_memory` MCP server for persistent cross-session knowledge.\n\n"
        "**Before** answering questions about prior decisions, conventions, project context,\n"
        "architecture, user preferences — call `memory_search` first.\n\n"
        "**After** making durable decisions — call `memory_save` to persist them.\n\n"
    )

Source: headroom/cli/wrap.py()

Proxy Architecture

Request Flow

sequenceDiagram
    participant Client
    participant Proxy as Headroom Proxy
    participant Pipeline
    participant Upstream as LLM Provider
    
    Client->>Proxy: /v1/messages (raw)
    Proxy->>Pipeline: Input Received
    Pipeline->>Pipeline: Input Cached
    Pipeline->>Pipeline: Input Routed
    Pipeline->>Pipeline: Input Compressed
    Pipeline->>Pipeline: Input Remembered
    Pipeline->>Proxy: Compressed request
    Proxy->>Upstream: /v1/messages (compressed)
    Upstream->>Proxy: Response
    Proxy->>Pipeline: Response Received
    Pipeline->>Pipeline: Post-Send (outcome)
    Proxy->>Client: Streaming/Final response

Proxy Configuration

OptionEnv VarDefaultDescription
--portHEADROOM_PORT8787Proxy port
--backendHEADROOM_BACKENDanthropicAPI backend
--memory-falseEnable memory
--code-graph-falseCode graph indexing
--budgetHEADROOM_BUDGETNoneDaily budget limit (USD)
--exclude-toolsHEADROOM_EXCLUDE_TOOLSNoneTools to skip
--code-awareHEADROOM_CODE_AWARE_ENABLEDfalseAST-based compression

Source: headroom/cli/proxy.py()

RequestOutcome Funnel

v0.21.38 introduced the RequestOutcome funnel to collapse streaming finalizers:

Refactor: proxy: introduce RequestOutcome funnel; collapse 3 streaming finalizers

Source: Release v0.21.38()

Integration Architecture

SDK Integration Points

IntegrationMethod
Python appcompress(messages, model=...)
TypeScript appawait compress(messages, { model })
Anthropic/OpenAI SDKwithHeadroom(new Anthropic())
Vercel AI SDKwrapLanguageModel({ model, middleware: headroomMiddleware() })
LiteLLMlitellm.callbacks = [HeadroomCallback()]
LangChainHeadroomChatModel(your_llm)
AgnoHeadroomAgnoModel(your_model)
ASGI appsapp.add_middleware(CompressionMiddleware)

Source: README.md()

CLI Wrapper Architecture

graph TD
    subgraph "headroom wrap <agent>"
        WRAP[wrap.py]
        RTK[RTK Setup]
        MCP[MCP Registration]
        PROXY[Proxy Startup]
    end
    
    subgraph "Agent Types"
        CLAUDE[Claude]
        CODEX[Codex]
        OPENCODE[OpenCode]
        COPILOT[Copilot]
        AIDER[Aider]
        OPENCLAW[OpenClaw]
    end
    
    WRAP --> RTK
    WRAP --> MCP
    WRAP --> PROXY
    
    PROXY --> CLAUDE
    PROXY --> CODEX
    PROXY --> OPENCODE
    PROXY --> COPILOT
    PROXY --> AIDER
    PROXY --> OPENCLAW

Each agent wrapper:

  1. Snapshots pre-wrap config (e.g., ~/.codex/config.toml)
  2. Sets up CLI context tool (RTK or lean-ctx)
  3. Registers MCP server for CCR retrieval
  4. Starts proxy if not already running
  5. Launches the agent

Source: headroom/cli/wrap.py()

Multi-Agent Shared Context

Architecture

graph LR
    subgraph "Agent A"
        CA[Claude]
    end
    
    subgraph "Agent B"  
        CB[Codex]
    end
    
    subgraph "Shared Context"
        SC[SharedContext<br/>.put() / .get()]
    end
    
    CA <--> SC
    CB <--> SC
    
    SC --> COMPRESS[Compression]
    COMPRESS --> TRANSFORM[Transforms]

Usage Example

import { SharedContext } from "@headroom/sdk";

// Create shared context
const ctx = new SharedContext({ 
    projectId: "k8s-scaling-research" 
});

// Agent A: Publish findings
await ctx.put("k8s-scaling-research", {
    role: "assistant",
    content: "Research findings on K8s autoscaling..."
});

// Agent B: Retrieve compressed context
const compressed = await ctx.get("k8s-scaling-research");

Source: sdk/typescript/examples/shared-context-multi-agent.ts()

CCR (Compress-Cache-Retrieve)

CCR provides reversible compression:

  1. Compress: Original content stored, marker inserted
  2. Cache: Markers indexed for retrieval
  3. Retrieve: Agent calls headroom_retrieve tool to expand marker

MCP Registration for Retrieval

# Register headroom MCP server in ~/.codex/config.toml so Codex can
# call headroom_retrieve on compression markers from the proxy.
if not no_mcp:
    from headroom.mcp_registry import CodexRegistrar
    _setup_headroom_mcp(CodexRegistrar(), port, verbose=verbose, force=True)

Source: headroom/cli/wrap.py()

Known Limitations

[BUG #503]: CCR proactive expansion blocks corrupt message attribution in multi-agent threads. The _append_context_to_latest_non_frozen_user_turn() function injects proactive expansion blocks into the latest user message content. In multi-agent setups, that message can contain structured XML attribution markup (<peer_turn from="AgentX">). The injected block ends up corrupting the attribution.

Source: GitHub Issue #503()

Provider Architecture

Provider-specific behavior lives under headroom/providers/ to keep core orchestration focused:

headroom/providers/
├── claude/      # Claude Code integration
├── copilot/     # GitHub Copilot CLI
├── codex/       # OpenAI Codex
└── open/        # OpenAI native clients

This separation ensures:

  • Core pipeline remains provider-agnostic
  • Provider-specific auth and routing handled at edges
  • New providers can be added without modifying core logic

Source: README.md()

Rust Extension

The Rust extension (headroom-core) provides performance-critical transforms:

Exports to Python

use headroom_core::transforms::{
    compress_openai_responses_live_zone,
    detect as rust_detect_chain,
    is_json_array_of_dicts,
    LogCompressor,
    SearchCompressor,
    DiffCompressor,
    DiffCompressorConfig,
};

Build Optimizations

v0.21.37 introduced wheel size optimizations:

Build: shrink Rust extension wheels (strip + thin-LTO + single codegen unit)

Source: Release v0.21.37()

Extension Points

Pipeline Extensions

  • on_pipeline_event(...) — Hook into lifecycle stages
  • Compression hooks — Additional seam alongside canonical lifecycle
  • Proxy extensions — ASGI middleware, routes, startup policy

Plugin System

# headroom learn registers via entry point
# 'headroom.learn_plugin'

Source: headroom/cli/learn.py()

Observability

RTK Metrics

RTK (Rewrite Tool Kit) metrics are wired into the observability stack:

Fix: fix(observability): RTK metrics + Rust observability (Phase H blocker)

Source: Release v0.22.4()

Logging Options

OptionPurpose
--log-filePath to JSONL log file
--log-messagesFull message logging (request/response content)
--codex-wire-debugLocal Codex wire snapshots + proxy.log traces

Source: headroom/cli/proxy.py()

Development Setup

git clone https://github.com/chopratejas/headroom.git && cd headroom
pip install -e ".[dev]" && pytest

Devcontainers available in .devcontainer/:

  • Default
  • memory-stack with Qdrant & Neo4j

Source: README.md()

Source: https://github.com/chopratejas/headroom / Human Manual

Compression Pipeline

Related topics: Architecture, Compression Algorithms

Section Related Pages

Continue reading this section for the full explanation and source context.

Section SmartCrusher

Continue reading this section for the full explanation and source context.

Section ContentRouter

Continue reading this section for the full explanation and source context.

Section CacheAligner

Continue reading this section for the full explanation and source context.

Related topics: Architecture, Compression Algorithms

Compression Pipeline

The Compression Pipeline is Headroom's core orchestration system for reducing token usage in LLM requests. It exposes a single, stable request lifecycle that operates consistently across the Python SDK, TypeScript SDK, CLI, and proxy server. The pipeline sequences multiple transform components to analyze, route, and compress content while preserving critical information through the CCR (Compress-Cache-Retrieve) pattern.

Architecture Overview

The pipeline follows a deterministic lifecycle with defined stages, configurable transforms, and extension points for hooks and plugins. Each request flows through the same stages regardless of entry point (SDK, CLI, or proxy), ensuring predictable behavior and observable outcomes.

graph TD
    subgraph Lifecycle["Request Lifecycle"]
        A[Input Received] --> B[Input Cached]
        B --> C[Input Routed]
        C --> D[Input Compressed]
        D --> E[Input Remembered]
        E --> F[Pre-Send]
        F --> G[Post-Send]
        G --> H[Response Received]
    end
    
    subgraph Transforms["Transform Components"]
        T1[CacheAligner]
        T2[ContentRouter]
        T3[SmartCrusher]
        T4[CodeCompressor]
        T5[Kompress-base]
        T6[IntelligentContext]
        T7[RollingWindow]
    end
    
    C --> T1
    T1 --> T2
    T2 --> T3
    T3 --> T4
    T4 --> T5
    T5 --> T6
    T6 --> T7

Request Lifecycle Stages

The pipeline implements 11 lifecycle stages that execute in order. Each stage is observable and can be extended or intercepted by pipeline extensions.

StagePurposeExtensions Available
SetupInitialize request context and configurationYes
Pre-StartPre-processing before transform executionYes
Post-StartPost-processing after initializationYes
Input ReceivedCapture raw request inputYes
Input CachedCheck and update cache stateYes
Input RoutedRoute content to appropriate transformsYes
Input CompressedApply compression transformsYes
Input RememberedStore relevant context for memoryYes
Pre-SendFinal modifications before LLM callYes
Post-SendProcess response metadataYes
Response ReceivedHandle and log responseYes

Source: README.md

Transform Components

Transforms are the execution units within the pipeline. Each transform specializes in a specific compression strategy.

SmartCrusher

SmartCrusher is the primary content-aware compressor for structured data. It analyzes JSON arrays, tool outputs, and log files using statistical selection to preserve critical items.

Key capabilities:

  • 100% ERROR preservation — Never drops error entries from tool outputs
  • Anomaly detection — Statistical identification of outliers (high CPU, memory spikes)
  • Boundary preservation — Always keeps first and last items in arrays
  • Relevance scoring — Weights items by relevance to the user's query
  • Change point detection — Identifies significant transitions in data

Configuration options:

ParameterDefaultDescription
enabledtrueEnable/disable the transform
min_items_to_analyze10Minimum array size to apply analysis
min_tokens_to_crush500Minimum content size to trigger compression
max_items_after_crush20Target maximum items after compression
relevance_threshold0.3Score threshold for item retention
bias1.0Compression bias (>1 preserves more, <1 compresses more)

Source: headroom/transforms/smart_crusher.py, crates/headroom-py/src/lib.rs

ContentRouter

ContentRouter determines which transforms should be applied to each content block based on content type detection. It routes JSON arrays, code, logs, and text to appropriate specialized compressors.

Routing logic:

  • Detects content type (JSON array, code, log, plain text)
  • Applies scoring weights for each content category
  • Selects optimal compression profile per block

Source: headroom/transforms/content_router.py

CacheAligner

CacheAligner stabilizes request prefixes to maximize KV cache hit rates across Anthropic and OpenAI providers. It analyzes common prefix patterns and aligns new requests to existing cache entries.

Behavior:

  • Computes prefix stability scores
  • Aligns new requests to cached prefixes when beneficial
  • Records cache prefix metrics for observability

Kompress-base

Kompress-base is Headroom's ML-based text compressor using a fine-tuned model. It provides aggressive token reduction (up to 90%) for arbitrary text content.

Usage: Applied after specialized compressors have processed structured data

IntelligentContext / RollingWindow

Two complementary context management strategies:

StrategyDescription
IntelligentContextScore-based context fitting with learned importance weights
RollingWindowMaintains recent turns with configurable window size

CodeCompressor

AST-aware code compression using tree-sitter parsing. Preserves code structure while removing whitespace, comments, and non-essential formatting.

SearchCompressor

Specialized compressor for search results and ranked lists. Applies relevance-based selection and deduplication.

LogCompressor

Format-aware log compression supporting multiple log formats. Detects format automatically and applies appropriate compression strategies.

Source: crates/headroom-py/src/lib.rs, crates/headroom-core/src/transforms/mod.rs

CCR Pattern (Compress-Cache-Retrieve)

CCR provides reversible, lossless compression by storing originals and allowing retrieval on demand.

graph LR
    A[Original Content] --> B[Compress]
    B --> C[Compressed + Hash]
    C --> D[Storage]
    D --> E[Retrieve by Hash]
    E --> F[Original Restored]
    
    style C fill:#90EE90
    style F fill:#90EE90

How it works:

  1. Compress — Content is analyzed and compressed, generating a cache_key (hash)
  2. Cache — Original content stored in the compression store keyed by hash
  3. Retrieve — Agent uses headroom_retrieve tool to access originals when needed

Rust bindings expose CCR functionality:

# Python usage via Rust extension
result = compressor.compress(content, bias=1.0)
# result.inner.cache_key contains the CCR hash

Source: crates/headroom-py/src/lib.rs

Compression Hooks

Hooks provide extension points for customizing compression behavior in the TypeScript SDK.

CompressContext

Context object passed to hook methods:

interface CompressContext {
  model: string;
  userQuery: string;
  turnNumber: number;
  toolCalls: string[];
  provider: string;
}

CompressEvent

Event object received by post-compression hooks:

interface CompressEvent {
  tokensBefore: number;
  tokensAfter: number;
  tokensSaved: number;
  compressionRatio: number;
  transformsApplied: string[];
  ccrHashes: string[];
  model: string;
  userQuery: string;
  provider: string;
}

Hook Methods

MethodTimingCan Modify?Purpose
preCompressBefore compressionYesModify messages before pipeline
computeBiasesDuring routingYesPer-message compression weights
postCompressAfter compressionNoObservability and logging

Example implementation:

class LoggingHooks extends CompressionHooks {
  postCompress(event: CompressEvent) {
    console.log(`Saved ${event.tokensSaved} tokens (${event.compressionRatio})`);
  }
}

Source: sdk/typescript/src/hooks.ts

Compression Results

The CompressResult type returned by compression operations:

FieldTypeDescription
messagesany[]Compressed messages in same format as input
tokensBeforenumberToken count before compression
tokensAfternumberToken count after compression
tokensSavednumberAbsolute tokens saved
compressionRationumberPercentage reduction (0-1)
transformsAppliedstring[]List of transforms that modified content
ccrHashesstring[]CCR cache keys for retrievable content
compressedbooleanWhether compression actually occurred

Source: sdk/typescript/src/types.ts

SDK Integration

TypeScript SDK

import { compress } from "headroom-ai";

// Direct compression
const result = await compress(messages, {
  model: "claude-sonnet-4-20250514",
  hooks: new LoggingHooks()
});

Python SDK

from headroom import compress

result = compress(messages, model="claude-sonnet-4-20250514")

CLI

headroom wrap -- model claude "Analyze this codebase"

Source: sdk/typescript/examples/basic-compress.ts, examples/langchain_demo/README.md

Configuration Profiles

Compression behavior can be tuned via profiles:

ProfileBiasMin KMax KUse Case
balanced1.028General purpose
aggressive0.715Long contexts
conservative1.3312High-fidelity

Configuration interface:

interface CompressionProfile {
  bias?: number;
  minK?: number;
  maxK?: number | null;
}

Source: sdk/typescript/src/types/config.ts

Observability

Pipeline stages emit lifecycle events for monitoring:

MetricDescription
tokens_savedCumulative tokens preserved
compression_ratioReal-time reduction percentage
cache_hit_ratePercentage of requests aligned to cache
transform_timingPer-transform latency breakdown

Known Limitations

CCR in Multi-Agent Threads

Issue #503 — CCR proactive expansion blocks can corrupt message attribution in multi-agent setups. When _append_context_to_latest_non_frozen_user_turn() injects expansion blocks into messages containing XML attribution markup (<peer_turn from="AgentX">), the injected block can interfere with structured attribution.

Workaround: Avoid using CCR retrieval markers in multi-agent threads with peer attribution until the issue is resolved.

Extension Points

The pipeline supports three extension mechanisms:

Extension TypeScopeUse Case
Pipeline ExtensionsLifecycle stagesCustom stage logic
Compression HooksPre/post processingLogging, bias computation
Proxy ExtensionsServer integrationASGI middleware, routes

Provider and tool-specific behavior lives under headroom/providers/ to keep core orchestration focused on lifecycle, sequencing, and policy.

Source: README.md

Source: https://github.com/chopratejas/headroom / Human Manual

Compression Algorithms

Related topics: Compression Pipeline, CCR (Reversible Compression)

Section Related Pages

Continue reading this section for the full explanation and source context.

Section Pipeline Lifecycle

Continue reading this section for the full explanation and source context.

Section Core Features

Continue reading this section for the full explanation and source context.

Section Configuration

Continue reading this section for the full explanation and source context.

Related topics: Compression Pipeline, CCR (Reversible Compression)

Compression Algorithms

Headroom employs a multi-layered compression system that reduces token usage by 60–95% across AI agent workflows. The compression algorithms work together in a configurable pipeline, with each algorithm optimized for specific content types.

Overview

Headroom's compression stack includes six distinct algorithms:

AlgorithmPrimary Use CaseTypical Savings
SmartCrusherTool outputs (JSON arrays, logs)70–90%
CodeCompressorSource code files60–80%
Kompress-baseGeneral text via ML model50–70%
CacheAlignerAPI request prefixes20–40%
IntelligentContextLong conversations40–60%
RollingWindowSimple context trimmingVariable

Source: README.md:smart-crusher

Architecture

graph TD
    A[Input Messages] --> B[CacheAligner]
    B --> C[ContentRouter]
    C --> D{Select Algorithm}
    D -->|Tool Output| E[SmartCrusher]
    D -->|Code| F[CodeCompressor]
    D -->|Text| G[Kompress-base]
    D -->|Long Context| H[IntelligentContext]
    E --> I[CCR Store]
    F --> I
    G --> I
    H --> I
    I --> J[Output to LLM]

Pipeline Lifecycle

The stable request lifecycle that all compression algorithms follow:

SetupPre-StartPost-StartInput ReceivedInput CachedInput RoutedInput CompressedInput RememberedPre-SendPost-SendResponse Received

Transforms execute during the Input Compressed stage, with each algorithm responsible for specific content types.

Source: README.md:pipeline-internals

SmartCrusher

SmartCrusher is Headroom's primary algorithm for compressing structured tool outputs, particularly JSON arrays from command results.

Core Features

  • 100% ERROR preservation — Never drops error items from output
  • Boundary preservation — Always keeps first and last items
  • Anomaly detection — Statistically identifies outliers (CPU spikes, high error rates)
  • Relevance scoring — Prioritizes items matching the user's query
  • Change point detection — Identifies significant transitions in data

Configuration

class SmartCrusherConfig:
    enabled: bool = True
    min_items_to_analyze: int = 10
    min_tokens_to_crush: int = 500
    max_items_after_crush: int | None = None
    relevance_threshold: float = 0.5
    enable_ccr_marker: bool = True
ParameterDefaultDescription
min_items_to_analyze10Minimum items before analysis activates
min_tokens_to_crush500Minimum token count to trigger compression
max_items_after_crushNoneCap on output items (None = unlimited)
relevance_threshold0.5Score threshold for item retention

Source: crates/headroom-py/src/lib.rs:PySmartCrusherConfig

CrushResult

The compression result object exposes:

PropertyTypeDescription
compressedstrThe compressed output
originalstrThe original input
was_modifiedboolWhether compression occurred
strategystrStrategy used ("preserve_all", "crush", etc.)

Source: crates/headroom-py/src/lib.rs:PyCrushResult

CodeCompressor

CodeCompressor uses AST-aware analysis via tree-sitter to compress source code while preserving semantic structure.

Compression Strategy

  1. AST Parsing — Parse code into an abstract syntax tree
  2. Importance Scoring — Rank nodes by relevance to the query
  3. Selective Retention — Keep high-importance nodes, summarize low-importance regions
  4. CCR Markers — Insert reversible markers for compressed sections

Supported Languages

CodeCompressor supports 75+ programming languages through tree-sitter grammars, including Python, JavaScript, TypeScript, Rust, Go, Java, C++, and more.

Source: headroom/transforms/code_compressor.py

Language-Aware Features

  • Preserves function signatures and class definitions
  • Retains docstrings and comments for critical functions
  • Compresses implementation details proportionally to relevance
  • Maintains indentation structure for readability

Kompress-base

Kompress-base is an ML-based compression model trained specifically for text compression in AI agent contexts.

Model Information

PropertyValue
Model Namekompress-base
ProviderHuggingFace
Publisherchopratejas
ArchitectureTransformer-based

Source: README.md:kompress-base-huggingface

Usage

from headroom.transforms.kompress_compressor import KompressCompressor

compressor = KompressCompressor()
result = compressor.compress(
    content="...",
    bias=1.0  # Higher = preserve more
)

The model is automatically used when ContentRouter classifies content as general-purpose text.

CacheAligner

CacheAligner optimizes request prefixes to maximize KV cache hit rates across Anthropic and OpenAI providers.

How It Works

  1. Analyze the prefix structure of incoming requests
  2. Identify stable vs. variable components
  3. Reorder or normalize prefix content for better cache alignment
  4. Track prefix metrics for observability

Configuration

class CacheAlignerConfig:
    enabled: bool = True
    validation_marker: str | None = None
    feedback_enabled: bool = True
    min_items_to_cache: int = 3
    inject_tool: bool = True
    inject_system_instructions: bool = True
    marker_template: str | None = None

Source: sdk/typescript/src/types/config.ts:CacheAlignerConfig

IntelligentContext

IntelligentContext uses score-based context fitting with learned importance weights to determine what content to retain.

Configuration

class IntelligentContextConfig:
    enabled: bool = True
    scoring_weights: ScoringWeights | None = None
    relevance_scorer: RelevanceScorerConfig | None = None
    anchor_config: AnchorConfig | None = None
ComponentPurpose
scoring_weightsTune importance factors (recency, relevance, role)
relevance_scorerConfigure relevance detection
anchor_configPin critical messages to prevent compression

Source: sdk/typescript/src/types/config.ts:IntelligentContextConfig

RollingWindow

RollingWindow provides simple context trimming for straightforward compression needs.

Configuration

class RollingWindowConfig:
    enabled: bool = True
    max_turns: int | None = None
    preserve_system: bool = True
    preserve_last_n: int = 2

Source: sdk/typescript/src/types/config.ts:RollingWindowConfig

Compress-Cache-Retrieve (CCR)

CCR enables reversible compression — originals are stored and can be retrieved by the LLM on demand.

Mechanism

  1. Compress — Algorithm compresses content, produces a hash
  2. Cache — Original stored in the CompressionStore
  3. Retrieve — LLM uses headroom_retrieve tool to access original

Usage Tracking

class CCRStats:
    entries: int
    total_original_tokens: int
    total_compressed_tokens: int
    total_tokens_saved: int
    savings_percent: float

Source: sdk/typescript/src/types/models.ts:CCRStats

Image Compression

Image content is handled separately through the image compressor module.

Features

  • Intelligent downsampling based on content type
  • Format optimization (JPEG for photos, PNG for graphics)
  • Size limits configurable per request

Source: headroom/image/compressor.py

Compression Hooks

The TypeScript SDK exposes hooks for customizing compression behavior:

export class CompressionHooks {
  preCompress(messages: any[], ctx: CompressContext): any[] | Promise<any[]>;
  computeBiases(messages: any[], ctx: CompressContext): Record<number, number>;
  postCompress(event: CompressEvent): void | Promise<void>;
}

CompressContext

interface CompressContext {
  model: string;
  userQuery: string;
  turnNumber: number;
  toolCalls: string[];
  provider: string;
}

CompressEvent

interface CompressEvent {
  tokensBefore: number;
  tokensAfter: number;
  tokensSaved: number;
  compressionRatio: number;
  transformsApplied: string[];
  ccrHashes: string[];
  model: string;
  userQuery: string;
  provider: string;
}

Source: sdk/typescript/src/hooks.ts

Configuration Profiles

Headroom supports compression profiles for different use cases:

class CompressionProfile:
    bias: float = 1.0      # >1 = preserve more, <1 = compress more
    minK: int = 10         # Minimum items to keep
    maxK: int | None = None # Maximum items to keep

Preset Profiles

ProfileBiasUse Case
balanced1.0General purpose
aggressive0.5Maximize compression
conservative2.0Preserve more context

Performance Characteristics

AlgorithmLatencyMemoryBest For
SmartCrusherLowLowTool outputs
CodeCompressorMediumMediumSource files
Kompress-baseHigherHigherFree-form text
CacheAlignerLowLowPrefix optimization

Known Limitations

Multi-Agent Attribution Issue

In multi-agent setups, CCR proactive expansion can corrupt message attribution. When _append_context_to_latest_non_frozen_user_turn() injects expansion blocks into messages containing XML attribution markup (<peer_turn from="AgentX">), the injected block may interfere with attribution parsing.

Workaround: Use explicit CCR retrieval calls instead of relying on proactive expansion in multi-agent threads.

Source: GitHub Issue #503

API Reference

Python SDK

from headroom import compress

result = compress(
    messages,
    model="claude-sonnet-4-20250514",
    profile="balanced"
)

TypeScript SDK

import { compress } from "headroom-ai";

const result = await compress(messages, {
  model: "gpt-4o",
  hooks: new LoggingHooks(),
  tokenBudget: 100000
});

CLI

headroom compress --input messages.json --output compressed.json

See Also

Source: https://github.com/chopratejas/headroom / Human Manual

CCR (Reversible Compression)

Related topics: Compression Algorithms, MCP Integration

Section Related Pages

Continue reading this section for the full explanation and source context.

Section Core Components

Continue reading this section for the full explanation and source context.

Section Data Flow

Continue reading this section for the full explanation and source context.

Section Cache Key Generation

Continue reading this section for the full explanation and source context.

Related topics: Compression Algorithms, MCP Integration

CCR (Reversible Compression)

Overview

CCR (Compress-Cache-Retrieve) is Headroom's reversible compression mechanism that enables lossless context reduction. Unlike traditional compression that permanently discards information, CCR stores compressed content alongside its original form, allowing the LLM to retrieve full details on demand through a specialized tool.

The core value proposition is straightforward: achieve aggressive token reduction while maintaining zero data loss. When the agent needs original content—whether debugging an error, reviewing a code change, or examining a log entry—it calls headroom_retrieve with a cache key to decompress and return the full original content.

Source: README.md

Architecture

CCR consists of three primary phases that form a continuous cycle:

graph LR
    A[Compress] -->|Store originals| B[Cache]
    B -->|Insert placeholder| C[Send to LLM]
    C -->|Agent requests| D[Retrieve]
    D -->|Return originals| C

Core Components

ComponentResponsibilityLocation
InMemoryCcrStoreRuntime storage of compressed originalscrates/headroom-core/src/ccr/mod.rs
CCR BackendPluggable storage implementationscrates/headroom-core/src/ccr/backends/mod.rs
headroom_retrieveMCP tool for on-demand retrievalheadroom/ccr/mcp_server.py
Response HandlerProcesses retrieval responsesheadroom/ccr/response_handler.py

Data Flow

sequenceDiagram
    participant Transform as Compression Transform
    participant Store as CCR Store
    participant Proxy as Headroom Proxy
    participant LLM as LLM
    participant Agent as AI Agent

    Transform->>Store: compress(content)
    Store->>Store: Generate cache_key
    Store->>Store: Store original with key
    Transform->>Proxy: Return compressed + cache_key
    Proxy->>LLM: Send compressed content
    LLM->>Agent: Request via headroom_retrieve
    Agent->>Proxy: retrieve(cache_key)
    Proxy->>Store: Lookup original
    Store->>Proxy: Return original
    Proxy->>Agent: Decompressed content

Compression Phase

During the compression phase, Headroom transforms apply lossy compression strategies (SmartCrusher, DiffCompressor, LogCompressor, etc.) while simultaneously preserving originals in the CCR store.

Cache Key Generation

Each compressed item receives a unique cache key that serves as the retrieval identifier. The key format enables:

  • Fast O(1) lookup in the store
  • Correlation with specific compression transforms
  • Version tracking for cache invalidation

Source: crates/headroom-core/src/ccr/mod.rs

Storage Backend

The default backend is InMemoryCcrStore, which provides:

  • Thread-safe in-memory storage during a session
  • Automatic cleanup on session end
  • Minimal latency for retrieval operations
# Python shim creates InMemoryCcrStore for Rust compression
store = headroom_core::ccr::InMemoryCcrStore::new();
let (result, stats) = self.inner.compress_with_store(&owned, bias, Some(&store));

Source: crates/headroom-py/src/lib.rs

Retrieval Phase

MCP Tool: `headroom_retrieve`

The headroom_retrieve tool is exposed via the Headroom MCP server and allows agents to decompress original content on demand.

# headroom/ccr/mcp_server.py exposes retrieval capabilities
class HeadroomMcpServer:
    def retrieve(self, cache_key: str) -> str:
        """Retrieve original content by cache key."""

#### Tool Parameters

ParameterTypeRequiredDescription
cache_keystringYesThe unique identifier returned during compression
search_querystringNoOptional search within retrieved content

Retrieval Response Handling

When an agent requests content, the response handler processes the lookup and formats the result:

# headroom/ccr/response_handler.py
def handle_retrieve_request(cache_key: str) -> RetrieveResult:
    original = store.get(cache_key)
    return format_response(original)

Source: headroom/ccr/response_handler.py

Configuration

CCR behavior is controlled through the main CCRConfig:

OptionTypeDefaultDescription
enabledbooltrueEnable/disable CCR entirely
store_typestring"in_memory"Storage backend selection
ttl_secondsint3600Cache expiration for stored originals
max_store_sizeint10000Maximum entries before eviction

Source: docs/content/docs/ccr.mdx

Environment Variables

VariableDescription
HEADROOM_CCR_ENABLEDSet to 0 to disable CCR
HEADROOM_CCR_STORE_TYPEOverride storage backend
HEADROOM_CCR_TTLOverride TTL in seconds

Multi-Agent Considerations

When using CCR in multi-agent workflows, content attribution becomes critical. The system supports structured XML markup for tracking content provenance:

<peer_turn from="AgentX">
  <!-- Agent-generated content -->
</peer_turn>

Known Limitation: Attribution Corruption

A known issue exists where CCR proactive expansion can corrupt message attribution in multi-agent threads. When _append_context_to_latest_non_frozen_user_turn() injects expansion blocks into messages containing peer attribution markup, the injected block may interfere with the structured XML.

Issue Reference: #503 - CCR proactive expansion blocks corrupt message attribution in multi-agent threads

This affects multi-agent setups where:

  • Messages contain structured XML attribution markup
  • CCR proactive expansion is enabled
  • Multiple agents contribute to the same thread

Transform Integration

CCR is integrated into the compression pipeline as a sidecar mechanism:

graph TD
    A[Input Content] --> B[Transform Applies]
    B --> C{CCR Enabled?}
    C -->|Yes| D[Store Original]
    C -->|No| E[Skip CCR]
    D --> F[Return Compressed + Key]
    E --> G[Return Compressed Only]

Supported Transforms

TransformCCR SupportTypical Savings
SmartCrusherFull40-70%
DiffCompressorFull60-90%
LogCompressorFull70-85%
SearchCompressorFull50-75%
CacheAlignerMetadata onlyN/A

Source: crates/headroom-core/src/ccr/backends/mod.rs

Observability

CCR operations emit metrics for monitoring:

MetricDescription
ccr_entries_storedTotal originals stored
ccr_retrievalsTotal retrieval requests
ccr_hit_rateRetrieval success rate
ccr_store_sizeCurrent store memory usage
ccr_ttl_evictionsEntries expired by TTL

Stats Object

The compression result includes a stats dictionary with diagnostic information:

result = compressor.compress(content)
print(f"Cache key: {result.cache_key}")  # For retrieval
print(f"Stats: {result.stats}")           # Observability data

Source: crates/headroom-py/src/lib.rs:45-52

Best Practices

  1. Session-Based Usage: CCR store is designed for session-scoped operation. For long-running agents, configure appropriate TTL values to manage memory.
  1. Key Preservation: Cache keys must be preserved in the conversation context for retrieval to work. The LLM must pass the key back to headroom_retrieve.
  1. Error Handling: Implement fallback behavior when retrieval fails—either re-compress with lower settings or request original content through alternative means.
  1. Multi-Agent Attribution: In multi-agent setups, track content provenance explicitly to avoid the attribution corruption issue documented in #503.

Source: https://github.com/chopratejas/headroom / Human Manual

Memory System

Related topics: MCP Integration

Section Related Pages

Continue reading this section for the full explanation and source context.

Section Memory Management

Continue reading this section for the full explanation and source context.

Section Memory Guidance Injection

Continue reading this section for the full explanation and source context.

Section LoCoMo V2 Evaluation

Continue reading this section for the full explanation and source context.

Related topics: MCP Integration

Memory System

The Headroom Memory System provides persistent cross-session knowledge storage and retrieval for AI coding agents. It enables agents to remember important decisions, conventions, project context, architecture details, and user preferences across multiple sessions without requiring manual re-entry.

Overview

The Memory System addresses a fundamental limitation of AI coding assistants: the context window is ephemeral. When a session ends, all learned information is lost. Headroom's memory system solves this by providing:

  • Persistent storage - Memories survive session boundaries
  • Multi-agent awareness - Shared store with agent provenance
  • Automatic retrieval - Relevant memories surfaced when needed
  • Reversible compression - Full fidelity retrieval via CCR

Source: headroom/cli/wrap.py:1-50

Architecture

graph TD
    subgraph "Client Layer"
        CLI[headroom memory CLI]
        MCP[Memory MCP Server]
        Wrap[headroom wrap]
    end
    
    subgraph "Core Memory"
        Bridge[Memory Bridge]
        Core[Memory Core]
    end
    
    subgraph "Backends"
        Local[Local SQLite]
        Mem0[Mem0 Backend]
        QdrantNeo4j[Qdrant + Neo4j]
    end
    
    subgraph "Integrations"
        ClaudeMCP[Claude MCP]
        CodexMCP[Codex MCP]
    end
    
    CLI --> Core
    MCP --> Bridge
    Wrap --> Bridge
    Bridge --> Core
    Core --> Local
    Core --> Mem0
    Core --> QdrantNeo4j
    
    ClaudeMCP -.->|memory_search| MCP
    ClaudeMCP -.->|memory_save| MCP

Memory Scopes

Memories are organized by scope, allowing fine-grained control over persistence and visibility:

ScopeDescriptionUse Case
USERUser-wide memoriesPreferences, coding style, org info
SESSIONSession-specific memoriesCurrent task context
AGENTAgent-specific memoriesAgent identity, capabilities
TURNSingle turn memoriesEphemeral context

Source: headroom/cli/memory.py:20-25

CLI Commands

Memory Management

headroom memory list                     # List all stored memories
headroom memory list --limit 10          # List the 10 most recent memories
headroom memory list --scope USER        # List only USER-level memories
headroom memory list --since 7d          # List memories from the last 7 days
headroom memory show <id>                # Show full details of a memory
headroom memory stats                    # Show memory statistics
headroom memory edit <id> --content ...  # Edit a memory's content
headroom memory delete <id>              # Delete a memory
headroom memory prune --older-than 30d   # Delete memories older than 30 days
headroom memory purge --confirm          # Delete ALL memories
headroom memory export --output file.json  # Export all memories to JSON
headroom memory import file.json         # Import memories from JSON

Source: headroom/cli/memory.py:1-30

Memory Integration in Wrapped Agents

When running headroom wrap with the --memory flag, the system automatically:

  1. Registers the headroom_memory MCP server
  2. Injects memory usage guidance into AGENTS.md
  3. Enables memory_search and memory_save tools

Source: headroom/cli/wrap.py:50-80

Memory Guidance Injection

The system injects guidance into AGENTS.md files to instruct agents on when to use memory:

<!-- headroom:memory-instructions -->
## Memory

Use the `headroom_memory` MCP server for persistent cross-session knowledge.

**Before** answering questions about prior decisions, conventions, project context,
architecture, user preferences, org info, codenames, debugging history, or anything
from past sessions — call `memory_search` first.

**After** making durable decisions, discovering conventions, or learning important
facts — call `memory_save` to persist them for future sessions.

Memory is your first source of truth for anything not visible in the current conversation.

Source: headroom/cli/wrap.py:80-100

Memory Evaluation

Headroom includes comprehensive evaluation suites for memory systems:

LoCoMo V2 Evaluation

Tests the architecture where:

  • LLM decides what to save (memory_save tool)
  • LLM decides when to search (memory_search tool)
  • Graph relationships enable multi-hop reasoning
headroom evals memory-v2 -n 3
headroom evals memory-v2 --answer-model gpt-4o --save-model gpt-4o-mini

Parameters:

ParameterDescriptionDefault
--n-conversationsNumber of conversations to evaluateAll (10)
--categoriesCategories 1-5 (default: 1,2,3,4)1,2,3,4
--include-adversarialInclude category 5 (unanswerable)False
--f1-thresholdF1 score threshold for 'correct'0.5
--answer-modelLLM model for generating answersNone
--llm-judgeUse LLM-as-judge scoringFalse
--judge-modelModel for judgingNone
--parallelParallel evaluation workers1

Source: headroom/cli/evals.py:30-80

Storage Backends

Per-Project Storage

As of v0.21.34, memories use per-project storage, preventing cross-project memory leakage. Each project has isolated memory storage.

Source: headroom/cli/evals.py, Community Release v0.21.34

Local SQLite Backend

Default backend using SQLite for storage with support for:

  • Scope-based filtering
  • Time-based queries
  • Full-text search
  • Import/export

Mem0 Backend

External Mem0 integration for users with existing Mem0 deployments.

Qdrant + Neo4j Backend

Advanced backend providing:

  • Vector search via Qdrant
  • Graph relationships via Neo4j
  • Multi-hop reasoning capabilities

Configuration:

OptionDescriptionDefault
--memory-qdrant-urlFull Qdrant URLNone
--memory-qdrant-hostQdrant hostlocalhost
--memory-qdrant-portQdrant port6333
--memory-neo4j-uriNeo4j URINone
--memory-neo4j-userNeo4j userNone

Source: headroom/cli/proxy.py

MCP Tools

The memory MCP server exposes the following tools:

ToolDescriptionParameters
memory_searchSearch memories by queryquery, limit, scope, session_id
memory_saveSave a new memorycontent, scope, agent_id, session_id
memory_listList memorieslimit, scope, since, search
memory_showShow memory detailsid
memory_editEdit memory contentid, content
memory_deleteDelete a memoryid

Source: headroom/cli/wrap.py:100-150

Multi-Agent Memory

In multi-agent setups, the memory system provides:

  • Shared store - All agents can access common memories
  • Agent provenance - Track which agent saved each memory
  • Auto-dedup - Prevent duplicate memories
  • Cross-agent context - Memory context passed across agent boundaries
graph LR
    subgraph "Agent A"
        A_Save[memory_save]
        A_Search[memory_search]
    end
    
    subgraph "Agent B"
        B_Save[memory_save]
        B_Search[memory_search]
    end
    
    subgraph "Shared Memory"
        Store[(Memory Store)]
    end
    
    A_Save --> Store
    B_Save --> Store
    Store --> A_Search
    Store --> B_Search

Known Issues

CCR Proactive Expansion in Multi-Agent Threads

Issue #503: CCR proactive expansion blocks corrupt message attribution in multi-agent threads.

TL;DR: The _append_context_to_latest_non_frozen_user_turn() function injects proactive expansion blocks into the latest user message content. In multi-agent setups, that message can contain structured XML attribution markup (<peer_turn from="AgentX">). The injected block can corrupt this attribution.

Status: Open, under investigation.

Configuration Options

Proxy Configuration

OptionDescriptionDefault
--memoryEnable memory integrationFalse
--memory-storageStorage backendlocal
--memory-project-rootOverride project root""
--no-memory-toolsDisable memory tool injectionFalse
--no-memory-contextDisable memory context injectionFalse
--memory-top-kMemories to inject as context10

Source: headroom/cli/proxy.py

Usage Examples

Basic Memory Workflow

# After discovering a convention
await mcp_client.call_tool("memory_save", {
    "content": "Use TypeScript strict mode in all new projects",
    "scope": "USER"
})

# In a new session, before answering
results = await mcp_client.call_tool("memory_search", {
    "query": "coding conventions and project standards"
})

Multi-Agent Shared Context

import { createSharedContext } from "@headroom/sdk";

const ctx = createSharedContext({
  agentId: "architect-agent",
  projectId: "k8s-scaling"
});

// Save findings
await ctx.set("research", { provider: "aws", region: "us-east" });

// Another agent reads it
const compressed = await ctx.get("research");

Source: sdk/typescript/examples/shared-context-multi-agent.ts

Performance Considerations

  • Memory ID exposure - As of v0.22.2, memory IDs are exposed in auto-tail and memory_list tool with ID-usage guidance
  • Regex-based prefix extraction removed - v0.21.35 dropped regex-based pref extraction and filters system-reminder noise
  • Query cap removed - v0.22.0 dropped the 500-char query cap for memory search

Further Reading

Source: https://github.com/chopratejas/headroom / Human Manual

MCP Integration

Related topics: CCR (Reversible Compression), Memory System

Section Related Pages

Continue reading this section for the full explanation and source context.

Section Automatic Setup via CLI Wrapper

Continue reading this section for the full explanation and source context.

Section Manual MCP Installation

Continue reading this section for the full explanation and source context.

Section Standalone MCP Server

Continue reading this section for the full explanation and source context.

Related topics: CCR (Reversible Compression), Memory System

MCP Integration

MCP (Model Context Protocol) integration enables Headroom to expose compression, retrieval, and memory tools to AI coding assistants like Claude Code. This integration is foundational to Headroom's CCR (Compress-Cache-Retrieve) pattern, allowing agents to work with compressed content summaries and retrieve original data on demand.

Overview

MCP integration serves three primary purposes in Headroom:

  1. Content Retrieval — Exposes headroom_retrieve as an MCP tool that Claude Code calls to decompress compressed content
  2. Memory Persistence — Provides a persistent cross-session memory MCP server (headroom_memory) for knowledge retention
  3. Subscription Access — Enables CCR functionality for Claude Code subscription users without API key access

The MCP server operates as a stdio-based service, meaning it communicates via standard input/output rather than HTTP. This is distinct from an HTTP endpoint — the /mcp path is not an HTTP route on the proxy server.

Architecture

graph TD
    A[Claude Code] -->|stdio| B[Headroom MCP Server]
    B -->|retrieve| C[Headroom Proxy]
    C -->|compressed content| B
    B -->|original content| A
    
    D[Claude Code] -->|stdio| E[Memory MCP Server]
    E -->|persist/query| F[(SQLite DB)]
    
    G[headroom wrap] -->|configures| A
    G -->|registers| B
    G -->|registers| E

Installation and Setup

Automatic Setup via CLI Wrapper

The recommended approach uses the headroom wrap command, which automatically configures MCP servers:

# For Claude Code
headroom wrap claude

# For Codex
headroom wrap codex

# For Claude Code with persistent memory
headroom wrap claude --memory

# For Codex with persistent memory
headroom wrap codex --memory

The wrap command handles multiple setup steps including proxy startup, CLI context tool configuration, and MCP server registration.

Manual MCP Installation

For manual configuration, use the MCP CLI commands:

# Install MCP server for Claude Code
headroom mcp install

# Verify installation
headroom mcp status

# Uninstall MCP server
headroom mcp uninstall

Source: headroom/cli/mcp.py:60-80

Standalone MCP Server

Start the MCP server independently when the proxy runs separately:

# Start the MCP server (requires proxy running)
headroom mcp serve

# With custom proxy URL
headroom mcp serve --proxy-url http://127.0.0.1:8787

MCP Server Implementation

MCP Command Structure

The CLI provides a command group for MCP operations:

@main.group()
def mcp() -> None:
    """MCP server for Claude Code integration."""

Source: headroom/cli/mcp.py:43-60

Configuration Management

MCP configuration is stored in ~/.claude/mcp.json:

def load_mcp_config() -> dict[str, Any]:
    """Load existing MCP config or return empty structure."""
    if MCP_CONFIG_PATH.exists():
        with open(MCP_CONFIG_PATH) as f:
            return json.load(f)
    return {"mcpServers": {}}

def save_mcp_config(config: dict) -> None:
    """Save MCP config, creating directory if needed."""
    CLAUDE_CONFIG_DIR.mkdir(parents=True, exist_ok=True)
    with open(MCP_CONFIG_PATH, "w") as f:
        json.dump(config, f, indent=2)

Source: headroom/cli/mcp.py:25-40

Headroom Command Generation

The MCP server command is generated dynamically:

def get_headroom_command() -> list[str]:
    """Get the command to run headroom MCP server."""
    return ["headroom", "mcp", "serve"]

Source: headroom/cli/mcp.py:18-23

CCR (Compress-Cache-Retrieve) Workflow

The MCP integration enables the CCR pattern for Claude Code subscription users:

sequenceDiagram
    participant CC as Claude Code
    participant MCP as Headroom MCP Server
    participant Proxy as Headroom Proxy
    
    CC->>Proxy: API request with ANTHROPIC_BASE_URL
    Proxy->>Proxy: Compress large tool outputs
    Proxy-->>CC: Compressed summary with hash markers
    CC->>MCP: headroom_retrieve(hash)
    MCP->>Proxy: Fetch original content
    Proxy-->>MCP: Original data
    MCP-->>CC: Full content restored

How CCR Works

  1. Compression — The proxy compresses large tool outputs (file listings, search results) and replaces them with hash markers
  2. Caching — Original content is stored temporarily with the hash as key
  3. Retrieval — When Claude Code needs full details, it calls headroom_retrieve with the hash
  4. Restoration — The MCP server fetches and returns the original content

Source: headroom/cli/mcp.py:50-75

Memory MCP Server

Headroom includes a dedicated MCP server for persistent cross-session memory:

Registration

The memory MCP server is registered in Claude Code's config.toml:

def _inject_memory_mcp_config(db_path: str, user_id: str) -> None:
    """Register headroom memory as an MCP server in Codex's config.toml."""
    mcp_section = (
        f"\n{_MEMORY_MCP_MARKER}\n"
        f"[mcp_servers.headroom_memory]\n"
        f'command = "{python_bin}"\n'
        f'args = ["-m", "headroom.memory.mcp_server", "--db", "{db_path_toml}", "--user", "{user_id}"]\n'
        f"startup_timeout_sec = 30\n"
        f"tool_timeout_sec = 30\n"
        f"{_MEMORY_MCP_END}\n"
    )

Source: headroom/cli/wrap.py:120-145

Memory Operations

The memory MCP server provides tools for:

  • memory_search — Query persistent knowledge from past sessions
  • memory_save — Store important decisions, conventions, and context
  • memory_list — List stored memories

Memory Usage Guidance

Memory instructions are injected into AGENTS.md:

def _inject_memory_agents_md(file_path: Path) -> bool:
    """Inject memory usage guidance into AGENTS.md."""
    memory_block = (
        f"{_MEMORY_AGENTS_MARKER}\n"
        "## Memory\n\n"
        "Use the `headroom_memory` MCP server for persistent cross-session knowledge.\n\n"
        "**Before** answering questions about prior decisions, conventions...\n"
        "**After** making durable decisions... call `memory_save` to persist them.\n"
    )

Source: headroom/cli/wrap.py:170-200

Codex Integration

For Codex, the MCP server registration differs slightly:

Config File Paths

def _codex_config_paths() -> tuple[Path, Path]:
    """Return ``(config_file, backup_file)`` paths for the Codex TOML config."""
    config_dir = Path.home() / ".codex"
    config_file = config_dir / "config.toml"
    backup_file = config_dir / f"config.toml{_CODEX_CONFIG_BACKUP_SUFFIX}"
    return config_file, backup_file

Source: headroom/cli/wrap.py:95-102

Idempotent Registration

MCP registration is idempotent — existing sections are replaced:

if _MEMORY_MCP_MARKER in content:
    start = content.index(_MEMORY_MCP_MARKER)
    end = content.index(_MEMORY_MCP_END) + len(_MEMORY_MCP_END)
    content = content[:start].rstrip("\n") + mcp_section + content[end:].lstrip("\n")
else:
    content = content.rstrip() + "\n" + mcp_section

Source: headroom/cli/wrap.py:140-155

Configuration Backup

Pre-wrap state is snapshotted to enable clean unwrapping:

def _snapshot_codex_config_if_unwrapped(config_file: Path, backup_file: Path) -> None:
    """Snapshot ~/.codex/config.toml BEFORE any wrap-time mutation."""

Source: headroom/cli/wrap.py:104-120

TypeScript SDK Integration

The TypeScript SDK supports shared context for multi-agent setups:

import { SharedContext } from "@headroomhq/sdk";

// Create shared context
const ctx = new SharedContext();

// Store compressed content
await ctx.put("k8s-scaling-research", compressedContent);

// Retrieve later
const compressed = await ctx.get("k8s-scaling-research");

// Access stats
const stats = ctx.stats();
console.log(`Total saved: ${stats.totalTokensSaved}`);

Source: sdk/typescript/examples/shared-context-multi-agent.ts

Configuration Options

CLI Options

OptionDescriptionDefault
--no-mcpSkip MCP retrieve tool registrationFalse
--no-serenaSkip Serena MCP registrationFalse
--memoryEnable persistent cross-session memoryFalse
--code-graphEnable code graph indexing via codebase-memory-mcpFalse

Source: headroom/cli/wrap.py:40-75

Environment Variables

VariableDescription
ANTHROPIC_BASE_URLRoute Claude Code traffic through proxy (set to http://127.0.0.1:8787)

Known Limitations

Multi-Agent Message Attribution

Issue #503 — CCR proactive expansion blocks may corrupt message attribution in multi-agent threads. The _append_context_to_latest_non_frozen_user_turn() function injects proactive expansion blocks into the latest user message content, which can contain structured XML attribution markup (<peer_turn from="AgentX">). The injected block ends up corrupting this structure.

HTTP Endpoint Misconception

Issue #460 — The MCP server operates via stdio, not HTTP. The proxy does not expose an HTTP endpoint at /mcp. Users should run headroom mcp serve as a standalone process, not expect /mcp on the proxy server.

Examples

Running the MCP Demo

# Configure API key
export OPENAI_API_KEY='your-key'

# Run the MCP demo
PYTHONPATH=. python -m examples.mcp_demo.run_agent_eval

Source: examples/README.md

AWS Bedrock with Strands

# Configure AWS credentials
export AWS_ACCESS_KEY_ID='your-access-key'
export AWS_SECRET_ACCESS_KEY='your-secret-key'
export AWS_DEFAULT_REGION='us-west-2'

# Run the demo
python examples/strands_bedrock_demo.py

Source: examples/README.md

Source: https://github.com/chopratejas/headroom / Human Manual

Proxy Deployment

Related topics: Getting Started, CLI Wrappers

Section Related Pages

Continue reading this section for the full explanation and source context.

Section Request Lifecycle

Continue reading this section for the full explanation and source context.

Section Pipeline Transforms

Continue reading this section for the full explanation and source context.

Section headroom proxy

Continue reading this section for the full explanation and source context.

Related topics: Getting Started, CLI Wrappers

Proxy Deployment

Overview

The Headroom proxy is a central component that intercepts, optimizes, and routes LLM API traffic through Headroom's context compression pipeline. It serves as the contextual optimization layer between AI coding tools (Claude Code, Codex, Goose, Continue, etc.) and upstream LLM providers like Anthropic and OpenAI.

The proxy enables:

  • Token savings through compression transforms (CacheAligner, SmartCrusher, IntelligentContext)
  • Reversible compression via CCR (Context Compression Retrieval) — originals remain retrievable on demand
  • Shared memory across multi-agent workflows
  • Semantic caching for repeated query patterns
  • Cross-agent context passing via SharedContext

Source: README.md

Architecture

graph TD
    subgraph "AI Coding Tools"
        Claude[Claude Code]
        Codex[OpenAI Codex]
        Goose[Goose]
        Continue[Continue Dev]
        OpenHands[OpenHands]
        Custom[Custom SDK / App]
    end

    subgraph "Headroom Proxy"
        Intercept[Request Interception]
        Pipeline[Compression Pipeline]
        Memory[Cross-Agent Memory]
        CCR[CCR Retrieval]
        Cache[Semantic Cache]
    end

    subgraph "LLM Providers"
        Anthropic[Anthropic /v1/messages]
        OpenAI[OpenAI /v1/chat/completions]
        Vertex[Vertex AI]
        Bedrock[AWS Bedrock]
    end

    Claude --> Intercept
    Codex --> Intercept
    Goose --> Intercept
    Continue --> Intercept
    OpenHands --> Intercept
    Custom --> Intercept

    Intercept --> Pipeline
    Pipeline --> Memory
    Pipeline --> CCR
    Pipeline --> Cache

    Pipeline --> Anthropic
    Pipeline --> OpenAI

Request Lifecycle

The proxy exposes one stable request lifecycle across all integration paths:

SetupPre-StartPost-StartInput ReceivedInput CachedInput RoutedInput CompressedInput RememberedPre-SendPost-SendResponse Received

Source: README.md

Pipeline Transforms

TransformPurpose
CacheAlignerStabilizes prefixes so KV caches hit effectively
ContentRouterRoutes content to appropriate compression strategies
SmartCrusherML-based compression routing (~90% reduction)
CodeCompressorSpecialized code content handling
Kompress-baseTrained ML text compression
IntelligentContextScore-based context fitting with learned importance
RollingWindowSliding conversation window management

Source: README.md

CLI Commands

`headroom proxy`

Starts the Headroom proxy server. This is the primary command for deploying the proxy as a standalone service.

headroom proxy [OPTIONS]

#### Core Options

OptionDefaultDescription
--host127.0.0.1Host to bind to
--port, -p8787Proxy port
--modeoptimizeProxy mode: audit, optimize, simulate
--backendanthropicAPI backend: anthropic, anyllm, litellm-vertex
--anyllm-providerNoneProvider for any-llm backend
--regionNoneCloud region for Bedrock/Vertex
--exclude-toolsComma-separated tools to exclude from processing
--no-optimizeDisable optimization (passthrough mode)
--no-cacheDisable semantic caching
--no-rate-limitDisable rate limiting
--no-subscription-trackingDisable Anthropic subscription usage poller
--intercept-tool-resultsEnable tool_result interceptors (opt-in)
--memoryEnable persistent cross-session memory
--learnEnable live traffic learning
--verbose, -vVerbose output

Source: headroom/cli/proxy.py

#### Environment Variables

VariableDescription
HEADROOM_HOSTProxy host binding
HEADROOM_PORTProxy port
HEADROOM_MODEProxy mode
HEADROOM_BACKENDAPI backend selection
HEADROOM_ANYLLM_PROVIDERProvider for any-llm backend
HEADROOM_REGIONCloud region for Bedrock/Vertex
HEADROOM_EXCLUDE_TOOLSTools to exclude
HEADROOM_NO_SUBSCRIPTION_TRACKINGDisable subscription poller
HEADROOM_PROXY_EXTENSIONSEnabled proxy extensions
HEADROOM_CONTEXT_TOOLContext tool selection: rtk or lean-ctx

Source: headroom/cli/proxy.py

`headroom perf`

Analyzes proxy performance from logs.

headroom perf [OPTIONS]
OptionDefaultDescription
--hours168 (7 days)Analyze logs from last N hours
--rawShow raw PERF records instead of formatted report

Source: headroom/cli/perf.py

`headroom wrap <tool>`

Launches AI coding tools with the proxy automatically configured. Available wrappers:

ToolCommand
Claude Codeheadroom wrap claude
GitHub Copilot CLIheadroom wrap copilot
OpenAI Codexheadroom wrap codex
Aiderheadroom wrap aider
Cursorheadroom wrap cursor
Gooseheadroom wrap goose
OpenHandsheadroom wrap openhands
Continueheadroom wrap continue
OpenClawheadroom wrap openclaw

Each wrap command shares common options:

OptionDescription
--port, -pProxy port (default: 8787)
--no-context-tool / --no-rtkSkip CLI context-tool setup
--no-proxySkip proxy startup (use existing)
--no-mcpSkip headroom MCP server registration
--no-serenaSkip Serena MCP server registration
--code-graphEnable code graph indexing via codebase-memory-mcp
--memoryEnable persistent cross-session memory
--learnEnable live traffic learning
--backendAPI backend selection
--anyllm-providerProvider for any-llm backend
--regionCloud region for Bedrock/Vertex
--verbose, -vVerbose output
--prepare-onlyPrepare environment without launching tool

Source: headroom/cli/wrap.py

Configuration

Proxy Modes

ModeDescription
auditLog requests/responses without modification
optimizeFull compression and optimization enabled
simulatePreview compression effects without API calls

TypeScript Configuration Types

export type HeadroomMode = "audit" | "optimize" | "simulate";

export interface CompressionProfile {
  cacheAligner?: CacheAlignerConfig;
  rollingWindow?: RollingWindowConfig;
  scoringWeights?: ScoringWeights;
  intelligentContext?: IntelligentContextConfig;
  smartCrusher?: SmartCrusherConfig;
  cacheOptimizer?: CacheOptimizerConfig;
  ccr?: CCRConfig;
  prefixFreeze?: PrefixFreezeConfig;
}

Source: sdk/typescript/src/types/config.ts

HeadroomConfig Interface

export interface HeadroomConfig {
  mode?: HeadroomMode;
  optimize?: boolean;
  cacheEnabled?: boolean;
  rateLimitEnabled?: boolean;
  profile?: CompressionProfile;
  toolCrusher?: ToolCrusherConfig;
  memory?: MemoryConfig;
  extensions?: string[];
}

Source: sdk/typescript/src/types/config.ts

Integration Patterns

SDK Integration

#### Python

from headroom import compress

result = compress(messages, model="claude-sonnet-4-20250514")

#### TypeScript

import { compress } from 'headroom-ai';

const result = await compress(messages, { model: 'claude-sonnet-4-20250514' });

Source: README.md

SDK Wrapper Integration

from headroom import withHeadroom

# Wrap Anthropic SDK
client = withHeadroom(Anthropic())

# Wrap OpenAI SDK
client = withHeadroom(OpenAI())

Source: README.md

Vercel AI SDK Integration

import { wrapLanguageModel } from 'ai';
import { headroomMiddleware } from 'headroom-ai';

const model = wrapLanguageModel({
  model: yourModel,
  middleware: headroomMiddleware(),
});

Source: README.md

LiteLLM Integration

import litellm
from headroom.integrations.litellm import HeadroomCallback

litellm.callbacks = [HeadroomCallback()]

LangChain Integration

LangChain supports callback-based integration for Headroom compression.

Supported API Routes

The proxy routes traffic to different upstream providers:

RouteUpstream Target
/v1/messagesAnthropic API
/v1/chat/completionsOpenAI API
/v1/responsesOpenAI API (HTTP + WebSocket)
/v1internal:streamGenerateContentCloudCode API

Source: headroom/cli/proxy.py

Installation

Python Package

# Full installation
pip install "headroom-ai[all]"

# Granular extras
pip install "headroom-ai[proxy]"    # Proxy only
pip install "headroom-ai[mcp]"      # MCP support
pip install "headroom-ai[ml]"       # Kompress-base ML
pip install "headroom-ai[agno]"     # Agno framework
pip install "headroom-ai[langchain]" # LangChain integration
pip install "headroom-ai[evals]"     # Evaluation tools

Requires Python 3.10+.

Docker

docker pull ghcr.io/chopratejas/headroom:latest

npm / TypeScript

npm install headroom-ai

Source: README.md

Known Limitations and Issues

MCP Endpoint Unavailability

The proxy does not expose an HTTP MCP endpoint at /mcp. The MCP server functionality requires stdio-based communication, not HTTP routing. Users should use headroom mcp install for MCP integration rather than expecting HTTP endpoint access through the proxy.

Source: GitHub Issue #460

CCR in Multi-Agent Threads

The _append_context_to_latest_non_frozen_user_turn() function injects proactive expansion blocks into the latest user message content. In multi-agent setups where messages contain structured XML attribution markup (<peer_turn from="AgentX">), injected blocks may corrupt message attribution.

Source: GitHub Issue #503

Provider-Agnostic Limitations

The proxy currently intercepts traffic at the Anthropic API level (/v1/messages). Users on AWS Bedrock, OpenAI, or Google Vertex cannot use the proxy because their LLM traffic goes through provider-specific SDKs with different authentication mechanisms (SigV4 for Bedrock, API keys for OpenAI).

Source: GitHub Issue #510

Proxy Extensions

Proxy extensions provide integration points for ASGI middleware, custom routes, and startup policy:

headroom proxy --proxy-extension my-extension --proxy-extension another-extension

Use --proxy-extension '*' to enable all discovered extensions.

Source: headroom/cli/proxy.py

Memory and Learning

Persistent Memory

Enable cross-session memory with the --memory flag:

headroom proxy --memory

Memory storage is per-project to prevent cross-project memory bleeding (fixed in v0.21.34).

Live Traffic Learning

Enable pattern learning from agent failures:

headroom proxy --learn
headroom wrap claude --learn

Patterns are saved to AGENTS.md and used to improve future compression decisions.

Exports Reference

The TypeScript SDK exports the following proxy-related types and functions:

export type {
  HeadroomMode,
  RelevanceTier,
  ContentType,
  BlockKind,
  CompressionProfile,
  HeadroomConfig,
  WasteSignals,
  CachePrefixMetrics,
  TransformDiff,
  RequestMetrics,
  ProxyStats,
} from "./types/config.js";

export type {
  MetricsSummary,
  HealthStatus,
  ProxyStats,
  MemoryUsage,
} from "./types/models.js";

Source: sdk/typescript/src/index.ts

Source: https://github.com/chopratejas/headroom / Human Manual

CLI Wrappers

Related topics: Proxy Deployment

Section Related Pages

Continue reading this section for the full explanation and source context.

Section Component Responsibilities

Continue reading this section for the full explanation and source context.

Section Proxy Configuration

Continue reading this section for the full explanation and source context.

Section Context Tool Options

Continue reading this section for the full explanation and source context.

Related topics: Proxy Deployment

CLI Wrappers

Overview

CLI Wrappers (headroom wrap) are the primary entry point for integrating Headroom's context compression with standalone AI coding assistants. They automate the setup of the Headroom proxy, MCP servers, CLI context tools (RTK or lean-ctx), and memory integration—eliminating manual configuration for supported tools.

The wrapper system acts as a launch orchestrator that:

  1. Starts the Headroom proxy server on a configurable port
  2. Configures the target CLI tool to route API calls through the proxy
  3. Registers MCP servers for compression marker retrieval
  4. Injects context tool instructions into the CLI's configuration files
  5. Optionally enables persistent cross-session memory

Source: headroom/cli/wrap.py:1-100

Supported CLI Tools

ToolCommandSupported Options
Claude Codeheadroom wrap claude--memory, --resume, --model, --code-graph, --no-context-tool, --no-mcp, --no-serena
OpenCodeheadroom wrap codex--port, --backend, --anyllm-provider, --no-context-tool, --no-mcp, --no-serena, --no-proxy
Continueheadroom wrap continue--config, --memory, --no-rtk, --no-proxy, --learn
Gooseheadroom wrap gooseStandard wrap options
OpenHandsheadroom wrap openhandsStandard wrap options
Cursorheadroom wrap cursorStandard wrap options

Source: headroom/cli/wrap.py:150-300

Architecture

graph TD
    A["headroom wrap <tool>"] --> B[Parse CLI Arguments]
    B --> C{prepare_only flag?}
    C -->|Yes| D[Setup Context Tool Only]
    C -->|No| E[Snapshot Pre-Wrap Config]
    E --> F[Setup Context Tool]
    F --> G[Register MCP Servers]
    G --> H[Start Headroom Proxy]
    H --> I[Inject Config Into CLI]
    I --> J[Launch Target CLI Tool]
    J --> K[Monitor & Forward Traffic]
    
    L[Proxy Server] <--> M[Compression Engine]
    M --> N[CacheAligner]
    M --> O[SmartCrusher]
    M --> P[CCR Markers]
    
    K --> L
    P --> Q[MCP Retrieve Tool]
    Q --> R[LLM Retrieval on Demand]

Component Responsibilities

ComponentFileRole
Wrap Command Dispatcherheadroom/cli/wrap.pyParses arguments, routes to provider runtime
Claude Runtimeheadroom/providers/claude/runtime.pyClaude Code specific setup and lifecycle
Codex Runtimeheadroom/providers/codex/runtime.pyOpenCode/Codex specific setup
MCP Registryheadroom/mcp_registry/MCP server registration for all tools
Proxy Managerplugins/openclaw/src/index.tsCross-platform proxy command resolution

Source: headroom/cli/wrap.py:200-280

Common Command Options

Proxy Configuration

OptionEnvironment VariableDescription
--port <n>HEADROOM_PORTProxy listen port (default: 8080)
--backend <backend>HEADROOM_BACKENDAPI backend: anthropic, anyllm, litellm-vertex
--anyllm-provider <provider>HEADROOM_ANYLLM_PROVIDERProvider for any-llm: openai, mistral, groq
--region <region>HEADROOM_REGIONCloud region for Bedrock/Vertex
--no-proxy-Use existing proxy instead of starting new one

Source: headroom/cli/wrap.py:220-260

Context Tool Options

OptionDescription
--no-context-tool / --no-rtkSkip CLI context-tool setup (RTK or lean-ctx)
--learnEnable live traffic learning, patterns saved to AGENTS.md

MCP Integration Options

OptionDescription
--no-mcpSkip headroom MCP server registration
--no-serenaSkip Serena MCP server registration

Memory Options

OptionDescription
--memoryEnable persistent cross-session memory
--resume <id>Resume a specific memory session (Claude-specific)

Source: headroom/cli/wrap.py:260-320

Claude Code Wrapper

The headroom wrap claude command provides deep integration with Anthropic's Claude Code CLI.

# Basic usage
headroom wrap claude

# With persistent memory
headroom wrap claude --memory

# Resume a session
headroom wrap claude --resume <session-id>

# Pass arguments to Claude
headroom wrap claude -- "fix the bug"

# With code graph intelligence
headroom wrap claude --code-graph

# Skip context tool setup
headroom wrap claude --no-context-tool

Claude-Specific Setup Flow

sequenceDiagram
    participant User
    participant CLI as headroom wrap claude
    participant RTK as RTK/lean-ctx
    participant Config as Claude Config
    participant Proxy as Headroom Proxy
    participant MCP as MCP Server
    
    User->>CLI: headroom wrap claude --memory
    CLI->>Config: Snapshot pre-wrap state
    CLI->>RTK: Setup context tool
    RTK->>Config: Inject instructions into CLAUDE.md
    CLI->>MCP: Register headroom MCP server
    CLI->>MCP: Register Serena MCP server
    CLI->>Proxy: Start proxy on port 8080
    CLI->>Config: Set ANTHROPIC_BASE_URL to proxy
    CLI->>User: Launch Claude Code

Source: headroom/providers/claude/runtime.py:1-150

OpenCode/Codex Wrapper

The headroom wrap codex command integrates with OpenCode (formerly OpenCode/Codex).

# Basic usage
headroom wrap codex

# Custom proxy port
headroom wrap codex --port 9999

# Pass prompt to codex
headroom wrap codex -- "fix the bug"

# With specific backend
headroom wrap codex --backend anyllm --anyllm-provider groq

# Skip all tool registration
headroom wrap codex --no-context-tool --no-mcp --no-serena

Codex Configuration Handling

The wrapper snapshots ~/.codex/config.toml before any modifications, ensuring headroom unwrap codex can restore the original state byte-for-byte.

# Snapshot happens BEFORE MCP install
_codex_config_file, _codex_backup_file = _codex_config_paths()
_snapshot_codex_config_if_unwrapped(_codex_config_file, _codex_backup_file)

Source: headroom/cli/wrap.py:60-100

Continue IDE Wrapper

The headroom wrap continue command configures the Continue VS Code/JetBrains extension.

# Basic usage
headroom wrap continue

# With custom config path
headroom wrap continue --config .continue/config.json

# Enable learning
headroom wrap continue --learn

Continue Configuration Injection

The wrapper injects RTK guidance into both top-level and per-model systemMessage fields:

# Non-string systemMessage values are NEVER overwritten
# Only string values get the RTK marker injected
if isinstance(existing_value, str):
    # Append RTK instructions

Source: headroom/cli/wrap.py:400-500

Context Tool Integration

RTK (Default)

The default context tool uses RTK for shell output rewriting.

Command CategoryCommandsTypical Savings
Gitrtk git diff, rtk git log40-60%
Files & Searchrtk ls, rtk read, rtk grep60-75%
Testingrtk pytest, rtk cargo test90-99%
Build & Lintrtk tsc, rtk lint, rtk ruff check80-90%
Infrastructurertk docker ps, rtk kubectl get85%

lean-ctx Alternative

Set HEADROOM_CONTEXT_TOOL=lean-ctx before running wrap commands to use lean-ctx instead of RTK.

Source: headroom/cli/wrap.py:500-600

MCP Server Registration

CLI wrappers automatically register MCP servers that enable on-demand decompression of CCR markers.

graph LR
    A[Compressed Content<br/>with CCR Markers] --> B[headroom_retrieve MCP Tool]
    B --> C[Headroom Proxy]
    C --> D[Decompressed Original]
    D --> E[LLM Processing]

Supported MCP Tools

ToolPurposeRegistration
headroomPrimary compression retrievalAuto-registered in CLI config
serenaAdditional context handlingAuto-registered unless --no-serena

Source: headroom/cli/wrap.py:100-150

Memory Integration

When --memory is enabled, the wrapper:

  1. Syncs Headroom's memory database with the CLI's conversation files
  2. Enables cross-session context persistence
  3. Registers memory-specific MCP tools
# Memory is automatically synced before proxy startup
if memory:
    _memory_sync(proxy_holder, port)

Source: headroom/providers/claude/runtime.py:200-250

Cleanup and Unwrap

CLI wrappers handle graceful cleanup on SIGINT/SIGTERM:

  1. Restore original CLI configuration files
  2. Stop the proxy server
  3. Remove MCP server registrations
cleanup = _make_cleanup(proxy_holder, port)
signal.signal(signal.SIGINT, _ignore_child_sigint)
signal.signal(signal.SIGTERM, cleanup)

Source: headroom/cli/wrap.py:300-350

Known Issues and Limitations

MCP Endpoint Unavailability

The MCP docs currently imply that headroom proxy can be used as an HTTP MCP endpoint at /mcp, but the installed package returns 404 for this endpoint while the stdio MCP server works correctly.

See: Issue #460 - docs: clarify MCP setup when proxy /mcp is unavailable

CCR in Multi-Agent Threads

When using CCR (Context Compression Retrieval) with multi-agent setups, proactive expansion blocks can corrupt message attribution when injected into messages containing XML markup like <peer_turn from="AgentX">.

See: Issue #503 - CCR proactive expansion blocks corrupt message attribution in multi-agent threads

Feature Requests

See Also

Source: https://github.com/chopratejas/headroom / Human Manual

Doramagic Pitfall Log

Source-linked risks stay visible on the manual page so the preview does not read like a recommendation.

high Configuration risk requires verification

May increase setup, validation, or first-run risk for the user.

high Security or permission risk requires verification

May increase setup, validation, or first-run risk for the user.

medium Installation risk requires verification

May increase setup, validation, or first-run risk for the user.

medium Installation risk requires verification

May increase setup, validation, or first-run risk for the user.

Doramagic Pitfall Log

Found 15 structured pitfall item(s), including 2 high/blocking item(s). Top priority: Configuration risk - Configuration risk requires verification.

1. Configuration risk: Configuration risk requires verification

  • Severity: high
  • Finding: Project evidence flags a configuration risk. Review the linked source before relying on this workflow.
  • User impact: May increase setup, validation, or first-run risk for the user.
  • Recommended check: Reproduce the official install and quickstart path in an isolated environment.
  • Evidence: community_evidence:github | cevd_04817419db9f40abb9c953ce30494c44 | https://github.com/chopratejas/headroom/issues/517

2. Security or permission risk: Security or permission risk requires verification

  • Severity: high
  • Finding: Project evidence flags a security or permission risk. Review the linked source before relying on this workflow.
  • User impact: May increase setup, validation, or first-run risk for the user.
  • Recommended check: Reproduce the official install and quickstart path in an isolated environment.
  • Evidence: community_evidence:github | cevd_7be4ca48f77a496cadb0a00d943bb95a | https://github.com/chopratejas/headroom/issues/488

3. Installation risk: Installation risk requires verification

  • Severity: medium
  • Finding: Project evidence flags a installation risk. Review the linked source before relying on this workflow.
  • User impact: May increase setup, validation, or first-run risk for the user.
  • Recommended check: Reproduce the official install and quickstart path in an isolated environment.
  • Evidence: community_evidence:github | cevd_46c97725a0304b659cfaa50b79312fcd | https://github.com/chopratejas/headroom/issues/525

4. Installation risk: Installation risk requires verification

  • Severity: medium
  • Finding: Project evidence flags a installation risk. Review the linked source before relying on this workflow.
  • User impact: May increase setup, validation, or first-run risk for the user.
  • Recommended check: Reproduce the official install and quickstart path in an isolated environment.
  • Evidence: community_evidence:github | cevd_109db57bc201482abc7bf318a0ee4792 | https://github.com/chopratejas/headroom/issues/460

5. Configuration risk: Configuration risk requires verification

  • Severity: medium
  • Finding: Project evidence flags a configuration risk. Review the linked source before relying on this workflow.
  • User impact: May increase setup, validation, or first-run risk for the user.
  • Recommended check: Reproduce the official install and quickstart path in an isolated environment.
  • Evidence: capability.host_targets | github_repo:1129940957 | https://github.com/chopratejas/headroom

6. Capability evidence risk: Capability evidence risk requires verification

  • Severity: medium
  • Finding: README/documentation is current enough for a first validation pass.
  • User impact: May increase setup, validation, or first-run risk for the user.
  • Recommended check: Reproduce the official install and quickstart path in an isolated environment.
  • Evidence: capability.assumptions | github_repo:1129940957 | https://github.com/chopratejas/headroom

7. Maintenance risk: Maintenance risk requires verification

  • Severity: medium
  • Finding: Project evidence flags a maintenance risk. Review the linked source before relying on this workflow.
  • User impact: May increase setup, validation, or first-run risk for the user.
  • Recommended check: Reproduce the official install and quickstart path in an isolated environment.
  • Evidence: evidence.maintainer_signals | github_repo:1129940957 | https://github.com/chopratejas/headroom

8. Security or permission risk: Security or permission risk requires verification

  • Severity: medium
  • Finding: no_demo
  • User impact: May increase setup, validation, or first-run risk for the user.
  • Recommended check: Reproduce the official install and quickstart path in an isolated environment.
  • Evidence: downstream_validation.risk_items | github_repo:1129940957 | https://github.com/chopratejas/headroom

9. Security or permission risk: Security or permission risk requires verification

  • Severity: medium
  • Finding: no_demo
  • User impact: May increase setup, validation, or first-run risk for the user.
  • Recommended check: Reproduce the official install and quickstart path in an isolated environment.
  • Evidence: risks.scoring_risks | github_repo:1129940957 | https://github.com/chopratejas/headroom

10. Security or permission risk: Security or permission risk requires verification

  • Severity: medium
  • Finding: Project evidence flags a security or permission risk. Review the linked source before relying on this workflow.
  • User impact: May increase setup, validation, or first-run risk for the user.
  • Recommended check: Reproduce the official install and quickstart path in an isolated environment.
  • Evidence: community_evidence:github | cevd_a0e9cf430514488eb093dc09b617e6ca | https://github.com/chopratejas/headroom/issues/520

11. Security or permission risk: Security or permission risk requires verification

  • Severity: medium
  • Finding: Project evidence flags a security or permission risk. Review the linked source before relying on this workflow.
  • User impact: May increase setup, validation, or first-run risk for the user.
  • Recommended check: Reproduce the official install and quickstart path in an isolated environment.
  • Evidence: community_evidence:github | cevd_9ccb556fe9cd431cba78c2ee3ebb27ea | https://github.com/chopratejas/headroom/issues/510

12. Security or permission risk: Security or permission risk requires verification

  • Severity: medium
  • Finding: Project evidence flags a security or permission risk. Review the linked source before relying on this workflow.
  • User impact: May increase setup, validation, or first-run risk for the user.
  • Recommended check: Reproduce the official install and quickstart path in an isolated environment.
  • Evidence: community_evidence:github | cevd_5ab74a3c022a4924998aaa72cc334c04 | https://github.com/chopratejas/headroom/issues/503

Source: Doramagic discovery, validation, and Project Pack records

Community Discussion Evidence

These external discussion links are review inputs, not standalone proof that the project is production-ready.

Sources 12

Count of project-level external discussion links exposed on this manual page.

Use Review before install

Open the linked issues or discussions before treating the pack as ready for your environment.

Community Discussion Evidence

Doramagic exposes project-level community discussion separately from official documentation. Review these links before using headroom with real data or production workflows.

  • [[FEATURE] Support Copilot CLI subscription mode (no BYOK/API key)](https://github.com/chopratejas/headroom/issues/488) - github / github_issue
  • [[FEATURE] Hermes agent support](https://github.com/chopratejas/headroom/issues/526) - github / github_issue
  • [[BUG] Installation fails on macOS x86_64 (Intel) — ort-sys has no preb](https://github.com/chopratejas/headroom/issues/525) - github / github_issue
  • Container image logs spurious "PyTorch was not found" warning on every s - github / github_issue
  • [[BUG] Docs website gives 404 for all non-index pages](https://github.com/chopratejas/headroom/issues/517) - github / github_issue
  • [[BUG] headroom init codex creates invalid config.toml by appending keys](https://github.com/chopratejas/headroom/issues/260) - github / github_issue
  • Feature: provider-agnostic proxy mode (Bedrock, OpenAI, Vertex) - github / github_issue
  • [[BUG] CCR proactive expansion blocks corrupt message attribution in mult](https://github.com/chopratejas/headroom/issues/503) - github / github_issue
  • docs: clarify MCP setup when proxy /mcp is unavailable - github / github_issue
  • Release v0.22.4 - github / github_release
  • Release v0.22.2 - github / github_release
  • Release v0.22.1 - github / github_release

Source: Project Pack community evidence and pitfall evidence