Doramagic Project Pack · Human Manual
headroom
Related topics: Getting Started, Architecture
Introduction
Related topics: Getting Started, Architecture
Continue reading this section for the full explanation and source context.
Continue reading this section for the full explanation and source context.
Continue reading this section for the full explanation and source context.
Continue reading this section for the full explanation and source context.
Related Pages
Related topics: Getting Started, Architecture
Introduction
Headroom is a context compression framework designed to reduce token usage and costs when working with large language models (LLMs) in AI-assisted coding workflows. By intelligently compressing conversation history, tool outputs, and context before sending to the LLM, Headroom achieves 60–90% token savings while preserving critical information.
Overview
Headroom intercepts and optimizes AI traffic through multiple integration points:
| Integration Method | Use Case | Configuration |
|---|---|---|
CLI Wrapper (headroom wrap) | Claude Code, Codex, Continue, Goose, OpenHands | headroom wrap claude |
| SDK Integration | Python/TypeScript applications | withHeadroom(new Anthropic()) |
| ASGI Middleware | Web applications | app.add_middleware(CompressionMiddleware) |
| MCP Server | Model Context Protocol clients | headroom mcp install |
| Proxy Server | Any HTTP-based LLM traffic | headroom proxy --port 8080 |
Source: README.md:1-25
Core Architecture
The Headroom pipeline exposes one stable request lifecycle across all integration methods:
Setup → Pre-Start → Post-Start → Input Received → Input Cached → Input Routed → Input Compressed → Input Remembered → Pre-Send → Post-Send → Response Received
Transform Components
| Component | Purpose | Savings |
|---|---|---|
| SmartCrusher | Universal JSON compression (arrays, nested objects, mixed types) | 40–70% |
| CodeCompressor | AST-aware compression for Python, JS, Go, Rust, Java, C++ | 50–80% |
| Kompress-base | HuggingFace model trained on agentic traces | 40–90% |
| CacheAligner | Stabilizes prefixes for Anthropic/OpenAI KV cache hits | Variable |
| IntelligentContext | Score-based context fitting with learned importance | 30–60% |
| CCR (Context Compression Retrieval) | Reversible compression with on-demand retrieval | 40–80% |
Source: README.md:45-60
Extension Seams
- Pipeline extensions — observe or customize lifecycle stages via
on_pipeline_event(...) - Compression hooks — additional extension points alongside the canonical lifecycle
- Proxy extensions — server/app integration seam for ASGI middleware, routes, and startup policy
Source: README.md:55-58
CLI Wrappers
The headroom wrap command provides zero-configuration setup for popular AI coding assistants:
headroom wrap claude # Start everything
headroom wrap claude --memory # With persistent memory
headroom wrap claude --resume <id> # Resume a session
headroom wrap claude --code-graph # With code graph intelligence
headroom wrap claude --no-context-tool # Skip CLI context-tool setup
Supported Agents
| Agent | Command | Key Features |
|---|---|---|
| Claude Code | headroom wrap claude | Memory sync, MCP retrieve, Serena integration |
| Codex | headroom wrap codex | RTK injection, MCP registration, config snapshot |
| Continue | headroom wrap continue | Config.toml modification, systemMessage injection |
| Goose | headroom wrap goose | Independent session handling |
| OpenHands | headroom wrap openhands | Recent support (v0.22.4) |
| OpenCode | Planned | Feature request #74 |
Source: headroom/cli/wrap.py:1-50
RTK Context Tool Integration
Headroom integrates with RTK (Rewritten Tool Kit) for CLI output compression. Commands are prefixed with rtk to achieve 60–90% savings:
# Files & Search (60-75% savings)
rtk ls <path> rtk read <file> rtk grep <pattern>
rtk find <pattern> rtk diff <file>
# Test (90-99% savings) — shows failures only
rtk pytest tests/ rtk cargo test rtk test <cmd>
# Build & Lint (80-90% savings) — shows errors only
rtk tsc rtk lint rtk cargo build
rtk prettier --check rtk mypy rtk ruff check
The RTK block is injected into agent configuration files with idempotent markers (<!-- headroom:rtk-instructions -->) to prevent duplicate insertions.
Source: headroom/cli/wrap.py:25-75
SDK Integration
Python SDK
from anthropic import Anthropic
from headroom import with_headroom
client = with_headroom(Anthropic())
response = client.messages.create(
model="claude-sonnet-4-6-20250514",
max_tokens=1024,
messages=[{"role": "user", "content": "Hello"}]
)
TypeScript/SDK
import { withHeadroom } from "@headroom/sdk";
import { createOpenAI } from "openai";
const model = withHeadroom(createOpenAI({ apiKey: process.env.OPENAI_API_KEY }));
Other Framework Integrations
| Framework | Integration Method |
|---|---|
| OpenAI SDK | withHeadroom(new OpenAI()) |
| Vercel AI SDK | wrapLanguageModel({ model, middleware: headroomMiddleware() }) |
| LangChain | HeadroomChatModel(your_llm) |
| Agno | HeadroomAgnoModel(your_model) |
| Strands | See Strands guide |
Source: README.md:30-40
Memory System
Headroom provides cross-agent memory capabilities for persistent knowledge:
headroom wrap claude --memory # Enable persistent cross-session memory
The memory system injects guidance into agent configuration:
## Memory
Use the `headroom_memory` MCP server for persistent cross-session knowledge.
**Before** answering questions about prior decisions, conventions, project context,
architecture, user preferences — call `memory_search` first.
**After** making durable decisions, discovering conventions — call `memory_save`.
Memory storage is per-project to prevent context bleeding between projects (fixed in v0.21.34).
Source: headroom/cli/wrap.py:200-220
Pipeline Lifecycle
graph TD
A[Setup] --> B[Pre-Start]
B --> C[Post-Start]
C --> D[Input Received]
D --> E[Input Cached]
E --> F[Input Routed]
F --> G[Input Compressed]
G --> H[Input Remembered]
H --> I[Pre-Send]
I --> J[Post-Send]
J --> K[Response Received]
L[Transforms] --> F
M[Extensions] --> D
N[Hooks] --> GProvider and tool-specific behavior lives under headroom/providers/ so core orchestration stays focused on lifecycle, sequencing, and policy:
- CLI/tool slices:
headroom/providers/claude,copilot,codex,openai,gemini - Core transforms: CacheAligner, ContentRouter, SmartCrusher, CodeCompressor
Source: README.md:50-65
Learning from Failures
The headroom learn command analyzes past tool call failures to generate preventive context:
headroom learn # Auto-detect agent & model
headroom learn --apply # Write recommendations to context files
headroom learn --model gpt-4o # Use specific model for analysis
headroom learn --all # Analyze all discovered projects
Plugin architecture supports multiple coding agents with built-in support for Claude Code, Codex, and Gemini CLI.
Source: headroom/cli/learn.py:1-50
Known Limitations
MCP Endpoint Availability
The headroom proxy command does not expose an HTTP MCP endpoint at /mcp. The stdio MCP server works correctly, but the HTTP endpoint returns 404. See issue #460 for details.
CCR Multi-Agent Attribution
In multi-agent setups, CCR proactive expansion may corrupt message attribution when injected into messages containing XML markup (<peer_turn from="AgentX">). See issue #503.
Provider-Agnostic Proxy
The proxy currently intercepts traffic at the Anthropic API level (/v1/messages). Users on AWS Bedrock, OpenAI, or Google Vertex cannot use the proxy due to provider-specific SDKs. See issue #510.
LiteLLM Security Concern
The litellm PyPI package was subject to a supply chain attack in version 1.82.8. See issue #56 for mitigation recommendations.
Source: README.md:1-30
Contributing
git clone https://github.com/chopratejas/headroom.git && cd headroom
pip install -e ".[dev]" && pytest
Devcontainers available in .devcontainer/ (default + memory-stack with Qdrant & Neo4j).
Source: README.md:100-105
Community Resources
| Resource | Link |
|---|---|
| Live Leaderboard | headroomlabs.ai/dashboard — 60B+ tokens saved |
| Discord | discord.gg/yRmaUNpsPJ |
| HuggingFace Model | huggingface.co/chopratejas/kompress-base |
Recent Releases
| Version | Date | Key Changes |
|---|---|---|
| v0.22.4 | Latest | wrap CLI breadth for cline, continue, goose, openhands |
| v0.22.2 | 2026-05-20 | Memory IDs exposure in auto-tail + memory_list tool |
| v0.22.0 | 2026-05-19 | --exclude-tools flag + HEADROOM_EXCLUDE_TOOLS env var |
| v0.21.34 | 2026-05-13 | Per-project memory storage (fixes #462) |
| v0.21.33 | 2026-05-13 | Narrow compressed type for mypy 1.14 compatibility |
Source: README.md:20-45
Next Steps
- Installation Guide — Set up Headroom for your preferred integration method
- Quick Start — Get started with
headroom wrapin under 5 minutes - SDK Reference — Detailed API documentation for Python and TypeScript SDKs
- Proxy Configuration — Advanced proxy setup and configuration options
- MCP Integration — Connect Headroom as a Model Context Protocol server
Source: https://github.com/chopratejas/headroom / Human Manual
Getting Started
Related topics: Introduction, Proxy Deployment
Continue reading this section for the full explanation and source context.
Continue reading this section for the full explanation and source context.
Continue reading this section for the full explanation and source context.
Continue reading this section for the full explanation and source context.
Related Pages
Related topics: Introduction, Proxy Deployment
Getting Started
Headroom is a context compression platform for AI coding assistants that reduces token usage by 40-90% while preserving relevance. It intercepts LLM API traffic through a local proxy, compresses conversation context using ML-based transforms, and restores original content when needed via CCR (Context Compression & Retrieval) markers.
Prerequisites
Before installing Headroom, ensure you have:
| Requirement | Version/Details |
|---|---|
| Python | 3.10+ |
| API Key | Anthropic, OpenAI, or compatible provider |
| Supported OS | Linux, macOS, Windows |
| Package Manager | pip, uv, or conda |
Installation
Standard Installation
Install Headroom with all dependencies:
pip install headroom
Development Installation
For contributing or testing latest features:
git clone https://github.com/chopratejas/headroom.git && cd headroom
pip install -e ".[dev]" && pytest
Source: README.md:1-10
Dev Container
Headroom provides pre-configured devcontainers:
# Default devcontainer (basic Python development)
# .devcontainer/ directory
# Memory-stack devcontainer (with Qdrant & Neo4j)
# .devcontainer/memory-stack/
Quick Start with Claude
The fastest way to use Headroom is with the headroom wrap command, which starts the proxy and configures Claude Code automatically:
headroom wrap claude
This single command:
- Starts the Headroom proxy on the default port
- Configures Claude Code to route API traffic through the proxy
- Sets up the RTK context tool for efficient CLI output
- Registers the MCP retrieve tool for CCR decompression
Options
| Flag | Description |
|---|---|
--memory | Enable persistent cross-session memory |
--resume <id> | Resume a previous session |
--no-context-tool | Skip RTK/lean-ctx CLI tool setup |
--no-mcp | Skip MCP retrieve tool registration |
--no-serena | Skip Serena MCP server registration |
--code-graph | Enable code graph indexing via codebase-memory-mcp |
--no-proxy | Use existing proxy instead of starting one |
--learn | Enable live traffic learning (patterns saved to AGENTS.md) |
--port <n> | Custom proxy port (default: 8080) |
Example with memory enabled:
headroom wrap claude --memory
Source: headroom/cli/wrap.py:1-50
Quick Start with Codex
Headroom also supports OpenCode's Codex:
headroom wrap codex
For Codex-specific options:
| Flag | Description |
|---|---|
--backend anyllm | Use any-llm backend |
--anyllm-provider <provider> | Provider for any-llm: openai, mistral, groq, etc. |
--region <region> | Cloud region for Bedrock/Vertex |
Source: headroom/cli/wrap.py:200-280
SDK Integration
Python SDK
#### Basic Usage
from headroom import Headroom
# Initialize with your API key
h = Headroom(api_key="sk-ant-...")
# Compress a prompt
result = h.compress("Your long prompt here...")
print(result.compressed) # Compressed text
print(result.original_tokens) # Original token count
print(result.saved_tokens) # Tokens saved
#### Streaming Responses
from headroom import Headroom
h = Headroom(api_key="sk-ant-...")
# Streaming with automatic compression
for chunk in h.stream("Your prompt"):
print(chunk, end="", flush=True)
#### Integration with Anthropic SDK
from anthropic import Anthropic
from headroom import with_headroom
# Wrap any SDK client
client = with_headroom(Anthropic())
# All calls are automatically compressed
response = client.messages.create(
model="claude-sonnet-4-6-20250514",
max_tokens=1024,
messages=[{"role": "user", "content": "Your prompt"}]
)
Source: headroom/cli/main.py:1-50
TypeScript SDK
#### Installation
npm install @headroom/sdk
# or
yarn add @headroom/sdk
# or
pnpm add @headroom/sdk
#### Basic Usage
import { Headroom } from "@headroom/sdk";
const headroom = new Headroom({
apiKey: process.env.ANTHROPIC_API_KEY,
});
const result = await headroom.compress({
content: "Your long prompt here...",
});
console.log(result.compressed);
console.log(`Saved ${result.savingsPercent.toFixed(0)}% tokens`);
#### Streaming
import { generateText } from "ai";
import { headroomMiddleware } from "@headroom/sdk/middleware";
const result = await generateText({
model: headroomMiddleware({
model: yourModel,
apiKey: process.env.ANTHROPIC_API_KEY,
}),
prompt: "Your prompt",
});
#### Shared Context (Multi-Agent)
import { SharedContext } from "@headroom/sdk";
const ctx = new SharedContext({
projectId: "my-agent-team",
});
// Agent 1: Researcher
await ctx.put("k8s-scaling-research", researchData);
// Agent 2: Writer (reads compressed context)
const compressed = await ctx.get("k8s-scaling-research");
console.log(`Reading compressed context (${compressed?.length ?? 0} chars)`);
// Stats
const stats = ctx.stats();
console.log(`Total saved: ${stats.totalTokensSaved} (${stats.savingsPercent.toFixed(0)}%)`);
Source: sdk/typescript/src/index.ts:1-80
Configuration
Environment Variables
| Variable | Description | Default |
|---|---|---|
HEADROOM_API_KEY | API key for LLM provider | Required |
HEADROOM_PROXY_PORT | Proxy port | 8080 |
HEADROOM_BACKEND | Backend type: anthropic, anyllm, litellm-vertex | anthropic |
HEADROOM_ANYLLM_PROVIDER | Provider for any-llm backend | - |
HEADROOM_REGION | Cloud region for Bedrock/Vertex | - |
HEADROOM_EXCLUDE_TOOLS | Comma-separated tool names to exclude | - |
HEADROOM_CONTEXT_TOOL | CLI context tool: rtk, lean-ctx | rtk |
Compression Profiles
Headroom supports multiple compression strategies:
| Profile | Description | Typical Savings |
|---|---|---|
balanced | Default profile, good for most use cases | 50-70% |
aggressive | Maximum compression, may lose some detail | 70-90% |
conservative | Minimal compression, preserves more detail | 30-50% |
custom | User-defined weights for different transform types | Varies |
Compression Transforms
Headroom uses multiple compression transforms:
| Transform | Purpose | Savings |
|---|---|---|
| SmartCrusher | Universal JSON compression (arrays, dicts, nested objects) | 40-60% |
| CodeCompressor | AST-aware compression for Python, JS, Go, Rust, Java, C++ | 60-75% |
| Kompress-base | HuggingFace model trained on agentic traces | 40-90% |
| CacheAligner | Stabilizes prefixes for KV cache efficiency | Variable |
| IntelligentContext | Score-based context fitting | 30-50% |
| CCR | Reversible compression with on-demand retrieval | 50-80% |
Source: README.md:50-100
MCP Server Setup
Model Context Protocol (MCP) enables Headroom to retrieve compressed content when needed.
Installation
headroom mcp install
Available MCP Tools
| Tool | Description |
|---|---|
headroom_retrieve | Retrieves original content for CCR markers |
headroom_stats | Shows compression statistics |
headroom_memory_search | Search persistent memory (requires --memory) |
headroom_memory_save | Save to persistent memory (requires --memory) |
Note on MCP Endpoint
Important: Theheadroom proxycommand does not expose an HTTP MCP endpoint at/mcp. The MCP server uses stdio transport and must be configured in your IDE/editor's MCP settings. See Issue #460 for details.
Source: headroom/cli/main.py:100-150
Memory System
Headroom provides persistent cross-session memory using vector storage:
Enable Memory
headroom wrap claude --memory
Memory Features
- Per-project storage: Memories are isolated per project directory
- Auto-dedup: Duplicate memories are automatically filtered
- Agent provenance: Tracks which agent saved each memory
- Semantic search: Query past decisions, conventions, and context
Usage in Claude
When memory is enabled, Claude Code automatically:
- Searches memory before answering questions about past decisions
- Saves important facts discovered during the session
- References project context from previous sessions
Source: headroom/cli/wrap.py:300-350
Learning System
Headroom can analyze your coding patterns and optimize compression:
headroom learn --project /path/to/project --apply
Options
| Flag | Description |
|---|---|
--project <path> | Project directory to analyze |
--all | Analyze all discovered projects |
--apply | Write recommendations to context/memory files |
--agent <name> | Specific agent to analyze (claude, codex, gemini, auto) |
--model <model> | LLM model for analysis |
--workers <n> | Parallel workers (default: auto) |
Source: headroom/cli/learn.py:1-60
Examples
The repository includes comprehensive examples:
Python Examples
# Basic usage
export OPENAI_API_KEY='your-key'
python examples/basic_usage.py
# Anthropic integration
export ANTHROPIC_API_KEY='your-key'
python examples/anthropic_example.py
# Streaming
python examples/streaming_example.py
# Evaluation
python examples/smart_vs_naive_eval.py
python examples/real_world_eval.py
TypeScript Examples
# Shared context multi-agent
npx tsx sdk/typescript/examples/shared-context-multi-agent.ts
LangChain Integration
# Compression demo
PYTHONPATH=. python -m examples.langchain_demo.show_compression
# Full comparison
export OPENAI_API_KEY='your-key'
PYTHONPATH=. python -m examples.langchain_demo.run_comparison
Source: examples/README.md:1-80
Next Steps
| Topic | Description |
|---|---|
| CLI Reference | Full documentation of headroom commands |
| Proxy Configuration | Advanced proxy settings and backends |
| Memory System | Deep dive into cross-session memory |
| SDK Reference | Complete API documentation |
| Compression Internals | How Headroom's transforms work |
| Contributing | Development setup and guidelines |
Troubleshooting
Claude not found
Error: 'claude' not found in PATH.
Install Claude Code: https://docs.anthropic.com/en/docs/claude-code
Solution: Install Claude Code or use the SDK directly.
MCP retrieve tool not working
Symptoms: CCR markers appear but content isn't retrieved.
Solutions:
- Ensure
--no-mcpwas not used - Check MCP server is registered in your IDE settings
- Verify proxy is running:
headroom status
Token savings lower than expected
Possible causes:
- Short prompts (less data to compress)
- Already compressed content
- High relevance content that can't be safely reduced
Solutions:
- Enable more aggressive compression profiles
- Use
--learnto optimize for your patterns
Community
- Discord — Questions, feedback, support
- Live leaderboard — 60B+ tokens saved and counting
- HuggingFace — Kompress-base model
Source: https://github.com/chopratejas/headroom / Human Manual
Architecture
Related topics: Compression Pipeline, CCR (Reversible Compression)
Continue reading this section for the full explanation and source context.
Continue reading this section for the full explanation and source context.
Continue reading this section for the full explanation and source context.
Continue reading this section for the full explanation and source context.
Related Pages
Related topics: Compression Pipeline, CCR (Reversible Compression)
Architecture
Overview
Headroom is a context compression proxy and SDK designed to reduce token costs when running AI coding agents. The architecture follows a layered design with a Rust-based proxy core, Python SDK, and TypeScript SDK that exposes a unified request lifecycle across all integration paths.
The system intercepts LLM API calls at the proxy layer, applies a pipeline of compression transforms, and routes compressed requests to upstream providers while maintaining the ability to retrieve original content via CCR markers.
Source: crates/headroom-core/src/lib.rs()
High-Level Architecture
graph TD
subgraph "Client Layer"
CLI[CLI<br/>headroom wrap]
SDK_PY[Python SDK]
SDK_TS[TypeScript SDK]
MCP[MCP Clients]
end
subgraph "Proxy Layer"
PROXY[Headroom Proxy]
MIDDLEWARE[ASGI Middleware]
end
subgraph "Core Engine"
PIPELINE[Compression Pipeline]
TRANSFORMS[Transforms]
MEMORY[Memory System]
end
subgraph "Transforms"
CC[CacheAligner]
CR[ContentRouter]
SC[SmartCrusher]
CODEC[CodeCompressor]
KB[Kompress-base]
IC[IntelligentContext]
end
subgraph "Storage"
QDRANT[Qdrant]
NEO4J[Neo4j]
SQLITE[SQLite]
end
CLI --> PROXY
SDK_PY --> PROXY
SDK_TS --> PROXY
MCP --> MIDDLEWARE
PROXY --> PIPELINE
MIDDLEWARE --> PIPELINE
PIPELINE --> TRANSFORMS
TRANSFORMS --> MEMORY
MEMORY --> QDRANT
MEMORY --> NEO4J
MEMORY --> SQLITE
style PROXY fill:#4a90d9
style PIPELINE fill:#5ba85b
style TRANSFORMS fill:#d94a4aRequest Lifecycle
All compression passes through a stable, 11-stage request lifecycle that exposes consistent hooks regardless of integration method:
Setup → Pre-Start → Post-Start → Input Received → Input Cached → Input Routed → Input Compressed → Input Remembered → Pre-Send → Post-Send → Response Received
graph LR
A[Setup] --> B[Pre-Start]
B --> C[Post-Start]
C --> D[Input Received]
D --> E[Input Cached]
E --> F[Input Routed]
F --> G[Input Compressed]
G --> H[Input Remembered]
H --> I[Pre-Send]
I --> J[Post-Send]
J --> K[Response Received]
style A fill:#f0f0f0
style G fill:#5ba85b
style K fill:#4a90d9Lifecycle Stages
| Stage | Purpose | Extension Point |
|---|---|---|
| Setup | Initialize transforms, load config | on_pipeline_event() |
| Pre-Start | Prepare upstream connection | on_pipeline_event() |
| Post-Start | Confirm upstream health | on_pipeline_event() |
| Input Received | Capture raw request | on_pipeline_event() |
| Input Cached | Check KV cache alignment | CacheAligner |
| Input Routed | Route to appropriate compression path | ContentRouter |
| Input Compressed | Apply compression transforms | SmartCrusher, CodeCompressor, Kompress-base |
| Input Remembered | Store in cross-agent memory | Memory system |
| Pre-Send | Finalize compressed request | on_pipeline_event() |
| Post-Send | Record outcome metrics | RequestOutcome funnel |
| Response Received | Process streaming/final response | Compression hooks |
Source: headroom/pipeline.py()
Compression Transforms
The transform layer applies specialized compression algorithms. Each transform handles a specific content type.
Transform Components
| Transform | Function | Reduction |
|---|---|---|
| CacheAligner | Stabilizes prefixes so Anthropic/OpenAI KV caches hit | Indirect |
| ContentRouter | Routes content to appropriate compression path | 10-40% |
| SmartCrusher | Universal JSON compression (arrays, nested objects) | 60-90% |
| CodeCompressor | AST-aware for Python, JS, Go, Rust, Java, C++ | 60-75% |
| Kompress-base | HuggingFace model for ML-based token compression | 40-90% |
| IntelligentContext | Score-based context fitting with learned importance | Variable |
| RollingWindow | Fixed-context summarization | Variable |
SmartCrusher Configuration
@dataclass
class SmartCrusherConfig:
enabled: bool = True
min_items_to_analyze: int = 3
min_tokens_to_crush: int = 500
max_items_after_crush: int = 50
relevance_threshold: float = 0.3
enable_ccr_marker: bool = True
Source: crates/headroom-py/src/lib.rs()
CodeCompressor
AST-aware compression supports:
- Python (via
astmodule) - JavaScript, Go, Rust, Java, C++ (via tree-sitter)
Requires optional dependency: pip install headroom-ai[code]
Enabled via --code-aware flag or HEADROOM_CODE_AWARE_ENABLED=1 environment variable.
Source: headroom/cli/proxy.py()
Memory System
Architecture
graph TD
subgraph "Memory Layer"
MEM[Memory Manager]
RANKER[MemoryRanker]
DECISION[MemoryDecision]
end
subgraph "Storage Backends"
QDRANT[Qdrant<br/>Vector Search]
NEO4J[Neo4j<br/>Graph]
SQLITE[SQLite<br/>Project-local]
end
subgraph "Tools"
SEARCH[memory_search]
SAVE[memory_save]
LIST[memory_list]
end
MEM --> RANKER
MEM --> DECISION
MEM --> QDRANT
MEM --> NEO4J
MEM --> SQLITE
SEARCH --> MEM
SAVE --> MEM
LIST --> MEMPer-Project Storage
Memory storage is isolated per project to prevent cross-contamination:
Bug Fix: v0.21.34 introduced per-project storage so projects can no longer bleed memories.
Source: Release v0.21.34()
Memory Integration in CLI
The headroom wrap command injects memory guidance into AGENTS.md:
def _inject_memory_agents_md(file_path: Path) -> bool:
"""Inject memory usage guidance into AGENTS.md.
Idempotent — skips if marker already present.
"""
memory_block = (
f"{_MEMORY_AGENTS_MARKER}\n"
"## Memory\n\n"
"Use the `headroom_memory` MCP server for persistent cross-session knowledge.\n\n"
"**Before** answering questions about prior decisions, conventions, project context,\n"
"architecture, user preferences — call `memory_search` first.\n\n"
"**After** making durable decisions — call `memory_save` to persist them.\n\n"
)
Source: headroom/cli/wrap.py()
Proxy Architecture
Request Flow
sequenceDiagram
participant Client
participant Proxy as Headroom Proxy
participant Pipeline
participant Upstream as LLM Provider
Client->>Proxy: /v1/messages (raw)
Proxy->>Pipeline: Input Received
Pipeline->>Pipeline: Input Cached
Pipeline->>Pipeline: Input Routed
Pipeline->>Pipeline: Input Compressed
Pipeline->>Pipeline: Input Remembered
Pipeline->>Proxy: Compressed request
Proxy->>Upstream: /v1/messages (compressed)
Upstream->>Proxy: Response
Proxy->>Pipeline: Response Received
Pipeline->>Pipeline: Post-Send (outcome)
Proxy->>Client: Streaming/Final responseProxy Configuration
| Option | Env Var | Default | Description |
|---|---|---|---|
--port | HEADROOM_PORT | 8787 | Proxy port |
--backend | HEADROOM_BACKEND | anthropic | API backend |
--memory | - | false | Enable memory |
--code-graph | - | false | Code graph indexing |
--budget | HEADROOM_BUDGET | None | Daily budget limit (USD) |
--exclude-tools | HEADROOM_EXCLUDE_TOOLS | None | Tools to skip |
--code-aware | HEADROOM_CODE_AWARE_ENABLED | false | AST-based compression |
Source: headroom/cli/proxy.py()
RequestOutcome Funnel
v0.21.38 introduced the RequestOutcome funnel to collapse streaming finalizers:
Refactor: proxy: introduce RequestOutcome funnel; collapse 3 streaming finalizers
Source: Release v0.21.38()
Integration Architecture
SDK Integration Points
| Integration | Method |
|---|---|
| Python app | compress(messages, model=...) |
| TypeScript app | await compress(messages, { model }) |
| Anthropic/OpenAI SDK | withHeadroom(new Anthropic()) |
| Vercel AI SDK | wrapLanguageModel({ model, middleware: headroomMiddleware() }) |
| LiteLLM | litellm.callbacks = [HeadroomCallback()] |
| LangChain | HeadroomChatModel(your_llm) |
| Agno | HeadroomAgnoModel(your_model) |
| ASGI apps | app.add_middleware(CompressionMiddleware) |
Source: README.md()
CLI Wrapper Architecture
graph TD
subgraph "headroom wrap <agent>"
WRAP[wrap.py]
RTK[RTK Setup]
MCP[MCP Registration]
PROXY[Proxy Startup]
end
subgraph "Agent Types"
CLAUDE[Claude]
CODEX[Codex]
OPENCODE[OpenCode]
COPILOT[Copilot]
AIDER[Aider]
OPENCLAW[OpenClaw]
end
WRAP --> RTK
WRAP --> MCP
WRAP --> PROXY
PROXY --> CLAUDE
PROXY --> CODEX
PROXY --> OPENCODE
PROXY --> COPILOT
PROXY --> AIDER
PROXY --> OPENCLAWEach agent wrapper:
- Snapshots pre-wrap config (e.g.,
~/.codex/config.toml) - Sets up CLI context tool (RTK or lean-ctx)
- Registers MCP server for CCR retrieval
- Starts proxy if not already running
- Launches the agent
Source: headroom/cli/wrap.py()
Multi-Agent Shared Context
Architecture
graph LR
subgraph "Agent A"
CA[Claude]
end
subgraph "Agent B"
CB[Codex]
end
subgraph "Shared Context"
SC[SharedContext<br/>.put() / .get()]
end
CA <--> SC
CB <--> SC
SC --> COMPRESS[Compression]
COMPRESS --> TRANSFORM[Transforms]Usage Example
import { SharedContext } from "@headroom/sdk";
// Create shared context
const ctx = new SharedContext({
projectId: "k8s-scaling-research"
});
// Agent A: Publish findings
await ctx.put("k8s-scaling-research", {
role: "assistant",
content: "Research findings on K8s autoscaling..."
});
// Agent B: Retrieve compressed context
const compressed = await ctx.get("k8s-scaling-research");
Source: sdk/typescript/examples/shared-context-multi-agent.ts()
CCR (Compress-Cache-Retrieve)
CCR provides reversible compression:
- Compress: Original content stored, marker inserted
- Cache: Markers indexed for retrieval
- Retrieve: Agent calls
headroom_retrievetool to expand marker
MCP Registration for Retrieval
# Register headroom MCP server in ~/.codex/config.toml so Codex can
# call headroom_retrieve on compression markers from the proxy.
if not no_mcp:
from headroom.mcp_registry import CodexRegistrar
_setup_headroom_mcp(CodexRegistrar(), port, verbose=verbose, force=True)
Source: headroom/cli/wrap.py()
Known Limitations
[BUG #503]: CCR proactive expansion blocks corrupt message attribution in multi-agent threads. The_append_context_to_latest_non_frozen_user_turn()function injects proactive expansion blocks into the latest user message content. In multi-agent setups, that message can contain structured XML attribution markup (<peer_turn from="AgentX">). The injected block ends up corrupting the attribution.
Source: GitHub Issue #503()
Provider Architecture
Provider-specific behavior lives under headroom/providers/ to keep core orchestration focused:
headroom/providers/
├── claude/ # Claude Code integration
├── copilot/ # GitHub Copilot CLI
├── codex/ # OpenAI Codex
└── open/ # OpenAI native clients
This separation ensures:
- Core pipeline remains provider-agnostic
- Provider-specific auth and routing handled at edges
- New providers can be added without modifying core logic
Source: README.md()
Rust Extension
The Rust extension (headroom-core) provides performance-critical transforms:
Exports to Python
use headroom_core::transforms::{
compress_openai_responses_live_zone,
detect as rust_detect_chain,
is_json_array_of_dicts,
LogCompressor,
SearchCompressor,
DiffCompressor,
DiffCompressorConfig,
};
Build Optimizations
v0.21.37 introduced wheel size optimizations:
Build: shrink Rust extension wheels (strip + thin-LTO + single codegen unit)
Source: Release v0.21.37()
Extension Points
Pipeline Extensions
on_pipeline_event(...)— Hook into lifecycle stages- Compression hooks — Additional seam alongside canonical lifecycle
- Proxy extensions — ASGI middleware, routes, startup policy
Plugin System
# headroom learn registers via entry point
# 'headroom.learn_plugin'
Source: headroom/cli/learn.py()
Observability
RTK Metrics
RTK (Rewrite Tool Kit) metrics are wired into the observability stack:
Fix: fix(observability): RTK metrics + Rust observability (Phase H blocker)
Source: Release v0.22.4()
Logging Options
| Option | Purpose |
|---|---|
--log-file | Path to JSONL log file |
--log-messages | Full message logging (request/response content) |
--codex-wire-debug | Local Codex wire snapshots + proxy.log traces |
Source: headroom/cli/proxy.py()
Development Setup
git clone https://github.com/chopratejas/headroom.git && cd headroom
pip install -e ".[dev]" && pytest
Devcontainers available in .devcontainer/:
- Default
memory-stackwith Qdrant & Neo4j
Source: README.md()
Related Documentation
- Contributing Guide
- MCP Setup — Note:
/mcpHTTP endpoint returns 404; stdio MCP server works (Issue #460) - Provider-agnostic proxy mode — Planned for Bedrock, OpenAI, Vertex support
Source: https://github.com/chopratejas/headroom / Human Manual
Compression Pipeline
Related topics: Architecture, Compression Algorithms
Continue reading this section for the full explanation and source context.
Continue reading this section for the full explanation and source context.
Continue reading this section for the full explanation and source context.
Continue reading this section for the full explanation and source context.
Related Pages
Related topics: Architecture, Compression Algorithms
Compression Pipeline
The Compression Pipeline is Headroom's core orchestration system for reducing token usage in LLM requests. It exposes a single, stable request lifecycle that operates consistently across the Python SDK, TypeScript SDK, CLI, and proxy server. The pipeline sequences multiple transform components to analyze, route, and compress content while preserving critical information through the CCR (Compress-Cache-Retrieve) pattern.
Architecture Overview
The pipeline follows a deterministic lifecycle with defined stages, configurable transforms, and extension points for hooks and plugins. Each request flows through the same stages regardless of entry point (SDK, CLI, or proxy), ensuring predictable behavior and observable outcomes.
graph TD
subgraph Lifecycle["Request Lifecycle"]
A[Input Received] --> B[Input Cached]
B --> C[Input Routed]
C --> D[Input Compressed]
D --> E[Input Remembered]
E --> F[Pre-Send]
F --> G[Post-Send]
G --> H[Response Received]
end
subgraph Transforms["Transform Components"]
T1[CacheAligner]
T2[ContentRouter]
T3[SmartCrusher]
T4[CodeCompressor]
T5[Kompress-base]
T6[IntelligentContext]
T7[RollingWindow]
end
C --> T1
T1 --> T2
T2 --> T3
T3 --> T4
T4 --> T5
T5 --> T6
T6 --> T7Request Lifecycle Stages
The pipeline implements 11 lifecycle stages that execute in order. Each stage is observable and can be extended or intercepted by pipeline extensions.
| Stage | Purpose | Extensions Available |
|---|---|---|
Setup | Initialize request context and configuration | Yes |
Pre-Start | Pre-processing before transform execution | Yes |
Post-Start | Post-processing after initialization | Yes |
Input Received | Capture raw request input | Yes |
Input Cached | Check and update cache state | Yes |
Input Routed | Route content to appropriate transforms | Yes |
Input Compressed | Apply compression transforms | Yes |
Input Remembered | Store relevant context for memory | Yes |
Pre-Send | Final modifications before LLM call | Yes |
Post-Send | Process response metadata | Yes |
Response Received | Handle and log response | Yes |
Source: README.md
Transform Components
Transforms are the execution units within the pipeline. Each transform specializes in a specific compression strategy.
SmartCrusher
SmartCrusher is the primary content-aware compressor for structured data. It analyzes JSON arrays, tool outputs, and log files using statistical selection to preserve critical items.
Key capabilities:
- 100% ERROR preservation — Never drops error entries from tool outputs
- Anomaly detection — Statistical identification of outliers (high CPU, memory spikes)
- Boundary preservation — Always keeps first and last items in arrays
- Relevance scoring — Weights items by relevance to the user's query
- Change point detection — Identifies significant transitions in data
Configuration options:
| Parameter | Default | Description |
|---|---|---|
enabled | true | Enable/disable the transform |
min_items_to_analyze | 10 | Minimum array size to apply analysis |
min_tokens_to_crush | 500 | Minimum content size to trigger compression |
max_items_after_crush | 20 | Target maximum items after compression |
relevance_threshold | 0.3 | Score threshold for item retention |
bias | 1.0 | Compression bias (>1 preserves more, <1 compresses more) |
Source: headroom/transforms/smart_crusher.py, crates/headroom-py/src/lib.rs
ContentRouter
ContentRouter determines which transforms should be applied to each content block based on content type detection. It routes JSON arrays, code, logs, and text to appropriate specialized compressors.
Routing logic:
- Detects content type (JSON array, code, log, plain text)
- Applies scoring weights for each content category
- Selects optimal compression profile per block
Source: headroom/transforms/content_router.py
CacheAligner
CacheAligner stabilizes request prefixes to maximize KV cache hit rates across Anthropic and OpenAI providers. It analyzes common prefix patterns and aligns new requests to existing cache entries.
Behavior:
- Computes prefix stability scores
- Aligns new requests to cached prefixes when beneficial
- Records cache prefix metrics for observability
Kompress-base
Kompress-base is Headroom's ML-based text compressor using a fine-tuned model. It provides aggressive token reduction (up to 90%) for arbitrary text content.
Usage: Applied after specialized compressors have processed structured data
IntelligentContext / RollingWindow
Two complementary context management strategies:
| Strategy | Description |
|---|---|
IntelligentContext | Score-based context fitting with learned importance weights |
RollingWindow | Maintains recent turns with configurable window size |
CodeCompressor
AST-aware code compression using tree-sitter parsing. Preserves code structure while removing whitespace, comments, and non-essential formatting.
SearchCompressor
Specialized compressor for search results and ranked lists. Applies relevance-based selection and deduplication.
LogCompressor
Format-aware log compression supporting multiple log formats. Detects format automatically and applies appropriate compression strategies.
Source: crates/headroom-py/src/lib.rs, crates/headroom-core/src/transforms/mod.rs
CCR Pattern (Compress-Cache-Retrieve)
CCR provides reversible, lossless compression by storing originals and allowing retrieval on demand.
graph LR
A[Original Content] --> B[Compress]
B --> C[Compressed + Hash]
C --> D[Storage]
D --> E[Retrieve by Hash]
E --> F[Original Restored]
style C fill:#90EE90
style F fill:#90EE90How it works:
- Compress — Content is analyzed and compressed, generating a
cache_key(hash) - Cache — Original content stored in the compression store keyed by hash
- Retrieve — Agent uses
headroom_retrievetool to access originals when needed
Rust bindings expose CCR functionality:
# Python usage via Rust extension
result = compressor.compress(content, bias=1.0)
# result.inner.cache_key contains the CCR hash
Source: crates/headroom-py/src/lib.rs
Compression Hooks
Hooks provide extension points for customizing compression behavior in the TypeScript SDK.
CompressContext
Context object passed to hook methods:
interface CompressContext {
model: string;
userQuery: string;
turnNumber: number;
toolCalls: string[];
provider: string;
}
CompressEvent
Event object received by post-compression hooks:
interface CompressEvent {
tokensBefore: number;
tokensAfter: number;
tokensSaved: number;
compressionRatio: number;
transformsApplied: string[];
ccrHashes: string[];
model: string;
userQuery: string;
provider: string;
}
Hook Methods
| Method | Timing | Can Modify? | Purpose |
|---|---|---|---|
preCompress | Before compression | Yes | Modify messages before pipeline |
computeBiases | During routing | Yes | Per-message compression weights |
postCompress | After compression | No | Observability and logging |
Example implementation:
class LoggingHooks extends CompressionHooks {
postCompress(event: CompressEvent) {
console.log(`Saved ${event.tokensSaved} tokens (${event.compressionRatio})`);
}
}
Source: sdk/typescript/src/hooks.ts
Compression Results
The CompressResult type returned by compression operations:
| Field | Type | Description |
|---|---|---|
messages | any[] | Compressed messages in same format as input |
tokensBefore | number | Token count before compression |
tokensAfter | number | Token count after compression |
tokensSaved | number | Absolute tokens saved |
compressionRatio | number | Percentage reduction (0-1) |
transformsApplied | string[] | List of transforms that modified content |
ccrHashes | string[] | CCR cache keys for retrievable content |
compressed | boolean | Whether compression actually occurred |
Source: sdk/typescript/src/types.ts
SDK Integration
TypeScript SDK
import { compress } from "headroom-ai";
// Direct compression
const result = await compress(messages, {
model: "claude-sonnet-4-20250514",
hooks: new LoggingHooks()
});
Python SDK
from headroom import compress
result = compress(messages, model="claude-sonnet-4-20250514")
CLI
headroom wrap -- model claude "Analyze this codebase"
Source: sdk/typescript/examples/basic-compress.ts, examples/langchain_demo/README.md
Configuration Profiles
Compression behavior can be tuned via profiles:
| Profile | Bias | Min K | Max K | Use Case |
|---|---|---|---|---|
balanced | 1.0 | 2 | 8 | General purpose |
aggressive | 0.7 | 1 | 5 | Long contexts |
conservative | 1.3 | 3 | 12 | High-fidelity |
Configuration interface:
interface CompressionProfile {
bias?: number;
minK?: number;
maxK?: number | null;
}
Source: sdk/typescript/src/types/config.ts
Observability
Pipeline stages emit lifecycle events for monitoring:
| Metric | Description |
|---|---|
tokens_saved | Cumulative tokens preserved |
compression_ratio | Real-time reduction percentage |
cache_hit_rate | Percentage of requests aligned to cache |
transform_timing | Per-transform latency breakdown |
Known Limitations
CCR in Multi-Agent Threads
Issue #503 — CCR proactive expansion blocks can corrupt message attribution in multi-agent setups. When _append_context_to_latest_non_frozen_user_turn() injects expansion blocks into messages containing XML attribution markup (<peer_turn from="AgentX">), the injected block can interfere with structured attribution.
Workaround: Avoid using CCR retrieval markers in multi-agent threads with peer attribution until the issue is resolved.
Extension Points
The pipeline supports three extension mechanisms:
| Extension Type | Scope | Use Case |
|---|---|---|
| Pipeline Extensions | Lifecycle stages | Custom stage logic |
| Compression Hooks | Pre/post processing | Logging, bias computation |
| Proxy Extensions | Server integration | ASGI middleware, routes |
Provider and tool-specific behavior lives under headroom/providers/ to keep core orchestration focused on lifecycle, sequencing, and policy.
Source: README.md
Source: https://github.com/chopratejas/headroom / Human Manual
Compression Algorithms
Related topics: Compression Pipeline, CCR (Reversible Compression)
Continue reading this section for the full explanation and source context.
Continue reading this section for the full explanation and source context.
Continue reading this section for the full explanation and source context.
Continue reading this section for the full explanation and source context.
Related Pages
Related topics: Compression Pipeline, CCR (Reversible Compression)
Compression Algorithms
Headroom employs a multi-layered compression system that reduces token usage by 60–95% across AI agent workflows. The compression algorithms work together in a configurable pipeline, with each algorithm optimized for specific content types.
Overview
Headroom's compression stack includes six distinct algorithms:
| Algorithm | Primary Use Case | Typical Savings |
|---|---|---|
| SmartCrusher | Tool outputs (JSON arrays, logs) | 70–90% |
| CodeCompressor | Source code files | 60–80% |
| Kompress-base | General text via ML model | 50–70% |
| CacheAligner | API request prefixes | 20–40% |
| IntelligentContext | Long conversations | 40–60% |
| RollingWindow | Simple context trimming | Variable |
Source: README.md:smart-crusher
Architecture
graph TD
A[Input Messages] --> B[CacheAligner]
B --> C[ContentRouter]
C --> D{Select Algorithm}
D -->|Tool Output| E[SmartCrusher]
D -->|Code| F[CodeCompressor]
D -->|Text| G[Kompress-base]
D -->|Long Context| H[IntelligentContext]
E --> I[CCR Store]
F --> I
G --> I
H --> I
I --> J[Output to LLM]Pipeline Lifecycle
The stable request lifecycle that all compression algorithms follow:
Setup → Pre-Start → Post-Start → Input Received → Input Cached → Input Routed → Input Compressed → Input Remembered → Pre-Send → Post-Send → Response Received
Transforms execute during the Input Compressed stage, with each algorithm responsible for specific content types.
Source: README.md:pipeline-internals
SmartCrusher
SmartCrusher is Headroom's primary algorithm for compressing structured tool outputs, particularly JSON arrays from command results.
Core Features
- 100% ERROR preservation — Never drops error items from output
- Boundary preservation — Always keeps first and last items
- Anomaly detection — Statistically identifies outliers (CPU spikes, high error rates)
- Relevance scoring — Prioritizes items matching the user's query
- Change point detection — Identifies significant transitions in data
Configuration
class SmartCrusherConfig:
enabled: bool = True
min_items_to_analyze: int = 10
min_tokens_to_crush: int = 500
max_items_after_crush: int | None = None
relevance_threshold: float = 0.5
enable_ccr_marker: bool = True
| Parameter | Default | Description |
|---|---|---|
min_items_to_analyze | 10 | Minimum items before analysis activates |
min_tokens_to_crush | 500 | Minimum token count to trigger compression |
max_items_after_crush | None | Cap on output items (None = unlimited) |
relevance_threshold | 0.5 | Score threshold for item retention |
Source: crates/headroom-py/src/lib.rs:PySmartCrusherConfig
CrushResult
The compression result object exposes:
| Property | Type | Description |
|---|---|---|
compressed | str | The compressed output |
original | str | The original input |
was_modified | bool | Whether compression occurred |
strategy | str | Strategy used ("preserve_all", "crush", etc.) |
Source: crates/headroom-py/src/lib.rs:PyCrushResult
CodeCompressor
CodeCompressor uses AST-aware analysis via tree-sitter to compress source code while preserving semantic structure.
Compression Strategy
- AST Parsing — Parse code into an abstract syntax tree
- Importance Scoring — Rank nodes by relevance to the query
- Selective Retention — Keep high-importance nodes, summarize low-importance regions
- CCR Markers — Insert reversible markers for compressed sections
Supported Languages
CodeCompressor supports 75+ programming languages through tree-sitter grammars, including Python, JavaScript, TypeScript, Rust, Go, Java, C++, and more.
Source: headroom/transforms/code_compressor.py
Language-Aware Features
- Preserves function signatures and class definitions
- Retains docstrings and comments for critical functions
- Compresses implementation details proportionally to relevance
- Maintains indentation structure for readability
Kompress-base
Kompress-base is an ML-based compression model trained specifically for text compression in AI agent contexts.
Model Information
| Property | Value |
|---|---|
| Model Name | kompress-base |
| Provider | HuggingFace |
| Publisher | chopratejas |
| Architecture | Transformer-based |
Source: README.md:kompress-base-huggingface
Usage
from headroom.transforms.kompress_compressor import KompressCompressor
compressor = KompressCompressor()
result = compressor.compress(
content="...",
bias=1.0 # Higher = preserve more
)
The model is automatically used when ContentRouter classifies content as general-purpose text.
CacheAligner
CacheAligner optimizes request prefixes to maximize KV cache hit rates across Anthropic and OpenAI providers.
How It Works
- Analyze the prefix structure of incoming requests
- Identify stable vs. variable components
- Reorder or normalize prefix content for better cache alignment
- Track prefix metrics for observability
Configuration
class CacheAlignerConfig:
enabled: bool = True
validation_marker: str | None = None
feedback_enabled: bool = True
min_items_to_cache: int = 3
inject_tool: bool = True
inject_system_instructions: bool = True
marker_template: str | None = None
Source: sdk/typescript/src/types/config.ts:CacheAlignerConfig
IntelligentContext
IntelligentContext uses score-based context fitting with learned importance weights to determine what content to retain.
Configuration
class IntelligentContextConfig:
enabled: bool = True
scoring_weights: ScoringWeights | None = None
relevance_scorer: RelevanceScorerConfig | None = None
anchor_config: AnchorConfig | None = None
| Component | Purpose |
|---|---|
scoring_weights | Tune importance factors (recency, relevance, role) |
relevance_scorer | Configure relevance detection |
anchor_config | Pin critical messages to prevent compression |
Source: sdk/typescript/src/types/config.ts:IntelligentContextConfig
RollingWindow
RollingWindow provides simple context trimming for straightforward compression needs.
Configuration
class RollingWindowConfig:
enabled: bool = True
max_turns: int | None = None
preserve_system: bool = True
preserve_last_n: int = 2
Source: sdk/typescript/src/types/config.ts:RollingWindowConfig
Compress-Cache-Retrieve (CCR)
CCR enables reversible compression — originals are stored and can be retrieved by the LLM on demand.
Mechanism
- Compress — Algorithm compresses content, produces a hash
- Cache — Original stored in the CompressionStore
- Retrieve — LLM uses
headroom_retrievetool to access original
Usage Tracking
class CCRStats:
entries: int
total_original_tokens: int
total_compressed_tokens: int
total_tokens_saved: int
savings_percent: float
Source: sdk/typescript/src/types/models.ts:CCRStats
Image Compression
Image content is handled separately through the image compressor module.
Features
- Intelligent downsampling based on content type
- Format optimization (JPEG for photos, PNG for graphics)
- Size limits configurable per request
Source: headroom/image/compressor.py
Compression Hooks
The TypeScript SDK exposes hooks for customizing compression behavior:
export class CompressionHooks {
preCompress(messages: any[], ctx: CompressContext): any[] | Promise<any[]>;
computeBiases(messages: any[], ctx: CompressContext): Record<number, number>;
postCompress(event: CompressEvent): void | Promise<void>;
}
CompressContext
interface CompressContext {
model: string;
userQuery: string;
turnNumber: number;
toolCalls: string[];
provider: string;
}
CompressEvent
interface CompressEvent {
tokensBefore: number;
tokensAfter: number;
tokensSaved: number;
compressionRatio: number;
transformsApplied: string[];
ccrHashes: string[];
model: string;
userQuery: string;
provider: string;
}
Source: sdk/typescript/src/hooks.ts
Configuration Profiles
Headroom supports compression profiles for different use cases:
class CompressionProfile:
bias: float = 1.0 # >1 = preserve more, <1 = compress more
minK: int = 10 # Minimum items to keep
maxK: int | None = None # Maximum items to keep
Preset Profiles
| Profile | Bias | Use Case |
|---|---|---|
balanced | 1.0 | General purpose |
aggressive | 0.5 | Maximize compression |
conservative | 2.0 | Preserve more context |
Performance Characteristics
| Algorithm | Latency | Memory | Best For |
|---|---|---|---|
| SmartCrusher | Low | Low | Tool outputs |
| CodeCompressor | Medium | Medium | Source files |
| Kompress-base | Higher | Higher | Free-form text |
| CacheAligner | Low | Low | Prefix optimization |
Known Limitations
Multi-Agent Attribution Issue
In multi-agent setups, CCR proactive expansion can corrupt message attribution. When _append_context_to_latest_non_frozen_user_turn() injects expansion blocks into messages containing XML attribution markup (<peer_turn from="AgentX">), the injected block may interfere with attribution parsing.
Workaround: Use explicit CCR retrieval calls instead of relying on proactive expansion in multi-agent threads.
Source: GitHub Issue #503
API Reference
Python SDK
from headroom import compress
result = compress(
messages,
model="claude-sonnet-4-20250514",
profile="balanced"
)
TypeScript SDK
import { compress } from "headroom-ai";
const result = await compress(messages, {
model: "gpt-4o",
hooks: new LoggingHooks(),
tokenBudget: 100000
});
CLI
headroom compress --input messages.json --output compressed.json
See Also
- Pipeline Internals — Detailed compression lifecycle
- Configuration Reference — Full configuration options
- CCR System — Storage and retrieval mechanism
- Code Compression — Code-specific compression docs
Source: https://github.com/chopratejas/headroom / Human Manual
CCR (Reversible Compression)
Related topics: Compression Algorithms, MCP Integration
Continue reading this section for the full explanation and source context.
Continue reading this section for the full explanation and source context.
Continue reading this section for the full explanation and source context.
Continue reading this section for the full explanation and source context.
Related Pages
Related topics: Compression Algorithms, MCP Integration
CCR (Reversible Compression)
Overview
CCR (Compress-Cache-Retrieve) is Headroom's reversible compression mechanism that enables lossless context reduction. Unlike traditional compression that permanently discards information, CCR stores compressed content alongside its original form, allowing the LLM to retrieve full details on demand through a specialized tool.
The core value proposition is straightforward: achieve aggressive token reduction while maintaining zero data loss. When the agent needs original content—whether debugging an error, reviewing a code change, or examining a log entry—it calls headroom_retrieve with a cache key to decompress and return the full original content.
Source: README.md
Architecture
CCR consists of three primary phases that form a continuous cycle:
graph LR
A[Compress] -->|Store originals| B[Cache]
B -->|Insert placeholder| C[Send to LLM]
C -->|Agent requests| D[Retrieve]
D -->|Return originals| CCore Components
| Component | Responsibility | Location |
|---|---|---|
InMemoryCcrStore | Runtime storage of compressed originals | crates/headroom-core/src/ccr/mod.rs |
| CCR Backend | Pluggable storage implementations | crates/headroom-core/src/ccr/backends/mod.rs |
headroom_retrieve | MCP tool for on-demand retrieval | headroom/ccr/mcp_server.py |
| Response Handler | Processes retrieval responses | headroom/ccr/response_handler.py |
Data Flow
sequenceDiagram
participant Transform as Compression Transform
participant Store as CCR Store
participant Proxy as Headroom Proxy
participant LLM as LLM
participant Agent as AI Agent
Transform->>Store: compress(content)
Store->>Store: Generate cache_key
Store->>Store: Store original with key
Transform->>Proxy: Return compressed + cache_key
Proxy->>LLM: Send compressed content
LLM->>Agent: Request via headroom_retrieve
Agent->>Proxy: retrieve(cache_key)
Proxy->>Store: Lookup original
Store->>Proxy: Return original
Proxy->>Agent: Decompressed contentCompression Phase
During the compression phase, Headroom transforms apply lossy compression strategies (SmartCrusher, DiffCompressor, LogCompressor, etc.) while simultaneously preserving originals in the CCR store.
Cache Key Generation
Each compressed item receives a unique cache key that serves as the retrieval identifier. The key format enables:
- Fast O(1) lookup in the store
- Correlation with specific compression transforms
- Version tracking for cache invalidation
Source: crates/headroom-core/src/ccr/mod.rs
Storage Backend
The default backend is InMemoryCcrStore, which provides:
- Thread-safe in-memory storage during a session
- Automatic cleanup on session end
- Minimal latency for retrieval operations
# Python shim creates InMemoryCcrStore for Rust compression
store = headroom_core::ccr::InMemoryCcrStore::new();
let (result, stats) = self.inner.compress_with_store(&owned, bias, Some(&store));
Source: crates/headroom-py/src/lib.rs
Retrieval Phase
MCP Tool: `headroom_retrieve`
The headroom_retrieve tool is exposed via the Headroom MCP server and allows agents to decompress original content on demand.
# headroom/ccr/mcp_server.py exposes retrieval capabilities
class HeadroomMcpServer:
def retrieve(self, cache_key: str) -> str:
"""Retrieve original content by cache key."""
#### Tool Parameters
| Parameter | Type | Required | Description |
|---|---|---|---|
cache_key | string | Yes | The unique identifier returned during compression |
search_query | string | No | Optional search within retrieved content |
Retrieval Response Handling
When an agent requests content, the response handler processes the lookup and formats the result:
# headroom/ccr/response_handler.py
def handle_retrieve_request(cache_key: str) -> RetrieveResult:
original = store.get(cache_key)
return format_response(original)
Source: headroom/ccr/response_handler.py
Configuration
CCR behavior is controlled through the main CCRConfig:
| Option | Type | Default | Description |
|---|---|---|---|
enabled | bool | true | Enable/disable CCR entirely |
store_type | string | "in_memory" | Storage backend selection |
ttl_seconds | int | 3600 | Cache expiration for stored originals |
max_store_size | int | 10000 | Maximum entries before eviction |
Source: docs/content/docs/ccr.mdx
Environment Variables
| Variable | Description |
|---|---|
HEADROOM_CCR_ENABLED | Set to 0 to disable CCR |
HEADROOM_CCR_STORE_TYPE | Override storage backend |
HEADROOM_CCR_TTL | Override TTL in seconds |
Multi-Agent Considerations
When using CCR in multi-agent workflows, content attribution becomes critical. The system supports structured XML markup for tracking content provenance:
<peer_turn from="AgentX">
<!-- Agent-generated content -->
</peer_turn>
Known Limitation: Attribution Corruption
A known issue exists where CCR proactive expansion can corrupt message attribution in multi-agent threads. When _append_context_to_latest_non_frozen_user_turn() injects expansion blocks into messages containing peer attribution markup, the injected block may interfere with the structured XML.
Issue Reference: #503 - CCR proactive expansion blocks corrupt message attribution in multi-agent threads
This affects multi-agent setups where:
- Messages contain structured XML attribution markup
- CCR proactive expansion is enabled
- Multiple agents contribute to the same thread
Transform Integration
CCR is integrated into the compression pipeline as a sidecar mechanism:
graph TD
A[Input Content] --> B[Transform Applies]
B --> C{CCR Enabled?}
C -->|Yes| D[Store Original]
C -->|No| E[Skip CCR]
D --> F[Return Compressed + Key]
E --> G[Return Compressed Only]Supported Transforms
| Transform | CCR Support | Typical Savings |
|---|---|---|
| SmartCrusher | Full | 40-70% |
| DiffCompressor | Full | 60-90% |
| LogCompressor | Full | 70-85% |
| SearchCompressor | Full | 50-75% |
| CacheAligner | Metadata only | N/A |
Source: crates/headroom-core/src/ccr/backends/mod.rs
Observability
CCR operations emit metrics for monitoring:
| Metric | Description |
|---|---|
ccr_entries_stored | Total originals stored |
ccr_retrievals | Total retrieval requests |
ccr_hit_rate | Retrieval success rate |
ccr_store_size | Current store memory usage |
ccr_ttl_evictions | Entries expired by TTL |
Stats Object
The compression result includes a stats dictionary with diagnostic information:
result = compressor.compress(content)
print(f"Cache key: {result.cache_key}") # For retrieval
print(f"Stats: {result.stats}") # Observability data
Source: crates/headroom-py/src/lib.rs:45-52
Best Practices
- Session-Based Usage: CCR store is designed for session-scoped operation. For long-running agents, configure appropriate TTL values to manage memory.
- Key Preservation: Cache keys must be preserved in the conversation context for retrieval to work. The LLM must pass the key back to
headroom_retrieve.
- Error Handling: Implement fallback behavior when retrieval fails—either re-compress with lower settings or request original content through alternative means.
- Multi-Agent Attribution: In multi-agent setups, track content provenance explicitly to avoid the attribution corruption issue documented in #503.
Related Documentation
- Pipeline Internals - How CCR fits into the broader compression pipeline
- MCP Server Setup - Detailed MCP configuration
- Compression Hooks - Pre/post compression customization
Source: https://github.com/chopratejas/headroom / Human Manual
Memory System
Related topics: MCP Integration
Continue reading this section for the full explanation and source context.
Continue reading this section for the full explanation and source context.
Continue reading this section for the full explanation and source context.
Continue reading this section for the full explanation and source context.
Related Pages
Related topics: MCP Integration
Memory System
The Headroom Memory System provides persistent cross-session knowledge storage and retrieval for AI coding agents. It enables agents to remember important decisions, conventions, project context, architecture details, and user preferences across multiple sessions without requiring manual re-entry.
Overview
The Memory System addresses a fundamental limitation of AI coding assistants: the context window is ephemeral. When a session ends, all learned information is lost. Headroom's memory system solves this by providing:
- Persistent storage - Memories survive session boundaries
- Multi-agent awareness - Shared store with agent provenance
- Automatic retrieval - Relevant memories surfaced when needed
- Reversible compression - Full fidelity retrieval via CCR
Source: headroom/cli/wrap.py:1-50
Architecture
graph TD
subgraph "Client Layer"
CLI[headroom memory CLI]
MCP[Memory MCP Server]
Wrap[headroom wrap]
end
subgraph "Core Memory"
Bridge[Memory Bridge]
Core[Memory Core]
end
subgraph "Backends"
Local[Local SQLite]
Mem0[Mem0 Backend]
QdrantNeo4j[Qdrant + Neo4j]
end
subgraph "Integrations"
ClaudeMCP[Claude MCP]
CodexMCP[Codex MCP]
end
CLI --> Core
MCP --> Bridge
Wrap --> Bridge
Bridge --> Core
Core --> Local
Core --> Mem0
Core --> QdrantNeo4j
ClaudeMCP -.->|memory_search| MCP
ClaudeMCP -.->|memory_save| MCPMemory Scopes
Memories are organized by scope, allowing fine-grained control over persistence and visibility:
| Scope | Description | Use Case |
|---|---|---|
USER | User-wide memories | Preferences, coding style, org info |
SESSION | Session-specific memories | Current task context |
AGENT | Agent-specific memories | Agent identity, capabilities |
TURN | Single turn memories | Ephemeral context |
Source: headroom/cli/memory.py:20-25
CLI Commands
Memory Management
headroom memory list # List all stored memories
headroom memory list --limit 10 # List the 10 most recent memories
headroom memory list --scope USER # List only USER-level memories
headroom memory list --since 7d # List memories from the last 7 days
headroom memory show <id> # Show full details of a memory
headroom memory stats # Show memory statistics
headroom memory edit <id> --content ... # Edit a memory's content
headroom memory delete <id> # Delete a memory
headroom memory prune --older-than 30d # Delete memories older than 30 days
headroom memory purge --confirm # Delete ALL memories
headroom memory export --output file.json # Export all memories to JSON
headroom memory import file.json # Import memories from JSON
Source: headroom/cli/memory.py:1-30
Memory Integration in Wrapped Agents
When running headroom wrap with the --memory flag, the system automatically:
- Registers the
headroom_memoryMCP server - Injects memory usage guidance into
AGENTS.md - Enables
memory_searchandmemory_savetools
Source: headroom/cli/wrap.py:50-80
Memory Guidance Injection
The system injects guidance into AGENTS.md files to instruct agents on when to use memory:
<!-- headroom:memory-instructions -->
## Memory
Use the `headroom_memory` MCP server for persistent cross-session knowledge.
**Before** answering questions about prior decisions, conventions, project context,
architecture, user preferences, org info, codenames, debugging history, or anything
from past sessions — call `memory_search` first.
**After** making durable decisions, discovering conventions, or learning important
facts — call `memory_save` to persist them for future sessions.
Memory is your first source of truth for anything not visible in the current conversation.
Source: headroom/cli/wrap.py:80-100
Memory Evaluation
Headroom includes comprehensive evaluation suites for memory systems:
LoCoMo V2 Evaluation
Tests the architecture where:
- LLM decides what to save (
memory_savetool) - LLM decides when to search (
memory_searchtool) - Graph relationships enable multi-hop reasoning
headroom evals memory-v2 -n 3
headroom evals memory-v2 --answer-model gpt-4o --save-model gpt-4o-mini
Parameters:
| Parameter | Description | Default |
|---|---|---|
--n-conversations | Number of conversations to evaluate | All (10) |
--categories | Categories 1-5 (default: 1,2,3,4) | 1,2,3,4 |
--include-adversarial | Include category 5 (unanswerable) | False |
--f1-threshold | F1 score threshold for 'correct' | 0.5 |
--answer-model | LLM model for generating answers | None |
--llm-judge | Use LLM-as-judge scoring | False |
--judge-model | Model for judging | None |
--parallel | Parallel evaluation workers | 1 |
Source: headroom/cli/evals.py:30-80
Storage Backends
Per-Project Storage
As of v0.21.34, memories use per-project storage, preventing cross-project memory leakage. Each project has isolated memory storage.
Source: headroom/cli/evals.py, Community Release v0.21.34
Local SQLite Backend
Default backend using SQLite for storage with support for:
- Scope-based filtering
- Time-based queries
- Full-text search
- Import/export
Mem0 Backend
External Mem0 integration for users with existing Mem0 deployments.
Qdrant + Neo4j Backend
Advanced backend providing:
- Vector search via Qdrant
- Graph relationships via Neo4j
- Multi-hop reasoning capabilities
Configuration:
| Option | Description | Default |
|---|---|---|
--memory-qdrant-url | Full Qdrant URL | None |
--memory-qdrant-host | Qdrant host | localhost |
--memory-qdrant-port | Qdrant port | 6333 |
--memory-neo4j-uri | Neo4j URI | None |
--memory-neo4j-user | Neo4j user | None |
Source: headroom/cli/proxy.py
MCP Tools
The memory MCP server exposes the following tools:
| Tool | Description | Parameters |
|---|---|---|
memory_search | Search memories by query | query, limit, scope, session_id |
memory_save | Save a new memory | content, scope, agent_id, session_id |
memory_list | List memories | limit, scope, since, search |
memory_show | Show memory details | id |
memory_edit | Edit memory content | id, content |
memory_delete | Delete a memory | id |
Source: headroom/cli/wrap.py:100-150
Multi-Agent Memory
In multi-agent setups, the memory system provides:
- Shared store - All agents can access common memories
- Agent provenance - Track which agent saved each memory
- Auto-dedup - Prevent duplicate memories
- Cross-agent context - Memory context passed across agent boundaries
graph LR
subgraph "Agent A"
A_Save[memory_save]
A_Search[memory_search]
end
subgraph "Agent B"
B_Save[memory_save]
B_Search[memory_search]
end
subgraph "Shared Memory"
Store[(Memory Store)]
end
A_Save --> Store
B_Save --> Store
Store --> A_Search
Store --> B_SearchKnown Issues
CCR Proactive Expansion in Multi-Agent Threads
Issue #503: CCR proactive expansion blocks corrupt message attribution in multi-agent threads.
TL;DR: The _append_context_to_latest_non_frozen_user_turn() function injects proactive expansion blocks into the latest user message content. In multi-agent setups, that message can contain structured XML attribution markup (<peer_turn from="AgentX">). The injected block can corrupt this attribution.
Status: Open, under investigation.
Configuration Options
Proxy Configuration
| Option | Description | Default |
|---|---|---|
--memory | Enable memory integration | False |
--memory-storage | Storage backend | local |
--memory-project-root | Override project root | "" |
--no-memory-tools | Disable memory tool injection | False |
--no-memory-context | Disable memory context injection | False |
--memory-top-k | Memories to inject as context | 10 |
Source: headroom/cli/proxy.py
Usage Examples
Basic Memory Workflow
# After discovering a convention
await mcp_client.call_tool("memory_save", {
"content": "Use TypeScript strict mode in all new projects",
"scope": "USER"
})
# In a new session, before answering
results = await mcp_client.call_tool("memory_search", {
"query": "coding conventions and project standards"
})
Multi-Agent Shared Context
import { createSharedContext } from "@headroom/sdk";
const ctx = createSharedContext({
agentId: "architect-agent",
projectId: "k8s-scaling"
});
// Save findings
await ctx.set("research", { provider: "aws", region: "us-east" });
// Another agent reads it
const compressed = await ctx.get("research");
Source: sdk/typescript/examples/shared-context-multi-agent.ts
Performance Considerations
- Memory ID exposure - As of v0.22.2, memory IDs are exposed in auto-tail and memory_list tool with ID-usage guidance
- Regex-based prefix extraction removed - v0.21.35 dropped regex-based pref extraction and filters system-reminder noise
- Query cap removed - v0.22.0 dropped the 500-char query cap for memory search
Further Reading
Source: https://github.com/chopratejas/headroom / Human Manual
MCP Integration
Related topics: CCR (Reversible Compression), Memory System
Continue reading this section for the full explanation and source context.
Continue reading this section for the full explanation and source context.
Continue reading this section for the full explanation and source context.
Continue reading this section for the full explanation and source context.
Related Pages
Related topics: CCR (Reversible Compression), Memory System
MCP Integration
MCP (Model Context Protocol) integration enables Headroom to expose compression, retrieval, and memory tools to AI coding assistants like Claude Code. This integration is foundational to Headroom's CCR (Compress-Cache-Retrieve) pattern, allowing agents to work with compressed content summaries and retrieve original data on demand.
Overview
MCP integration serves three primary purposes in Headroom:
- Content Retrieval — Exposes
headroom_retrieveas an MCP tool that Claude Code calls to decompress compressed content - Memory Persistence — Provides a persistent cross-session memory MCP server (
headroom_memory) for knowledge retention - Subscription Access — Enables CCR functionality for Claude Code subscription users without API key access
The MCP server operates as a stdio-based service, meaning it communicates via standard input/output rather than HTTP. This is distinct from an HTTP endpoint — the /mcp path is not an HTTP route on the proxy server.
Architecture
graph TD
A[Claude Code] -->|stdio| B[Headroom MCP Server]
B -->|retrieve| C[Headroom Proxy]
C -->|compressed content| B
B -->|original content| A
D[Claude Code] -->|stdio| E[Memory MCP Server]
E -->|persist/query| F[(SQLite DB)]
G[headroom wrap] -->|configures| A
G -->|registers| B
G -->|registers| EInstallation and Setup
Automatic Setup via CLI Wrapper
The recommended approach uses the headroom wrap command, which automatically configures MCP servers:
# For Claude Code
headroom wrap claude
# For Codex
headroom wrap codex
# For Claude Code with persistent memory
headroom wrap claude --memory
# For Codex with persistent memory
headroom wrap codex --memory
The wrap command handles multiple setup steps including proxy startup, CLI context tool configuration, and MCP server registration.
Manual MCP Installation
For manual configuration, use the MCP CLI commands:
# Install MCP server for Claude Code
headroom mcp install
# Verify installation
headroom mcp status
# Uninstall MCP server
headroom mcp uninstall
Source: headroom/cli/mcp.py:60-80
Standalone MCP Server
Start the MCP server independently when the proxy runs separately:
# Start the MCP server (requires proxy running)
headroom mcp serve
# With custom proxy URL
headroom mcp serve --proxy-url http://127.0.0.1:8787
MCP Server Implementation
MCP Command Structure
The CLI provides a command group for MCP operations:
@main.group()
def mcp() -> None:
"""MCP server for Claude Code integration."""
Source: headroom/cli/mcp.py:43-60
Configuration Management
MCP configuration is stored in ~/.claude/mcp.json:
def load_mcp_config() -> dict[str, Any]:
"""Load existing MCP config or return empty structure."""
if MCP_CONFIG_PATH.exists():
with open(MCP_CONFIG_PATH) as f:
return json.load(f)
return {"mcpServers": {}}
def save_mcp_config(config: dict) -> None:
"""Save MCP config, creating directory if needed."""
CLAUDE_CONFIG_DIR.mkdir(parents=True, exist_ok=True)
with open(MCP_CONFIG_PATH, "w") as f:
json.dump(config, f, indent=2)
Source: headroom/cli/mcp.py:25-40
Headroom Command Generation
The MCP server command is generated dynamically:
def get_headroom_command() -> list[str]:
"""Get the command to run headroom MCP server."""
return ["headroom", "mcp", "serve"]
Source: headroom/cli/mcp.py:18-23
CCR (Compress-Cache-Retrieve) Workflow
The MCP integration enables the CCR pattern for Claude Code subscription users:
sequenceDiagram
participant CC as Claude Code
participant MCP as Headroom MCP Server
participant Proxy as Headroom Proxy
CC->>Proxy: API request with ANTHROPIC_BASE_URL
Proxy->>Proxy: Compress large tool outputs
Proxy-->>CC: Compressed summary with hash markers
CC->>MCP: headroom_retrieve(hash)
MCP->>Proxy: Fetch original content
Proxy-->>MCP: Original data
MCP-->>CC: Full content restoredHow CCR Works
- Compression — The proxy compresses large tool outputs (file listings, search results) and replaces them with hash markers
- Caching — Original content is stored temporarily with the hash as key
- Retrieval — When Claude Code needs full details, it calls
headroom_retrievewith the hash - Restoration — The MCP server fetches and returns the original content
Source: headroom/cli/mcp.py:50-75
Memory MCP Server
Headroom includes a dedicated MCP server for persistent cross-session memory:
Registration
The memory MCP server is registered in Claude Code's config.toml:
def _inject_memory_mcp_config(db_path: str, user_id: str) -> None:
"""Register headroom memory as an MCP server in Codex's config.toml."""
mcp_section = (
f"\n{_MEMORY_MCP_MARKER}\n"
f"[mcp_servers.headroom_memory]\n"
f'command = "{python_bin}"\n'
f'args = ["-m", "headroom.memory.mcp_server", "--db", "{db_path_toml}", "--user", "{user_id}"]\n'
f"startup_timeout_sec = 30\n"
f"tool_timeout_sec = 30\n"
f"{_MEMORY_MCP_END}\n"
)
Source: headroom/cli/wrap.py:120-145
Memory Operations
The memory MCP server provides tools for:
memory_search— Query persistent knowledge from past sessionsmemory_save— Store important decisions, conventions, and contextmemory_list— List stored memories
Memory Usage Guidance
Memory instructions are injected into AGENTS.md:
def _inject_memory_agents_md(file_path: Path) -> bool:
"""Inject memory usage guidance into AGENTS.md."""
memory_block = (
f"{_MEMORY_AGENTS_MARKER}\n"
"## Memory\n\n"
"Use the `headroom_memory` MCP server for persistent cross-session knowledge.\n\n"
"**Before** answering questions about prior decisions, conventions...\n"
"**After** making durable decisions... call `memory_save` to persist them.\n"
)
Source: headroom/cli/wrap.py:170-200
Codex Integration
For Codex, the MCP server registration differs slightly:
Config File Paths
def _codex_config_paths() -> tuple[Path, Path]:
"""Return ``(config_file, backup_file)`` paths for the Codex TOML config."""
config_dir = Path.home() / ".codex"
config_file = config_dir / "config.toml"
backup_file = config_dir / f"config.toml{_CODEX_CONFIG_BACKUP_SUFFIX}"
return config_file, backup_file
Source: headroom/cli/wrap.py:95-102
Idempotent Registration
MCP registration is idempotent — existing sections are replaced:
if _MEMORY_MCP_MARKER in content:
start = content.index(_MEMORY_MCP_MARKER)
end = content.index(_MEMORY_MCP_END) + len(_MEMORY_MCP_END)
content = content[:start].rstrip("\n") + mcp_section + content[end:].lstrip("\n")
else:
content = content.rstrip() + "\n" + mcp_section
Source: headroom/cli/wrap.py:140-155
Configuration Backup
Pre-wrap state is snapshotted to enable clean unwrapping:
def _snapshot_codex_config_if_unwrapped(config_file: Path, backup_file: Path) -> None:
"""Snapshot ~/.codex/config.toml BEFORE any wrap-time mutation."""
Source: headroom/cli/wrap.py:104-120
TypeScript SDK Integration
The TypeScript SDK supports shared context for multi-agent setups:
import { SharedContext } from "@headroomhq/sdk";
// Create shared context
const ctx = new SharedContext();
// Store compressed content
await ctx.put("k8s-scaling-research", compressedContent);
// Retrieve later
const compressed = await ctx.get("k8s-scaling-research");
// Access stats
const stats = ctx.stats();
console.log(`Total saved: ${stats.totalTokensSaved}`);
Source: sdk/typescript/examples/shared-context-multi-agent.ts
Configuration Options
CLI Options
| Option | Description | Default |
|---|---|---|
--no-mcp | Skip MCP retrieve tool registration | False |
--no-serena | Skip Serena MCP registration | False |
--memory | Enable persistent cross-session memory | False |
--code-graph | Enable code graph indexing via codebase-memory-mcp | False |
Source: headroom/cli/wrap.py:40-75
Environment Variables
| Variable | Description |
|---|---|
ANTHROPIC_BASE_URL | Route Claude Code traffic through proxy (set to http://127.0.0.1:8787) |
Known Limitations
Multi-Agent Message Attribution
Issue #503 — CCR proactive expansion blocks may corrupt message attribution in multi-agent threads. The_append_context_to_latest_non_frozen_user_turn()function injects proactive expansion blocks into the latest user message content, which can contain structured XML attribution markup (<peer_turn from="AgentX">). The injected block ends up corrupting this structure.
HTTP Endpoint Misconception
Issue #460 — The MCP server operates via stdio, not HTTP. The proxy does not expose an HTTP endpoint at/mcp. Users should runheadroom mcp serveas a standalone process, not expect/mcpon the proxy server.
Examples
Running the MCP Demo
# Configure API key
export OPENAI_API_KEY='your-key'
# Run the MCP demo
PYTHONPATH=. python -m examples.mcp_demo.run_agent_eval
Source: examples/README.md
AWS Bedrock with Strands
# Configure AWS credentials
export AWS_ACCESS_KEY_ID='your-access-key'
export AWS_SECRET_ACCESS_KEY='your-secret-key'
export AWS_DEFAULT_REGION='us-west-2'
# Run the demo
python examples/strands_bedrock_demo.py
Source: examples/README.md
Related Documentation
- Claude Code Integration — Complete Claude Code setup
- Codex Integration — Complete Codex setup
- Memory System — Cross-session memory architecture
- CCR Pattern — Compress-Cache-Retrieve details
Source: https://github.com/chopratejas/headroom / Human Manual
Proxy Deployment
Related topics: Getting Started, CLI Wrappers
Continue reading this section for the full explanation and source context.
Continue reading this section for the full explanation and source context.
Continue reading this section for the full explanation and source context.
Continue reading this section for the full explanation and source context.
Related Pages
Related topics: Getting Started, CLI Wrappers
Proxy Deployment
Overview
The Headroom proxy is a central component that intercepts, optimizes, and routes LLM API traffic through Headroom's context compression pipeline. It serves as the contextual optimization layer between AI coding tools (Claude Code, Codex, Goose, Continue, etc.) and upstream LLM providers like Anthropic and OpenAI.
The proxy enables:
- Token savings through compression transforms (CacheAligner, SmartCrusher, IntelligentContext)
- Reversible compression via CCR (Context Compression Retrieval) — originals remain retrievable on demand
- Shared memory across multi-agent workflows
- Semantic caching for repeated query patterns
- Cross-agent context passing via SharedContext
Source: README.md
Architecture
graph TD
subgraph "AI Coding Tools"
Claude[Claude Code]
Codex[OpenAI Codex]
Goose[Goose]
Continue[Continue Dev]
OpenHands[OpenHands]
Custom[Custom SDK / App]
end
subgraph "Headroom Proxy"
Intercept[Request Interception]
Pipeline[Compression Pipeline]
Memory[Cross-Agent Memory]
CCR[CCR Retrieval]
Cache[Semantic Cache]
end
subgraph "LLM Providers"
Anthropic[Anthropic /v1/messages]
OpenAI[OpenAI /v1/chat/completions]
Vertex[Vertex AI]
Bedrock[AWS Bedrock]
end
Claude --> Intercept
Codex --> Intercept
Goose --> Intercept
Continue --> Intercept
OpenHands --> Intercept
Custom --> Intercept
Intercept --> Pipeline
Pipeline --> Memory
Pipeline --> CCR
Pipeline --> Cache
Pipeline --> Anthropic
Pipeline --> OpenAIRequest Lifecycle
The proxy exposes one stable request lifecycle across all integration paths:
Setup → Pre-Start → Post-Start → Input Received → Input Cached → Input Routed → Input Compressed → Input Remembered → Pre-Send → Post-Send → Response Received
Source: README.md
Pipeline Transforms
| Transform | Purpose |
|---|---|
| CacheAligner | Stabilizes prefixes so KV caches hit effectively |
| ContentRouter | Routes content to appropriate compression strategies |
| SmartCrusher | ML-based compression routing (~90% reduction) |
| CodeCompressor | Specialized code content handling |
| Kompress-base | Trained ML text compression |
| IntelligentContext | Score-based context fitting with learned importance |
| RollingWindow | Sliding conversation window management |
Source: README.md
CLI Commands
`headroom proxy`
Starts the Headroom proxy server. This is the primary command for deploying the proxy as a standalone service.
headroom proxy [OPTIONS]
#### Core Options
| Option | Default | Description |
|---|---|---|
--host | 127.0.0.1 | Host to bind to |
--port, -p | 8787 | Proxy port |
--mode | optimize | Proxy mode: audit, optimize, simulate |
--backend | anthropic | API backend: anthropic, anyllm, litellm-vertex |
--anyllm-provider | None | Provider for any-llm backend |
--region | None | Cloud region for Bedrock/Vertex |
--exclude-tools | — | Comma-separated tools to exclude from processing |
--no-optimize | — | Disable optimization (passthrough mode) |
--no-cache | — | Disable semantic caching |
--no-rate-limit | — | Disable rate limiting |
--no-subscription-tracking | — | Disable Anthropic subscription usage poller |
--intercept-tool-results | — | Enable tool_result interceptors (opt-in) |
--memory | — | Enable persistent cross-session memory |
--learn | — | Enable live traffic learning |
--verbose, -v | — | Verbose output |
Source: headroom/cli/proxy.py
#### Environment Variables
| Variable | Description |
|---|---|
HEADROOM_HOST | Proxy host binding |
HEADROOM_PORT | Proxy port |
HEADROOM_MODE | Proxy mode |
HEADROOM_BACKEND | API backend selection |
HEADROOM_ANYLLM_PROVIDER | Provider for any-llm backend |
HEADROOM_REGION | Cloud region for Bedrock/Vertex |
HEADROOM_EXCLUDE_TOOLS | Tools to exclude |
HEADROOM_NO_SUBSCRIPTION_TRACKING | Disable subscription poller |
HEADROOM_PROXY_EXTENSIONS | Enabled proxy extensions |
HEADROOM_CONTEXT_TOOL | Context tool selection: rtk or lean-ctx |
Source: headroom/cli/proxy.py
`headroom perf`
Analyzes proxy performance from logs.
headroom perf [OPTIONS]
| Option | Default | Description |
|---|---|---|
--hours | 168 (7 days) | Analyze logs from last N hours |
--raw | — | Show raw PERF records instead of formatted report |
Source: headroom/cli/perf.py
`headroom wrap <tool>`
Launches AI coding tools with the proxy automatically configured. Available wrappers:
| Tool | Command |
|---|---|
| Claude Code | headroom wrap claude |
| GitHub Copilot CLI | headroom wrap copilot |
| OpenAI Codex | headroom wrap codex |
| Aider | headroom wrap aider |
| Cursor | headroom wrap cursor |
| Goose | headroom wrap goose |
| OpenHands | headroom wrap openhands |
| Continue | headroom wrap continue |
| OpenClaw | headroom wrap openclaw |
Each wrap command shares common options:
| Option | Description |
|---|---|
--port, -p | Proxy port (default: 8787) |
--no-context-tool / --no-rtk | Skip CLI context-tool setup |
--no-proxy | Skip proxy startup (use existing) |
--no-mcp | Skip headroom MCP server registration |
--no-serena | Skip Serena MCP server registration |
--code-graph | Enable code graph indexing via codebase-memory-mcp |
--memory | Enable persistent cross-session memory |
--learn | Enable live traffic learning |
--backend | API backend selection |
--anyllm-provider | Provider for any-llm backend |
--region | Cloud region for Bedrock/Vertex |
--verbose, -v | Verbose output |
--prepare-only | Prepare environment without launching tool |
Source: headroom/cli/wrap.py
Configuration
Proxy Modes
| Mode | Description |
|---|---|
audit | Log requests/responses without modification |
optimize | Full compression and optimization enabled |
simulate | Preview compression effects without API calls |
TypeScript Configuration Types
export type HeadroomMode = "audit" | "optimize" | "simulate";
export interface CompressionProfile {
cacheAligner?: CacheAlignerConfig;
rollingWindow?: RollingWindowConfig;
scoringWeights?: ScoringWeights;
intelligentContext?: IntelligentContextConfig;
smartCrusher?: SmartCrusherConfig;
cacheOptimizer?: CacheOptimizerConfig;
ccr?: CCRConfig;
prefixFreeze?: PrefixFreezeConfig;
}
Source: sdk/typescript/src/types/config.ts
HeadroomConfig Interface
export interface HeadroomConfig {
mode?: HeadroomMode;
optimize?: boolean;
cacheEnabled?: boolean;
rateLimitEnabled?: boolean;
profile?: CompressionProfile;
toolCrusher?: ToolCrusherConfig;
memory?: MemoryConfig;
extensions?: string[];
}
Source: sdk/typescript/src/types/config.ts
Integration Patterns
SDK Integration
#### Python
from headroom import compress
result = compress(messages, model="claude-sonnet-4-20250514")
#### TypeScript
import { compress } from 'headroom-ai';
const result = await compress(messages, { model: 'claude-sonnet-4-20250514' });
Source: README.md
SDK Wrapper Integration
from headroom import withHeadroom
# Wrap Anthropic SDK
client = withHeadroom(Anthropic())
# Wrap OpenAI SDK
client = withHeadroom(OpenAI())
Source: README.md
Vercel AI SDK Integration
import { wrapLanguageModel } from 'ai';
import { headroomMiddleware } from 'headroom-ai';
const model = wrapLanguageModel({
model: yourModel,
middleware: headroomMiddleware(),
});
Source: README.md
LiteLLM Integration
import litellm
from headroom.integrations.litellm import HeadroomCallback
litellm.callbacks = [HeadroomCallback()]
LangChain Integration
LangChain supports callback-based integration for Headroom compression.
Supported API Routes
The proxy routes traffic to different upstream providers:
| Route | Upstream Target |
|---|---|
/v1/messages | Anthropic API |
/v1/chat/completions | OpenAI API |
/v1/responses | OpenAI API (HTTP + WebSocket) |
/v1internal:streamGenerateContent | CloudCode API |
Source: headroom/cli/proxy.py
Installation
Python Package
# Full installation
pip install "headroom-ai[all]"
# Granular extras
pip install "headroom-ai[proxy]" # Proxy only
pip install "headroom-ai[mcp]" # MCP support
pip install "headroom-ai[ml]" # Kompress-base ML
pip install "headroom-ai[agno]" # Agno framework
pip install "headroom-ai[langchain]" # LangChain integration
pip install "headroom-ai[evals]" # Evaluation tools
Requires Python 3.10+.
Docker
docker pull ghcr.io/chopratejas/headroom:latest
npm / TypeScript
npm install headroom-ai
Source: README.md
Known Limitations and Issues
MCP Endpoint Unavailability
The proxy does not expose an HTTP MCP endpoint at /mcp. The MCP server functionality requires stdio-based communication, not HTTP routing. Users should use headroom mcp install for MCP integration rather than expecting HTTP endpoint access through the proxy.
Source: GitHub Issue #460
CCR in Multi-Agent Threads
The _append_context_to_latest_non_frozen_user_turn() function injects proactive expansion blocks into the latest user message content. In multi-agent setups where messages contain structured XML attribution markup (<peer_turn from="AgentX">), injected blocks may corrupt message attribution.
Source: GitHub Issue #503
Provider-Agnostic Limitations
The proxy currently intercepts traffic at the Anthropic API level (/v1/messages). Users on AWS Bedrock, OpenAI, or Google Vertex cannot use the proxy because their LLM traffic goes through provider-specific SDKs with different authentication mechanisms (SigV4 for Bedrock, API keys for OpenAI).
Source: GitHub Issue #510
Proxy Extensions
Proxy extensions provide integration points for ASGI middleware, custom routes, and startup policy:
headroom proxy --proxy-extension my-extension --proxy-extension another-extension
Use --proxy-extension '*' to enable all discovered extensions.
Source: headroom/cli/proxy.py
Memory and Learning
Persistent Memory
Enable cross-session memory with the --memory flag:
headroom proxy --memory
Memory storage is per-project to prevent cross-project memory bleeding (fixed in v0.21.34).
Live Traffic Learning
Enable pattern learning from agent failures:
headroom proxy --learn
headroom wrap claude --learn
Patterns are saved to AGENTS.md and used to improve future compression decisions.
Exports Reference
The TypeScript SDK exports the following proxy-related types and functions:
export type {
HeadroomMode,
RelevanceTier,
ContentType,
BlockKind,
CompressionProfile,
HeadroomConfig,
WasteSignals,
CachePrefixMetrics,
TransformDiff,
RequestMetrics,
ProxyStats,
} from "./types/config.js";
export type {
MetricsSummary,
HealthStatus,
ProxyStats,
MemoryUsage,
} from "./types/models.js";
Source: sdk/typescript/src/index.ts
Source: https://github.com/chopratejas/headroom / Human Manual
CLI Wrappers
Related topics: Proxy Deployment
Continue reading this section for the full explanation and source context.
Continue reading this section for the full explanation and source context.
Continue reading this section for the full explanation and source context.
Continue reading this section for the full explanation and source context.
Related Pages
Related topics: Proxy Deployment
CLI Wrappers
Overview
CLI Wrappers (headroom wrap) are the primary entry point for integrating Headroom's context compression with standalone AI coding assistants. They automate the setup of the Headroom proxy, MCP servers, CLI context tools (RTK or lean-ctx), and memory integration—eliminating manual configuration for supported tools.
The wrapper system acts as a launch orchestrator that:
- Starts the Headroom proxy server on a configurable port
- Configures the target CLI tool to route API calls through the proxy
- Registers MCP servers for compression marker retrieval
- Injects context tool instructions into the CLI's configuration files
- Optionally enables persistent cross-session memory
Source: headroom/cli/wrap.py:1-100
Supported CLI Tools
| Tool | Command | Supported Options |
|---|---|---|
| Claude Code | headroom wrap claude | --memory, --resume, --model, --code-graph, --no-context-tool, --no-mcp, --no-serena |
| OpenCode | headroom wrap codex | --port, --backend, --anyllm-provider, --no-context-tool, --no-mcp, --no-serena, --no-proxy |
| Continue | headroom wrap continue | --config, --memory, --no-rtk, --no-proxy, --learn |
| Goose | headroom wrap goose | Standard wrap options |
| OpenHands | headroom wrap openhands | Standard wrap options |
| Cursor | headroom wrap cursor | Standard wrap options |
Source: headroom/cli/wrap.py:150-300
Architecture
graph TD
A["headroom wrap <tool>"] --> B[Parse CLI Arguments]
B --> C{prepare_only flag?}
C -->|Yes| D[Setup Context Tool Only]
C -->|No| E[Snapshot Pre-Wrap Config]
E --> F[Setup Context Tool]
F --> G[Register MCP Servers]
G --> H[Start Headroom Proxy]
H --> I[Inject Config Into CLI]
I --> J[Launch Target CLI Tool]
J --> K[Monitor & Forward Traffic]
L[Proxy Server] <--> M[Compression Engine]
M --> N[CacheAligner]
M --> O[SmartCrusher]
M --> P[CCR Markers]
K --> L
P --> Q[MCP Retrieve Tool]
Q --> R[LLM Retrieval on Demand]Component Responsibilities
| Component | File | Role |
|---|---|---|
| Wrap Command Dispatcher | headroom/cli/wrap.py | Parses arguments, routes to provider runtime |
| Claude Runtime | headroom/providers/claude/runtime.py | Claude Code specific setup and lifecycle |
| Codex Runtime | headroom/providers/codex/runtime.py | OpenCode/Codex specific setup |
| MCP Registry | headroom/mcp_registry/ | MCP server registration for all tools |
| Proxy Manager | plugins/openclaw/src/index.ts | Cross-platform proxy command resolution |
Source: headroom/cli/wrap.py:200-280
Common Command Options
Proxy Configuration
| Option | Environment Variable | Description |
|---|---|---|
--port <n> | HEADROOM_PORT | Proxy listen port (default: 8080) |
--backend <backend> | HEADROOM_BACKEND | API backend: anthropic, anyllm, litellm-vertex |
--anyllm-provider <provider> | HEADROOM_ANYLLM_PROVIDER | Provider for any-llm: openai, mistral, groq |
--region <region> | HEADROOM_REGION | Cloud region for Bedrock/Vertex |
--no-proxy | - | Use existing proxy instead of starting new one |
Source: headroom/cli/wrap.py:220-260
Context Tool Options
| Option | Description |
|---|---|
--no-context-tool / --no-rtk | Skip CLI context-tool setup (RTK or lean-ctx) |
--learn | Enable live traffic learning, patterns saved to AGENTS.md |
MCP Integration Options
| Option | Description |
|---|---|
--no-mcp | Skip headroom MCP server registration |
--no-serena | Skip Serena MCP server registration |
Memory Options
| Option | Description |
|---|---|
--memory | Enable persistent cross-session memory |
--resume <id> | Resume a specific memory session (Claude-specific) |
Source: headroom/cli/wrap.py:260-320
Claude Code Wrapper
The headroom wrap claude command provides deep integration with Anthropic's Claude Code CLI.
# Basic usage
headroom wrap claude
# With persistent memory
headroom wrap claude --memory
# Resume a session
headroom wrap claude --resume <session-id>
# Pass arguments to Claude
headroom wrap claude -- "fix the bug"
# With code graph intelligence
headroom wrap claude --code-graph
# Skip context tool setup
headroom wrap claude --no-context-tool
Claude-Specific Setup Flow
sequenceDiagram
participant User
participant CLI as headroom wrap claude
participant RTK as RTK/lean-ctx
participant Config as Claude Config
participant Proxy as Headroom Proxy
participant MCP as MCP Server
User->>CLI: headroom wrap claude --memory
CLI->>Config: Snapshot pre-wrap state
CLI->>RTK: Setup context tool
RTK->>Config: Inject instructions into CLAUDE.md
CLI->>MCP: Register headroom MCP server
CLI->>MCP: Register Serena MCP server
CLI->>Proxy: Start proxy on port 8080
CLI->>Config: Set ANTHROPIC_BASE_URL to proxy
CLI->>User: Launch Claude CodeSource: headroom/providers/claude/runtime.py:1-150
OpenCode/Codex Wrapper
The headroom wrap codex command integrates with OpenCode (formerly OpenCode/Codex).
# Basic usage
headroom wrap codex
# Custom proxy port
headroom wrap codex --port 9999
# Pass prompt to codex
headroom wrap codex -- "fix the bug"
# With specific backend
headroom wrap codex --backend anyllm --anyllm-provider groq
# Skip all tool registration
headroom wrap codex --no-context-tool --no-mcp --no-serena
Codex Configuration Handling
The wrapper snapshots ~/.codex/config.toml before any modifications, ensuring headroom unwrap codex can restore the original state byte-for-byte.
# Snapshot happens BEFORE MCP install
_codex_config_file, _codex_backup_file = _codex_config_paths()
_snapshot_codex_config_if_unwrapped(_codex_config_file, _codex_backup_file)
Source: headroom/cli/wrap.py:60-100
Continue IDE Wrapper
The headroom wrap continue command configures the Continue VS Code/JetBrains extension.
# Basic usage
headroom wrap continue
# With custom config path
headroom wrap continue --config .continue/config.json
# Enable learning
headroom wrap continue --learn
Continue Configuration Injection
The wrapper injects RTK guidance into both top-level and per-model systemMessage fields:
# Non-string systemMessage values are NEVER overwritten
# Only string values get the RTK marker injected
if isinstance(existing_value, str):
# Append RTK instructions
Source: headroom/cli/wrap.py:400-500
Context Tool Integration
RTK (Default)
The default context tool uses RTK for shell output rewriting.
| Command Category | Commands | Typical Savings |
|---|---|---|
| Git | rtk git diff, rtk git log | 40-60% |
| Files & Search | rtk ls, rtk read, rtk grep | 60-75% |
| Testing | rtk pytest, rtk cargo test | 90-99% |
| Build & Lint | rtk tsc, rtk lint, rtk ruff check | 80-90% |
| Infrastructure | rtk docker ps, rtk kubectl get | 85% |
lean-ctx Alternative
Set HEADROOM_CONTEXT_TOOL=lean-ctx before running wrap commands to use lean-ctx instead of RTK.
Source: headroom/cli/wrap.py:500-600
MCP Server Registration
CLI wrappers automatically register MCP servers that enable on-demand decompression of CCR markers.
graph LR
A[Compressed Content<br/>with CCR Markers] --> B[headroom_retrieve MCP Tool]
B --> C[Headroom Proxy]
C --> D[Decompressed Original]
D --> E[LLM Processing]Supported MCP Tools
| Tool | Purpose | Registration |
|---|---|---|
| headroom | Primary compression retrieval | Auto-registered in CLI config |
| serena | Additional context handling | Auto-registered unless --no-serena |
Source: headroom/cli/wrap.py:100-150
Memory Integration
When --memory is enabled, the wrapper:
- Syncs Headroom's memory database with the CLI's conversation files
- Enables cross-session context persistence
- Registers memory-specific MCP tools
# Memory is automatically synced before proxy startup
if memory:
_memory_sync(proxy_holder, port)
Source: headroom/providers/claude/runtime.py:200-250
Cleanup and Unwrap
CLI wrappers handle graceful cleanup on SIGINT/SIGTERM:
- Restore original CLI configuration files
- Stop the proxy server
- Remove MCP server registrations
cleanup = _make_cleanup(proxy_holder, port)
signal.signal(signal.SIGINT, _ignore_child_sigint)
signal.signal(signal.SIGTERM, cleanup)
Source: headroom/cli/wrap.py:300-350
Known Issues and Limitations
MCP Endpoint Unavailability
The MCP docs currently imply that headroom proxy can be used as an HTTP MCP endpoint at /mcp, but the installed package returns 404 for this endpoint while the stdio MCP server works correctly.
See: Issue #460 - docs: clarify MCP setup when proxy /mcp is unavailable
CCR in Multi-Agent Threads
When using CCR (Context Compression Retrieval) with multi-agent setups, proactive expansion blocks can corrupt message attribution when injected into messages containing XML markup like <peer_turn from="AgentX">.
See: Issue #503 - CCR proactive expansion blocks corrupt message attribution in multi-agent threads
Related Community Features
Feature Requests
- #74: headroom wrap opencode — CLI wrapper for OpenCode
- #76: OpenCode plugin — headroom-opencode npm package
- #510: provider-agnostic proxy mode (Bedrock, OpenAI, Vertex)
See Also
- Headroom Proxy - The compression proxy architecture
- CCR (Context Compression Retrieval) - Reversible compression system
- MCP Integration - Model Context Protocol setup
- Memory System - Cross-session memory management
Source: https://github.com/chopratejas/headroom / Human Manual
Doramagic Pitfall Log
Source-linked risks stay visible on the manual page so the preview does not read like a recommendation.
May increase setup, validation, or first-run risk for the user.
May increase setup, validation, or first-run risk for the user.
May increase setup, validation, or first-run risk for the user.
May increase setup, validation, or first-run risk for the user.
Doramagic Pitfall Log
Found 15 structured pitfall item(s), including 2 high/blocking item(s). Top priority: Configuration risk - Configuration risk requires verification.
1. Configuration risk: Configuration risk requires verification
- Severity: high
- Finding: Project evidence flags a configuration risk. Review the linked source before relying on this workflow.
- User impact: May increase setup, validation, or first-run risk for the user.
- Recommended check: Reproduce the official install and quickstart path in an isolated environment.
- Evidence: community_evidence:github | cevd_04817419db9f40abb9c953ce30494c44 | https://github.com/chopratejas/headroom/issues/517
2. Security or permission risk: Security or permission risk requires verification
- Severity: high
- Finding: Project evidence flags a security or permission risk. Review the linked source before relying on this workflow.
- User impact: May increase setup, validation, or first-run risk for the user.
- Recommended check: Reproduce the official install and quickstart path in an isolated environment.
- Evidence: community_evidence:github | cevd_7be4ca48f77a496cadb0a00d943bb95a | https://github.com/chopratejas/headroom/issues/488
3. Installation risk: Installation risk requires verification
- Severity: medium
- Finding: Project evidence flags a installation risk. Review the linked source before relying on this workflow.
- User impact: May increase setup, validation, or first-run risk for the user.
- Recommended check: Reproduce the official install and quickstart path in an isolated environment.
- Evidence: community_evidence:github | cevd_46c97725a0304b659cfaa50b79312fcd | https://github.com/chopratejas/headroom/issues/525
4. Installation risk: Installation risk requires verification
- Severity: medium
- Finding: Project evidence flags a installation risk. Review the linked source before relying on this workflow.
- User impact: May increase setup, validation, or first-run risk for the user.
- Recommended check: Reproduce the official install and quickstart path in an isolated environment.
- Evidence: community_evidence:github | cevd_109db57bc201482abc7bf318a0ee4792 | https://github.com/chopratejas/headroom/issues/460
5. Configuration risk: Configuration risk requires verification
- Severity: medium
- Finding: Project evidence flags a configuration risk. Review the linked source before relying on this workflow.
- User impact: May increase setup, validation, or first-run risk for the user.
- Recommended check: Reproduce the official install and quickstart path in an isolated environment.
- Evidence: capability.host_targets | github_repo:1129940957 | https://github.com/chopratejas/headroom
6. Capability evidence risk: Capability evidence risk requires verification
- Severity: medium
- Finding: README/documentation is current enough for a first validation pass.
- User impact: May increase setup, validation, or first-run risk for the user.
- Recommended check: Reproduce the official install and quickstart path in an isolated environment.
- Evidence: capability.assumptions | github_repo:1129940957 | https://github.com/chopratejas/headroom
7. Maintenance risk: Maintenance risk requires verification
- Severity: medium
- Finding: Project evidence flags a maintenance risk. Review the linked source before relying on this workflow.
- User impact: May increase setup, validation, or first-run risk for the user.
- Recommended check: Reproduce the official install and quickstart path in an isolated environment.
- Evidence: evidence.maintainer_signals | github_repo:1129940957 | https://github.com/chopratejas/headroom
8. Security or permission risk: Security or permission risk requires verification
- Severity: medium
- Finding: no_demo
- User impact: May increase setup, validation, or first-run risk for the user.
- Recommended check: Reproduce the official install and quickstart path in an isolated environment.
- Evidence: downstream_validation.risk_items | github_repo:1129940957 | https://github.com/chopratejas/headroom
9. Security or permission risk: Security or permission risk requires verification
- Severity: medium
- Finding: no_demo
- User impact: May increase setup, validation, or first-run risk for the user.
- Recommended check: Reproduce the official install and quickstart path in an isolated environment.
- Evidence: risks.scoring_risks | github_repo:1129940957 | https://github.com/chopratejas/headroom
10. Security or permission risk: Security or permission risk requires verification
- Severity: medium
- Finding: Project evidence flags a security or permission risk. Review the linked source before relying on this workflow.
- User impact: May increase setup, validation, or first-run risk for the user.
- Recommended check: Reproduce the official install and quickstart path in an isolated environment.
- Evidence: community_evidence:github | cevd_a0e9cf430514488eb093dc09b617e6ca | https://github.com/chopratejas/headroom/issues/520
11. Security or permission risk: Security or permission risk requires verification
- Severity: medium
- Finding: Project evidence flags a security or permission risk. Review the linked source before relying on this workflow.
- User impact: May increase setup, validation, or first-run risk for the user.
- Recommended check: Reproduce the official install and quickstart path in an isolated environment.
- Evidence: community_evidence:github | cevd_9ccb556fe9cd431cba78c2ee3ebb27ea | https://github.com/chopratejas/headroom/issues/510
12. Security or permission risk: Security or permission risk requires verification
- Severity: medium
- Finding: Project evidence flags a security or permission risk. Review the linked source before relying on this workflow.
- User impact: May increase setup, validation, or first-run risk for the user.
- Recommended check: Reproduce the official install and quickstart path in an isolated environment.
- Evidence: community_evidence:github | cevd_5ab74a3c022a4924998aaa72cc334c04 | https://github.com/chopratejas/headroom/issues/503
Source: Doramagic discovery, validation, and Project Pack records
Community Discussion Evidence
These external discussion links are review inputs, not standalone proof that the project is production-ready.
Count of project-level external discussion links exposed on this manual page.
Open the linked issues or discussions before treating the pack as ready for your environment.
Community Discussion Evidence
Doramagic exposes project-level community discussion separately from official documentation. Review these links before using headroom with real data or production workflows.
- [[FEATURE] Support Copilot CLI subscription mode (no BYOK/API key)](https://github.com/chopratejas/headroom/issues/488) - github / github_issue
- [[FEATURE] Hermes agent support](https://github.com/chopratejas/headroom/issues/526) - github / github_issue
- [[BUG] Installation fails on macOS x86_64 (Intel) —
ort-syshas no preb](https://github.com/chopratejas/headroom/issues/525) - github / github_issue - Container image logs spurious "PyTorch was not found" warning on every s - github / github_issue
- [[BUG] Docs website gives 404 for all non-index pages](https://github.com/chopratejas/headroom/issues/517) - github / github_issue
- [[BUG] headroom init codex creates invalid config.toml by appending keys](https://github.com/chopratejas/headroom/issues/260) - github / github_issue
- Feature: provider-agnostic proxy mode (Bedrock, OpenAI, Vertex) - github / github_issue
- [[BUG] CCR proactive expansion blocks corrupt message attribution in mult](https://github.com/chopratejas/headroom/issues/503) - github / github_issue
- docs: clarify MCP setup when proxy /mcp is unavailable - github / github_issue
- Release v0.22.4 - github / github_release
- Release v0.22.2 - github / github_release
- Release v0.22.1 - github / github_release
Source: Project Pack community evidence and pitfall evidence