# https://github.com/Surya-Hariharan/Velune-CLI Project Manual

Generated at: 2026-06-27 15:29:44 UTC

## Table of Contents

- [Repository Overview & System Architecture](#page-1)
- [Council Orchestration, Memory Tiers & Retrieval](#page-2)
- [Providers, MCP & Plugin Extensibility](#page-3)
- [CLI Reference, Session Modes & Operations](#page-4)

<a id='page-1'></a>

## Repository Overview & System Architecture

### Related Pages

Related topics: [Council Orchestration, Memory Tiers & Retrieval](#page-2), [Providers, MCP & Plugin Extensibility](#page-3), [CLI Reference, Session Modes & Operations](#page-4)

<details>
<summary>Related Source Files</summary>

The following source files were used to generate this page:

- Source: [README.md](https://github.com/Surya-Hariharan/Velune-CLI/blob/main/README.md)
- Source: [pyproject.toml](https://github.com/Surya-Hariharan/Velune-CLI/blob/main/pyproject.toml)
- Source: [velune/__init__.py](https://github.com/Surya-Hariharan/Velune-CLI/blob/main/velune/__init__.py)
- Source: [velune/__main__.py](https://github.com/Surya-Hariharan/Velune-CLI/blob/main/velune/__main__.py)
- Source: [velune/main.py](https://github.com/Surya-Hariharan/Velune-CLI/blob/main/velune/main.py)
- Source: [velune/kernel/entrypoint.py](https://github.com/Surya-Hariharan/Velune-CLI/blob/main/velune/kernel/entrypoint.py)
</details>

---

<a id='page-2'></a>

## Council Orchestration, Memory Tiers & Retrieval

### Related Pages

Related topics: [Repository Overview & System Architecture](#page-1), [Providers, MCP & Plugin Extensibility](#page-3), [CLI Reference, Session Modes & Operations](#page-4)

<details>
<summary>Related Source Files</summary>

The following source files were used to generate this page:

- [velune/cognition/agents/planner.py](https://github.com/Surya-Hariharan/Velune-CLI/blob/main/velune/cognition/agents/planner.py)
- [velune/cognition/agents/reviewer.py](https://github.com/Surya-Hariharan/Velune-CLI/blob/main/velune/cognition/agents/reviewer.py)
- [velune/cognition/council/base.py](https://github.com/Surya-Hariharan/Velune-CLI/blob/main/velune/cognition/council/base.py)
- [velune/cognition/council/messages.py](https://github.com/Surya-Hariharan/Velune-CLI/blob/main/velune/cognition/council/messages.py)
- [velune/cognition/state.py](https://github.com/Surya-Hariharan/Velune-CLI/blob/main/velune/cognition/state.py)
- [velune/cognition/firewall.py](https://github.com/Surya-Hariharan/Velune-CLI/blob/main/velune/cognition/firewall.py)
- [velune/cli/commands/ask.py](https://github.com/Surya-Hariharan/Velune-CLI/blob/main/velune/cli/commands/ask.py)
- [velune/cli/commands/run.py](https://github.com/Surya-Hariharan/Velune-CLI/blob/main/velune/cli/commands/run.py)
- [velune/models/specializations.py](https://github.com/Surya-Hariharan/Velune-CLI/blob/main/velune/models/specializations.py)
- [velune/core/types/task.py](https://github.com/Surya-Hariharan/Velune-CLI/blob/main/velune/core/types/task.py)
- [velune/core/event_loop.py](https://github.com/Surya-Hariharan/Velune-CLI/blob/main/velune/core/event_loop.py)
- [velune/repository/schemas.py](https://github.com/Surya-Hariharan/Velune-CLI/blob/main/velune/repository/schemas.py)
- [velune/providers/task_classifier.py](https://github.com/Surya-Hariharan/Velune-CLI/blob/main/velune/providers/task_classifier.py)
- [README.md](https://github.com/Surya-Hariharan/Velune-CLI/blob/main/README.md)

</details>

# Council Orchestration, Memory Tiers & Retrieval

## Overview

Velune is a terminal-first AI developer CLI whose central abstraction is the **Reasoning Council** — a coordinated ensemble of role-specialized agents that together translate a natural-language prompt into a planned, executed, and reviewed change set. Around the Council, a five-tier memory system and a hybrid retrieval pipeline supply long-horizon context (repository structure, prior sessions, semantic recall) so prompts like *"fix the auth issue from yesterday"* can be grounded without the user re-explaining intent. Source: [README.md]()

This page documents how those subsystems fit together: the agents that compose the Council, the memory tiers that feed them, the retrieval pipeline that selects context, and the CLI command surface (`velune ask` and `velune run`) that drives them.

## Council Architecture & Agent Roles

The Council is composed of specialized agents that all derive from a shared `BaseCouncilAgent` and communicate via strongly-typed messages such as `PlannerMessage` and `ReviewerMessage`. Each agent is bound to a `ModelDescriptor` and a `ModelProvider` so orchestration can route different roles to different models based on capability profiles. Source: [velune/cognition/agents/planner.py](), [velune/cognition/agents/reviewer.py](), [velune/cognition/council/base.py](), [velune/cognition/council/messages.py]()

Two agents are visible in the current source tree:

| Agent | Role | Output | Constraints |
|-------|------|--------|-------------|
| **Planner** | Decomposes the task into a strict JSON DAG of `TaskStep` records | `TaskPlan` written into `CouncilState` | Wall-clock budget enforcement; raises `TimeoutError` if `planner_timeout_seconds` is exceeded |
| **Reviewer** | Audits proposed code changes for logical flaws, security, performance, and alignment | JSON `{passed, critical_issues, suggestions, confidence_rating}` | Decision logic applied to determine retry vs. escalate |

The Planner's system prompt mandates raw JSON (no Markdown fences) describing a `TaskPlan` with a `task_id` and a list of `steps`, each carrying `id`, `description`, `target_files`, `expected_outcome`, and `agent_role`. Source: [velune/cognition/agents/planner.py](), [velune/core/types/task.py]()

All agents read and write a shared `CouncilState` object that tracks wall-clock budget, plan progress, and review decisions, giving the orchestration loop a single source of truth. Source: [velune/cognition/state.py]()

### Orchestration Flow

```mermaid
flowchart LR
    U[User prompt] --> CLI["velune ask / velune run"]
    CLI --> FW[CognitiveFirewall]
    FW --> R["Retrieval: BM25 + Vector + Graph"]
    R --> P[Planner]
    P --> C["Coder (agent_role)"]
    C --> RV[Reviewer]
    RV -->|pass| OUT[Result surfaced to user]
    RV -->|fail| P
```

The firewall guards every prompt before it reaches the Council; the Planner consumes retrieved context to build a plan; the Coder (selected via `agent_role`) executes steps; the Reviewer either approves the diff or loops it back to the Planner. Source: [velune/cognition/firewall.py](), [velune/cli/commands/run.py]()

## Memory Tiers

Velune maintains five memory tiers across sessions so context survives a single REPL turn. Source: [README.md]()

| Tier | Scope | Storage | Purpose |
|------|-------|---------|---------|
| **Working** | Current conversation | In-process | TTL-evicted turn buffer for the active loop |
| **Episodic** | Session history | SQLite (`~/.velune/`) | "What did I do last run?" |
| **Semantic** | Past interactions | LanceDB / Qdrant (opt-in `[rag]` extra) | Vector recall over earlier work |
| **Graph** | Repository symbols | Local graph store | Symbol relationships and structural queries |
| **Lineage** | Decision history | Persisted | What was tried, why, and outcome |

The `[rag]` extra was introduced in the v0.9.2 release notes to keep `pip install velune-cli` lean by default; it pulls in `lancedb`, `pyarrow`, and `qdrant-client` only on demand. Source: [README.md]() (release notes for v0.9.2)

## Retrieval Pipeline

Before the Council deliberates, the prompt is enriched through a hybrid retrieval pass that fuses three signals:

- **BM25** lexical scoring over the indexed repository
- **Vector** similarity over the semantic memory store
- **Graph** traversal over the repository structure (symbol references, imports, call edges)

The result is packaged as `retrieved_context` and forwarded to the Planner, which embeds it into the system prompt that drives `generate_plan`. Source: [velune/cognition/agents/planner.py](), [velune/cognition/firewall.py]()

A `TaskClassifier` upstream of retrieval routes prompts by keyword families — coding (`refactor`, `rewrite`, `query`, `schema`), reasoning (`explain`, `analyze`, `compare`), summarization (`tldr`, `summary`, `digest`), and quick-question patterns (`what is`, `define`) — into a `TaskProfile` that influences which memory tiers are consulted and how aggressive the budget guard becomes. Source: [velune/providers/task_classifier.py]()

## Command Surface & Execution Modes

Two CLI entry points feed the Council:

- **`velune ask <prompt>`** — Interactive question path; routes natural language through the firewall and Council but does not execute sandboxed tools. Supports `--council-tier` override (`instant`, `standard`, `full`). Source: [velune/cli/commands/ask.py]()
- **`velune run <task>`** — Autonomous path; the Council plans, writes code, and executes it in a sandbox. Flags `--dry-run`, `--force`, and `--yes` control write/execute permissions and confirm prompts. Source: [velune/cli/commands/run.py]()

Both commands hand the async work off to `velune.core.event_loop.submit()` so the synchronous Typer boundary stays responsive. Source: [velune/core/event_loop.py]()

Session-mode toggles (`/normal`, `/optimus`, `/godly`) and cognition-depth toggles (`/cognition quick|standard|deep`) adjust the council tier and context cap (4k → 128k tokens) without restarting the REPL. The v0.9.3-beta.1 release notes further describe the move to "explicit, on-demand cognition" — the REPL no longer runs automatic repository indexing on launch; cognition is user-driven via slash commands. Source: [README.md]()

### Failure Modes

- **Budget exhaustion** — `PlannerAgent.generate_plan` raises `ValueError("Wall-clock budget exhausted before Planner could run")` if `state.is_budget_exhausted()` is true. Source: [velune/cognition/agents/planner.py]()
- **Planner timeout** — The call is wrapped in an `asyncio` timeout; raises `TimeoutError` if `planner_timeout_seconds` is exceeded. Source: [velune/cognition/agents/planner.py]()
- **Reviewer rejection** — A `passed=false` review loops the diff back to the Planner for revision rather than surfacing it to the user. Source: [velune/cognition/agents/reviewer.py]()
- **Sandbox writes blocked** — `velune run` without `--force` or `--yes` may halt on human-confirm thresholds before code is written. Source: [velune/cli/commands/run.py]()

## See Also

- [CLI Commands & REPL](wiki/velune-cli-repl.md) — Interactive prompt and tab-completion surface
- [Provider Adapters](wiki/velune-providers.md) — How `ModelProvider` and `ModelDescriptor` back each Council agent
- [Cognitive Firewall](wiki/velune-firewall.md) — Pre-Council prompt guard
- [Repository Cognition](wiki/velune-repository-cognition.md) — How the Graph memory tier is built

---

<a id='page-3'></a>

## Providers, MCP & Plugin Extensibility

### Related Pages

Related topics: [Repository Overview & System Architecture](#page-1), [Council Orchestration, Memory Tiers & Retrieval](#page-2), [CLI Reference, Session Modes & Operations](#page-4)

<details>
<summary>Related Source Files</summary>

The following source files were used to generate this page:

- [velune/providers/adapters/nvidia.py](https://github.com/Surya-Hariharan/Velune-CLI/blob/main/velune/providers/adapters/nvidia.py)
- [velune/providers/adapters/cohere.py](https://github.com/Surya-Hariharan/Velune-CLI/blob/main/velune/providers/adapters/cohere.py)
- [velune/providers/adapters/together.py](https://github.com/Surya-Hariharan/Velune-CLI/blob/main/velune/providers/adapters/together.py)
- [velune/providers/adapters/groq.py](https://github.com/Surya-Hariharan/Velune-CLI/blob/main/velune/providers/adapters/groq.py)
- [velune/providers/adapters/google.py](https://github.com/Surya-Hariharan/Velune-CLI/blob/main/velune/providers/adapters/google.py)
- [velune/providers/discovery/openai.py](https://github.com/Surya-Hariharan/Velune-CLI/blob/main/velune/providers/discovery/openai.py)
- [velune/providers/discovery/together.py](https://github.com/Surya-Hariharan/Velune-CLI/blob/main/velune/providers/discovery/together.py)
- [velune/providers/task_classifier.py](https://github.com/Surya-Hariharan/Velune-CLI/blob/main/velune/providers/task_classifier.py)
- [velune/cognition/agents/planner.py](https://github.com/Surya-Hariharan/Velune-CLI/blob/main/velune/cognition/agents/planner.py)
- [velune/cognition/agents/reviewer.py](https://github.com/Surya-Hariharan/Velune-CLI/blob/main/velune/cognition/agents/reviewer.py)
- [velune/cli/commands/mcp.py](https://github.com/Surya-Hariharan/Velune-CLI/blob/main/velune/cli/commands/mcp.py)
- [velune/cli/commands/trust.py](https://github.com/Surya-Hariharan/Velune-CLI/blob/main/velune/cli/commands/trust.py)
- [velune/cli/commands/ask.py](https://github.com/Surya-Hariharan/Velune-CLI/blob/main/velune/cli/commands/ask.py)
- [README.md](https://github.com/Surya-Hariharan/Velune-CLI/blob/main/README.md)
</details>

# Providers, MCP & Plugin Extensibility

Velune is a terminal-first AI developer CLI whose value is delivered through three orthogonal extension surfaces: **model providers**, the **Model Context Protocol (MCP)**, and a **plugin loader**. This page documents how each surface is structured in the source tree, how they are wired together, and what configuration knobs they expose.

## Provider Subsystem

### Adapters and Model Catalogs

Velune ships dedicated adapter modules under [velune/providers/adapters/](https://github.com/Surya-Hariharan/Velune-CLI/tree/main/velune/providers/adapters), one per upstream inference vendor. Each adapter is responsible for two things: enumerating a static catalog of `ModelDescriptor` records (capability levels, context length, cost, tags) and translating a generic `InferenceRequest` into the vendor-specific HTTP payload.

| Adapter | Notable models catalogued | Source |
| --- | --- | --- |
| NVIDIA NIM | Llama 3.1 70B, Mistral Large 2, Nemotron 70B | [velune/providers/adapters/nvidia.py](https://github.com/Surya-Hariharan/Velune-CLI/blob/main/velune/providers/adapters/nvidia.py) |
| Cohere | Command models with chat-history translation | [velune/providers/adapters/cohere.py](https://github.com/Surya-Hariharan/Velune-CLI/blob/main/velune/providers/adapters/cohere.py) |
| Together AI | Qwen Coder, DeepSeek R1, Mistral 7B | [velune/providers/adapters/together.py](https://github.com/Surya-Hariharan/Velune-CLI/blob/main/velune/providers/adapters/together.py) |
| Groq | Llama 3.x, Mixtral, Gemma 2 (free tier) | [velune/providers/adapters/groq.py](https://github.com/Surya-Hariharan/Velune-CLI/blob/main/velune/providers/adapters/groq.py) |
| Google | Gemini 2.0 Flash, Gemini 2.0 Flash Thinking | [velune/providers/adapters/google.py](https://github.com/Surya-Hariharan/Velune-CLI/blob/main/velune/providers/adapters/google.py) |

Capabilities are scored on a `CapabilityLevel` enum (BASIC → INTERMEDIATE → ADVANCED → EXPERT) across dimensions such as `coding`, `reasoning`, `planning`, `summarization`, `instruction_following`, `tool_use`, and `long_context`. The Groq catalog, for example, marks `gemma2-9b-it` with `free_tier=True`, `cost_per_1k_tokens=0.0`, and `speed_tier="fast"` so the router can prefer it when cost matters. Source: [velune/providers/adapters/groq.py](https://github.com/Surya-Hariharan/Velune-CLI/blob/main/velune/providers/adapters/groq.py).

### Discovery vs. Static Catalogs

Two parallel directories coexist under `velune/providers/`:

- `adapters/` — hand-curated, opinionated catalogs with capability ratings already attached.
- `discovery/` — runtime probe logic that infers capabilities from model identifiers. Source: [velune/providers/discovery/openai.py](https://github.com/Surya-Hariharan/Velune-CLI/blob/main/velune/providers/discovery/openai.py) shows pattern-matching against substrings such as `"gpt-4"`, `"gpt-4o"`, and `"gpt-3.5"` to assign capability profiles when the vendor's own metadata is missing.

Discovery is used when the user enables a new model at runtime that the adapter has not pre-registered, keeping the system forward-compatible with new vendor releases without code changes.

### Task Classification and Routing

The router does not pick models arbitrarily. Before inference, [velune/providers/task_classifier.py](https://github.com/Surya-Hariharan/Velune-CLI/blob/main/velune/providers/task_classifier.py) tags each prompt with keyword sets (`CODING_KEYWORDS`, `REASONING_KEYWORDS`, `SUMMARIZATION_KEYWORDS`, `QUICK_PATTERNS`) and returns a `TaskProfile` carrying task type, complexity, latency-sensitivity, and a long-context flag (`total_tokens > 8000`). The router then matches the profile against `ModelDescriptor.capabilities` to select the smallest, cheapest model that still meets the bar. Source: [velune/providers/task_classifier.py](https://github.com/Surya-Hariharan/Velune-CLI/blob/main/velune/providers/task_classifier.py).

### Council Agent Integration

Selected providers back the specialized council roles. The Planner ([velune/cognition/agents/planner.py](https://github.com/Surya-Hariharan/Velune-CLI/blob/main/velune/cognition/agents/planner.py)) and Reviewer ([velune/cognition/agents/reviewer.py](https://github.com/Surya-Hariharan/Velune-CLI/blob/main/velune/cognition/agents/reviewer.py)) agents each accept a `ModelDescriptor` and `ModelProvider` at construction time and emit strict JSON via a system prompt that forbids Markdown wrapping. Planner requests a DAG-style `TaskPlan`; Reviewer returns a `passed` boolean plus `critical_issues` and a `confidence_rating`. This uniform contract is what allows the same router to mix-and-match council members across providers.

## MCP Integration

### Server and Client Surface

The README positions Velune as both an MCP **server** (`velune mcp serve`) and an MCP **client**, allowing it to expose its tools to other agents and consume tools from third-party servers. The CLI entry points live in [velune/cli/commands/mcp.py](https://github.com/Surya-Hariharan/Velune-CLI/blob/main/velune/cli/commands/mcp.py). The `connect` subcommand accepts an SSE `server_url` and a `name`, reads the operator-configured `mcp.allowed_hosts` allowlist via `ConfigLoader`, and instantiates `VeluneMCPClient(server_url, name, allowed_hosts=...)` before printing the discovered tool list. Source: [velune/cli/commands/mcp.py](https://github.com/Surya-Hariharan/Velune-CLI/blob/main/velune/cli/commands/mcp.py).

### Trust and Project-Level Config

MCP servers can ship project-level configuration. To prevent silent execution of untrusted code, Velune requires the operator to explicitly opt in per directory. The `velune trust add`, `velune trust forget`, and `velune trust list` subcommands in [velune/cli/commands/trust.py](https://github.com/Surya-Hariharan/Velune-CLI/blob/main/velune/cli/commands/trust.py) maintain a persistent allowlist. Until a directory is trusted, the REPL prints a hint pointing the user to `velune trust add`. Source: [velune/cli/commands/trust.py](https://github.com/Surya-Hariharan/Velune-CLI/blob/main/velune/cli/commands/trust.py).

### Community Acknowledgement

The community audit captured in [Issue #9](https://github.com/Surya-Hariharan/Velune-CLI/issues/9) describes Velune as having "one of the most mature MCP implementations" among open-source AI coding assistants, noting full client/server coverage. Subsequent releases (0.9.x line) have continued hardening the MCP path: 0.9.1 closed Windows path-handling defects and 0.9.3-beta.1 re-architected startup so cognition is on-demand rather than automatic, which directly affects when MCP tools become reachable inside a session.

## Plugin Loader

The third extensibility surface is the plugin system. Per the README project layout, the `plugins/` directory contains the declarative plugin loader, `SKILL.md` injection logic, and hook wiring. Plugins are how third parties contribute tools and skills without forking the core CLI. Source: [README.md](https://github.com/Surya-Hariharan/Velune-CLI/blob/main/README.md).

The 0.9.0 release notes are explicit that the **plugin sandbox remains unimplemented or disabled** for standard CLI operations, so plugins currently run with the same privileges as the Velune process itself. Operators should treat plugin installation with the same caution they apply to `pip install`-style trust decisions.

## How the Three Surfaces Compose

```mermaid
flowchart LR
    A[User Prompt] --> B[task_classifier.py]
    B --> C{Provider Router}
    C --> D[Adapter: NVIDIA / Cohere / Together / Groq / Google]
    D --> E[InferenceResponse]
    C --> F[Council Agents: Planner / Reviewer]
    E --> G[Trust-aware REPL]
    F --> G
    G --> H[MCP Server outbound]
    G --> I[MCP Client inbound]
    G --> J[Plugin Loader / SKILL.md]
```

A user prompt is classified, routed to a provider-backed agent (optionally wrapped by the council), and the resulting action is mediated by the trust-aware REPL before it can reach MCP endpoints, inbound MCP tools, or locally installed plugins.

## Configuration and Failure Modes

- **Lean install**: Heavy provider-agnostic extras such as `[rag]`, `[parsing]`, `[telemetry]`, `[git]`, `[gguf]`, and `[docker]` are opt-in. Removing them never breaks chat or tool use; it only degrades the affected feature. Source: [README.md](https://github.com/Surya-Hariharan/Velune-CLI/blob/main/README.md).
- **HTTP failures**: Adapters surface upstream `httpx.HTTPStatusError` exceptions directly (see [velune/providers/adapters/nvidia.py](https://github.com/Surya-Hariharan/Velune-CLI/blob/main/velune/providers/adapters/nvidia.py)), so a 429 from a cloud vendor will bubble up unless a retry layer wraps it.
- **Untrusted MCP config**: Until `velune trust add <path>` is invoked, project-level MCP servers are ignored. This is the primary defense for multi-repo workspaces.
- **Missing capabilities**: When `discovery/` cannot infer a profile for a new model identifier, the router falls back to a conservative capability set, which may route it to a more expensive model than necessary.

## See Also

- [Architecture Overview](README.md)
- [CLI Commands](velune/cli/commands/)
- [Council Agents](velune/cognition/agents/)
- [Security Posture](docs/SECURITY.md)
- [Contributing Guide](docs/CONTRIBUTING.md)
- [Changelog](docs/CHANGELOG.md)

---

<a id='page-4'></a>

## CLI Reference, Session Modes & Operations

### Related Pages

Related topics: [Repository Overview & System Architecture](#page-1), [Council Orchestration, Memory Tiers & Retrieval](#page-2), [Providers, MCP & Plugin Extensibility](#page-3)

<details>
<summary>Related Source Files</summary>

The following source files were used to generate this page:

- [README.md](https://github.com/Surya-Hariharan/Velune-CLI/blob/main/README.md)
- [velune/cli/commands/ask.py](https://github.com/Surya-Hariharan/Velune-CLI/blob/main/velune/cli/commands/ask.py)
- [velune/cli/commands/run.py](https://github.com/Surya-Hariharan/Velune-CLI/blob/main/velune/cli/commands/run.py)
- [velune/cli/commands/session.py](https://github.com/Surya-Hariharan/Velune-CLI/blob/main/velune/cli/commands/session.py)
- [velune/cli/commands/workspace.py](https://github.com/Surya-Hariharan/Velune-CLI/blob/main/velune/cli/commands/workspace.py)
- [velune/cognition/agents/planner.py](https://github.com/Surya-Hariharan/Velune-CLI/blob/main/velune/cognition/agents/planner.py)
- [velune/cognition/agents/coder.py](https://github.com/Surya-Hariharan/Velune-CLI/blob/main/velune/cognition/agents/coder.py)
- [velune/cognition/agents/reviewer.py](https://github.com/Surya-Hariharan/Velune-CLI/blob/main/velune/cognition/agents/reviewer.py)
- [velune/providers/task_classifier.py](https://github.com/Surya-Hariharan/Velune-CLI/blob/main/velune/providers/task_classifier.py)
</details>

# CLI Reference, Session Modes & Operations

The Velune command-line interface is a terminal-first entry point for routing natural-language tasks and questions to the Reasoning Council. This page documents the user-facing commands, the three session modes, and the operational flow that connects the CLI to the planner, coder, and reviewer agents.

## Overview

Velune exposes its capabilities through a Typer-based CLI that dispatches to async command implementations via the centralized event loop helper `velune.core.event_loop.submit`. Every command resolves a `CLIContext` from `ctx.obj`, then submits a coroutine to the loop. Source: [velune/cli/commands/ask.py:43-50](), [velune/cli/commands/run.py:24-30]().

The CLI is partitioned into four primary sub-commands:

| Sub-command | Purpose | File |
|-------------|---------|------|
| `velune ask` | Interactive prompt routing (no code execution) | [velune/cli/commands/ask.py]() |
| `velune run` | Autonomous Council deliberation + sandbox execution | [velune/cli/commands/run.py]() |
| `velune session` | List, resume, delete, or export chat sessions | [velune/cli/commands/session.py]() |
| `velune workspace` | Initialize and explain the indexed workspace | [velune/cli/commands/workspace.py]() |

## Session Modes

Velune supports three session modes that balance speed, quality, and context budget. Source: [README.md]()

| Mode | Slash Command | Council Tier | Model | Context Cap |
|------|---------------|--------------|-------|-------------|
| Normal | `/normal` | auto | current | 16 k tokens |
| Optimus | `/optimus` | instant | smallest | 4 k tokens |
| Godly | `/godly` | full | largest | 128 k tokens |

Switching modes is done mid-session through slash commands; the prompt badge updates immediately to reflect the active mode. Optimus prioritizes low latency and small context, Godly activates the full multi-agent council and uses the largest available model, and Normal is the default auto-tiered middle ground.

## Core Commands

### `velune ask` — Interactive Prompt Routing

`ask` is the read-only entry point: it routes a natural-language question to the Council but never writes files or executes scripts. The command accepts an optional positional `prompt` and an optional `--council-tier` override (values: `instant`, `standard`, `full`). When the prompt is omitted, the command falls back to `typer.prompt` for interactive capture; in JSON mode it errors out instead. Source: [velune/cli/commands/ask.py:18-42]().

### `velune run` — Autonomous Council Execution

`run` triggers the full Reasoning Council: the planner decomposes the task, the coder produces diffs, and the reviewer audits them before any modification is applied. The command exposes three options:

- `--dry-run` / `-d` — deliberate but skip writes and execution
- `--force` / `-f` — bypass human confirmation thresholds
- `--yes` / `-y` — skip cost confirmation prompts for scripting

The `--yes` flag is propagated into `cli_context.yes` so downstream async helpers can read it. Source: [velune/cli/commands/run.py:13-36]().

### `velune session` — Session Lifecycle

The `session` sub-command manages persisted chat history. The `list` subcommand supports `--all`/`-a` (show sessions from every workspace) and `--limit`/`-n` to cap the number of rows; it can emit JSON when `cli_context.json_mode` is active. Source: [velune/cli/commands/session.py:18-52]().

### `velune workspace` — Repository Cognition

`workspace explain` runs the `TechnologyDetector` and `ArchitectureDetector` against the local index — without calling any AI provider — and renders a plain-English summary covering framework, routing, auth, state management, and entry points. It accepts `--path`/`-p` (default `Path.cwd()`) and `--json` for machine-readable output, and errors out with a non-zero exit code when no `.velune` directory exists. Source: [velune/cli/commands/workspace.py:78-105]().

## Operation Flow

A `velune run` invocation flows through the Council in three stages, each guarded by the `CouncilState` budget. The planner decomposes the task into a JSON `TaskPlan` and respects `state.remaining_budget_seconds()`. Source: [velune/cognition/agents/planner.py:55-75](). The coder then drafts diffs using the model-family edit-format preference order (`search_replace` → `whole_file` → `udiff`) within `state.budget.max_tokens_per_agent`. Source: [velune/cognition/agents/coder.py:120-145](). Finally, the reviewer returns a `ReviewDecision` (passed/critical_issues/suggestions/confidence_rating) that gates downstream writes. Source: [velune/cognition/agents/reviewer.py:30-45]().

The `TaskClassifier` upstream of these agents inspects the prompt for keyword clusters (coding, reasoning, summarization, quick-question patterns) and emits a `TaskProfile` that the router uses to pick a council tier and check whether long context is required. Source: [velune/providers/task_classifier.py:60-100]().

```mermaid
flowchart LR
    A[velune run/ask] --> B[CLIContext]
    B --> C{TaskClassifier}
    C -->|profile| D[Council Tier]
    D --> E[PlannerAgent]
    E --> F[TaskPlan]
    F --> G[CoderAgent]
    G --> H[ReviewerAgent]
    H --> I[ReviewDecision]
    I -->|passed| J[Sandbox / Write]
    I -->|failed| K[Refine / Replan]
```

## Common Failure Modes

- **Missing `.velune` directory**: `workspace explain` exits with code 1 and prints a danger-styled message instructing the user to run `velune workspace init` first. Source: [velune/cli/commands/workspace.py:95-101]().
- **Budget exhaustion**: The planner raises `ValueError("Wall-clock budget exhausted before Planner could run")` if `state.is_budget_exhausted()` is true. Source: [velune/cognition/agents/planner.py:60-65]().
- **Planner/Coder timeout**: Both agents wrap their `deliberate` calls in `asyncio.wait_for` with `state.budget.planner_timeout_seconds` and the remaining wall-clock budget; a timeout is re-raised as `TimeoutError` and logged. Source: [velune/cognition/agents/planner.py:75-80](), [velune/cognition/agents/coder.py:130-135]().
- **Empty prompt in JSON mode**: `velune ask` writes a JSON error object and exits with code 1 instead of falling back to interactive input. Source: [velune/cli/commands/ask.py:38-45]().
- **Context overflow**: When `prompt_tokens + context_tokens` exceeds 8 000, the classifier sets `requires_long_context = True`, which routes the request to a long-context-capable model. Source: [velune/providers/task_classifier.py:90-95]().

## See Also

- Reasoning Council and agent roles
- Memory tiers and persistent storage
- Provider adapters (OpenAI, Anthropic, Google, Groq, Cohere, NVIDIA, Together)
- MCP server and client integration

Per the latest community discussion, Velune is noted for having one of the most mature MCP implementations among open-source AI coding assistants, and the council-based orchestration model is the central abstraction users interact with through the commands documented above.

---

<!-- evidence_pipeline_checked: true -->
<!-- evidence_injected: true -->

---

## Pitfall Log

Project: Surya-Hariharan/Velune-CLI

Summary: Found 9 structured pitfall item(s), including 0 high/blocking item(s). Top priority: Configuration risk - Configuration risk requires verification.

## 1. Configuration risk - Configuration risk requires verification

- Severity: medium
- Evidence strength: source_linked
- Finding: Project evidence flags a configuration risk. Review the linked source before relying on this workflow.
- User impact: May increase setup, validation, or first-run risk for the user.
- Evidence: capability.host_targets | https://github.com/Surya-Hariharan/Velune-CLI

## 2. Capability evidence risk - Capability evidence risk requires verification

- Severity: medium
- Evidence strength: source_linked
- Finding: README/documentation is current enough for a first validation pass.
- User impact: May increase setup, validation, or first-run risk for the user.
- Evidence: capability.assumptions | https://github.com/Surya-Hariharan/Velune-CLI

## 3. Runtime risk - Runtime risk requires verification

- Severity: medium
- Evidence strength: source_linked
- Finding: Project evidence flags a runtime risk. Review the linked source before relying on this workflow.
- User impact: May increase setup, validation, or first-run risk for the user.
- Evidence: packet_text.keyword_scan | https://github.com/Surya-Hariharan/Velune-CLI

## 4. Maintenance risk - Maintenance risk requires verification

- Severity: medium
- Evidence strength: source_linked
- Finding: Project evidence flags a maintenance risk. Review the linked source before relying on this workflow.
- User impact: May increase setup, validation, or first-run risk for the user.
- Evidence: evidence.maintainer_signals | https://github.com/Surya-Hariharan/Velune-CLI

## 5. Security or permission risk - Security or permission risk requires verification

- Severity: medium
- Evidence strength: source_linked
- Finding: no_demo
- User impact: May increase setup, validation, or first-run risk for the user.
- Evidence: downstream_validation.risk_items | https://github.com/Surya-Hariharan/Velune-CLI

## 6. Security or permission risk - Security or permission risk requires verification

- Severity: medium
- Evidence strength: source_linked
- Finding: no_demo
- User impact: May increase setup, validation, or first-run risk for the user.
- Evidence: risks.scoring_risks | https://github.com/Surya-Hariharan/Velune-CLI

## 7. Security or permission risk - Security or permission risk requires verification

- Severity: medium
- Evidence strength: source_linked
- Finding: Project evidence flags a security or permission risk. Review the linked source before relying on this workflow.
- User impact: May increase setup, validation, or first-run risk for the user.
- Evidence: community_evidence:github | https://github.com/Surya-Hariharan/Velune-CLI/issues/9

## 8. Maintenance risk - Maintenance risk requires verification

- Severity: low
- Evidence strength: source_linked
- Finding: issue_or_pr_quality=unknown。
- User impact: May increase setup, validation, or first-run risk for the user.
- Evidence: evidence.maintainer_signals | https://github.com/Surya-Hariharan/Velune-CLI

## 9. Maintenance risk - Maintenance risk requires verification

- Severity: low
- Evidence strength: source_linked
- Finding: release_recency=unknown。
- User impact: May increase setup, validation, or first-run risk for the user.
- Evidence: evidence.maintainer_signals | https://github.com/Surya-Hariharan/Velune-CLI

<!-- canonical_name: Surya-Hariharan/Velune-CLI; human_manual_source: deepwiki_human_wiki -->
