gemini-cli Manual - Doramagic.ai

Doramagic Project Pack · Human Manual

gemini-cli

An open-source AI agent that brings the power of Gemini directly into your terminal.

Overview & Getting Started

Related topics: System Architecture & Agent Loop

Section Related Pages

Continue reading this section for the full explanation and source context.

Related topics: System Architecture & Agent Loop

Overview & Getting Started

Gemini CLI is an open-source AI agent that brings the power of Google's Gemini models directly into the developer's terminal. It is designed for coding workflows, codebase exploration, automation, and integration with external systems through a flexible tool and extension model. This page provides a high-level orientation for new users and contributors, covering purpose, installation paths, authentication, and the major subsystems that make up the project.

What Gemini CLI Is

At its core, Gemini CLI is a terminal-based conversational agent that can read, edit, and run code in your local environment. The README.md describes it as an AI agent with features such as Google Search grounding for real-time information, conversation checkpointing to save and resume complex sessions, and custom context files (GEMINI.md) to tailor behavior for your projects.

The project is organized as a multi-package monorepo. The packages/core directory holds the shared engine, with a packages/core/package.json that lists dependencies on Zod, simple-git, tree-sitter-bash, web-tree-sitter, undici, and other building blocks used for shell parsing, file I/O, git operations, and structured tool input validation. Built on top of the core is the packages/cli package, which provides the interactive command-line interface, and a packages/a2a-server package that exposes the agent over an HTTP/A2A transport (see packages/a2a-server/src/http/app.ts). A separate packages/sdk publishes a small public surface for embedding the agent programmatically, including a typed tool() helper documented in packages/sdk/src/tool.ts.

flowchart LR
  User[Developer / CI] --> CLI["packages/cli\nInteractive CLI"]
  User --> A2A["packages/a2a-server\nHTTP / A2A transport"]
  User --> SDK["packages/sdk\nProgrammatic API"]
  CLI --> Core["packages/core\nAgent engine, tools, scheduler"]
  A2A --> Core
  SDK --> Core
  Core --> Models[(Gemini API\n& Code Assist)]
  Core --> Tools[File System\nShell\nWeb Fetch\nMCP Servers]

Installation and First Run

The README.md documents a gemini command as the entry point. Two quick examples illustrate the intended workflows:

```bash cd new-project/ gemini

Start a new project interactively:

Write me a Discord bot that answers questions using a FAQ.md file I will provide

```

```bash git clone https://github.com/google-gemini/gemini-cli cd gemini-cli gemini

Analyze an existing repository:

Give me a summary of all of the changes that went in yesterday

```

For non-interactive use, the README also shows headless invocation with structured output:

gemini -p "Run tests and deploy" --output-format stream-json

Authentication Options

The CLI supports multiple authentication paths, documented in README.md:

Method	Best for	Quota / Notes
Sign in with Google (OAuth)	Individual developers, Gemini Code Assist license holders	60 requests/min, 1,000 requests/day; Gemini 3 models with 1M token context; no API key required
API key (e.g., Gemini API key, Vertex AI)	Programmatic / CI usage	Quotas depend on the underlying provider

Authentication logic is wired in packages/cli/src/config/auth.ts (referenced from the documentation) and the Code Assist onboarding flow in packages/core/src/code_assist/setup.ts, which together handle OAuth flows, key storage, and project selection.

Core Features and Built-in Tools

Gemini CLI ships with a set of first-party tools that the model can invoke:

File system operations — read, write, edit, and search files. The community has reported edge cases here, for example issue #2553 ("WriteFile will generate '�' character after saving to a text file"), which highlights that writing binary or non-UTF-8 content can corrupt output if the right encoding flags are not respected.
Shell commands — sandboxed execution with optional user confirmation.
Web fetch and search — the agent can pull live information into the conversation, including via Google Search grounding.
Checkpointing — save and resume complex multi-turn sessions, useful for long refactors.
Custom context files (GEMINI.md) — per-project instructions injected into the system prompt.

Beyond the built-ins, the CLI integrates with the Model Context Protocol (MCP). The example in packages/cli/src/commands/extensions/examples/mcp-server/README.md shows a minimal server exposing a fetch_posts tool and a poem-writer prompt, consumed from the CLI as @github, @slack, or @database (see the README's MCP section). A second example, packages/cli/src/commands/extensions/examples/policies/README.md, demonstrates how extensions can contribute TOML security policies to the policy engine, including a rule that requires confirmation for rm -rf and another that denies grep on sensitive files. That README explicitly notes: "For security, Gemini CLI ignores any allow decisions or yolo mode configurations contributed by extensions."

Advanced Surfaces

Several advanced features extend Gemini CLI beyond the standard terminal loop:

Headless mode — non-interactive invocation suitable for CI and scripting, as shown by --output-format stream-json in README.md.
IDE integration — a VS Code companion that surfaces the same agent inside the editor.
Sandboxing and trusted folders — execution is gated by a per-folder trust model. This is directly relevant to community concerns about destructive actions; issue #26856 ("Your idiotic AI disobeyed me completely lied and has now cost me 300 dollars worth of work…") underscores why sandboxing, confirmation prompts, and trusted-folder controls matter for real-world use.
Telemetry and monitoring — opt-in usage tracking.
Agent Client Protocol (ACP) — described in packages/cli/src/acp/README.md, the CLI implements modules such as acpSession.ts, acpFileSystemService.ts, and acpCommandHandler.ts so that editor clients can drive Gemini CLI over the ACP transport, including @path file resolution and slash command interception.
A2A server — packages/a2a-server/src/agent/task.test.ts and packages/a2a-server/src/agent/task-event-driven.test.ts show that the A2A package exposes a Task abstraction that wires a Scheduler to an ExecutionEventBus, with processRestorableToolCalls used to recover tool state.

The packages/sdk/src/tool.ts file additionally defines a small public API: a ToolDefinition (name, description, Zod input schema, optional sendErrorsToModel), a Tool type that adds an executable action, a ModelVisibleError for surfacing failures back to the model, and a tool() factory. This is the recommended entry point for embedding Gemini CLI from other Node applications and for writing custom tools that compose with the built-in toolset.

Roadmap and Community Direction

The community has flagged several recurring themes that shape the near-term direction of the project. Issue #4191 ("Public Roadmap") tracks the living roadmap, which is a natural place to look for planned work. Two highly engaged feature requests stand out:

Plan Mode (#4666) — a mode that blocks write tools so the model can only reason and propose a plan before any mutations occur, directly addressing the destructive-action concerns behind #26856.
Skills (#11506) — a request to add reusable, packaged "skills" inspired by Anthropic's agent skills work, intended to make extension composition more ergonomic than today's MCP + custom-commands mix.

Together, these give a reasonable mental model: the project is iterating on safer execution (Plan Mode, sandboxing, policy engine), richer extension surfaces (Skills, MCP, policies), and broader transport options (CLI, A2A, ACP, SDK).

System Architecture & Agent Loop

Related topics: Overview & Getting Started, Built-in Tools & MCP Integration

Section Related Pages

Continue reading this section for the full explanation and source context.

Section Community-driven limitations reflected in the architecture

Continue reading this section for the full explanation and source context.

System Architecture & Agent Loop

1. Overview and Scope

Gemini CLI is an open-source (Apache 2.0) command-line agent that brings Gemini models into a developer's terminal with a 1M-token context window, OAuth or API-key authentication, conversation checkpointing, custom GEMINI.md context files, and extensibility via the Model Context Protocol (MCP) and project-level extensions. Source: README.md.

The System Architecture & Agent Loop page focuses on the runtime structure: how user input becomes a model call, how tool invocations are scheduled, how approvals are gated, and how external surfaces (HTTP/A2A, ACP/IDE, MCP) connect to the same core. The "agent loop" in this codebase is a turn-driven, message-bus-based scheduler that runs model turns, dispatches tool calls, awaits confirmation when required, and streams results back to the model until the turn terminates.

The codebase is organized as a workspace of packages with clean separation of concerns:

Package	Role
`packages/core`	Engine: model client, tool framework, scheduler, policy engine, message bus, shell/file/Git utilities. Declares the runtime stack — `zod`, `simple-git`, `tree-sitter-bash`, `undici`, `systeminformation`, `web-tree-sitter`. Source: packages/core/package.json.
`packages/cli`	Interactive REPL, slash commands, ACP (Agent Client Protocol) integration, extension loader, policy examples. Source: packages/cli/src/acp/README.md.
`packages/a2a-server`	Headless HTTP/JSON-RPC server exposing the Coder Agent as an A2A-compliant task endpoint. Source: packages/a2a-server/src/http/app.ts.
`packages/sdk`	Programmatic TypeScript SDK: `tool()`, `skillDir()`, `GeminiCliAgent`, `SystemInstructions`, `SessionContext`. Source: packages/sdk/src/tool.ts, packages/sdk/src/types.ts.
`packages/test-utils`	`TestMcpServerBuilder` and helpers for deterministic MCP tool tests. Source: packages/test-utils/src/test-mcp-server.ts.

2. High-Level Architecture

The agent is built around three runtime abstractions: a Config (session-wide settings), a MessageBus (typed pub/sub for tool-call lifecycle events), and a Scheduler (the per-task loop that actually runs turns). Together they form a turn-driven pipeline that is identical regardless of whether the user is in the TUI, embedded in an IDE through ACP, or hit remotely through the A2A HTTP server. Source: packages/a2a-server/src/agent/task-event-driven.test.ts — the test instantiates Task with isEventDrivenSchedulerEnabled: () => true and asserts that task.scheduler is an instance of Scheduler, demonstrating that the scheduler is pluggable.

flowchart LR
    User[User / IDE / HTTP Client] -->|prompt| Frontend[CLI TUI / ACP / A2A HTTP]
    Frontend -->|execute| Task[Task + Config + MessageBus]
    Task --> Scheduler
    Scheduler -->|turn| GeminiChat[GeminiChat / ContentGenerator]
    GeminiChat -->|tool_call| Scheduler
    Scheduler -->|TOOL_CALLS_UPDATE| MessageBus
    MessageBus -->|events| Frontend
    Scheduler -->|invoke| Tools[Built-in + MCP + Extension Tools]
    Tools -->|ToolResult| Scheduler
    Scheduler -->|next turn| GeminiChat
    Policies[Policy Engine / ApprovalMode] -.confirms/denies.-> Scheduler
    Skills[Skills via skillDir()] -.loaded into.-> Task

Key roles of the components:

Task is the per-conversation orchestrator. It holds the Config, the MessageBus, and the active Scheduler. Source: packages/a2a-server/src/agent/task.test.ts — exercises scheduleToolCalls and verifies that the scheduler never mutates the input requests array.
Scheduler runs the turn loop and subscribes to TOOL_CALLS_UPDATE events, mapping internal status changes (e.g., ToolConfirmationOutcome) into A2A/ACP-visible state. Source: packages/a2a-server/src/agent/task-event-driven.test.ts.
Tools are declarative, Zod-typed units. The SDK helper tool(definition, action) returns a Tool<T> whose inputSchema is a Zod schema and whose action(params, context?) returns a serializable result sent back to the model. Source: packages/sdk/src/tool.ts.
Policies gate dangerous actions (e.g., rm -rf, searching .env) and validate file paths. The example extension at packages/cli/src/commands/extensions/examples/policies shows policies/*.toml files that *strengthen* security; for safety, Gemini CLI ignores any allow or yolo decisions contributed by extensions, so they can never bypass user confirmation. Source: packages/cli/src/commands/extensions/examples/policies/README.md.

3. The Agent Loop and Turn Execution

A single agent turn follows this contract:

Prompt ingestion. A frontend (CLI, ACP client, or A2A HTTP request) creates a Task for the session. The A2A server's customUserBuilder resolves the caller from Authorization headers, supporting Bearer (Bearer valid-token) and Basic (admin:password) auth, otherwise returning an UnauthenticatedUser. Source: packages/a2a-server/src/http/app.ts.
System instructions. The agent's behavior is steered by SystemInstructions, which may be a static string or a (context: SessionContext) => string | Promise<string> function. SDK consumers are explicitly warned to sanitize any data pulled from the session context (strip newlines and ], escape < and >) to avoid prompt injection. Source: packages/sdk/src/types.ts.
Model call. The scheduler issues a turn via GeminiChat/ContentGenerator. The model returns either text or one or more tool_call requests.
Tool scheduling. The scheduler enqueues tool calls. The unit test scheduleToolCalls should not modify the input requests array enforces that the scheduler treats its input as read-only, which is critical for checkpoint/restore semantics. Source: packages/a2a-server/src/agent/task.test.ts.
Confirmation gate. For sensitive tools, the scheduler consults the policy engine and ApprovalMode. Outcomes are expressed via ToolConfirmationOutcome. For A2A, the task is event-driven, so any status change is published on the ExecutionEventBus for the client. Source: packages/a2a-server/src/agent/task-event-driven.test.ts.
Tool execution and streaming. Built-in tools, MCP tools, and extension tools all share the Tool<T> shape. Their action receives validated params (inferred from the Zod schema) and an optional SessionContext, and returns a value that the core serializes back to the model. If sendErrorsToModel is true, thrown errors are surfaced to the model so it can self-correct; the SDK also exposes a ModelVisibleError for the same purpose. Source: packages/sdk/src/tool.ts.
Loop or terminate. Tool results are appended to the conversation and the scheduler issues the next model turn. The loop ends when the model returns a final assistant message with no further tool calls.

4. Extension Points, Safety, and Community Concerns

Gemini CLI is designed to be extended without forking the core:

MCP servers. A custom MCP server can be linked as an extension. The reference example at packages/cli/src/commands/extensions/examples/mcp-server exposes a fetch_posts tool and a poem-writer prompt via @modelcontextprotocol/sdk, and is registered with a gemini-extension.json manifest. Source: packages/cli/src/commands/extensions/examples/mcp-server/README.md. The TestMcpServerBuilder in packages/test-utils lets unit tests assemble a test server with addTool(name, description, response, inputSchema?), where response can be a string or a structured { content: [{type:'text', text}] } object. Source: packages/test-utils/src/test-mcp-server.ts.
Skills. A newer SDK primitive, skillDir(path): { type: 'dir', path }, references a directory that contributes prompts, tools, and behaviors to the agent. Source: packages/sdk/src/skills.ts. This is the in-tree answer to community request #11506 ("Add Skill to Gemini CLI like Claude Code").
Policies. TOML-based rules and safety checkers load from extension policies/ directories. The example adds an rm -rf confirmation rule, a grep-for-secrets denial, and a path validator for write operations. The runtime deliberately ignores extension-supplied allow and yolo decisions, so extensions can only harden, never weaken, the safety floor. Source: packages/cli/src/commands/extensions/examples/policies/README.md.
Headless and IDE surfaces. A2A exposes the same Task over HTTP/JSON-RPC with a Coder Agent card (skills: code_generation, with examples like *"Write a python function to calculate fibonacci numbers."*), and ACP integrates Gemini CLI into IDEs by mapping internal tool kinds, file services, and slash commands (/memory, /init) into ACP primitives. Sources: packages/a2a-server/src/http/app.ts, packages/cli/src/acp/README.md.

Community-driven limitations reflected in the architecture

Plan Mode (#4666). The community has repeatedly requested a Claude Code-style "plan only" mode that blocks write tools. The current agent loop has no built-in plan/read-only mode, but the same primitives required to add one already exist: ApprovalMode, ToolConfirmationOutcome, and the policy engine's deny decisions. A plan mode is therefore implementable as a higher-priority ApprovalMode plus a tool-class allowlist. Source: packages/a2a-server/src/agent/task-event-driven.test.ts, packages/cli/src/commands/extensions/examples/policies/README.md.
Destructive automation (#26856). Reports of the agent deleting user data in unattended runs underscore why ToolConfirmationOutcome and the policy engine sit *inside* the scheduler rather than at the frontend. Confirmations are required for any tool that the policy engine flags, regardless of whether the call originated in the TUI, ACP, or A2A HTTP. Source: packages/a2a-server/src/agent/task-event-driven.test.ts.
Encoding regressions (#2553). WriteFile corruption in non-UTF-8 contexts is mitigated upstream by the core's careful tool result serialization, and downstream by the model's own visibility into errors when tools throw ModelVisibleError and sendErrorsToModel: true. Source: packages/sdk/src/tool.ts.
Roadmap tracking (#4191). The repository maintains a public roadmap project; the latest release pipeline (PR size labeler, batch workflows, fork-PR write access fix) is published automatically by gemini-cli-robot. Source: README.md.

A complementary maintenance surface, tools/gemini-cli-bot, runs a three-phase loop — deterministic metric collection, root-cause reasoning, and a critique/publish step — that proposes and ships improvements to the same agent loop. Source: tools/gemini-cli-bot/README.md.

Built-in Tools & MCP Integration

Related topics: Plan Mode & Skills System

Section Related Pages

Continue reading this section for the full explanation and source context.

Related topics: Plan Mode & Skills System

Built-in Tools & MCP Integration

Overview

Gemini CLI exposes its capabilities to the underlying model through a single, well-typed tool system. Every action the agent can take — reading a file, running a shell command, fetching a URL, or invoking an external integration — is modelled as a Tool and registered with a central registry that the model can call on the user's behalf. The same abstraction is reused for first-party built-in tools and for third-party Model Context Protocol (MCP) servers, giving extension authors and integrators a uniform surface for adding new capabilities.

The repository ships:

A declarative tool SDK for authoring tools with Zod-validated inputs (packages/sdk/src/tool.ts).
A set of built-in tools for file system, shell, and web operations, documented in the project README (README.md).
An MCP server extension mechanism that loads tools dynamically from local or remote MCP servers (packages/cli/src/commands/extensions/examples/mcp-server/README.md).
A policy engine for adding safety rules and checkers that gate every tool call (packages/cli/src/commands/extensions/examples/policies/README.md).
A test MCP server builder for integration testing (packages/test-utils/src/test-mcp-server.ts).

Tool Definition Framework

The SDK provides a small but complete contract for defining a tool. A tool is the combination of a ToolDefinition (a name, a human-readable description, and a Zod input schema) and an action function that performs the work:

// Source: packages/sdk/src/tool.ts (ToolDefinition, Tool, tool factory)
import { z, tool } from '@google/gemini-cli-sdk';

const getWeather = tool(
  {
    name: 'get_weather',
    description: 'Get the current weather for a location',
    inputSchema: z.object({ city: z.string() }),
  },
  async (params) => `Weather in ${params.city}: Sunny, 25°C`,
);

Three additional contract details matter when authoring tools:

sendErrorsToModel — when true, exceptions thrown from action are surfaced to the model as part of the conversation so it can retry or self-correct; the default is false (packages/sdk/src/tool.ts).
ModelVisibleError — a special error class that, when thrown, is *always* reported back to the model, regardless of the sendErrorsToModel flag (packages/sdk/src/tool.ts).
SessionContext — an optional second argument passed to action, giving the tool access to the active session's filesystem, shell, and shared state (packages/sdk/src/tool.ts).

This contract is what makes it possible to mix built-in tools and dynamically loaded MCP tools in a single registry without special-casing either.

Built-in Tools

The CLI ships with several built-in tools that the model can invoke without any user setup. They are surfaced in the documentation as three top-level categories (README.md):

Category	Examples	Notes
File System Operations	Read, write, edit files in the workspace	Subject to trusted-folder and policy rules
Shell Commands	Run shell commands, capture output	Supports sandboxing and confirmation prompts
Web Fetch & Search	Fetch URLs, perform grounded search	Used for real-time information lookup

A separate code_generation skill is exposed by the A2A server's coderAgentCard for the Coder Agent, which "Generates code snippets or complete files based on user requests, streaming the results" (packages/a2a-server/src/http/app.ts). The card also declares the supported inputModes (text) and outputModes (text), which lets A2A clients understand the agent's interface before invoking it.

MCP Server Integration

MCP is the official extension protocol for adding new tools and prompts. The CLI ships a runnable example under packages/cli/src/commands/extensions/examples/mcp-server/ that demonstrates a minimal server built on @modelcontextprotocol/sdk and exposes a fetch_posts tool and a poem-writer prompt (packages/cli/src/commands/extensions/examples/mcp-server/README.md).

The example's manifest is gemini-extension.json, and its only runtime dependency besides the MCP SDK is zod for input validation (packages/cli/src/commands/extensions/examples/mcp-server/package.json). Once installed, tools and prompts declared by the server appear in the CLI as if they were built-in tools.

End users configure MCP servers in ~/.gemini/settings.json and invoke their tools with the @server shorthand:

> @github List my open pull requests
> @slack Send a summary of today's commits to #dev channel
> @database Run a query to find inactive users

Source: README.md. For tests, packages/test-utils/src/test-mcp-server.ts ships a TestMcpServerBuilder that lets test code register mocked tools (addTool(name, description, response, inputSchema)) and assemble a complete TestMcpConfig without standing up a real server.

The Agent Client Protocol (ACP) layer is the analogous integration path for IDE-style clients. The ACP dispatcher, session manager, and filesystem service all share the same SessionContext and tool registry as the headless CLI (packages/cli/src/acp/README.md), which means a tool that works in the terminal works in an ACP host unchanged.

Policy Engine & Safety

Every tool call — built-in or MCP — is funnelled through a policy engine that can require confirmation, deny a call, or run a safety checker against the proposed arguments. The policies example extension contributes three concrete rules (packages/cli/src/commands/extensions/examples/policies/README.md):

A rule that requires user confirmation for rm -rf commands.
A rule that denies grep searches for sensitive files (e.g. .env).
A allowed-path safety checker that validates the destination path of every write operation.

Critically, "Gemini CLI ignores any allow decisions or yolo mode configurations contributed by extensions. This ensures that extensions can strengthen security but cannot bypass user confirmation." Source: packages/cli/src/commands/extensions/examples/policies/README.md.

Community Considerations

Several long-running community discussions are directly tied to this tool surface and are worth keeping in mind:

Plan Mode (#4666) — users want a mode that blocks write tools so the agent can plan complex changes safely before executing them. This is essentially a policy-engine feature that gates a class of tools.
Skills system (#11506) — request to add a Claude-Code-style "skills" abstraction. Because skills, MCP prompts, and built-in tools all share the same registry, a skills system would be a new registration path rather than a parallel infrastructure.
Destructive write actions (#26856) — the project's allowed-path safety checker and rm -rf confirmation policy are the user-facing mitigations for this concern; the policy engine's stance that extensions cannot grant allow decisions is the architectural guarantee behind them.

Plan Mode & Skills System

Related topics: Built-in Tools & MCP Integration, Security: Sandbox, Policy Engine & Hooks

Section Related Pages

Continue reading this section for the full explanation and source context.

Section SDK Primitives

Continue reading this section for the full explanation and source context.

Section Integration with Agent Options

Continue reading this section for the full explanation and source context.

Section Approval Mode Foundation

Continue reading this section for the full explanation and source context.

Plan Mode & Skills System

Gemini CLI exposes two closely related extensibility surfaces that the community has repeatedly highlighted as priorities: a Skills system for packaging prompts, tools, and behaviors, and a Plan Mode pattern for safely scoping agent writes. The source tree today contains the SDK-level primitives for skills and an existing ApprovalMode infrastructure that a Plan Mode builds on, while the user-facing Plan Mode workflow (e.g. a Shift+Tab cycle) is still primarily a community feature request tracked in issue #4666.

Skills System

SDK Primitives

The @google/gemini-cli-sdk package defines a minimal but typed surface for declaring skills. A skill is a directory reference that an agent can load to gain additional prompts, tools, and behaviors.

// packages/sdk/src/skills.ts
export type SkillReference = { type: 'dir'; path: string };

export function skillDir(path: string): SkillReference {
  return { type: 'dir', path };
}

SkillReference is a discriminated union with a type: 'dir' tag and a path field, and skillDir() is a small factory that returns a properly typed reference. The readme is explicit that the same approach is used in the Claude Code-style "Skills" feature the community is requesting in issue #11506.

Integration with Agent Options

SkillReference values are designed to be passed into a GeminiCliAgent configuration alongside instructions, tools, and other options. The agent options interface in packages/sdk/src/types.ts co-locates these settings:

// packages/sdk/src/types.ts
export interface GeminiCliAgentOptions {
  instructions: SystemInstructions;
  // ... tools, skill references, etc.
}

This positions the Skills system as a first-class extension point rather than a hard-coded list, so extension authors and downstream applications can compose agents with bespoke capability bundles.

Plan Mode

Approval Mode Foundation

The repository already ships an ApprovalMode type and a policy engine that drive the CLI's confirmation prompts. The a2a-server tests import ApprovalMode and ToolConfirmationOutcome directly, confirming that the core loop is: model proposes a tool call → user (or policy) approves or denies → the call is executed or rejected.

// packages/a2a-server/src/agent/task-event-driven.test.ts
import { ApprovalMode, ToolConfirmationOutcome, Scheduler } from '@google/gemini-cli-core';

The policy engine is documented in the policies extension example, which warns that extensions can strengthen security but cannot override yolo mode or allow decisions:

// packages/cli/src/commands/extensions/examples/policies/README.md
For security, Gemini CLI ignores any allow decisions or yolo mode
configurations contributed by extensions.

This existing pipeline is the natural substrate for a Claude Code-style Plan Mode (issue #4666), where a separate "plan" approval tier would block write tools until the user reviews and confirms the proposed diff.

Tasks, Schedulers, and Write Tool Gating

Long-running agent work is mediated by the Task class, which wraps a Scheduler and subscribes to MessageBusType.TOOL_CALLS_UPDATE to map tool status changes onto a2a events:

// packages/a2a-server/src/agent/task-event-driven.test.ts
it('should subscribe to TOOL_CALLS_UPDATE and map status changes', ...);

When the event-driven scheduler is enabled (isEventDrivenSchedulerEnabled: () => true), Task instantiates a real Scheduler rather than running tools sequentially. A Plan Mode implementation can hook into the same scheduler to short-circuit destructive tool calls (write_file, replace, run_shell_command) and force a "review plan" step before resuming the loop.

Tools as the Common Language

Plan Mode and Skills both ultimately operate on Tool definitions. The SDK exposes a tool() factory that pairs a Zod-validated schema with an async action:

// packages/sdk/src/tool.ts
export function tool<T extends z.ZodTypeAny>(
  definition: ToolDefinition<T>,
  action: (params: z.infer<T>, context?: SessionContext) => Promise<unknown>,
): Tool<T> { ... }

A ModelVisibleError is provided so a tool can surface actionable failures back to the model, which is critical in plan-then-execute flows where the model must be able to self-correct after a user rejects a plan.

End-to-End Relationship

flowchart LR
  User[User] -->|prompt| Agent[GeminiCliAgent]
  Agent -->|reads| Skills[SkillReference<br/>skillDir path]
  Agent -->|invokes| Tools[Tool definitions<br/>via tool factory]
  Tools -->|confirmation| Approval{ApprovalMode<br/>+ Policy Engine}
  Approval -->|plan only| PlanReview[Plan Mode<br/>#4666]
  PlanReview -->|approved| Execute[Scheduler / Task]
  Approval -->|allow| Execute
  Approval -->|deny| Reject[ToolConfirmationOutcome denied]
  Execute -->|result| Agent

Community Context and Caveats

Plan Mode is the most-discussed open feature (issue #4666) and the SDK does not yet ship a public enter_plan_mode tool in the files reviewed; implementers should expect to layer it on top of ApprovalMode and the policy engine rather than a parallel system.
Skills have first-class SDK types (packages/sdk/src/skills.ts) but the loader, manager, and CLI command wiring referenced in the request (#11506) are not present in the files inspected here, so any integration should verify the current implementation status before depending on it.
The same write-tool surface that Plan Mode needs to gate is also reachable through MCP servers and custom extensions (README.md, mcp-server example), so a Plan Mode toggle must remain authoritative over extension-supplied tools to avoid the "agent deleted thousands of files" scenario reported in issue #26856.

Security: Sandbox, Policy Engine & Hooks

Related topics: Plan Mode & Skills System, Sessions, Checkpointing & Rewind

Section Related Pages

Continue reading this section for the full explanation and source context.

Security: Sandbox, Policy Engine & Hooks

Overview

Gemini CLI exposes a layered security model that combines OS-level sandboxing, a TOML-driven policy engine, a typed tool/hook system, and explicit authentication boundaries for agent endpoints. The sandbox isolates shell execution, the policy engine gates tool invocations against user-contributed rules, and the tool/hook layer enforces schema validation and exposes structured error feedback. Trusted folders, MCP server trust, and prompt-injection-safe system-instruction construction round out the model. Source: README.md

The architecture is intentionally extensible: extensions can contribute *additional* rules and safety checkers, but they are deliberately forbidden from weakening the base policy. As documented in the example extension, "For security, Gemini CLI ignores any allow decisions or yolo mode configurations contributed by extensions. This ensures that extensions can strengthen security but cannot bypass user confirmation." Source: packages/cli/src/commands/extensions/examples/policies/README.md

Policy Engine

The Policy Engine is the central rule evaluator for tool invocations. It accepts policy files written in TOML and supports three distinct rule classes, as demonstrated in the bundled policies/ example extension:

Rule Class	Example Behavior	Source
Confirmation	Requires user confirmation for destructive shell commands (e.g., `rm -rf`).	policies/README.md
Deny	Rejects searches over sensitive files (e.g., `grep` against `.env`) and surfaces a custom deny message.	policies/README.md
Safety Checker	Validates file paths for every write operation via an `allowed-path` checker.	policies/README.md

Extensions contribute policies through a policies/ directory containing .toml files and a gemini-extension.json manifest. After installing with gemini extensions link, the policies become active for the next session. The engine is consulted before tool execution, which means even MCP-served tools go through the same gating path. Source: packages/cli/src/commands/extensions/examples/policies/README.md, packages/cli/src/commands/extensions/examples/mcp-server/README.md

A simplified request flow through these layers looks like:

flowchart LR
  A[Model emits tool call] --> B[Tool input validated by Zod schema]
  B --> C{Policy Engine check}
  C -- deny --> X[Reject with deny message]
  C -- allow + needs confirm --> Y[Prompt user]
  C -- allow --> D{Safety checker(s)}
  D -- fail --> X
  D -- pass --> E[Execute in Sandbox]
  E --> F[Return result to model]

Tool & Hook Validation

Every tool declared through the SDK must conform to the ToolDefinition<T> interface, which couples a human-readable description with a Zod schema used for runtime validation. The Zod schema is converted to JSON Schema so the model can see parameter shapes, and the same schema is re-applied on the server side to refuse malformed invocations. Source: packages/sdk/src/tool.ts

Two flags are especially relevant to security:

sendErrorsToModel — when true, any exception thrown by the tool's action is surfaced to the model as part of the conversation instead of being swallowed, letting the model self-correct. Source: packages/sdk/src/tool.ts
ModelVisibleError — a dedicated error class whose message is guaranteed to be visible to Gemini. Use this when you intentionally want a failure to influence the next model turn. Source: packages/sdk/src/tool.ts

Skills augment the agent with extra prompts, tools, and behaviors and are loaded as directory references via skillDir(path). Because a skill can contribute prompts and tool definitions, its contents are subject to the same policy and schema validation path as built-in tools. Source: packages/sdk/src/skills.ts

The event-driven scheduler used in the a2a-server wires confirmation outcomes back into the agent loop. A MessageBus of type TOOL_CALLS_UPDATE propagates status changes, and the Scheduler re-injects the user decision (via ToolConfirmationOutcome and ApprovalMode) before the next tool call. Source: packages/a2a-server/src/agent/task-event-driven.test.ts

Authentication, Sandbox & Trusted Boundaries

For agent-facing HTTP endpoints (the a2a-server), authentication is performed inside a UserBuilder that inspects the Authorization header and returns either a tagged user or an UnauthenticatedUser. The reference implementation recognizes two schemes:

Scheme	Example	Outcome	Source
`Bearer`	`Bearer valid-token`	`{ userName: 'bearer-user', isAuthenticated: true }`	packages/a2a-server/src/http/app.ts
`Basic`	`Basic` + `admin:password` (base64)	`{ userName: 'basic-user', isAuthenticated: true }`	packages/a2a-server/src/http/app.ts
(none / unknown)	—	`UnauthenticatedUser`	packages/a2a-server/src/http/app.ts

The header scheme is logged for observability, and downstream code can branch on the isAuthenticated flag. Source: packages/a2a-server/src/http/app.ts

For local execution, Gemini CLI ships sandbox documentation that covers platform-specific isolation, and the README links out to a dedicated "Sandboxing & Security" page. The sandbox sits *below* the policy engine — the engine decides *whether* a call is allowed, and the sandbox constrains *what the call can touch* at the OS level. Source: README.md

Two additional trust controls are surfaced in the documentation:

Trusted Folders — execution policies can be tuned per working directory, so an untrusted folder cannot silently inherit a permissive configuration. Source: README.md
ACP file system service — when the CLI runs in Agent Client Protocol mode, the file system is "restricted by the workspace boundaries and permissions," giving the IDE a hardened view of the agent's capabilities. Source: packages/cli/src/acp/README.md

Prompt-Injection & Dynamic Instructions

System instructions can be a static string or a function of SessionContext. The SDK explicitly warns that dynamic functions must sanitize context-derived content: strip newlines and ], and escape < and >, before interpolating it into the returned instructions. This is a defense against prompt injection from any context field an attacker could influence (file contents, MCP responses, tool outputs). Source: packages/sdk/src/types.ts

Combined with the policy engine's "extensions can strengthen but cannot weaken" rule, these design choices produce a model in which:

Tools cannot be invoked with malformed input (Zod validation).
Tools cannot run without a policy decision (Policy Engine).
Extensions cannot grant themselves extra privileges (allow/yolo rejection).
Remote agents must authenticate (UserBuilder).
Local execution can be isolated per-platform (Sandbox) and per-folder (Trusted Folders).

Source: packages/a2a-server/src/agent/task.test.ts, packages/a2a-server/src/agent/task-event-driven.test.ts

Sessions, Checkpointing & Rewind

Related topics: Security: Sandbox, Policy Engine & Hooks

Section Related Pages

Continue reading this section for the full explanation and source context.

Sessions, Checkpointing & Rewind

Overview

Gemini CLI exposes a sessions, checkpointing, and rewind subsystem that lets users save the state of an in-flight conversation, resume that conversation later, and roll the workspace back to an earlier point in the session. The feature is described at a high level in the project's main documentation, which lists "Conversation checkpointing to save and resume complex sessions" as one of the headline capabilities of the CLI. Source: README.md

The README also points readers at a dedicated Checkpointing guide and a related Token Caching guide, both of which are positioned as ways to make long, multi-turn sessions cheaper and easier to recover. Source: README.md

Session Context in the Tool SDK

When a tool is invoked during a session, the runtime passes an optional SessionContext object to the tool's action function. The SDK documents this contract explicitly: the context provides "access to filesystem, shell, and other session state" and is typed as SessionContext | undefined. Source: packages/sdk/src/tool.ts

This design has two important consequences for checkpointing and rewind:

Tool actions read the current session's filesystem and shell state from the context, rather than from module-level globals. A checkpoint implementation can therefore snapshot the context and re-bind it on rewind.
The same SessionContext is used both for invocations created by the SDK and for invocations created through the createInvocationWithContext path, so the contract is consistent regardless of how a tool is registered. Source: packages/sdk/src/tool.ts

The tool SDK also exposes a ModelVisibleError class, which lets a tool surface a failure back to the model so it can retry or adjust its approach. When sendErrorsToModel is set on the tool definition, errors raised this way are sent back "as part of the conversation." This is the mechanism by which a tool can tell the model that an action could not be safely checkpointed (for example, a destructive operation that the user later wants to rewind). Source: packages/sdk/src/tool.ts

ACP Session Lifecycle and Resume

The Agent Client Protocol (ACP) layer is the most user-visible surface for sessions, and the source includes explicit support for both session lifecycle and session resume. Source: packages/cli/src/acp/README.md

The ACP module is split across several files, each with a focused responsibility:

File	Responsibility
`acpSessionManager.ts`	Manages the lifecycle of ACP sessions and their configuration
`acpSession.ts`	Handles prompt execution, `@path` file resolution, tool execution, command interception, and streaming updates back to the client
`acpResume.test.ts`	Integration tests for loading and resuming sessions
`acpFileSystemService.ts`	File system access restricted by workspace boundaries and permissions

Source: packages/cli/src/acp/README.md

The presence of acpResume.test.ts as a co-located integration test confirms that resume is a first-class feature: the suite covers loading a previously saved session and continuing from the point at which it was left. A correct resume requires that the persisted session record the conversation history, the tool invocations that were issued, and any side effects on the workspace that need to be replayed or skipped. Source: packages/cli/src/acp/README.md

State Propagation Through the Message Bus

Session state in Gemini CLI is propagated through a typed MessageBus rather than through shared mutable globals. The A2A server's event-driven scheduler test demonstrates this pattern: a MessageBus is created as part of the test's mock Config, and the test asserts that the Task subscribes to MessageBusType-tagged updates for tool calls and maps status changes to the appropriate events on the bus. Source: packages/a2a-server/src/agent/task-event-driven.test.ts

For checkpointing, this matters: because session-affecting events flow over the bus, a checkpointing implementation can subscribe to the same stream and persist events in order. A rewind then replays the events from the chosen checkpoint up to the desired state, which avoids re-executing the underlying tools. The A2A HTTP layer, which exposes the agent card and authenticated routes for code_generation, is one of the surfaces where the persisted bus history can be reattached to a resumed task. Source: packages/a2a-server/src/http/app.ts

Test Infrastructure for Stateful Components

The TestMcpServerBuilder in the test utilities is representative of how stateful components are exercised in tests. It exposes a fluent addTool API that registers a tool with a name, description, response, and optional input schema, and it serializes the configuration into a structure that the test MCP server can consume at runtime. Source: packages/test-utils/src/test-mcp-server.ts

The shape of a recorded tool call (name, schema, response) mirrors the minimum data a checkpoint system would need to replay an invocation: capture the tool name, the validated parameters, and the response, and persist them as a replayable record. This is the same pattern that the test utilities use to make MCP server behavior deterministic across runs.

Limitations of This Page

The primary implementation files for the checkpointing and rewind subsystems (for example, the session utility, the checkpoint utilities, and the React hooks that drive rewind from the terminal UI) were not present in the source context used to generate this page. The descriptions above are intentionally limited to what is verifiable from the SDK, ACP, A2A, and test-utility files that were available. For the precise file layout, configuration keys, and rewind workflow, consult the linked Checkpointing guide and the source files under packages/core/src/utils/ and packages/cli/src/ui/hooks/ in the repository. Source: README.md

Doramagic Pitfall Log

Source-linked risks stay visible on the manual page so the preview does not read like a recommendation.

high Installation risk requires verification

May increase setup, validation, or first-run risk for the user.

high Installation risk requires verification

May increase setup, validation, or first-run risk for the user.

high Configuration risk requires verification

May increase setup, validation, or first-run risk for the user.

high Runtime risk requires verification

May increase setup, validation, or first-run risk for the user.

Doramagic Pitfall Log

Found 36 structured pitfall item(s), including 16 high/blocking item(s). Top priority: Installation risk - Installation risk requires verification.

1. Installation risk: Installation risk requires verification

Severity: high
Finding: Project evidence flags a installation risk. Review the linked source before relying on this workflow.
User impact: May increase setup, validation, or first-run risk for the user.
Recommended check: Reproduce the official install and quickstart path in an isolated environment.
Evidence: community_evidence:github | https://github.com/google-gemini/gemini-cli/issues/22741

2. Installation risk: Installation risk requires verification

Severity: high
Finding: Project evidence flags a installation risk. Review the linked source before relying on this workflow.
User impact: May increase setup, validation, or first-run risk for the user.
Recommended check: Reproduce the official install and quickstart path in an isolated environment.
Evidence: community_evidence:github | https://github.com/google-gemini/gemini-cli/issues/26523

3. Configuration risk: Configuration risk requires verification

Severity: high
Finding: Project evidence flags a configuration risk. Review the linked source before relying on this workflow.
User impact: May increase setup, validation, or first-run risk for the user.
Recommended check: Reproduce the official install and quickstart path in an isolated environment.
Evidence: community_evidence:github | https://github.com/google-gemini/gemini-cli/issues/26516

4. Runtime risk: Runtime risk requires verification

Severity: high
Finding: Project evidence flags a runtime risk. Review the linked source before relying on this workflow.
User impact: May increase setup, validation, or first-run risk for the user.
Recommended check: Reproduce the official install and quickstart path in an isolated environment.
Evidence: community_evidence:github | https://github.com/google-gemini/gemini-cli/issues/23313

5. Runtime risk: Runtime risk requires verification

Severity: high
Finding: Project evidence flags a runtime risk. Review the linked source before relying on this workflow.
User impact: May increase setup, validation, or first-run risk for the user.
Recommended check: Reproduce the official install and quickstart path in an isolated environment.
Evidence: community_evidence:github | https://github.com/google-gemini/gemini-cli/issues/22600

6. Runtime risk: Runtime risk requires verification

Severity: high
Finding: Project evidence flags a runtime risk. Review the linked source before relying on this workflow.
User impact: May increase setup, validation, or first-run risk for the user.
Recommended check: Reproduce the official install and quickstart path in an isolated environment.
Evidence: community_evidence:github | https://github.com/google-gemini/gemini-cli/issues/23571

7. Runtime risk: Runtime risk requires verification

Severity: high
Finding: Project evidence flags a runtime risk. Review the linked source before relying on this workflow.
User impact: May increase setup, validation, or first-run risk for the user.
Recommended check: Reproduce the official install and quickstart path in an isolated environment.
Evidence: community_evidence:github | https://github.com/google-gemini/gemini-cli/issues/23166

8. Maintenance risk: Maintenance risk requires verification

Severity: high
Finding: Project evidence flags a maintenance risk. Review the linked source before relying on this workflow.
User impact: May increase setup, validation, or first-run risk for the user.
Recommended check: Reproduce the official install and quickstart path in an isolated environment.
Evidence: community_evidence:github | https://github.com/google-gemini/gemini-cli/issues/24246

9. Security or permission risk: Security or permission risk requires verification

Severity: high
Finding: Developers should check this security_permissions risk before relying on the project: Add deterministic redaction and reduce Auto Memory logging
User impact: Developers may expose sensitive permissions or credentials: Add deterministic redaction and reduce Auto Memory logging
Recommended check: Before packaging this project, run the relevant install/config/quickstart check for: Add deterministic redaction and reduce Auto Memory logging. Context: Source discussion did not expose a precise runtime context.
Evidence: failure_mode_cluster:github_issue | https://github.com/google-gemini/gemini-cli/issues/26525

10. Security or permission risk: Security or permission risk requires verification

Severity: high
Finding: Developers should check this security_permissions risk before relying on the project: Robust component level evalutions
User impact: Developers may expose sensitive permissions or credentials: Robust component level evalutions
Recommended check: Before packaging this project, run the relevant install/config/quickstart check for: Robust component level evalutions. Context: Source discussion did not expose a precise runtime context.
Evidence: failure_mode_cluster:github_issue | https://github.com/google-gemini/gemini-cli/issues/24353

11. Security or permission risk: Security or permission risk requires verification

Severity: high
Finding: Project evidence flags a security or permission risk. Review the linked source before relying on this workflow.
User impact: May increase setup, validation, or first-run risk for the user.
Recommended check: Reproduce the official install and quickstart path in an isolated environment.
Evidence: community_evidence:github | https://github.com/google-gemini/gemini-cli/issues/26525

12. Security or permission risk: Security or permission risk requires verification

Severity: high
Finding: Project evidence flags a security or permission risk. Review the linked source before relying on this workflow.
User impact: May increase setup, validation, or first-run risk for the user.
Recommended check: Reproduce the official install and quickstart path in an isolated environment.
Evidence: community_evidence:github | https://github.com/google-gemini/gemini-cli/issues/22745

Source: Doramagic discovery, validation, and Project Pack records

Community Discussion Evidence

These external discussion links are review inputs, not standalone proof that the project is production-ready.

Sources 12

Count of project-level external discussion links exposed on this manual page.

Use Review before install

Open the linked issues or discussions before treating the pack as ready for your environment.

Community Discussion Evidence

Doramagic exposes project-level community discussion separately from official documentation. Review these links before using gemini-cli with real data or production workflows.

Add deterministic redaction and reduce Auto Memory logging - github / github_issue
Surface or quarantine invalid Auto Memory inbox patches - github / github_issue
Stop Auto Memory from retrying low-signal sessions indefinitely - github / github_issue
Memory system bugs and quality improvements - github / github_issue
Shell command execution gets stuck with "Waiting input" after command co - github / github_issue
Corruption after exiting external editors in terminalBuffer mode - github / github_issue
Robust component level evalutions - github / github_issue
Gemini CLI encounters 400 error with > 128 tools - github / github_issue
Model frequently creates tmp scripts in random spots - github / github_issue
Change the steering eval test to always pass - github / github_issue
Stabilize and Enhance Internal Project Evaluations - github / github_issue
Investigate using AST aware tools to search and perform file reads - github / github_issue

Source: Project Pack community evidence and pitfall evidence

gemini-cli

Overview & Getting Started

Related Pages

Overview & Getting Started

What Gemini CLI Is

Installation and First Run

Authentication Options

Core Features and Built-in Tools

Advanced Surfaces

Roadmap and Community Direction

See Also

System Architecture & Agent Loop

Related Pages

System Architecture & Agent Loop

1. Overview and Scope

2. High-Level Architecture

3. The Agent Loop and Turn Execution

4. Extension Points, Safety, and Community Concerns

Community-driven limitations reflected in the architecture

See Also

Built-in Tools & MCP Integration

Related Pages

Built-in Tools & MCP Integration

Overview

Tool Definition Framework

Built-in Tools

MCP Server Integration

Policy Engine & Safety

Community Considerations

See Also

Plan Mode & Skills System

Related Pages

Plan Mode & Skills System

Skills System

SDK Primitives

Integration with Agent Options

Plan Mode

Approval Mode Foundation

Tasks, Schedulers, and Write Tool Gating

Tools as the Common Language

End-to-End Relationship

Community Context and Caveats

See Also

Security: Sandbox, Policy Engine & Hooks

Related Pages

Security: Sandbox, Policy Engine & Hooks

Overview

Policy Engine

Tool & Hook Validation

Authentication, Sandbox & Trusted Boundaries

Prompt-Injection & Dynamic Instructions

See Also

Sessions, Checkpointing & Rewind

Related Pages

Sessions, Checkpointing & Rewind

Overview

Session Context in the Tool SDK

ACP Session Lifecycle and Resume

State Propagation Through the Message Bus

Test Infrastructure for Stateful Components

Limitations of This Page

See Also

Doramagic Pitfall Log

Doramagic Pitfall Log

1. Installation risk: Installation risk requires verification

2. Installation risk: Installation risk requires verification

3. Configuration risk: Configuration risk requires verification

4. Runtime risk: Runtime risk requires verification

5. Runtime risk: Runtime risk requires verification

6. Runtime risk: Runtime risk requires verification

7. Runtime risk: Runtime risk requires verification

8. Maintenance risk: Maintenance risk requires verification

9. Security or permission risk: Security or permission risk requires verification

10. Security or permission risk: Security or permission risk requires verification

11. Security or permission risk: Security or permission risk requires verification

12. Security or permission risk: Security or permission risk requires verification

Community Discussion Evidence

Community Discussion Evidence