# https://github.com/sidebutton/sidebutton 项目说明书

生成时间：2026-05-16 13:21:37 UTC

## 目录

- [Introduction to SideButton](#introduction)
- [Getting Started](#getting-started)
- [System Architecture](#architecture)
- [Package Overview](#packages-overview)
- [Workflow Engine](#workflow-engine)
- [Step Types Reference](#step-types)
- [Workflow Examples](#workflow-examples)
- [MCP Server Integration](#mcp-server)
- [Chrome Extension](#chrome-extension)
- [Knowledge Packs](#knowledge-packs)

<a id='introduction'></a>

## Introduction to SideButton

### 相关页面

相关主题：[Getting Started](#getting-started), [System Architecture](#architecture), [MCP Server Integration](#mcp-server)

<details>
<summary>相关源码文件</summary>

以下源码文件用于生成本页说明：

- [README.md](https://github.com/sidebutton/sidebutton/blob/main/README.md)
- [AGENTS.md](https://github.com/sidebutton/sidebutton/blob/main/AGENTS.md)
- [CONTRIBUTING.md](https://github.com/sidebutton/sidebutton/blob/main/CONTRIBUTING.md)
- [packages/core/README.md](https://github.com/sidebutton/sidebutton/blob/main/packages/core/README.md)
- [packages/server/README.md](https://github.com/sidebutton/sidebutton/blob/main/packages/server/README.md)
- [packages/sidebutton/README.md](https://github.com/sidebutton/sidebutton/blob/main/packages/sidebutton/README.md)
</details>

# Introduction to SideButton

SideButton is an open-source AI agent platform that combines an MCP (Model Context Protocol) server with browser automation tools, a YAML-based workflow engine, and extensible knowledge packs for domain-specific expertise. It enables autonomous AI agents to interact with web applications through standardized browser controls, CLI operations, and pre-built workflow automations.

资料来源：[AGENTS.md]()

## High-Level Architecture

SideButton follows a modular monorepo architecture with four primary packages working together to provide a complete automation platform.

```mermaid
graph TB
    subgraph "Client Layer"
        EXT["Chrome Extension"]
        CLI["CLI Tool"]
        MCP["MCP Clients<br/>(Claude, Cursor)"]
    end
    
    subgraph "packages/server"
        API["REST API & Dashboard"]
        MCP_SRV["MCP Endpoint"]
        WS["WebSocket Bridge"]
    end
    
    subgraph "packages/core"
        PARSER["Workflow Parser"]
        EXEC["Step Executor"]
    end
    
    subgraph "packages/dashboard"
        UI["Svelte Web UI"]
    end
    
    EXT --> WS
    CLI --> API
    MCP --> MCP_SRV
    API --> EXEC
    MCP_SRV --> EXEC
    WS --> EXEC
    EXEC --> PARSER
    UI --> API
```

资料来源：[AGENTS.md](), [CONTRIBUTING.md]()

## Package Structure

The repository is organized as a monorepo using pnpm workspaces. Each package has a focused responsibility.

| Package | Purpose | Location |
|---------|---------|----------|
| `packages/core` | Workflow engine — parser, executor, and step implementations | Workflow execution runtime |
| `packages/server` | HTTP server, MCP endpoint, CLI, and WebSocket bridge | API layer and server runtime |
| `packages/dashboard` | Svelte web UI served at `localhost:9876` | User interface |
| `packages/sidebutton` | CLI entry point for `npx sidebutton@latest` | Command-line interface |
| `extension/` | Chrome extension (Manifest V3) | Browser automation |

资料来源：[AGENTS.md](), [CONTRIBUTING.md]()

## Core Concepts

### Workflows

Workflows are YAML files that define sequences of steps for automation tasks. They can include browser interactions, shell commands, LLM calls, and control flow logic.

```yaml
name: example_workflow
steps:
  - type: navigate
    url: https://example.com
  
  - type: click
    selector: "#submit-button"
  
  - type: extract
    selector: ".result"
    as: result
```

资料来源：[packages/core/README.md]()

### Step Types

SideButton provides multiple categories of steps for different automation needs.

| Category | Steps | Purpose |
|----------|-------|---------|
| Browser | `navigate`, `click`, `type`, `scroll`, `hover`, `wait`, `extract`, `extractAll`, `exists`, `key` | Web page interaction |
| Shell | `shell.run`, `terminal.open`, `terminal.run` | Command execution |
| LLM | `llm.classify`, `llm.generate` | AI-powered operations |
| Control | `control.if`, `control.retry`, `control.stop`, `workflow.call` | Flow control |
| Data | `data.first` | Data manipulation |

资料来源：[packages/core/README.md]()

### Knowledge Packs

Knowledge packs (also called skill packs) teach autonomous AI agents how specific web applications work. They bundle markdown files containing selectors, data models, state definitions, and agentic workflows per web app.

Key capabilities of knowledge packs:

- **Selectors**: CSS/XPath selectors for UI elements
- **Data Models**: Structured data representations
- **Agentic Workflows**: Pre-defined sequences for common tasks
- **Role Playbooks**: Instructions for AI agent behavior

资料来源：[packages/sidebutton/README.md](), [AGENTS.md]()

## MCP Integration

SideButton provides MCP (Model Context Protocol) server functionality for integration with AI coding assistants. The MCP tools allow AI agents to control browser automation and execute workflows programmatically.

### Supported Clients

| Client | Transport | Configuration |
|--------|-----------|---------------|
| Claude Desktop | stdio | `npx sidebutton --stdio` |
| Claude Code | SSE | `http://localhost:9876/mcp` |
| Cursor | HTTP | `http://localhost:9876/mcp` |

资料来源：[packages/server/README.md](), [packages/sidebutton/README.md]()

### MCP Tools

| Tool | Description |
|------|-------------|
| `run_workflow` | Execute a workflow by ID |
| `list_workflows` | List available workflows |
| `get_workflow` | Get workflow YAML definition |
| `get_run_log` | Get execution log for a run |
| `list_run_logs` | List recent workflow executions |
| `get_browser_status` | Check browser extension connection |
| `capture_page` | Capture selectors from current page |
| `navigate` | Navigate browser to URL |
| `snapshot` | Get page accessibility snapshot |
| `click` | Click an element |
| `type` | Type text into an element |
| `scroll` | Scroll the page |
| `screenshot` | Capture page screenshot |
| `hover` | Hover over element |
| `extract` | Extract text from element |
| `extract_all` | Extract all matching elements |

资料来源：[packages/server/README.md](), [README.md]()

## CLI Commands

The SideButton CLI provides commands for managing the server, workflows, and knowledge packs.

```bash
sidebutton                           # Start server (default port 9876)
sidebutton --stdio                   # Start with stdio transport (Claude Desktop)
sidebutton -p 8080                   # Custom port

sidebutton list                      # List available workflows
sidebutton run <id>                  # Run a workflow by ID
sidebutton status                    # Check server status
```

### Knowledge Pack Management

| Command | Description |
|---------|-------------|
| `sidebutton registry add <path\|url>` | Register and install all knowledge packs |
| `sidebutton registry update [name]` | Update installed packs from registry |
| `sidebutton registry remove <name>` | Uninstall packs and remove registry |
| `sidebutton registry list` | Show registries and pack counts |
| `sidebutton search [query]` | Search packs across registries |
| `sidebutton install <path\|url\|name>` | One-off knowledge pack install |
| `sidebutton uninstall <domain>` | Remove an installed knowledge pack |
| `sidebutton init [domain]` | Scaffold a new knowledge pack |
| `sidebutton validate [path]` | Lint and validate a knowledge pack |

资料来源：[packages/sidebutton/README.md](), [packages/server/README.md]()

## Quick Start

### Published Package (No Clone Required)

```bash
npx sidebutton@latest   # starts server + dashboard on port 9876
```

资料来源：[AGENTS.md]()

### Local Development Setup

```bash
# Clone the repo
git clone https://github.com/sidebutton/sidebutton.git
cd sidebutton

# Install dependencies
pnpm install

# Build all packages
pnpm build

# Start the server
pnpm start
# Open http://localhost:9876
```

资料来源：[CONTRIBUTING.md]()

### Development Prerequisites

| Requirement | Version |
|-------------|---------|
| Node.js | 20+ |
| pnpm | 9.15+ |
| Chrome | Latest (for browser automation) |

资料来源：[CONTRIBUTING.md]()

## Development Workflow

### Running Components

Start everything in watch mode with hot reload:

```bash
pnpm dev
```

Run components individually:

| Command | Description |
|---------|-------------|
| `pnpm dev:server` | Server with auto-restart on :9876 |
| `pnpm dev:dashboard` | Dashboard with HMR on :5173 |
| `pnpm build` | Build all packages |
| `pnpm test` | Run all tests |

资料来源：[CONTRIBUTING.md]()

## Provider Preference

When multiple integration methods exist, SideButton follows this preference order:

```mermaid
graph LR
    A["API Provider"] --> B["CLI Tool"] --> C["Browser Automation"]
    style A fill:#90EE90
    style B fill:#FFD700
    style C fill:#FFA07A
```

- **API** is fastest and most reliable
- **CLI** provides programmatic access
- **Browser automation** is the universal fallback

资料来源：[packages/server/defaults/roles/software-engineer.md]()

## Data Directories

| Directory | What it is |
|-----------|------------|
| `packages/core/` | Workflow engine — parser, executor, step implementations |
| `packages/server/` | HTTP server, MCP endpoint, CLI, WebSocket bridge |
| `packages/dashboard/` | Svelte web UI served at localhost:9876 |
| `extension/` | Chrome extension for browser automation |
| `workflows/` | Public workflow library (YAML files) |
| `actions/` | User-created workflows (gitignored) |

资料来源：[CONTRIBUTING.md]()

## Related Packages

| Package | NPM Link |
|---------|----------|
| `@sidebutton/core` | [npmjs.com](https://www.npmjs.com/package/@sidebutton/core) |
| `@sidebutton/server` | [npmjs.com](https://www.npmjs.com/package/@sidebutton/server) |

资料来源：[packages/core/README.md](), [packages/server/README.md]()

## External Resources

| Resource | URL |
|----------|-----|
| Documentation | [docs.sidebutton.com](https://docs.sidebutton.com) |
| GitHub Repository | [github.com/sidebutton/sidebutton](https://github.com/sidebutton/sidebutton) |
| Website | [sidebutton.com](https://sidebutton.com) |
| Knowledge Packs | [sidebutton.com/skills](https://sidebutton.com/skills) |

## License

SideButton is licensed under **Apache-2.0**.

资料来源：[CONTRIBUTING.md](), [packages/core/README.md](), [packages/server/README.md]()

---

<a id='getting-started'></a>

## Getting Started

### 相关页面

相关主题：[Introduction to SideButton](#introduction)

<details>
<summary>相关源码文件</summary>

以下源码文件用于生成本页说明：

- [package.json](https://github.com/sidebutton/sidebutton/blob/main/package.json)
- [packages/server/README.md](https://github.com/sidebutton/sidebutton/blob/main/packages/server/README.md)
- [packages/sidebutton/README.md](https://github.com/sidebutton/sidebutton/blob/main/packages/sidebutton/README.md)
- [packages/core/README.md](https://github.com/sidebutton/sidebutton/blob/main/packages/core/README.md)
- [README.md](https://github.com/sidebutton/sidebutton/blob/main/README.md)
</details>

# Getting Started

SideButton is a Model Context Protocol (MCP) server that provides browser automation, workflow execution, and knowledge pack management capabilities. It enables AI assistants like Claude Desktop and Cursor to interact with web browsers, execute predefined workflows, and leverage domain-specific knowledge packs.

## Prerequisites

Before installing SideButton, ensure your environment meets the following requirements:

| Requirement | Version/Details |
|-------------|-----------------|
| Node.js | v18 or higher |
| Package Manager | pnpm (recommended) or npm |
| Browser | Chrome/Chromium (for browser automation features) |
| OS | macOS, Windows, Linux |

资料来源：[README.md:1-50](https://github.com/sidebutton/sidebutton/blob/main/README.md)

## Installation

### Package Manager Installation

Install the SideButton CLI globally using your preferred package manager:

```bash
# Using npm
npm install -g sidebutton

# Using pnpm
pnpm add -g sidebutton

# Using yarn
yarn global add sidebutton
```

Verify the installation:

```bash
sidebutton --version
```

### Development Setup (From Source)

For contributing or running the latest development version:

```bash
# Clone the repository
git clone https://github.com/sidebutton/sidebutton.git
cd sidebutton

# Install dependencies
pnpm install

# Build all packages
pnpm build

# Run CLI directly
pnpm cli --version
```

资料来源：[CONTRIBUTING.md:1-20](https://github.com/sidebutton/sidebutton/blob/main/CONTRIBUTING.md)

## Quick Start

### Starting the Server

The default command starts the SideButton server on port 9876:

```bash
sidebutton
```

To use a custom port:

```bash
sidebutton -p 8080
```

The server provides:
- REST API endpoint
- MCP (Model Context Protocol) endpoint
- WebSocket connection for browser extension
- Dashboard UI at `http://localhost:9876`

资料来源：[packages/sidebutton/README.md:1-30](https://github.com/sidebutton/sidebutton/blob/main/packages/sidebutton/README.md)

### Architecture Overview

```mermaid
graph TD
    A[Claude Desktop / Cursor] -->|MCP Protocol| B[SideButton Server]
    A -->|stdio| B
    B -->|REST API| C[Dashboard UI]
    B -->|WebSocket| D[Chrome Extension]
    B -->|Execute| E[Workflow Engine]
    E -->|Browser Actions| F[Chrome Browser]
    E -->|CLI Tools| G[Shell/CLI]
    E -->|LLM Calls| H[OpenAI/Anthropic/Ollama]
```

## MCP Integration

SideButton can be integrated with various AI coding assistants through the MCP protocol.

### Claude Desktop

Add SideButton to your Claude Desktop configuration file at `~/Library/Application Support/Claude/claude_desktop_config.json`:

```json
{
  "mcpServers": {
    "sidebutton": {
      "command": "npx",
      "args": ["sidebutton", "--stdio"]
    }
  }
}
```

After adding the configuration, restart Claude Desktop to load the new MCP server.

资料来源：[README.md:50-70](https://github.com/sidebutton/sidebutton/blob/main/README.md)

### Cursor

Add SideButton to your Cursor MCP configuration file at `~/.cursor/mcp.json`:

```json
{
  "mcpServers": {
    "sidebutton": {
      "url": "http://localhost:9876/mcp"
    }
  }
}
```

Ensure the SideButton server is running before using Cursor with this configuration.

资料来源：[packages/server/README.md:20-35](https://github.com/sidebutton/sidebutton/blob/main/packages/server/README.md)

### Available MCP Tools

| Tool | Description |
|------|-------------|
| `run_workflow` | Execute a workflow by ID |
| `list_workflows` | List all available workflows |
| `get_workflow` | Get workflow YAML definition |
| `get_run_log` | Get execution log for a run |
| `list_run_logs` | List recent workflow executions |
| `get_browser_status` | Check browser extension connection |
| `capture_page` | Capture selectors from current page |
| `navigate` | Navigate browser to URL |
| `snapshot` | Get page accessibility snapshot |
| `click` | Click an element |
| `type` | Type text into an element |
| `scroll` | Scroll the page |
| `screenshot` | Capture page screenshot |
| `hover` | Hover over element |
| `extract` | Extract text from element |
| `extract_all` | Extract all matching elements |

资料来源：[packages/server/README.md:40-60](https://github.com/sidebutton/sidebutton/blob/main/packages/server/README.md)

## CLI Commands

### Basic Commands

```bash
# Start the server (default port 9876)
sidebutton

# Start with stdio transport for Claude Desktop
sidebutton --stdio

# Start on custom port
sidebutton -p 8080

# List available workflows
sidebutton list

# Run a specific workflow
sidebutton run <workflow-id>

# Check server status
sidebutton status
```

### Knowledge Packs Management

```bash
# Add a registry
sidebutton registry add <path|url>

# Update installed packs
sidebutton registry update [name]

# Remove a registry
sidebutton registry remove <name>

# List all registries
sidebutton registry list

# Search packs across registries
sidebutton search [query]

# Install a knowledge pack
sidebutton install <path|url|name>

# Uninstall a knowledge pack
sidebutton uninstall <domain>
```

### Knowledge Pack Development

```bash
# Scaffold a new knowledge pack
sidebutton init [domain]

# Validate a knowledge pack
sidebutton validate [path]

# Publish to registry
sidebutton publish
```

资料来源：[packages/sidebutton/README.md:60-100](https://github.com/sidebutton/sidebutton/blob/main/packages/sidebutton/README.md)

## Dashboard

The SideButton dashboard provides a web-based UI for managing workflows and viewing execution history.

Access the dashboard at: `http://localhost:9876`

### Dashboard Features

- View and manage shortcuts
- Browse available workflows
- View execution logs
- Add workflows to dashboard
- Monitor browser extension status

## Chrome Extension

Install the SideButton Chrome extension from the [Chrome Web Store](https://chromewebstore.google.com/detail/sidebutton/odaefhmdmgijnhdbkfagnlnmobphgkij).

### Extension Features

- 40+ browser commands for navigation, clicking, typing, extraction
- Real DOM access via CSS selectors
- Recording mode to capture manual actions as workflows
- Embed action buttons into web pages
- WebSocket connection for stable reconnection

资料来源：[README.md:80-100](https://github.com/sidebutton/sidebutton/blob/main/README.md)

## Workflow Execution

### Running a Workflow via CLI

```bash
# List all available workflows
sidebutton list

# Execute a workflow by ID
sidebutton run <workflow-id>

# With parameters
sidebutton run <workflow-id> --param value
```

### Running a Workflow via MCP

When connected to an MCP client like Claude Desktop:

```
# Use the run_workflow tool
run_workflow({ id: "workflow-id", params: { key: "value" } })
```

### Workflow Step Types

| Category | Steps |
|----------|-------|
| Browser | `navigate`, `click`, `type`, `scroll`, `hover`, `wait`, `extract`, `extractAll`, `exists`, `key` |
| Shell | `shell.run`, `terminal.open`, `terminal.run` |
| LLM | `llm.classify`, `llm.generate` |
| Control | `control.if`, `control.retry`, `control.stop`, `workflow.call` |
| Data | `data.first` |

资料来源：[packages/core/README.md:20-40](https://github.com/sidebutton/sidebutton/blob/main/packages/core/README.md)

## Next Steps

- Explore [Workflow Configuration](workflows.md) for creating custom automations
- Set up [Knowledge Packs](knowledge-packs.md) for domain-specific capabilities
- Configure [Provider Integrations](providers.md) for GitHub, Jira, and other tools
- Review [Testing Guide](testing.md) for quality assurance workflows

---

<a id='architecture'></a>

## System Architecture

### 相关页面

相关主题：[Package Overview](#packages-overview), [MCP Server Integration](#mcp-server), [Workflow Engine](#workflow-engine)

<details>
<summary>Relevant Source Files</summary>

以下源码文件用于生成本页说明：

- [README.md](https://github.com/sidebutton/sidebutton/blob/main/README.md)
- [packages/server/src/server.ts](https://github.com/sidebutton/sidebutton/blob/main/packages/server/src/server.ts)
- [packages/core/README.md](https://github.com/sidebutton/sidebutton/blob/main/packages/core/README.md)
- [packages/server/README.md](https://github.com/sidebutton/sidebutton/blob/main/packages/server/README.md)
- [packages/sidebutton/README.md](https://github.com/sidebutton/sidebutton/blob/main/packages/sidebutton/README.md)
- [packages/server/src/mcp/handler.ts](https://github.com/sidebutton/sidebutton/blob/main/packages/server/src/mcp/handler.ts)
- [packages/server/src/cli.ts](https://github.com/sidebutton/sidebutton/blob/main/packages/server/src/cli.ts)
</details>

# System Architecture

## Overview

SideButton is a browser automation and workflow orchestration platform that enables AI agents (such as Claude Desktop, Cursor) to interact with web applications through a unified MCP (Model Context Protocol) interface. The system combines browser automation, CLI tools, LLM capabilities, and external integrations into a coherent workflow execution engine.

The architecture follows a modular monorepo design with four primary packages:

| Package | Purpose |
|---------|---------|
| `packages/sidebutton` | CLI entry point and CLI transport for MCP |
| `packages/server` | Fastify-based MCP server with REST API and dashboard |
| `packages/core` | Workflow definition parsing and execution engine |
| `packages/dashboard` | React-based web UI for workflow management |

资料来源：[README.md:1-50]()

---

## High-Level Architecture

```mermaid
graph TB
    subgraph "Client Layer"
        CLI["CLI Client<br/>sidebutton"]
        BrowserExt["Chrome Extension"]
        AI["AI Agents<br/>(Claude, Cursor)"]
    end

    subgraph "Transport Layer"
        Stdio["stdio"]
        SSE["Server-Sent Events"]
        WebSocket["WebSocket"]
        HTTP["HTTP/REST"]
    end

    subgraph "Server Package"
        MCP["MCP Handler<br/>handler.ts"]
        API["REST API<br/>server.ts"]
        Dashboard["Dashboard<br/>React App"]
    end

    subgraph "Core Package"
        Executor["Workflow Executor<br/>executor.ts"]
        Workflow["Workflow Parser"]
        Steps["Step Handlers"]
    end

    subgraph "Providers"
        GitHub["GitHub Provider"]
        Browser["Browser Provider"]
        LLM["LLM Provider"]
        Shell["Shell Provider"]
    end

    CLI --> Stdio
    BrowserExt --> WebSocket
    AI --> SSE
    AI --> HTTP

    Stdio --> MCP
    SSE --> MCP
    WebSocket --> MCP
    HTTP --> API

    MCP --> Executor
    API --> Executor

    Executor --> Workflow
    Workflow --> Steps

    Steps --> GitHub
    Steps --> Browser
    Steps --> LLM
    Steps --> Shell
```

资料来源：[packages/server/src/mcp/handler.ts:1-50]()

---

## Package Structure

### CLI Package (`packages/sidebutton`)

The CLI package serves as the primary entry point for users and as an MCP transport adapter.

**Key responsibilities:**

- Parse CLI arguments and commands
- Initialize and start the MCP server
- Provide `stdio` transport for AI agent integration
- Manage local configuration and authentication

**Transport modes:**

| Mode | Command | Use Case |
|------|---------|----------|
| HTTP Server | `sidebutton` | Dashboard + API access |
| stdio | `sidebutton --stdio` | Claude Desktop integration |
| Custom Port | `sidebutton -p 8080` | Development/custom deployments |

资料来源：[packages/sidebutton/README.md:1-30]()

### Server Package (`packages/server`)

The server package implements the MCP protocol and exposes REST APIs for workflow management.

```mermaid
graph LR
    subgraph "Server Components"
        Fastify["Fastify Server"]
        MCPHandler["MCP Handler"]
        WorkflowManager["Workflow Manager"]
        RunLogManager["Run Log Manager"]
    end

    Fastify --> MCPHandler
    Fastify --> RESTAPI["REST API"]
    MCPHandler --> WorkflowManager
    RESTAPI --> WorkflowManager
    WorkflowManager --> RunLogManager
```

**Core server configuration:**

```typescript
interface ServerConfig {
  port: number;           // Default: 9876
  actionsDir: string;     // User-defined workflows
  workflowsDir: string;   // Bundled workflows
  templatesDir: string;   // Importable templates
  runLogsDir: string;     // Execution logs
}
```

资料来源：[packages/server/src/server.ts:1-50]()

**REST API Endpoints:**

| Endpoint | Method | Description |
|----------|--------|-------------|
| `/api/workflows` | GET | List all workflows |
| `/api/workflows/:id` | GET | Get workflow definition |
| `/api/workflows/:id/run` | POST | Execute a workflow |
| `/api/templates` | GET | List available templates |
| `/api/templates/:id/import` | POST | Import template to actions |
| `/api/runs` | GET | List run logs |
| `/api/runs/:id` | GET/DELETE | Get or delete run log |

资料来源：[packages/server/src/server.ts:100-200]()

### Core Package (`packages/core`)

The core package contains the workflow execution engine that parses YAML definitions and executes steps sequentially.

**Workflow execution flow:**

```mermaid
sequenceDiagram
    participant Client
    participant Executor
    participant StepHandler
    participant Provider

    Client->>Executor: executeWorkflow(workflow, context)
    Executor->>Executor: Parse YAML definition
    Loop For each step
        Executor->>StepHandler: Execute step
        StepHandler->>Provider: Provider action
        Provider-->>StepHandler: Result
        StepHandler-->>Executor: Step result
    End
    Executor-->>Client: Final result
```

资料来源：[packages/core/README.md:1-30]()

**Step Types:**

| Category | Steps | Description |
|----------|-------|-------------|
| Browser | `navigate`, `click`, `type`, `scroll`, `hover`, `wait`, `extract`, `extractAll`, `exists`, `key` | DOM interaction via Chrome extension |
| Shell | `shell.run`, `terminal.open`, `terminal.run` | Execute CLI commands |
| LLM | `llm.classify`, `llm.generate`, `llm.decide` | AI-driven operations |
| Control | `control.if`, `control.retry`, `control.stop`, `workflow.call` | Flow control |
| Data | `data.first`, `data.get`, `variable.set` | Data manipulation |

资料来源：[packages/core/README.md:30-60]()

### Dashboard Package (`packages/dashboard`)

The dashboard is a React/Vite application served by the Fastify server.

**Structure:**

```
packages/dashboard/
├── index.html          # Entry HTML
└── src/
    └── main.ts         # React mount point
```

The dashboard provides:

- Workflow listing and management UI
- Run log viewer
- Settings configuration
- Knowledge pack management

资料来源：[packages/dashboard/index.html:1-20]()

---

## MCP Protocol Integration

### MCP Tools

The MCP handler exposes tools for AI agent interaction:

| Tool | Parameters | Description |
|------|------------|-------------|
| `run_workflow` | `id`, `params?` | Execute a workflow by ID |
| `list_workflows` | `source?` | List workflows (actions/workflows/all) |
| `get_workflow` | `id` | Get workflow YAML definition |
| `get_run_log` | `id` | Get execution log for a run |
| `list_run_logs` | `limit?` | List recent workflow executions |
| `get_browser_status` | - | Check Chrome extension connection |
| `capture_page` | `selectors?` | Capture CSS selectors from current page |
| `navigate` | `url` | Navigate browser to URL |
| `snapshot` | - | Get page accessibility snapshot |
| `click` | `selector` | Click an element |
| `type` | `selector`, `text` | Type text into an element |
| `scroll` | `selector?`, `direction?` | Scroll the page |

资料来源：[packages/server/README.md:40-60]()

### MCP Handler Architecture

```mermaid
graph TD
    MCP["MCP Handler"]
    Tools["Tool Registry"]
    Actions["Actions Loader"]
    Workflows["Workflows Loader"]
    Templates["Templates Loader"]

    MCP --> Tools
    Tools --> Actions
    Tools --> Workflows
    Tools --> Templates

    Actions --> YAML["YAML Files<br/>(~/.sidebutton/actions/)"]
    Workflows --> Bundles["Bundled Workflows<br/>(bundles/)"]
    Templates --> Defaults["Default Templates<br/>(defaults/templates/)"]
```

资料来源：[packages/server/src/mcp/handler.ts:50-100]()

---

## Data Flow

### Workflow Execution Pipeline

```mermaid
graph LR
    A[Workflow YAML] --> B[Parse Steps]
    B --> C[Initialize Context]
    C --> D{For Each Step}
    D -->|Browser Step| E[Browser Provider]
    D -->|Shell Step| F[Shell Provider]
    D -->|LLM Step| G[LLM Provider]
    D -->|Control Step| H[Control Handler]
    E --> I[Record Result]
    F --> I
    G --> I
    H --> I
    I -->|More Steps| D
    I -->|Complete| J[Save Run Log]
    J --> K[Return Result]
```

### Variable Interpolation

Workflows support variable interpolation using `{{variable}}` syntax:

```yaml
steps:
  - type: browser.extract
    selector: ".username"
    as: user
  - type: shell.run
    cmd: "echo 'Hello, {{user}}!'"
```

Variables are stored in the execution context and can be:

- Extracted from previous steps using `as` attribute
- Passed as workflow parameters
- Set explicitly via `variable.set` step

资料来源：[README.md:150-180]()

---

## Provider System

Providers are integration modules that execute specific step types.

### GitHub Provider

Located at `packages/core/src/providers/github.ts`, the GitHub provider supports:

| Operation | Methods |
|-----------|---------|
| Pull Requests | `listPRs`, `getPR`, `createPR` |
| Issues | `listIssues`, `getIssue`, `createIssue`, `comment`, `transition` |
| Repository | `getRepoInfo` |

**Configuration:**

```yaml
providers:
  github:
    type: github
    # Uses GitHub CLI (gh) for operations
```

资料来源：[packages/core/src/providers/github.ts:1-50]()

### Browser Provider

Browser automation is handled via the Chrome extension:

- **Protocol:** WebSocket (stable reconnection)
- **Access:** Real DOM via CSS selectors (not pixel coordinates)
- **Features:** Recording mode, embed buttons, page snapshots

**Connection flow:**

```mermaid
sequenceDiagram
    participant Extension
    participant Server
    participant Browser

    Extension->>Server: Connect WebSocket
    Server->>Extension: Connection confirmed
    Extension->>Server: Send browser commands
    Server->>Browser: Execute via CDP
    Browser-->>Server: Result
    Server-->>Extension: Response
```

资料来源：[README.md:60-90]()

---

## Configuration System

### Server Configuration

```typescript
interface Config {
  port: number;              // Default: 9876
  host: string;               // Default: 'localhost'
  dataDir: string;            // ~/.sidebutton
  actionsDir: string;         // {dataDir}/actions
  workflowsDir: string;       // {dataDir}/workflows
  templatesDir: string;      // {dataDir}/templates
  runLogsDir: string;         // {dataDir}/run-logs
  mcpPort: number;            // Default: 9877
}
```

### Environment Variables

The server supports environment variables for provider configuration:

| Variable | Description |
|----------|-------------|
| `GITHUB_TOKEN` | GitHub authentication token |
| `OPENAI_API_KEY` | OpenAI API key for LLM steps |
| `ANTHROPIC_API_KEY` | Anthropic API key |
| `SIDEBUTTON_API_BASE` | API base URL for browser extension |

资料来源：[packages/server/src/server.ts:50-100]()

---

## Knowledge Packs System

Knowledge packs (also called "skill packs") are installable domain-specific modules.

### Structure

```
{domain}/
├── manifest.yaml      # Pack metadata
├── selectors/         # CSS selectors for UI elements
├── models/            # Data models and entity types
├── states/            # State machine definitions
├── roles/            # Role-specific playbooks
└── tasks/            # Common task procedures
```

### Registry Commands

| Command | Description |
|---------|-------------|
| `sidebutton registry add <url>` | Add a registry |
| `sidebutton registry list` | List installed registries |
| `sidebutton install <domain>` | Install a knowledge pack |
| `sidebutton uninstall <domain>` | Remove a knowledge pack |
| `sidebutton search [query]` | Search packs across registries |

资料来源：[packages/sidebutton/README.md:30-50]()

### Publishing Knowledge Packs

```bash
# Initialize a new pack
sidebutton init github.com

# Validate
sidebutton validate ./github.com

# Publish to registry
sidebutton publish
```

资料来源：[packages/server/src/cli.ts:100-150]()

---

## Deployment Modes

### Local Development

```bash
# Start with dashboard and API
sidebutton

# Start with stdio for AI agent
sidebutton --stdio
```

### AI Agent Integration

**Claude Desktop:**

```json
{
  "mcpServers": {
    "sidebutton": {
      "command": "npx",
      "args": ["sidebutton", "--stdio"]
    }
  }
}
```

**Cursor:**

```json
{
  "mcpServers": {
    "sidebutton": {
      "url": "http://localhost:9876/mcp"
    }
  }
}
```

资料来源：[packages/sidebutton/README.md:60-90]()

---

## Security Considerations

### Authentication Flow

```mermaid
graph TD
    A[Login Command] --> B[Credentials Input]
    B --> C[Validate Credentials]
    C -->|Valid| D[Store Token]
    C -->|Invalid| E[Error]
    D --> F[Attach to Requests]

    F --> G[/api/* Requests]
    G --> H{Auth Required?}
    H -->|Yes| I[Verify Token]
    H -->|No| J[Allow]
    I -->|Valid| J
    I -->|Invalid| K[401 Unauthorized]
```

### Token Storage

- Tokens are stored in `~/.sidebutton/config.json`
- Not committed to version control
- Protected by file system permissions

资料来源：[packages/server/src/cli.ts:1-50]()

---

## Summary

The SideButton system architecture consists of:

1. **CLI Layer** - Entry point for users and AI agents
2. **Transport Layer** - stdio, SSE, HTTP, WebSocket support
3. **Server Layer** - Fastify MCP server with REST API
4. **Execution Layer** - Workflow parsing and step execution
5. **Provider Layer** - GitHub, Browser, Shell, LLM integrations
6. **UI Layer** - React dashboard for workflow management

The modular design allows:
- AI agents to execute complex browser automation workflows
- Users to create custom workflows via YAML
- Extensible provider system for new integrations
- Installable knowledge packs for domain-specific automation

---

<a id='packages-overview'></a>

## Package Overview

### 相关页面

相关主题：[System Architecture](#architecture), [Workflow Engine](#workflow-engine), [Chrome Extension](#chrome-extension)

<details>
<summary>相关源码文件</summary>

以下源码文件用于生成本页说明：

- [packages/core/package.json](https://github.com/sidebutton/sidebutton/blob/main/packages/core/package.json)
- [packages/server/package.json](https://github.com/sidebutton/sidebutton/blob/main/packages/server/package.json)
- [packages/dashboard/package.json](https://github.com/sidebutton/sidebutton/blob/main/packages/dashboard/package.json)
- [packages/sidebutton/package.json](https://github.com/sidebutton/sidebutton/blob/main/packages/sidebutton/package.json)
- [packages/core/src/index.ts](https://github.com/sidebutton/sidebutton/blob/main/packages/core/src/index.ts)
- [packages/core/README.md](https://github.com/sidebutton/sidebutton/blob/main/packages/core/README.md)
- [packages/server/README.md](https://github.com/sidebutton/sidebutton/blob/main/packages/server/README.md)
- [packages/sidebutton/README.md](https://github.com/sidebutton/sidebutton/blob/main/packages/sidebutton/README.md)
- [README.md](https://github.com/sidebutton/sidebutton/blob/main/README.md)
</details>

# Package Overview

SideButton is a browser automation platform organized as a monorepo with four primary packages. The system enables workflow-driven automation through YAML definitions, MCP (Model Context Protocol) integration, a REST API, and a Chrome extension. This document provides a comprehensive overview of each package's architecture, responsibilities, and interdependencies.

## Architecture Overview

SideButton follows a layered architecture pattern with clear separation of concerns across packages. The core workflow engine handles execution logic, the server package provides API endpoints and MCP connectivity, the CLI package offers command-line interaction, and the dashboard provides a web-based user interface.

```mermaid
graph TD
    User[User] --> CLI[CLI Package]
    User --> Dashboard[Dashboard Package]
    User --> MCP[MCP Client]
    User --> REST[REST API Client]
    
    CLI --> Server[Server Package]
    Dashboard --> Server
    MCP --> Server
    REST --> Server
    
    Server --> Core[Core Package]
    Server --> BrowserExtension[Chrome Extension]
    
    Core --> WorkflowEngine[Workflow Engine]
    Core --> Providers[Provider Integrations]
```

## Package Structure

The repository contains four main packages under the `packages/` directory:

| Package | Description |
|---------|-------------|
| `@sidebutton/core` | Core workflow engine and execution runtime |
| `@sidebutton/server` | MCP server, REST API, and embedded dashboard |
| `@sidebutton/sidebutton` | Command-line interface |
| `dashboard` | Frontend React application for the web UI |

## Core Package (`@sidebutton/core`)

The core package contains the fundamental workflow orchestration engine. It handles parsing, validation, and execution of YAML-based workflow definitions.

### Purpose and Scope

The core package is responsible for the runtime execution of automations. It provides the foundational primitives that the server package wraps with API endpoints. Workflows are defined in YAML and executed through a step-by-step interpreter that supports multiple action types.

### Core Exports

The package exposes three primary functions for workflow management:

```typescript
// packages/core/src/index.ts
export { parseWorkflow, validateWorkflow, executeWorkflow }
```

| Function | Purpose |
|----------|---------|
| `parseWorkflow` | Parse YAML workflow definition into internal representation |
| `validateWorkflow` | Validate workflow structure and step types |
| `executeWorkflow` | Execute a workflow with provided context and parameters |

### Step Types

The core package supports multiple categories of step types for workflow construction:

| Category | Steps |
|----------|-------|
| **Browser** | `navigate`, `click`, `type`, `scroll`, `hover`, `wait`, `extract`, `extractAll`, `exists`, `key` |
| **Shell** | `shell.run`, `terminal.open`, `terminal.run` |
| **LLM** | `llm.classify`, `llm.generate` |
| **Control** | `control.if`, `control.retry`, `control.stop`, `workflow.call` |
| **Data** | `data.first` |

### Provider Integrations

The core package includes provider implementations for external service integration. GitHub integration is implemented in `packages/core/src/providers/github.ts` and provides the following capabilities:

- `listPRs` - List pull requests
- `getPR` - Get pull request details
- `createPR` - Create a pull request
- `listIssues` - List repository issues
- `getIssue` - Get issue details

资料来源：[packages/core/src/providers/github.ts](packages/core/src/providers/github.ts)

## Server Package (`@sidebutton/server`)

The server package serves as the central hub for all external interactions with the workflow engine. It wraps the core package with MCP protocol support, REST API endpoints, and embeds the dashboard application.

### MCP Server

The MCP server implementation exposes workflow execution capabilities to MCP-compatible clients including Claude Desktop and Cursor. The server runs on port 9876 by default and provides the following tools:

| MCP Tool | Description |
|---------|-------------|
| `run_workflow` | Execute a workflow by ID |
| `list_workflows` | List available workflows |
| `get_workflow` | Get workflow YAML definition |
| `get_run_log` | Get execution log |
| `list_run_logs` | List recent executions |
| `get_browser_status` | Check extension connection |
| `capture_page` | Capture page selectors |
| `navigate` | Navigate browser to URL |
| `snapshot` | Get accessibility tree |
| `click` | Click element |
| `type` | Type text |
| `scroll` | Scroll page |
| `extract` | Extract text |
| `screenshot` | Capture screenshot |
| `hover` | Hover over element |

资料来源：[packages/server/README.md](packages/server/README.md)

### REST API

The server exposes 60+ JSON endpoints for external integrations. The API supports the same workflow operations available through MCP, enabling programmatic access from any HTTP client.

```bash
# Run a workflow
curl -X POST http://localhost:9876/api/workflows/check_ticket/run \
  -H "Content-Type: application/json" \
  -d '{"params": {"ticket_id": "PROJ-123"}}'

# List workflows
curl http://localhost:9876/api/workflows

# Get run log
curl http://localhost:9876/api/runs/latest
```

资料来源：[README.md](README.md)

### Embedded Dashboard

The server embeds a React-based dashboard application served from `packages/dashboard/`. The dashboard provides:

- Workflow browsing and execution
- Run log viewing
- Shortcut management
- Action library
- Workflow recording

### Workflow Engine Extensions

The server extends the core workflow engine with 34+ step types, providing additional capabilities beyond the core package:

| Extended Category | Additional Steps |
|-------------------|------------------|
| **Browser** | `fill`, `press_key`, `scroll_into_view`, `evaluate`, `select_option` |
| **Extended** | `check_writing_quality`, `capture_page` |

### Knowledge Packs

The server supports knowledge packs (also called skill packs) that provide domain-specific knowledge for AI-driven tasks. Knowledge packs include:

- **Selectors** — CSS selectors for UI elements
- **Data models** — entity types, fields, relationships, valid states
- **State machines** — valid transitions per state
- **Role playbooks** — role-specific procedures (QA, SE, PM, SD)
- **Common tasks** — step-by-step procedures, gotchas, edge cases

资料来源：[README.md](README.md)

## CLI Package (`@sidebutton/sidebutton`)

The CLI package provides command-line interaction with the SideButton platform. It serves as the primary interface for local development and scripting workflows.

### Installation

```bash
npm install -g sidebutton
```

### Core Commands

| Command | Description |
|---------|-------------|
| `sidebutton` | Start server (default port 9876) |
| `sidebutton --stdio` | Start with stdio transport (Claude Desktop) |
| `sidebutton -p 8080` | Start on custom port |

### Workflow Management

| Command | Description |
|---------|-------------|
| `sidebutton list` | List available workflows |
| `sidebutton run <id>` | Run a workflow |
| `sidebutton status` | Check server status |

### Knowledge Pack Commands

```bash
# Registry management
sidebutton registry add <path|url>   # Install from registry
sidebutton registry update [name]    # Update installed packs
sidebutton registry remove <name>    # Remove registry and packs
sidebutton registry list             # Show registries

# Search and install
sidebutton search [query]            # Search packs across registries
sidebutton install <path|url|name>   # Install a single knowledge pack
sidebutton uninstall <domain>        # Remove a knowledge pack

# Development
sidebutton init [domain]             # Scaffold a new knowledge pack
sidebutton validate [path]           # Lint and validate a knowledge pack
sidebutton publish                   # Publish to registry
```

资料来源：[packages/sidebutton/README.md](packages/sidebutton/README.md)

### Publishing Workflows

The CLI supports publishing skill packs to remote registries via the publish command:

```bash
sidebutton publish
```

This command sends the manifest and all associated files to the configured remote registry at `${REMOTE_BASE_URL}/api/skill-packs/publish`. Authentication is required via bearer token.

## Dashboard Package

The dashboard is a React-based frontend application that provides the visual interface for managing workflows and viewing execution logs.

### Entry Point

The dashboard application is mounted in `packages/dashboard/index.html`:

```html
<div id="app"></div>
<script type="module" src="/src/main.ts"></script>
```

### Key Pages

| Page | Route | Purpose |
|------|-------|---------|
| Dashboard Home | `/` | Display workflow shortcuts |
| Actions | `/actions` | Browse and search available workflows |
| Action Detail | `/actions/:id` | View workflow details and run |
| Workflows | `/workflows` | Library of workflows |
| Workflow Detail | `/workflows/:id` | Read-only workflow view |
| Recordings | `/recordings` | View recorded automations |
| Run Logs | `/run-logs` | View execution history |

## Chrome Extension

While not a separate npm package, the Chrome extension is an integral part of the SideButton ecosystem. It provides browser automation capabilities with:

- 40+ browser commands (navigate, click, type, extract, scroll, wait, snapshot)
- Real DOM access via CSS selectors
- Recording mode for capturing manual actions as workflows
- Embed buttons for injecting action buttons into web pages
- WebSocket connection with stable reconnection

资料来源：[README.md](README.md)

## Dependency Graph

The packages have the following dependency relationships:

```mermaid
graph LR
    CLI["@sidebutton/sidebutton"] --> Server["@sidebutton/server"]
    Dashboard["dashboard"] --> Server
    Server --> Core["@sidebutton/core"]
    Server --> BrowserExt["Chrome Extension"]
    
    style Core fill:#e1f5fe
    style Server fill:#fff3e0
    style CLI fill:#e8f5e9
    style Dashboard fill:#f3e5f5
```

| Consumer | Dependency | Relationship |
|----------|------------|---------------|
| `@sidebutton/sidebutton` | `@sidebutton/server` | CLI wraps server functionality |
| `dashboard` | `@sidebutton/server` | Dashboard embeds in server |
| `@sidebutton/server` | `@sidebutton/core` | Server uses core for workflow execution |

## Technology Stack

| Layer | Technology |
|-------|------------|
| Runtime | Node.js |
| Core Engine | TypeScript |
| Server | Fastify |
| API Protocol | MCP (Model Context Protocol) |
| Dashboard | React, Vite |
| Browser Automation | Chrome Extension (Manifest V3) |
| Package Manager | pnpm (monorepo) |

## Quick Start

### Running the Server

```bash
# Start the server
sidebutton

# Or with custom port
sidebutton -p 8080
```

### Running a Workflow

```bash
# List available workflows
sidebutton list

# Run a specific workflow
sidebutton run <workflow-id>
```

### Integrating with Claude Desktop

Add to `~/Library/Application Support/Claude/claude_desktop_config.json`:

```json
{
  "mcpServers": {
    "sidebutton": {
      "command": "npx",
      "args": ["sidebutton", "--stdio"]
    }
  }
}
```

## Related Documentation

- [Full Documentation](https://docs.sidebutton.com)
- [GitHub Repository](https://github.com/sidebutton/sidebutton)
- [Website](https://sidebutton.com)
- [Chrome Web Store](https://chromewebstore.google.com/detail/sidebutton/odaefhmdmgijnhdbkfagnlnmobphgkij)

---

<a id='workflow-engine'></a>

## Workflow Engine

### 相关页面

相关主题：[Step Types Reference](#step-types), [Workflow Examples](#workflow-examples), [Package Overview](#packages-overview)

<details>
<summary>Relevant Source Files</summary>

以下源码文件用于生成本页说明：

- [README.md](https://github.com/sidebutton/sidebutton/blob/main/README.md)
- [packages/core/README.md](https://github.com/sidebutton/sidebutton/blob/main/packages/core/README.md)
- [packages/server/src/server.ts](https://github.com/sidebutton/sidebutton/blob/main/packages/server/src/server.ts)
- [packages/server/src/mcp/handler.ts](https://github.com/sidebutton/sidebutton/blob/main/packages/server/src/mcp/handler.ts)
- [packages/core/src/providers/github.ts](https://github.com/sidebutton/sidebutton/blob/main/packages/core/src/providers/github.ts)
- [packages/server/defaults/roles/software-engineer.md](https://github.com/sidebutton/sidebutton/blob/main/packages/server/defaults/roles/software-engineer.md)
- [packages/server/defaults/targets/_provider-github-cli.md](https://github.com/sidebutton/sidebutton/blob/main/packages/server/defaults/targets/_provider-github-cli.md)
</details>

# Workflow Engine

## Overview

The SideButton Workflow Engine is a YAML-first orchestration system that enables automation of complex tasks through a declarative step-based approach. It provides 34+ built-in step types spanning browser automation, shell execution, LLM integration, and programmatic control flow operations.

The engine executes workflows defined in YAML format, supporting variable interpolation, conditional branching, retry logic, and cross-workflow chaining. Workflows can be triggered via MCP (Model Context Protocol), the REST API, or the dashboard interface.

资料来源：[README.md](https://github.com/sidebutton/sidebutton/blob/main/README.md)

## Architecture

### Core Components

```mermaid
graph TD
    A[Workflow YAML] --> B[Parser]
    B --> C[AST / Step Definitions]
    C --> D[Executor]
    D --> E[Step Handlers]
    
    F[Variables/Context] --> D
    D --> F
    
    G[MCP Client] --> D
    H[REST API] --> D
    I[Dashboard] --> D
```

The engine consists of three primary layers:

1. **Parser**: Validates and parses YAML workflow definitions into structured step objects
2. **Executor**: Orchestrates step execution, manages state, and handles control flow
3. **Step Handlers**: Provider-specific implementations for each step type

资料来源：[packages/core/README.md](https://github.com/sidebutton/sidebutton/blob/main/packages/core/README.md)

### Workflow Execution Flow

```mermaid
sequenceDiagram
    participant Client
    participant Executor
    participant StepHandler
    participant Context
    
    Client->>Executor: executeWorkflow(workflow, context)
    Executor->>Executor: Parse YAML
    Loop For Each Step
        Executor->>StepHandler: Execute Step
        StepHandler->>Context: Update Variables
        StepHandler-->>Executor: Result
        Executor->>Executor: Check Control Flow
    End
    Executor-->>Client: Execution Result
```

## Step Types

The workflow engine supports five major categories of steps:

资料来源：[README.md:1-50]()

### Browser Steps

Used for web automation tasks. All browser steps require an active browser connection.

| Step Type | Description |
|-----------|-------------|
| `browser.navigate` | Open a URL in the browser |
| `browser.click` | Click an element by CSS selector |
| `browser.type` | Type text into an input element |
| `browser.fill` | Fill input value directly (React-compatible) |
| `browser.scroll` | Scroll the page |
| `browser.extract` | Extract text from an element into a variable |
| `browser.extractAll` | Extract all matching elements |
| `browser.extractMap` | Extract structured data from repeated elements |
| `browser.wait` | Wait for element or fixed delay |
| `browser.exists` | Check if element exists (returns boolean) |
| `browser.hover` | Position cursor over element |
| `browser.key` | Send keyboard keys |
| `browser.snapshot` | Capture accessibility tree snapshot |
| `browser.injectCSS` | Inject CSS styles into page |
| `browser.injectJS` | Execute JavaScript in page context |
| `browser.select_option` | Select dropdown option |
| `browser.scrollIntoView` | Scroll element into viewport |

资料来源：[README.md:44-63]()

### Shell Steps

Execute command-line operations on the host system.

| Step Type | Description |
|-----------|-------------|
| `shell.run` | Execute a bash/shell command |
| `terminal.open` | Open a visible terminal window (macOS) |
| `terminal.run` | Run command in visible terminal window |

资料来源：[README.md:64-66]()

### LLM Steps

Integrate with large language models for AI-driven operations.

| Step Type | Description |
|-----------|-------------|
| `llm.classify` | Structured classification with predefined categories |
| `llm.generate` | Free-form text generation |
| `llm.decide` | Make decisions based on context |

Supported providers include Ollama (local), OpenAI, Anthropic, and Google.

资料来源：[README.md:67-73]()

### Control Flow Steps

Manage workflow execution logic and branching.

| Step Type | Description |
|-----------|-------------|
| `control.if` | Conditional branching based on expression evaluation |
| `control.retry` | Retry block with configurable backoff |
| `control.stop` | End workflow with success/error message |
| `workflow.call` | Call another workflow with parameters |
| `variable.set` | Set a variable value |

资料来源：[README.md:74-78]()

### Data Steps

Manipulate and transform data between steps.

| Step Type | Description |
|-----------|-------------|
| `data.first` | Extract first item from a list |
| `data.get` | Retrieve stored data value |

资料来源：[README.md:79-82]()

## Variable Interpolation

The workflow engine uses `{{variable}}` syntax for referencing extracted values and parameters.

```yaml
steps:
  - type: browser.extract
    selector: ".username"
    as: user
  - type: shell.run
    cmd: "echo 'Hello, {{user}}!'"
```

Variables can be:
- Extracted from page elements using `as` parameter
- Passed as workflow parameters
- Set via `variable.set` steps
- Returned from nested workflow calls

资料来源：[README.md:103-115]()

## Workflow Definition Schema

A workflow is defined with the following structure:

```yaml
id: workflow_identifier
title: "Display Title"
description: "What this workflow does"
params:
  param_name: string  # or number, boolean, array, object
steps:
  - type: browser.navigate
    url: "https://example.com"
  - type: browser.extract
    selector: ".element"
    as: extracted_value
```

### Required Fields

| Field | Type | Description |
|-------|------|-------------|
| `id` | string | Unique workflow identifier |
| `title` | string | Human-readable title |
| `steps` | array | Ordered list of step definitions |

### Optional Fields

| Field | Type | Description |
|-------|------|-------------|
| `description` | string | Workflow description |
| `params` | object | Parameter definitions with types |
| `category` | string | Workflow category |
| `platform` | string | Target platform |

资料来源：[packages/server/src/server.ts](https://github.com/sidebutton/sidebutton/blob/main/packages/server/src/server.ts)

## Control Flow Patterns

### Conditional Branching

```yaml
- type: control.if
  condition: "{{current_status}} != 'Done'"
  then:
    - type: llm.classify
      prompt: "Should this ticket be closed?"
      classes: [close, keep_open]
      as: decision
```

资料来源：[README.md:117-125]()

### Retry with Backoff

```yaml
- type: control.retry
  max_attempts: 3
  backoff: 1000  # milliseconds
  steps:
    - type: shell.run
      cmd: "curl -f https://api.example.com/health"
```

### Workflow Chaining

```yaml
- type: workflow.call
  workflow_id: another_workflow
  params:
    input_value: "{{extracted_data}}"
```

资料来源：[packages/server/defaults/roles/software-engineer.md](https://github.com/sidebutton/sidebutton/blob/main/packages/server/defaults/roles/software-engineer.md)

## MCP Integration

The workflow engine exposes functionality through the Model Context Protocol, enabling AI assistants to execute and manage workflows.

资料来源：[packages/server/README.md](https://github.com/sidebutton/sidebutton/blob/main/packages/server/README.md)

### Available MCP Tools

| Tool | Description |
|------|-------------|
| `run_workflow` | Execute a workflow by ID |
| `list_workflows` | List available workflows |
| `get_workflow` | Get workflow YAML definition |
| `get_run_log` | Get execution log for a run |
| `list_run_logs` | List recent workflow executions |
| `get_browser_status` | Check browser extension connection |
| `capture_page` | Capture selectors from current page |

### MCP Tool Handlers

```mermaid
graph LR
    A[MCP Request] --> B[Handler]
    B --> C{tool_name}
    C -->|list_workflows| D[toolListWorkflows]
    C -->|get_workflow| E[toolGetWorkflow]
    C -->|list_run_logs| F[toolListRunLogs]
    C -->|run_workflow| G[executeWorkflow]
```

资料来源：[packages/server/src/mcp/handler.ts](https://github.com/sidebutton/sidebutton/blob/main/packages/server/src/mcp/handler.ts)

### List Workflows Response Format

```typescript
interface WorkflowListItem {
  workflow: Workflow;
  source: 'actions' | 'workflows';
}

// Response includes:
// - workflow.id
// - workflow.title
// - workflow.params (if any)
// - source identifier
```

资料来源：[packages/server/src/mcp/handler.ts:1-50]()

## GitHub Integration

The engine provides specialized steps for GitHub operations through the GitHub CLI provider.

资料来源：[packages/core/src/providers/github.ts](https://github.com/sidebutton/sidebutton/blob/main/packages/core/src/providers/github.ts)

### GitHub Step Types

| Step | Description |
|------|-------------|
| `git.listPRs` | List pull requests with state filter |
| `git.getPR` | Get PR details and diff statistics |
| `git.createPR` | Create a new pull request |
| `git.listIssues` | List issues with filters |
| `git.getIssue` | Get issue details |
| `issues.create` | Create an issue |
| `issues.comment` | Add a comment to issue/PR |
| `issues.transition` | Change issue status |

### Create Pull Request Parameters

```typescript
interface CreatePRParams {
  repo?: string;       // Repository in format "owner/repo"
  title: string;       // PR title
  body?: string;       // PR description
  head: string;        // Head branch name
  base?: string;       // Base branch (default: main)
}
```

资料来源：[packages/core/src/providers/github.ts:1-50]()

### Common GitHub Workflows

**Review Open PRs:**
1. `git.listPRs` with `state: "open"` — view pending reviews
2. `git.getPR` with PR number — read details and diff stats
3. Use browser tools for visual diff review

**Autonomous Development Cycle:**
1. `git.listIssues` — browse available issues
2. `git.getIssue` — read candidate details
3. `llm.decide` — select best issue based on priority
4. `issues.comment` — signal work is starting
5. `git.createPR` — submit completed work

资料来源：[packages/server/defaults/targets/_provider-github-cli.md](https://github.com/sidebutton/sidebutton/blob/main/packages/server/defaults/targets/_provider-github-cli.md)

## Execution Context

Each workflow execution maintains a context object that stores:

- **Variables**: Extracted values and set variables
- **Parameters**: Input parameters passed to the workflow
- **Results**: Step execution results
- **Logs**: Execution logs for debugging

```yaml
# Context is passed to executeWorkflow
context:
  params:
    ticket_id: "PROJ-123"
  variables:
    current_status: "In Progress"
```

## Error Handling

The engine provides multiple mechanisms for error handling:

1. **control.retry**: Automatically retry failed steps with exponential backoff
2. **control.stop**: Gracefully end workflow with error message
3. **Conditional checks**: Use `browser.exists` to verify elements before actions

```yaml
steps:
  - type: browser.exists
    selector: ".error-message"
    as: has_error
  - type: control.if
    condition: "{{has_error}}"
    then:
      - type: control.stop
        status: error
        message: "Error detected on page"
```

## Dashboard Integration

The workflow engine integrates with the SideButton dashboard for:

- **Workflow Library**: Browse and install workflows
- **Run History**: View execution logs and results
- **Quick Run**: Execute workflows with parameter inputs

资料来源：[packages/server/src/server.ts](https://github.com/sidebutton/sidebutton/blob/main/packages/server/src/server.ts)

### Workflow Installation Flow

1. User navigates to workflow in library
2. Dashboard renders install confirmation page
3. POST request submits workflow to local server
4. Server saves workflow to user's action library
5. Success page confirms installation

```mermaid
graph TD
    A[Browse Workflow] --> B[Click Install]
    B --> C[POST /install/:workflowId]
    C --> D[Server Validates]
    D --> E[Save to Actions Library]
    E --> F[Show Success Page]
```

## Best Practices

1. **Use extracted variables immediately**: Variable references should occur close to their extraction step
2. **Add wait conditions**: Use `browser.wait` before extracting dynamic content
3. **Handle missing elements**: Check existence before interacting
4. **Limit retry attempts**: Configure appropriate `max_attempts` for unreliable operations
5. **Keep workflows focused**: Prefer workflow chaining over monolithic single workflows

资料来源：[packages/server/defaults/roles/qa.md](https://github.com/sidebutton/sidebutton/blob/main/packages/server/defaults/roles/qa.md)

---

<a id='step-types'></a>

## Step Types Reference

### 相关页面

相关主题：[Workflow Engine](#workflow-engine), [Workflow Examples](#workflow-examples)

<details>
<summary>相关源码文件</summary>

以下源码文件用于生成本页说明：

- [packages/core/README.md](https://github.com/sidebutton/sidebutton/blob/main/packages/core/README.md)
- [packages/server/README.md](https://github.com/sidebutton/sidebutton/blob/main/packages/server/README.md)
- [packages/sidebutton/README.md](https://github.com/sidebutton/sidebutton/blob/main/packages/sidebutton/README.md)
- [README.md](https://github.com/sidebutton/sidebutton/blob/main/README.md)
- [packages/core/src/providers/github.ts](https://github.com/sidebutton/sidebutton/blob/main/packages/core/src/providers/github.ts)
- [packages/server/src/mcp/handler.ts](https://github.com/sidebutton/sidebutton/blob/main/packages/server/src/mcp/handler.ts)
</details>

# Step Types Reference

SideButton workflows are composed of discrete steps that define actions to be executed sequentially. Each step type serves a specific purpose—from interacting with web pages to executing shell commands, making decisions via LLM, or orchestrating control flow. Understanding the available step types is essential for building effective automations.

## Overview

Steps are the fundamental building blocks of SideButton workflows. They are defined within workflow YAML files and specify:

- **What action** to perform (the step type)
- **What parameters** to use for that action
- **How to handle results** (variable assignment, conditional logic)

资料来源：[packages/core/README.md]()

```mermaid
graph TD
    A[Workflow YAML] --> B[Step Execution Engine]
    B --> C[Browser Steps]
    B --> D[Shell Steps]
    B --> E[LLM Steps]
    B --> F[Control Steps]
    B --> G[Data Steps]
    B --> H[Git Steps]
    B --> I[Issues Steps]
```

## Step Type Categories

SideButton organizes steps into the following categories:

| Category | Purpose | Primary Use Case |
|----------|---------|------------------|
| Browser | Web page interaction | UI automation, data extraction |
| Shell | Command execution | Build tools, CLI operations |
| LLM | AI-powered decisions | Classification, content generation |
| Control | Flow control | Conditionals, retries, sub-workflows |
| Data | Data manipulation | Variable assignment, extraction |
| Git | Version control | PRs, issues, repository ops |
| Issues | Issue tracking | Bug tracking, task management |

资料来源：[packages/core/README.md]()

---

## Browser Steps

Browser steps interact with web pages through a connected Chrome extension. These steps provide real DOM access via CSS selectors, enabling precise UI automation.

### Available Browser Steps

| Step Type | Description |
|-----------|-------------|
| `navigate` | Navigate browser to a URL |
| `click` | Click an element by selector |
| `type` | Type text into an input field |
| `scroll` | Scroll the page |
| `hover` | Hover over an element |
| `wait` | Wait for condition or duration |
| `extract` | Extract text from single element |
| `extractAll` | Extract text from all matching elements |
| `extract_map` | Extract structured data from repeated elements |
| `exists` | Check if element exists |
| `key` | Press keyboard key |
| `screenshot` | Capture page screenshot |
| `snapshot` | Get accessibility tree snapshot |
| `select_option` | Select dropdown option |

资料来源：[README.md]()

### Common Parameters

| Parameter | Type | Description |
|-----------|------|-------------|
| `selector` | string | CSS selector for target element |
| `url` | string | Target URL (for navigate) |
| `text` | string | Text to type or extract |
| `timeout` | number | Wait timeout in milliseconds |
| `as` | string | Variable name to store result |

### Example: Basic Browser Workflow

```yaml
steps:
  - type: browser.navigate
    url: "https://github.com/sidebutton/sidebutton"

  - type: browser.snapshot

  - type: browser.extract
    selector: ".readme h1"
    as: repo_title

  - type: browser.extractAll
    selector: ".file-info a"
    as: file_links
```

### Example: Form Interaction

```yaml
steps:
  - type: browser.navigate
    url: "https://example.com/search"

  - type: browser.type
    selector: "#search-input"
    text: "{{query}}"

  - type: browser.click
    selector: "#search-button"

  - type: browser.wait
    selector: ".results"
    timeout: 5000
```

---

## Shell Steps

Shell steps execute command-line commands on the local system. They support both synchronous execution and interactive terminal sessions.

### Available Shell Steps

| Step Type | Description |
|-----------|-------------|
| `shell.run` | Execute a shell command |
| `terminal.open` | Open an interactive terminal |
| `terminal.run` | Run command in open terminal |

资料来源：[packages/core/README.md]()

### Parameters

| Parameter | Type | Description |
|-----------|------|-------------|
| `cmd` | string | Shell command to execute |
| `cwd` | string | Working directory |
| `env` | object | Environment variables |
| `timeout` | number | Execution timeout |
| `async` | boolean | Run asynchronously |

### Example: Shell Command Execution

```yaml
steps:
  - type: shell.run
    cmd: "pnpm build"
    cwd: "/path/to/project"
    timeout: 120000
    as: build_output

  - type: shell.run
    cmd: "git status --short"
    as: git_status
```

---

## LLM Steps

LLM steps leverage AI models for content generation, classification, and decision-making. These steps enable intelligent automation that can adapt to context.

### Available LLM Steps

| Step Type | Description |
|-----------|-------------|
| `llm.generate` | Generate content with LLM |
| `llm.classify` | Classify input into categories |
| `llm.decide` | Make decisions based on context |
| `llm.extract` | Extract structured data from unstructured text |

资料来源：[README.md](), [packages/core/README.md]()

### Parameters

| Parameter | Type | Description |
|-----------|------|-------------|
| `prompt` | string | LLM prompt template |
| `model` | string | Model to use |
| `temperature` | number | Sampling temperature |
| `as` | string | Variable name for result |

### Example: Content Generation

```yaml
steps:
  - type: llm.generate
    prompt: |
      Write a concise changelog entry for this commit:

      === COMMIT ===
      {{commit_message}}

      Format: - {{change_summary}}
    as: changelog_entry
```

### Example: Classification

```yaml
steps:
  - type: llm.classify
    input: "{{issue_body}}"
    categories:
      - bug
      - feature
      - documentation
      - question
    as: issue_type
```

---

## Control Steps

Control steps manage workflow execution flow, enabling conditionals, loops, error handling, and sub-workflow invocation.

### Available Control Steps

| Step Type | Description |
|-----------|-------------|
| `control.if` | Conditional execution |
| `control.retry` | Retry failed steps |
| `control.stop` | Stop execution with message |
| `workflow.call` | Call another workflow |

资料来源：[packages/core/README.md]()

### control.if

Execute steps conditionally based on a boolean expression.

```yaml
steps:
  - type: control.if
    condition: "{{is_authenticated}}"
    then:
      - type: browser.click
        selector: ".dashboard-link"
    else:
      - type: browser.click
        selector: ".login-button"
```

### control.retry

Retry a step or block of steps on failure.

```yaml
steps:
  - type: control.retry
    attempts: 3
    delay: 1000
    steps:
      - type: shell.run
        cmd: "curl -f https://api.example.com/health"
```

### control.stop

Stop workflow execution and output a message.

```yaml
steps:
  - type: control.stop
    message: |
      === Processing Complete ===

      Extracted {{item_count}} items
      Status: {{final_status}}
```

### workflow.call

Invoke another workflow as a subroutine.

```yaml
steps:
  - type: workflow.call
    workflow_id: "send_notification"
    params:
      channel: "#alerts"
      message: "{{summary}}"
```

---

## Data Steps

Data steps manipulate variables and extract information for use by subsequent steps.

### Available Data Steps

| Step Type | Description |
|-----------|-------------|
| `data.first` | Get first item from array |
| `data.get` | Get value from object |
| `data.set` | Set a variable value |
| `variable.set` | Set a workflow variable |

资料来源：[packages/core/README.md](), [README.md]()

### Parameters

| Parameter | Type | Description |
|-----------|------|-------------|
| `from` | string | Source variable to extract from |
| `default` | any | Default value if not found |
| `as` | string | Variable name for result |

### Example: Data Extraction

```yaml
steps:
  - type: browser.extract
    selector: ".user-profile"
    as: profile_html

  - type: data.first
    from: "{{profile_html}}"
    as: first_profile

  - type: data.set
    key: "user_name"
    value: "{{first_profile.name}}"
```

---

## Git Steps

Git steps interact with GitHub repositories via the `gh` CLI tool. These steps support pull requests, issues, and repository operations.

### Available Git Steps

| Step Type | Description |
|-----------|-------------|
| `git.listPRs` | List pull requests |
| `git.getPR` | Get PR details |
| `git.createPR` | Create pull request |
| `git.listIssues` | List issues |
| `git.getIssue` | Get issue details |

资料来源：[packages/core/src/providers/github.ts](), [README.md]()

### Parameters

| Parameter | Type | Description |
|-----------|------|-------------|
| `repo` | string | Repository (owner/repo) |
| `state` | string | Issue/PR state (open, closed, all) |
| `labels` | string | Filter by labels |
| `limit` | number | Maximum results |

### Example: List Pull Requests

```yaml
steps:
  - type: git.listPRs
    repo: "sidebutton/sidebutton"
    state: "open"
    limit: 10
    as: open_prs

  - type: control.stop
    message: |
      === Open PRs ===
      {{open_prs}}
```

### Example: Create Pull Request

```yaml
steps:
  - type: git.createPR
    repo: "sidebutton/sidebutton"
    title: "feat: add new automation step"
    head: "feature-branch"
    base: "main"
    body: |
      ## Summary
      This PR adds support for...

      ## Testing
      - [ ] Unit tests pass
      - [ ] Manual testing completed
    as: pr_result
```

---

## Issues Steps

Issues steps manage issue tracking across connected providers. They support searching, creating, and updating issues.

### Available Issues Steps

| Step Type | Description |
|-----------|-------------|
| `issues.search` | Search for issues |
| `issues.get` | Get issue details |
| `issues.create` | Create new issue |
| `issues.comment` | Add comment to issue |
| `issues.transition` | Change issue status |
| `issues.attach` | Attach file to issue |

资料来源：[README.md]()

### Parameters

| Parameter | Type | Description |
|-----------|------|-------------|
| `query` | string | Search query |
| `repo` | string | Repository |
| `title` | string | Issue title |
| `body` | string | Issue body |
| `labels` | array | Issue labels |

### Example: Search and Create Issue

```yaml
steps:
  - type: issues.search
    query: "is:issue is:open label:bug"
    repo: "sidebutton/sidebutton"
    as: existing_bugs

  - type: control.if
    condition: "{{existing_bugs.length}} == 0"
    then:
      - type: issues.create
        repo: "sidebutton/sidebutton"
        title: "Bug: {{bug_description}}"
        body: |
          ## Description
          {{bug_description}}

          ## Steps to Reproduce
          1.
          2.
          3.
        labels: ["bug"]
        as: new_issue
```

---

## Variable Interpolation

Variables are referenced throughout steps using `{{variable}}` syntax. This enables dynamic values from previous step results.

```yaml
steps:
  - type: browser.extract
    selector: ".username"
    as: user

  - type: shell.run
    cmd: "echo 'Hello, {{user}}!'"

  - type: llm.generate
    prompt: "Write a welcome message for user: {{user}}"
    as: welcome_message
```

### Variable Scoping

| Scope | Description |
|-------|-------------|
| `{{step_id.value}}` | Result from specific step |
| `{{params.name}}` | Workflow parameter value |
| `{{env.VAR_NAME}}` | Environment variable |

---

## Step Execution Order

Steps execute sequentially by default. The execution order can be controlled through:

1. **Sequential**: Steps run in definition order
2. **Conditional**: `control.if` skips or executes blocks
3. **Retry**: `control.retry` repeats failed steps
4. **Sub-workflow**: `workflow.call` executes external workflows

```mermaid
graph LR
    A[Step 1] --> B{control.if}
    B -->|condition true| C[Step 2a]
    B -->|condition false| D[Step 2b]
    C --> E[Step 3]
    D --> E
    E --> F[control.retry]
    F -->|success| G[Step 4]
    F -->|failure| F
    G --> H[workflow.call]
    H --> I[Step 5]
```

---

## Error Handling

### Retry on Failure

```yaml
steps:
  - type: control.retry
    attempts: 3
    delay: 2000
    steps:
      - type: shell.run
        cmd: "npm test"
```

### Conditional Failure Handling

```yaml
steps:
  - type: control.if
    condition: "{{command_success}}"
    then:
      - type: browser.click
        selector: ".continue-button"
    else:
      - type: issues.create
        title: "Deployment failed"
        body: "Command exited with error"
```

---

## Best Practices

### 1. Use Specific Selectors

Prefer specific CSS selectors over generic ones:

```yaml
# Good
- type: browser.click
  selector: "#main-nav .settings-link"

# Avoid
- type: browser.click
  selector: "a:nth-child(2)"
```

### 2. Add Timeouts for Dynamic Content

```yaml
steps:
  - type: browser.wait
    selector: ".loading-complete"
    timeout: 10000
```

### 3. Store Intermediate Results

```yaml
steps:
  - type: browser.extract
    selector: ".data-table tr"
    as: rows

  - type: data.first
    from: "{{rows}}"
    as: first_row
```

### 4. Use LLM for Dynamic Decisions

```yaml
steps:
  - type: llm.classify
    input: "{{user_feedback}}"
    categories: ["positive", "negative", "neutral"]
    as: sentiment

  - type: control.if
    condition: "{{sentiment}} == 'negative'"
    then:
      - type: issues.create
        title: "Negative feedback received"
```

---

## Summary

Step types in SideButton provide a comprehensive toolkit for building automations:

| Category | Key Steps |
|----------|-----------|
| Browser | `navigate`, `click`, `type`, `extract`, `snapshot` |
| Shell | `shell.run`, `terminal.open` |
| LLM | `llm.generate`, `llm.classify`, `llm.decide` |
| Control | `control.if`, `control.retry`, `workflow.call` |
| Data | `data.first`, `data.get` |
| Git | `git.listPRs`, `git.createPR` |
| Issues | `issues.search`, `issues.create` |

For more details on workflow configuration and YAML syntax, refer to the [Workflow Configuration Reference](https://docs.sidebutton.com).

---

<a id='workflow-examples'></a>

## Workflow Examples

### 相关页面

相关主题：[Workflow Engine](#workflow-engine), [Step Types Reference](#step-types)

<details>
<summary>相关源码文件</summary>

以下源码文件用于生成本页说明：

- [bundles/github/workflows/create_release.yaml](https://github.com/sidebutton/sidebutton/blob/main/bundles/github/workflows/create_release.yaml)
- [bundles/github/workflows/github_pr_claude_review.yaml](https://github.com/sidebutton/sidebutton/blob/main/bundles/github/workflows/github_pr_claude_review.yaml)
- [workflows/llm_summarize.yaml](https://github.com/sidebutton/sidebutton/blob/main/workflows/llm_summarize.yaml)
- [workflows/wikipedia_open.yaml](https://github.com/sidebutton/sidebutton/blob/main/workflows/wikipedia_open.yaml)
- [packages/server/defaults/workflows/llm_summarize.yaml](https://github.com/sidebutton/sidebutton/blob/main/packages/server/defaults/workflows/llm_summarize.yaml)
</details>

# Workflow Examples

## Overview

Workflow Examples in SideButton are pre-built YAML-based automation sequences that demonstrate how to combine different step types to accomplish real-world tasks. These examples serve as both practical utilities and learning references for building custom automations.

The SideButton workflow system supports multiple integration methods depending on available providers:

| Provider Preference | Method | Reliability |
|---------------------|--------|-------------|
| 1st | API Provider | Fastest and most reliable |
| 2nd | CLI Tool | Good for git operations |
| 3rd | Browser Automation | Universal fallback |

资料来源：[packages/server/defaults/roles/software-engineer.md]()

## Workflow Structure

Every workflow in SideButton follows a standardized YAML format with the following core structure:

```yaml
id: workflow_identifier
title: "Human Readable Title"
description: "What this workflow accomplishes"
steps:
  - type: step.type
    property: value
```

资料来源：[CONTRIBUTING.md]()

### Basic Anatomy of a Workflow

```mermaid
graph TD
    A[YAML Workflow File] --> B[Workflow Engine]
    B --> C{Step Type Router}
    C -->|Browser| D[Browser Tools]
    C -->|Shell| E[Terminal/CLI]
    C -->|LLM| F[AI Provider]
    C -->|Control| G[Flow Control]
    C -->|Data| H[Data Manipulation]
    
    D --> I[Execute Action]
    E --> I
    F --> I
    G --> I
    H --> I
    
    I --> J[Run Log Entry]
```

## Step Types Reference

SideButton provides five categories of steps that can be combined in workflows:

资料来源：[packages/core/README.md]()

### Browser Steps

| Step | Purpose |
|------|---------|
| `navigate` | Navigate browser to URL |
| `click` | Click an element |
| `type` | Type text into an element |
| `scroll` | Scroll the page |
| `hover` | Hover over element |
| `wait` | Wait for condition |
| `extract` | Extract text from element |
| `extractAll` | Extract all matching elements |
| `exists` | Check element exists |
| `key` | Press keyboard key |
| `snapshot` | Get accessibility tree |
| `screenshot` | Capture screenshot |

资料来源：[README.md]()

### Shell/Terminal Steps

| Step | Purpose |
|------|---------|
| `shell.run` | Execute shell command |
| `terminal.open` | Open visible terminal window (macOS) |
| `terminal.run` | Run command in terminal window |

### LLM Steps

| Step | Purpose |
|------|---------|
| `llm.classify` | Structured classification with categories |
| `llm.generate` | Free-form text generation |
| `llm.decide` | AI-driven decision making |

### Control Flow Steps

| Step | Purpose |
|------|---------|
| `control.if` | Conditional branching |
| `control.retry` | Retry with backoff |
| `control.stop` | End workflow with message |
| `workflow.call` | Call another workflow with parameters |

### Data Steps

| Step | Purpose |
|------|---------|
| `data.first` | Extract first item from list |
| `data.get` | Get value from data object |

## GitHub Workflow Examples

SideButton includes pre-built workflows for GitHub automation stored in the `bundles/github/workflows/` directory.

### GitHub PR Claude Review

This workflow demonstrates how to automate PR review using Claude AI. The workflow follows a typical review pattern:

```mermaid
graph LR
    A[List Open PRs] --> B[Get PR Details]
    B --> C[Extract Diff Stats]
    C --> D[Navigate to PR]
    D --> E[Review Files Changed]
    E --> F[Generate Review Comment]
```

资料来源：[bundles/github/workflows/github_pr_claude_review.yaml]()

**Common PR Review Sequence:**

1. `git.listPRs` with `state: "open"` — see what needs review
2. `git.getPR` with number — read details and diff stats
3. Use browser tools for visual diff review if needed

资料来源：[packages/server/defaults/targets/_provider-github-cli.md]()

### Create Release Workflow

The release creation workflow demonstrates orchestrating multiple operations:

1. Open the release page using browser automation
2. Decide the next version tag based on commit history
3. Create the release with proper version naming

资料来源：[bundles/github/workflows/create_release.yaml]()

**Creating a PR after coding:**

1. `git.createPR` with title, head branch, base branch
2. `issues.comment` on related issue linking the PR

资料来源：[packages/server/defaults/targets/_provider-github-cli.md]()

## LLM-Based Workflows

### Summarize Workflow

The `llm_summarize.yaml` workflow demonstrates integration with AI providers for text processing:

```yaml
id: llm_summarize
title: "Summarize Content"
description: "Generate a summary using LLM"
steps:
  - type: llm.generate
    prompt: "{{input_text}}"
    instruction: "Provide a concise summary"
```

资料来源：[workflows/llm_summarize.yaml]()
资料来源：[packages/server/defaults/workflows/llm_summarize.yaml]()

**LLM Provider Support:**

LLM steps work with multiple providers:
- Ollama (local)
- OpenAI
- Anthropic
- Google

资料来源：[README.md]()

### Decision Workflows

The `llm.decide` step type enables autonomous decision-making:

```mermaid
graph TD
    A[Issue Received] --> B{llm.decide}
    B -->|Clear & well-scoped| C[Pick and start work]
    B -->|Ambiguous/blocked| D[Skip, pick next]
    B -->|Same priority| E[Prefer smaller scope]
    B -->|No suitable issues| F[Stop and report]
```

资料来源：[packages/server/defaults/roles/software-engineer.md]()

## Browser-Based Workflows

### Wikipedia Open Example

This demonstrates basic browser navigation and content extraction:

```yaml
id: wikipedia_open
title: "Open Wikipedia Page"
steps:
  - type: browser.navigate
    url: "{{wiki_url}}"
  - type: browser.snapshot
    as: page_content
  - type: browser.extract
    selector: "{{element_selector}}"
    as: extracted_text
```

资料来源：[workflows/wikipedia_open.yaml]()

**Best Practices for Browser Steps:**

1. Use `snapshot` to understand page structure before taking actions
2. Use `extract` to pull specific content from pages
3. Use `screenshot` for visual verification

资料来源：[packages/server/defaults/roles/software-engineer.md]()

## Variable Interpolation

All workflows support variable interpolation using `{{variable}}` syntax:

```yaml
steps:
  - type: browser.extract
    selector: ".username"
    as: user
  - type: shell.run
    cmd: "echo 'Hello, {{user}}!'"
  - type: llm.generate
    prompt: "Write a greeting for {{user}}"
```

资料来源：[README.md]()

## Parameterized Workflows

Workflows can accept parameters for flexibility:

```yaml
id: check_ticket_status
title: "Check Ticket Status"
params:
  ticket_id: string
steps:
  - type: browser.navigate
    url: "https://jira.example.com/browse/{{ticket_id}}"
  - type: browser.extract
    selector: "[data-testid='status-field']"
    as: current_status
```

资料来源：[README.md]()

## Execution Flow

```mermaid
sequenceDiagram
    participant User
    participant MCP as MCP Handler
    participant Engine as Workflow Engine
    participant Steps as Step Executors
    
    User->>MCP: run_workflow(workflow_id, params)
    MCP->>Engine: executeWorkflow(workflow, context)
    Engine->>Steps: Execute Step 1
    Steps-->>Engine: Step Result
    Engine->>Steps: Execute Step 2
    Steps-->>Engine: Step Result
    Engine->>MCP: Run Log Entry
    MCP-->>User: Execution Result
```

资料来源：[packages/server/src/mcp/handler.ts]()
资料来源：[packages/core/README.md]()

## Workflow Bundles

SideButton organizes related workflows into bundles. The GitHub bundle includes:

```json
{
  "name": "sidebutton/github",
  "version": "1.0.0",
  "title": "GitHub Automation",
  "description": "Workflows for GitHub releases, PR reviews, and repository management",
  "workflows": [
    "open_release_page.yaml",
    "decide_next_tag.yaml",
    "create_release.yaml",
    "github_pr_claude_review.yaml"
  ],
  "requires": {
    "llm": true,
    "browser": true
  }
}
```

资料来源：[bundles/github/bundle.json]()

## Adding Custom Workflows

The easiest way to contribute is by adding workflows to the `workflows/` directory:

```bash
# Create a new workflow file
cat > workflows/my_workflow.yaml << 'EOF'
id: my_workflow
title: "My Workflow"
description: "What this workflow does"
steps:
  - type: shell.run
    cmd: "echo 'Hello!'"
EOF
```

资料来源：[CONTRIBUTING.md]()

## See Also

- [Step Reference](https://docs.sidebutton.com) — Complete documentation for all step types
- [MCP Tools](https://docs.sidebutton.com/mcp) — Model Context Protocol integration
- [Server Documentation](../server/README.md) — Backend workflow execution
- [Core Engine](../core/README.md) — Workflow engine internals

---

<a id='mcp-server'></a>

## MCP Server Integration

### 相关页面

相关主题：[System Architecture](#architecture), [Chrome Extension](#chrome-extension), [Knowledge Packs](#knowledge-packs)

<details>
<summary>Related Source Files</summary>

以下源码文件用于生成本页说明：

- [packages/server/src/mcp/handler.ts](https://github.com/sidebutton/sidebutton/blob/main/packages/server/src/mcp/handler.ts)
- [packages/server/src/mcp/tools.ts](https://github.com/sidebutton/sidebutton/blob/main/packages/server/src/mcp/tools.ts)
- [packages/server/src/mcp/stdio.ts](https://github.com/sidebutton/sidebutton/blob/main/packages/server/src/mcp/stdio.ts)
- [packages/server/src/server.ts](https://github.com/sidebutton/sidebutton/blob/main/packages/server/src/server.ts)
- [packages/server/README.md](https://github.com/sidebutton/sidebutton/blob/main/packages/server/README.md)
</details>

# MCP Server Integration

## Overview

The SideButton MCP Server is the core integration layer that exposes browser automation tools, workflow execution capabilities, and knowledge pack management through the Model Context Protocol (MCP). This enables AI assistants like Claude Desktop, Claude Code, and Cursor to control web browsers and execute automated workflows using a standardized interface.

**资料来源：** [packages/server/README.md](https://github.com/sidebutton/sidebutton/blob/main/packages/server/README.md)

## Architecture

The MCP server is built as part of the `@sidebutton/server` package and supports multiple transport mechanisms for different AI assistant clients.

### Transport Modes

| Transport | Use Case | Configuration |
|-----------|----------|---------------|
| **HTTP/SSE** | Claude Code, Cursor | `type: "sse"`, `url: "http://localhost:9876/mcp"` |
| **stdio** | Claude Desktop | `command: "npx"`, `args: ["sidebutton", "--stdio"]` |
| **WebSocket** | Chrome Extension | Automatic reconnection support |

**资料来源：** [packages/sidebutton/README.md](https://github.com/sidebutton/sidebutton/blob/main/packages/sidebutton/README.md)

### Component Flow

```mermaid
graph TD
    subgraph "AI Assistant"
        A[Claude Desktop / Claude Code / Cursor]
    end
    
    subgraph "MCP Transport"
        B[stdio / HTTP-SSE]
    end
    
    subgraph "SideButton Server"
        C[MCP Handler]
        D[Tool Registry]
        E[Workflow Engine]
        F[Browser Controller]
    end
    
    subgraph "Browser Layer"
        G[Chrome Extension]
        H[Real DOM Access]
    end
    
    A --> B
    B --> C
    C --> D
    C --> E
    C --> F
    F <--> G
    G <--> H
```

## Tool Registry

The MCP server exposes a comprehensive set of tools organized by functionality. Each tool follows a consistent schema with annotations for the Claude Connectors Directory.

**资料来源：** [packages/server/src/mcp/tools.ts:1-50](https://github.com/sidebutton/sidebutton/blob/main/packages/server/src/mcp/tools.ts)

### Tool Annotations

| Annotation | Purpose | Example |
|------------|---------|---------|
| `title` | Human-readable display name | `"Run Workflow"` |
| `readOnlyHint` | Indicates observation-only tools | `true` for `snapshot` |
| `destructiveHint` | Indicates state-mutating tools | `true` for `run_workflow` |
| `openWorldHint` | Indicates external world interaction | `true` for browser tools |

**资料来源：** [packages/server/src/mcp/tools.ts:14-23](https://github.com/sidebutton/sidebutton/blob/main/packages/server/src/mcp/tools.ts)

## Workflow Tools

### Core Workflow Operations

| Tool | Description | Mutates State |
|------|-------------|---------------|
| `run_workflow` | Execute a workflow automation by ID | Yes |
| `list_workflows` | List all available workflows | No |
| `get_workflow` | Get workflow YAML definition | No |
| `list_run_logs` | List recent workflow executions | No |
| `get_run_log` | Get execution log for a specific run | No |

### run_workflow Parameters

```typescript
{
  workflow_id: string;  // Required: Unique identifier
  params?: {            // Optional: Key-value parameters
    [key: string]: string;
  };
}
```

**资料来源：** [packages/server/src/mcp/tools.ts:35-52](https://github.com/sidebutton/sidebutton/blob/main/packages/server/src/mcp/tools.ts)

## Browser Automation Tools

The MCP server provides direct browser control through the connected Chrome Extension.

### Navigation & State

| Tool | Description |
|------|-------------|
| `navigate` | Navigate browser to a URL |
| `snapshot` | Get page accessibility tree (DOM structure) |
| `screenshot` | Capture page screenshot |
| `get_browser_status` | Check extension connection status |
| `capture_page` | Capture CSS selectors from current page |

### Interaction Tools

| Tool | Description | Read-Only |
|------|-------------|-----------|
| `click` | Click an element by selector | No |
| `type` | Type text into an input element | No |
| `scroll` | Scroll the page | No |
| `hover` | Hover over an element | No |
| `extract` | Extract text from an element | Yes |
| `extract_all` | Extract text from all matching elements | Yes |
| `extract_map` | Extract structured data from repeated elements | Yes |
| `select_option` | Select a dropdown option | No |
| `wait` | Wait for element or condition | No |
| `exists` | Check if element exists | Yes |
| `key` | Press a keyboard key | No |

**资料来源：** [packages/server/README.md](https://github.com/sidebutton/sidebutton/blob/main/packages/server/README.md)

### Browser Tool Annotations

```typescript
{
  name: 'snapshot',
  description: 'Get page accessibility snapshot',
  inputSchema: {
    type: 'object',
    properties: {
      // Configuration options
    }
  },
  annotations: {
    title: 'Page Snapshot',
    readOnlyHint: true,      // Observation only
    openWorldHint: true      // Interacts with browser
  }
}
```

## Provider Integration Tools

SideButton integrates with external providers for enhanced functionality:

| Category | Tools |
|----------|-------|
| **Git** | `git.listPRs`, `git.getPR`, `git.createPR`, `git.listIssues`, `git.getIssue` |
| **Issues** | `issues.search`, `issues.get`, `issues.create`, `issues.transition`, `issues.comment`, `issues.attach` |
| **Chat** | `chat.readChannel`, `chat.readThread`, `chat.listChannels` |
| **Terminal** | `terminal.open`, `terminal.run` |
| **LLM** | `llm.generate`, `llm.decide`, `llm.classify` |

### Git Provider Implementation

The GitHub CLI connector provides programmatic access to GitHub operations:

```typescript
async createPullRequest(params: {
  repo?: string;
  title: string;
  body?: string;
  head: string;
  base?: string;
}): Promise<{ number: number; url: string }>
```

**资料来源：** [packages/core/src/providers/github.ts](https://github.com/sidebutton/sidebutton/blob/main/packages/core/src/providers/github.ts)

## MCP Endpoint Configuration

### Server Endpoints

| Endpoint | Method | Purpose |
|----------|--------|---------|
| `/mcp` | SSE | Server-Sent Events for Claude Code/Cursor |
| `/mcp` | POST | Tool invocation requests |
| `/mcp` | GET | Server info and capabilities |

**资料来源：** [packages/server/src/server.ts](https://github.com/sidebutton/sidebutton/blob/main/packages/server/src/server.ts)

### Client Configuration Examples

#### Claude Desktop

```json
{
  "mcpServers": {
    "sidebutton": {
      "command": "npx",
      "args": ["sidebutton", "--stdio"]
    }
  }
}
```

#### Claude Code

```json
{
  "mcpServers": {
    "sidebutton": {
      "type": "sse",
      "url": "http://localhost:9876/mcp"
    }
  }
}
```

#### Cursor

```json
{
  "mcpServers": {
    "sidebutton": {
      "url": "http://localhost:9876/mcp"
    }
  }
}
```

**资料来源：** [packages/sidebutton/README.md](https://github.com/sidebutton/sidebutton/blob/main/packages/sidebutton/README.md)

## CLI Commands for MCP

The `sidebutton` CLI provides workflow management commands:

```bash
sidebutton list              # List available workflows
sidebutton run <id>          # Run a workflow by ID
sidebutton status            # Check server status
```

**资料来源：** [packages/sidebutton/README.md](https://github.com/sidebutton/sidebutton/blob/main/packages/sidebutton/README.md)

## Data Models

### MCP Tool Schema

```typescript
export interface McpTool {
  name: string;
  description: string;
  inputSchema: Record<string, unknown>;
  annotations?: McpToolAnnotations;
}

export interface McpToolAnnotations {
  title: string;
  readOnlyHint?: true;
  destructiveHint?: true;
  openWorldHint?: true;
}
```

**资料来源：** [packages/server/src/mcp/tools.ts:7-23](https://github.com/sidebutton/sidebutton/blob/main/packages/server/src/mcp/tools.ts)

## Quick Start

1. **Start the server:**
   ```bash
   npx sidebutton@latest
   ```

2. **Connect your AI assistant** using the appropriate configuration above

3. **Verify connection:**
   ```bash
   sidebutton status
   ```

4. **Execute a workflow:**
   ```bash
   sidebutton run <workflow-id>
   ```

**资料来源：** [AGENTS.md](https://github.com/sidebutton/sidebutton/blob/main/AGENTS.md)

## See Also

- [Core Workflow Engine](../core/README.md) - Workflow execution runtime
- [Chrome Extension](../extension/) - Browser control implementation
- [Knowledge Packs](../knowledge-packs/) - Domain-specific automation packs

---

<a id='chrome-extension'></a>

## Chrome Extension

### 相关页面

相关主题：[MCP Server Integration](#mcp-server), [Step Types Reference](#step-types)

<details>
<summary>相关源码文件</summary>

以下源码文件用于生成本页说明：

- [README.md](https://github.com/sidebutton/sidebutton/blob/main/README.md)
- [CONTRIBUTING.md](https://github.com/sidebutton/sidebutton/blob/main/CONTRIBUTING.md)
- [AGENTS.md](https://github.com/sidebutton/sidebutton/blob/main/AGENTS.md)
- [packages/server/README.md](https://github.com/sidebutton/sidebutton/blob/main/packages/server/README.md)
- [packages/core/README.md](https://github.com/sidebutton/sidebutton/blob/main/packages/core/README.md)
- [packages/server/defaults/roles/software-engineer.md](https://github.com/sidebutton/sidebutton/blob/main/packages/server/defaults/roles/software-engineer.md)
- [packages/server/defaults/roles/qa.md](https://github.com/sidebutton/sidebutton/blob/main/packages/server/defaults/roles/qa.md)
- [packages/server/defaults/targets/_provider-github-browser.md](https://github.com/sidebutton/sidebutton/blob/main/packages/server/defaults/targets/_provider-github-browser.md)
</details>

# Chrome Extension

The SideButton Chrome Extension is a Manifest V3 browser extension that provides real-time browser automation capabilities through a WebSocket connection to the local MCP server. It enables AI agents and workflows to interact with web pages using real DOM access via CSS selectors.

## Overview

The Chrome Extension serves as the primary interface between SideButton's workflow engine and web pages. Rather than relying on pixel coordinates or screenshots, it provides direct DOM manipulation capabilities, making browser automation precise and reliable.

**Distribution:** Available on the [Chrome Web Store](https://chromewebstore.google.com/detail/sidebutton/odaefhmdmgijnhdbkfagnlnmobphgkij)

**Source location:** `extension/` directory in the repository

资料来源：[CONTRIBUTING.md](https://github.com/sidebutton/sidebutton/blob/main/CONTRIBUTING.md)

## Architecture

```mermaid
graph TD
    subgraph "Browser Context"
        CE[Chrome Extension]
        WS[WebSocket Connection]
    end
    
    subgraph "Local Server"
        MCP[MCP Server :9876]
        API[REST API]
    end
    
    subgraph "Workflow Engine"
        WE[Workflow Executor]
        ST[Step Types]
    end
    
    CE -->|Real DOM Access| PAGE[Web Pages]
    CE -->|WebSocket| WS
    WS --> MCP
    MCP --> WE
    WE --> ST
    ST -->|browser.* steps| CE
```

### Connection Flow

1. User clicks the SideButton extension icon in Chrome
2. Extension establishes WebSocket connection to `http://localhost:9876`
3. MCP server validates the connection and exposes browser tools
4. Workflows execute browser steps through the extension
5. Extension interacts with web pages via Chrome DevTools Protocol

资料来源：[README.md](https://github.com/sidebutton/sidebutton/blob/main/README.md)

## Browser Commands

The extension supports 40+ browser commands organized into functional categories:

| Command | Description | Use Case |
|---------|-------------|----------|
| `navigate` | Navigate browser to URL | Open pages for automation |
| `click` | Click an element by selector | Interact with buttons, links |
| `type` | Type text into an element | Form input |
| `scroll` | Scroll the page | Load more content |
| `hover` | Hover over element | Trigger hover states |
| `extract` | Extract text from element | Read page content |
| `extract_all` | Extract all matching elements | Get lists of items |
| `extract_map` | Extract structured data from repeated elements | Scrape data tables |
| `select_option` | Select dropdown option | Choose from selects |
| `fill` | Fill input value (React-compatible) | Handle React inputs |
| `press_key` | Send keyboard keys | Keyboard shortcuts |
| `scroll_into_view` | Scroll element into viewport | Ensure element visible |
| `evaluate` | Execute JavaScript in browser | Custom interactions |
| `exists` | Check if element exists | Conditional logic |
| `wait` | Wait for element or delay | Synchronize with page |
| `screenshot` | Capture page screenshot | Visual verification |
| `snapshot` | Get page accessibility tree | Understand page structure |
| `capture_page` | Capture selectors from current page | Identify elements |
| `check_writing_quality` | Evaluate text quality | Content validation |

资料来源：[README.md](https://github.com/sidebutton/sidebutton/blob/main/README.md)

## Key Features

### Real DOM Access

Unlike screen-based automation tools that rely on pixel coordinates or OCR, SideButton uses real DOM access through CSS selectors. This provides:

- Precise element targeting
- Works with dynamically rendered content
- Handles SPA (Single Page Applications) correctly
- Faster execution than vision-based alternatives

资料来源：[README.md](https://github.com/sidebutton/sidebutton/blob/main/README.md)

### Recording Mode

The extension includes a recording mode that captures manual actions as reusable workflows. This enables:

1. Manual browsing through desired workflow steps
2. Extension records each action with selector
3. Export as YAML workflow definition
4. Replay with workflow engine

### Embed Buttons

SideButton can inject action buttons into any web page, enabling:

- Quick access to defined actions
- On-page automation triggers
- Custom UI integration

### WebSocket Connection

The extension maintains a stable WebSocket connection with automatic reconnection:

```mermaid
sequenceDiagram
    participant EXT as Extension
    participant WS as WebSocket
    participant MCP as MCP Server
    participant PAGE as Web Page
    
    EXT->>WS: Connect
    WS->>MCP: Establish Session
    MCP-->>WS: Connected
    WS-->>EXT: Ready
    
    loop On Command
        MCP->>EXT: Execute Tool
        EXT->>PAGE: DOM Action
        PAGE-->>EXT: Result
        EXT-->>MCP: Response
    end
    
    Note over EXT,WS: Auto-reconnect on disconnect
```

### Stable Reconnection

The WebSocket implementation handles connection drops gracefully:

- Automatic retry with exponential backoff
- Works with local server instances
- Supports remote server connections
- Maintains session state across reconnections

资料来源：[README.md](https://github.com/sidebutton/sidebutton/blob/main/README.md)

## Installation

### From Chrome Web Store

1. Visit the [Chrome Web Store listing](https://chromewebstore.google.com/detail/sidebutton/odaefhmdmgijnhdbkfagnlnmobphgkij)
2. Click "Add to Chrome"
3. Grant necessary permissions

### From Source (Development)

1. Go to `chrome://extensions/`
2. Enable **Developer mode**
3. Click **Load unpacked** and select the `extension/` folder
4. Navigate to any page and click the extension icon to connect

资料来源：[CONTRIBUTING.md](https://github.com/sidebutton/sidebutton/blob/main/CONTRIBUTING.md)

## Connection States

| State | Indicator | Meaning |
|-------|-----------|---------|
| Connected | Green dot | Extension linked to server |
| Disconnected | Red dot | No active connection |
| Reconnecting | Yellow dot | Attempting to reconnect |

Verify connection status using the MCP `get_browser_status` tool:

```json
{
  "tool": "get_browser_status",
  "expected": { "connected": true }
}
```

资料来源：[packages/server/defaults/roles/qa.md](https://github.com/sidebutton/sidebutton/blob/main/packages/server/defaults/roles/qa.md)

## Usage in Workflows

Browser steps are defined in YAML workflows:

```yaml
steps:
  - type: browser.navigate
    url: "https://github.com/owner/repo/issues"
  
  - type: browser.snapshot
    as: page_state
  
  - type: browser.click
    selector: ".btn-primary"
  
  - type: browser.type
    selector: "#title"
    text: "{{issue_title}}"
  
  - type: browser.extract
    selector: ".issue-number"
    as: new_issue_id
```

### Variable Interpolation

Use `{{variable}}` syntax to reference extracted values:

```yaml
steps:
  - type: browser.extract
    selector: ".username"
    as: user
  - type: shell.run
    cmd: "echo 'Hello, {{user}}!'"
```

资料来源：[README.md](https://github.com/sidebutton/sidebutton/blob/main/README.md)

## Step Types Reference

### Navigation Steps

| Step Type | Parameters | Description |
|-----------|------------|-------------|
| `browser.navigate` | `url` | Open URL in connected browser |

### Interaction Steps

| Step Type | Parameters | Description |
|-----------|------------|-------------|
| `browser.click` | `selector` | Click element by CSS selector |
| `browser.type` | `selector`, `text` | Type text into input |
| `browser.fill` | `selector`, `value` | Fill input value (React-compatible) |
| `browser.hover` | `selector` | Hover over element |
| `browser.select_option` | `selector`, `value` | Select dropdown option |
| `browser.press_key` | `keys` | Send keyboard keys |
| `browser.scroll` | `direction`, `amount` | Scroll page |
| `browser.scroll_into_view` | `selector` | Scroll element into view |

### Extraction Steps

| Step Type | Parameters | Description |
|-----------|------------|-------------|
| `browser.extract` | `selector`, `as` | Extract text from single element |
| `browser.extract_all` | `selector`, `as` | Extract all matching elements |
| `browser.extract_map` | `selector`, `mapping`, `as` | Extract structured data |
| `browser.snapshot` | `as` | Get accessibility tree |
| `browser.screenshot` | `as` | Capture screenshot |

### Verification Steps

| Step Type | Parameters | Description |
|-----------|------------|-------------|
| `browser.exists` | `selector` | Check if element exists |
| `browser.wait` | `selector` or `ms` | Wait for element or delay |

### Advanced Steps

| Step Type | Parameters | Description |
|-----------|------------|-------------|
| `browser.capture_page` | - | Capture selectors from current page |
| `browser.evaluate` | `script` | Execute JavaScript |

资料来源：[packages/core/README.md](https://github.com/sidebutton/sidebutton/blob/main/packages/core/README.md)

## Integration with Providers

The extension works with platform-specific browser providers for deeper integration:

### GitHub Browser Provider

When configured with `GITHUB_BROWSER_URL`, the extension can:

1. Navigate to repository pages
2. Read PR details via snapshot
3. Review diffs by clicking "Files changed" tab
4. List and filter pull requests
5. Create issues through the web interface

**Configuration:** Set `GITHUB_BROWSER_URL` in Settings > Environment Variables (e.g., `https://github.com`)

**Requirements:** Must be logged into GitHub in the connected browser session

资料来源：[packages/server/defaults/targets/_provider-github-browser.md](https://github.com/sidebutton/sidebutton/blob/main/packages/server/defaults/targets/_provider-github-browser.md)

## Provider Preference

When multiple integration methods exist, SideButton follows this preference order:

1. **API Provider** — Fastest and most reliable
2. **CLI Tool** — Good for git operations, builds
3. **Browser Automation** — Universal fallback for visual tasks

```mermaid
graph LR
    A[Task] --> B{API Available?}
    B -->|Yes| C[Use API]
    B -->|No| D{CLI Available?}
    D -->|Yes| E[Use CLI]
    D -->|No| F[Browser Automation]
    
    C -->|Browser needed| G[Browser via Extension]
    E -->|Visual review| G
```

Browser tools complement CLI for visual tasks like:

- Diff viewing
- Board reviews
- UI bug identification
- Screenshot evidence

资料来源：[packages/server/defaults/roles/software-engineer.md](https://github.com/sidebutton/sidebutton/blob/main/packages/server/defaults/roles/software-engineer.md)

## Smoke Test

Verify extension connectivity during deployment testing:

### Step 1: Server Health

```
GET http://localhost:9876/health
```

Expected response:
```json
{"status":"ok","version":"...","browser_connected":true}
```

If `browser_connected: false` — stop, Chrome extension is not connected.

### Step 2: Extension Connection

Use `get_browser_status` tool:

Expected: `{ "connected": true }`

If disconnected:
1. Open Chrome
2. Verify SideButton extension is enabled at `chrome://extensions`
3. Refresh the page

### Step 3: Snapshot Test

Navigate to any page, then use `snapshot`:

Verify: Returns structured YAML with element refs (ref=N), not empty, contains page elements.

资料来源：[packages/server/defaults/roles/qa.md](https://github.com/sidebutton/sidebutton/blob/main/packages/server/defaults/roles/qa.md)

## Error Handling

Common extension issues and solutions:

| Issue | Cause | Solution |
|-------|-------|----------|
| `browser_connected: false` | Extension not connected | Click extension icon to connect |
| WebSocket timeout | Server not running | Start with `pnpm dev:server` |
| Element not found | Selector changed | Use `capture_page` to refresh selectors |
| React input issues | Virtual DOM | Use `fill` instead of `type` |

## Security Considerations

- Browser extension requires significant permissions for DOM access
- WebSocket connection is local by default
- Remote connections should use authenticated endpoints
- Never store credentials in workflow definitions

资料来源：[AGENTS.md](https://github.com/sidebutton/sidebutton/blob/main/AGENTS.md)

## Related Documentation

- [MCP Tools Reference](https://docs.sidebutton.com) — Full tool documentation
- [Workflow Engine](../core/workflow-engine.md) — Workflow execution
- [REST API](../server/rest-api.md) — HTTP API alternative
- [Knowledge Packs](../knowledge-packs/overview.md) — Domain-specific extensions

---

<a id='knowledge-packs'></a>

## Knowledge Packs

### 相关页面

相关主题：[MCP Server Integration](#mcp-server), [Getting Started](#getting-started)

<details>
<summary>相关源码文件</summary>

以下源码文件用于生成本页说明：

- [packages/server/src/skill-pack.ts](https://github.com/sidebutton/sidebutton/blob/main/packages/server/src/skill-pack.ts)
- [packages/server/src/registry.ts](https://github.com/sidebutton/sidebutton/blob/main/packages/server/src/registry.ts)
- [packages/server/src/cli.ts](https://github.com/sidebutton/sidebutton/blob/main/packages/server/src/cli.ts)
- [packages/server/defaults/targets/github.md](https://github.com/sidebutton/sidebutton/blob/main/packages/server/defaults/targets/github.md)
- [packages/server/defaults/roles/software-engineer.md](https://github.com/sidebutton/sidebutton/blob/main/packages/server/defaults/roles/software-engineer.md)
</details>

# Knowledge Packs

Knowledge Packs (also referred to as **Skill Packs** in CLI commands and code) are installable domain-specific modules that teach autonomous AI agents how specific web applications work. They serve as the foundational knowledge layer powering AI code review, automated testing, and enterprise AI agent deployments.

## Overview

Knowledge Packs provide structured, domain-specific intelligence to the SideButton platform. Rather than requiring AI agents to learn from scratch how to interact with each web application, Knowledge Packs pre-package essential information that enables immediate, accurate automation.

The SideButton registry currently hosts **11 domains** with **28+ modules** published, and maintains an open registry where anyone can build and share packs for any web application.

资料来源：[README.md](https://github.com/sidebutton/sidebutton/blob/main/README.md)

## Pack Components

Each Knowledge Pack comprises five core module types that together provide comprehensive domain understanding:

| Component | Description | Purpose |
|-----------|-------------|---------|
| **Selectors** | CSS selectors for UI elements | Precise DOM element targeting without pixel coordinates or screenshots |
| **Data Models** | Entity types, fields, relationships, valid states | Structured understanding of domain objects |
| **State Machines** | Valid transitions per state | Predictable, safe workflow execution |
| **Role Playbooks** | Role-specific procedures (QA, SE, PM, SD) | Context-aware guidance for different user roles |
| **Common Tasks** | Step-by-step procedures, gotchas, edge cases | Handling typical operations with best practices |

资料来源：[README.md](https://github.com/sidebutton/sidebutton/blob/main/README.md)

### Selector Modules

Selectors provide CSS-based targeting for browser automation, ensuring reliability across different browsers and viewport sizes. Unlike coordinate-based or screenshot-based approaches, CSS selectors remain stable as long as the application's DOM structure is maintained.

### Role Playbooks

Role playbooks define standard operating procedures for specific personas. For example, the `software-engineer` role includes:

- Decision guidance for issue prioritization
- Step types for common development tasks
- Integration patterns for git, issues, chat, and terminal operations

资料来源：[packages/server/defaults/roles/software-engineer.md](https://github.com/sidebutton/sidebutton/blob/main/packages/server/defaults/roles/software-engineer.md)

## Architecture

```mermaid
graph TD
    A[User/Agent] -->|sidebutton install| B[CLI]
    B --> C{Source Type}
    C -->|Local Path| D[Local Directory]
    C -->|Git URL| E[Git Repository]
    C -->|Registry Name| F[SideButton Registry]
    
    D --> G[Install Skill Pack]
    E --> G
    F --> H[Fetch from Registry API]
    H --> G
    
    G --> I[Parse Manifest]
    I --> J[Copy to ~/.sidebutton/packs/]
    J --> K[Knowledge Pack Active]
    
    L[Workflow Engine] -->|Uses| K
    M[MCP Tools] -->|Reads| K
```

## Installation Methods

Knowledge Packs can be installed from multiple sources:

| Source Type | Command Example | Use Case |
|-------------|-----------------|----------|
| Local directory | `sidebutton install ./my-pack` | Development and testing |
| Git URL | `sidebutton install https://github.com/org/skill-packs` | Remote repositories |
| Registry name | `sidebutton install github.com` | Published registry packs |

```bash
# Install from registry
sidebutton install github.com
sidebutton install atlassian.net

# Install from local path
sidebutton install ./custom-pack

# Install from Git URL
sidebutton install https://github.com/org/skill-packs

# Force reinstall
sidebutton install github.com --force
```

资料来源：[packages/server/src/cli.ts](https://github.com/sidebutton/sidebutton/blob/main/packages/server/src/cli.ts)

## Registry Management

The registry system allows centralized distribution and discovery of Knowledge Packs.

### Registry CLI Commands

| Command | Description |
|---------|-------------|
| `sidebutton registry add <path\|url>` | Register and install all packs from a registry |
| `sidebutton registry update [name]` | Update installed packs from registry |
| `sidebutton registry remove <name>` | Uninstall packs and remove registry |
| `sidebutton registry list` | Show registries and pack counts |
| `sidebutton search [query]` | Search packs across registries |

资料来源：[packages/server/README.md](https://github.com/sidebutton/sidebutton/blob/main/packages/server/README.md)

### Registry Configuration

Registries are stored in the SideButton configuration directory (`~/.sidebutton/registries.json`) and contain metadata about available skill pack sources.

## Publishing Knowledge Packs

### Publishing Process

1. **Initialize** a new pack using `sidebutton init [domain]`
2. **Develop** the pack with manifest and modules
3. **Validate** using `sidebutton validate [path]`
4. **Authenticate** with `sidebutton login`
5. **Publish** via `sidebutton publish`

### Manifest Structure

The `manifest.json` defines the pack's metadata:

```json
{
  "domain": "github.com",
  "title": "GitHub",
  "version": "1.0.0",
  "description": "GitHub integration for AI agents",
  "tagline": "Streamlined GitHub workflows",
  "category": "development",
  "modules": ["selectors", "data-models", "state-machines"],
  "roles": ["software-engineer", "qa"]
}
```

资料来源：[packages/server/src/cli.ts](https://github.com/sidebutton/sidebutton/blob/main/packages/server/src/cli.ts)

### Publishing Endpoint

```typescript
const res = await fetch(`${REMOTE_BASE_URL}/api/skill-packs/publish`, {
  method: 'POST',
  headers: {
    'Content-Type': 'application/json',
    'Authorization': `Bearer ${auth.token}`,
  },
  body: JSON.stringify({
    domain: manifest.domain,
    name: manifest.title || manifest.name || manifest.domain,
    version: manifest.version,
    description: manifest.description || '',
    tagline: manifest.tagline || '',
    modules: manifest.modules || [],
    roles: manifest.roles || [],
    category: manifest.category || '',
    manifest,
    files,
  }),
});
```

资料来源：[packages/server/src/cli.ts](https://github.com/sidebutton/sidebutton/blob/main/packages/server/src/cli.ts)

## Integration with Workflow Engine

Knowledge Packs integrate with the core SideButton workflow engine through step types that reference pack-specific configurations:

```mermaid
graph LR
    A[Knowledge Pack] --> B[Step Type Resolution]
    B --> C[Provider Selection]
    C --> D[Git Provider]
    C --> E[Issues Provider]
    C --> F[Chat Provider]
    C --> G[Browser Provider]
```

### Available Step Types

| Category | Steps |
|----------|-------|
| Browser | `navigate`, `click`, `type`, `scroll`, `hover`, `wait`, `extract`, `extractAll`, `exists`, `key` |
| Shell | `shell.run`, `terminal.open`, `terminal.run` |
| LLM | `llm.classify`, `llm.generate` |
| Control | `control.if`, `control.retry`, `control.stop`, `workflow.call` |
| Data | `data.first` |
| Git | `git.listPRs`, `git.getPR`, `git.createPR`, `git.listIssues`, `git.getIssue` |
| Issues | `issues.search`, `issues.get`, `issues.create`, `issues.transition`, `issues.comment` |
| Chat | `chat.readChannel`, `chat.readThread`, `chat.listChannels` |

资料来源：[packages/core/README.md](https://github.com/sidebutton/sidebutton/blob/main/packages/core/README.md)

## Development Workflow

### Creating a New Pack

```bash
# Initialize a new knowledge pack
sidebutton init my-app.com

# Scaffolded structure:
# my-app.com/
# ├── manifest.json
# ├── modules/
# │   ├── selectors/
# │   ├── data-models/
# │   └── state-machines/
# ├── roles/
# │   └── software-engineer.md
# └── targets/
#     └── github.md
```

### Validation

Before publishing, validate the pack structure:

```bash
sidebutton validate ./my-app.com
```

This command lints and checks:
- Manifest completeness
- Module structure validity
- Selector syntax correctness
- File integrity

## Configuration Locations

| Path | Purpose |
|------|---------|
| `~/.sidebutton/packs/` | Installed Knowledge Pack directories |
| `~/.sidebutton/registries.json` | Registry configurations |
| `~/.sidebutton/config.json` | Main SideButton configuration |

## Best Practices

1. **Selector Stability**: Use semantic CSS selectors that won't change with visual updates
2. **Versioning**: Follow semantic versioning for pack updates
3. **Error Handling**: Include edge case documentation in Common Tasks
4. **Role Coverage**: Provide at least one role playbook for each major user persona
5. **State Documentation**: Clearly define all valid state transitions

## Available Packs

The SideButton registry includes Knowledge Packs for popular platforms:

| Domain | Category | Modules |
|--------|----------|---------|
| github.com | Development | Selectors, Data Models, SE Role |
| atlassian.net | Development | Selectors, Data Models |
| *(10 more domains)* | Various | Various |

资料来源：[README.md](https://github.com/sidebutton/sidebutton/blob/main/README.md)

## See Also

- [Core Workflow Engine](../core/README.md) - `@sidebutton/core` package
- [MCP Server](../server/README.md) - `@sidebutton/server` package with REST API
- [Chrome Extension](../extension/README.md) - Browser extension integration
- [Full Documentation](https://docs.sidebutton.com)

---

---

## Doramagic 踩坑日志

项目：sidebutton/sidebutton

摘要：发现 10 个潜在踩坑项，其中 0 个为 high/blocking；最高优先级：安装坑 - 来源证据：Add control.foreach step type for iterating over lists。

## 1. 安装坑 · 来源证据：Add control.foreach step type for iterating over lists

- 严重度：medium
- 证据强度：source_linked
- 发现：GitHub 社区证据显示该项目存在一个安装相关的待验证问题：Add control.foreach step type for iterating over lists
- 对用户的影响：可能影响升级、迁移或版本选择。
- 建议检查：来源问题仍为 open，Pack Agent 需要复核是否仍影响当前版本。
- 防护动作：不得脱离来源链接放大为确定性结论；需要标注适用版本和复核状态。
- 证据：community_evidence:github | cevd_d7567e53b1794e828bb3342cb0699f6f | https://github.com/sidebutton/sidebutton/issues/1 | 来源类型 github_issue 暴露的待验证使用条件。

## 2. 能力坑 · 能力判断依赖假设

- 严重度：medium
- 证据强度：source_linked
- 发现：README/documentation is current enough for a first validation pass.
- 对用户的影响：假设不成立时，用户拿不到承诺的能力。
- 建议检查：将假设转成下游验证清单。
- 防护动作：假设必须转成验证项；没有验证结果前不能写成事实。
- 证据：capability.assumptions | github_repo:1124378210 | https://github.com/sidebutton/sidebutton | README/documentation is current enough for a first validation pass.

## 3. 维护坑 · 维护活跃度未知

- 严重度：medium
- 证据强度：source_linked
- 发现：未记录 last_activity_observed。
- 对用户的影响：新项目、停更项目和活跃项目会被混在一起，推荐信任度下降。
- 建议检查：补 GitHub 最近 commit、release、issue/PR 响应信号。
- 防护动作：维护活跃度未知时，推荐强度不能标为高信任。
- 证据：evidence.maintainer_signals | github_repo:1124378210 | https://github.com/sidebutton/sidebutton | last_activity_observed missing

## 4. 安全/权限坑 · 下游验证发现风险项

- 严重度：medium
- 证据强度：source_linked
- 发现：no_demo
- 对用户的影响：下游已经要求复核，不能在页面中弱化。
- 建议检查：进入安全/权限治理复核队列。
- 防护动作：下游风险存在时必须保持 review/recommendation 降级。
- 证据：downstream_validation.risk_items | github_repo:1124378210 | https://github.com/sidebutton/sidebutton | no_demo; severity=medium

## 5. 安全/权限坑 · 存在安全注意事项

- 严重度：medium
- 证据强度：source_linked
- 发现：No sandbox install has been executed yet; downstream must verify before user use.
- 对用户的影响：用户安装前需要知道权限边界和敏感操作。
- 建议检查：转成明确权限清单和安全审查提示。
- 防护动作：安全注意事项必须面向用户前置展示。
- 证据：risks.safety_notes | github_repo:1124378210 | https://github.com/sidebutton/sidebutton | No sandbox install has been executed yet; downstream must verify before user use.

## 6. 安全/权限坑 · 存在评分风险

- 严重度：medium
- 证据强度：source_linked
- 发现：no_demo
- 对用户的影响：风险会影响是否适合普通用户安装。
- 建议检查：把风险写入边界卡，并确认是否需要人工复核。
- 防护动作：评分风险必须进入边界卡，不能只作为内部分数。
- 证据：risks.scoring_risks | github_repo:1124378210 | https://github.com/sidebutton/sidebutton | no_demo; severity=medium

## 7. 安全/权限坑 · 来源证据：Native <select> elements cannot be programmatically selected via click/type tools

- 严重度：medium
- 证据强度：source_linked
- 发现：GitHub 社区证据显示该项目存在一个安全/权限相关的待验证问题：Native <select> elements cannot be programmatically selected via click/type tools
- 对用户的影响：可能阻塞安装或首次运行。
- 建议检查：来源问题仍为 open，Pack Agent 需要复核是否仍影响当前版本。
- 防护动作：不得脱离来源链接放大为确定性结论；需要标注适用版本和复核状态。
- 证据：community_evidence:github | cevd_a2eadefee45c45b2be1b8501c1a0724f | https://github.com/sidebutton/sidebutton/issues/12 | 来源讨论提到 node 相关条件，需在安装/试用前复核。

## 8. 安全/权限坑 · 来源证据：v1.1.0

- 严重度：medium
- 证据强度：source_linked
- 发现：GitHub 社区证据显示该项目存在一个安全/权限相关的待验证问题：v1.1.0
- 对用户的影响：可能阻塞安装或首次运行。
- 建议检查：来源显示可能已有修复、规避或版本变化，说明书中必须标注适用版本。
- 防护动作：不得脱离来源链接放大为确定性结论；需要标注适用版本和复核状态。
- 证据：community_evidence:github | cevd_a9cfb55ebc664ee99dcc39773859d715 | https://github.com/sidebutton/sidebutton/releases/tag/v1.1.0 | 来源类型 github_release 暴露的待验证使用条件。

## 9. 维护坑 · issue/PR 响应质量未知

- 严重度：low
- 证据强度：source_linked
- 发现：issue_or_pr_quality=unknown。
- 对用户的影响：用户无法判断遇到问题后是否有人维护。
- 建议检查：抽样最近 issue/PR，判断是否长期无人处理。
- 防护动作：issue/PR 响应未知时，必须提示维护风险。
- 证据：evidence.maintainer_signals | github_repo:1124378210 | https://github.com/sidebutton/sidebutton | issue_or_pr_quality=unknown

## 10. 维护坑 · 发布节奏不明确

- 严重度：low
- 证据强度：source_linked
- 发现：release_recency=unknown。
- 对用户的影响：安装命令和文档可能落后于代码，用户踩坑概率升高。
- 建议检查：确认最近 release/tag 和 README 安装命令是否一致。
- 防护动作：发布节奏未知或过期时，安装说明必须标注可能漂移。
- 证据：evidence.maintainer_signals | github_repo:1124378210 | https://github.com/sidebutton/sidebutton | release_recency=unknown

<!-- canonical_name: sidebutton/sidebutton; human_manual_source: deepwiki_human_wiki -->
