Doramagic Project Pack · Human Manual

sidebutton

SideButton is a browser automation and workflow orchestration platform that enables AI agents (such as Claude Desktop, Cursor) to interact with web applications through a unified MCP (Mode...

Introduction to SideButton

Related topics: Getting Started, System Architecture, MCP Server Integration

Section Related Pages

Continue reading this section for the full explanation and source context.

Section Workflows

Continue reading this section for the full explanation and source context.

Section Step Types

Continue reading this section for the full explanation and source context.

Section Knowledge Packs

Continue reading this section for the full explanation and source context.

Related topics: Getting Started, System Architecture, MCP Server Integration

Introduction to SideButton

SideButton is an open-source AI agent platform that combines an MCP (Model Context Protocol) server with browser automation tools, a YAML-based workflow engine, and extensible knowledge packs for domain-specific expertise. It enables autonomous AI agents to interact with web applications through standardized browser controls, CLI operations, and pre-built workflow automations.

Sources: AGENTS.md

High-Level Architecture

SideButton follows a modular monorepo architecture with four primary packages working together to provide a complete automation platform.

graph TB
    subgraph "Client Layer"
        EXT["Chrome Extension"]
        CLI["CLI Tool"]
        MCP["MCP Clients<br/>(Claude, Cursor)"]
    end
    
    subgraph "packages/server"
        API["REST API & Dashboard"]
        MCP_SRV["MCP Endpoint"]
        WS["WebSocket Bridge"]
    end
    
    subgraph "packages/core"
        PARSER["Workflow Parser"]
        EXEC["Step Executor"]
    end
    
    subgraph "packages/dashboard"
        UI["Svelte Web UI"]
    end
    
    EXT --> WS
    CLI --> API
    MCP --> MCP_SRV
    API --> EXEC
    MCP_SRV --> EXEC
    WS --> EXEC
    EXEC --> PARSER
    UI --> API

Sources: AGENTS.md, CONTRIBUTING.md

Package Structure

The repository is organized as a monorepo using pnpm workspaces. Each package has a focused responsibility.

PackagePurposeLocation
packages/coreWorkflow engine — parser, executor, and step implementationsWorkflow execution runtime
packages/serverHTTP server, MCP endpoint, CLI, and WebSocket bridgeAPI layer and server runtime
packages/dashboardSvelte web UI served at localhost:9876User interface
packages/sidebuttonCLI entry point for npx sidebutton@latestCommand-line interface
extension/Chrome extension (Manifest V3)Browser automation

Sources: AGENTS.md, CONTRIBUTING.md

Core Concepts

Workflows

Workflows are YAML files that define sequences of steps for automation tasks. They can include browser interactions, shell commands, LLM calls, and control flow logic.

name: example_workflow
steps:
  - type: navigate
    url: https://example.com
  
  - type: click
    selector: "#submit-button"
  
  - type: extract
    selector: ".result"
    as: result

Sources: packages/core/README.md

Step Types

SideButton provides multiple categories of steps for different automation needs.

CategoryStepsPurpose
Browsernavigate, click, type, scroll, hover, wait, extract, extractAll, exists, keyWeb page interaction
Shellshell.run, terminal.open, terminal.runCommand execution
LLMllm.classify, llm.generateAI-powered operations
Controlcontrol.if, control.retry, control.stop, workflow.callFlow control
Datadata.firstData manipulation

Sources: packages/core/README.md

Knowledge Packs

Knowledge packs (also called skill packs) teach autonomous AI agents how specific web applications work. They bundle markdown files containing selectors, data models, state definitions, and agentic workflows per web app.

Key capabilities of knowledge packs:

  • Selectors: CSS/XPath selectors for UI elements
  • Data Models: Structured data representations
  • Agentic Workflows: Pre-defined sequences for common tasks
  • Role Playbooks: Instructions for AI agent behavior

Sources: packages/sidebutton/README.md, AGENTS.md

MCP Integration

SideButton provides MCP (Model Context Protocol) server functionality for integration with AI coding assistants. The MCP tools allow AI agents to control browser automation and execute workflows programmatically.

Supported Clients

ClientTransportConfiguration
Claude Desktopstdionpx sidebutton --stdio
Claude CodeSSEhttp://localhost:9876/mcp
CursorHTTPhttp://localhost:9876/mcp

Sources: packages/server/README.md, packages/sidebutton/README.md

MCP Tools

ToolDescription
run_workflowExecute a workflow by ID
list_workflowsList available workflows
get_workflowGet workflow YAML definition
get_run_logGet execution log for a run
list_run_logsList recent workflow executions
get_browser_statusCheck browser extension connection
capture_pageCapture selectors from current page
navigateNavigate browser to URL
snapshotGet page accessibility snapshot
clickClick an element
typeType text into an element
scrollScroll the page
screenshotCapture page screenshot
hoverHover over element
extractExtract text from element
extract_allExtract all matching elements

Sources: packages/server/README.md, README.md

CLI Commands

The SideButton CLI provides commands for managing the server, workflows, and knowledge packs.

sidebutton                           # Start server (default port 9876)
sidebutton --stdio                   # Start with stdio transport (Claude Desktop)
sidebutton -p 8080                   # Custom port

sidebutton list                      # List available workflows
sidebutton run <id>                  # Run a workflow by ID
sidebutton status                    # Check server status

Knowledge Pack Management

CommandDescription
`sidebutton registry add <path\url>`Register and install all knowledge packs
sidebutton registry update [name]Update installed packs from registry
sidebutton registry remove <name>Uninstall packs and remove registry
sidebutton registry listShow registries and pack counts
sidebutton search [query]Search packs across registries
`sidebutton install <path\url\name>`One-off knowledge pack install
sidebutton uninstall <domain>Remove an installed knowledge pack
sidebutton init [domain]Scaffold a new knowledge pack
sidebutton validate [path]Lint and validate a knowledge pack

Sources: packages/sidebutton/README.md, packages/server/README.md

Quick Start

Published Package (No Clone Required)

npx sidebutton@latest   # starts server + dashboard on port 9876

Sources: AGENTS.md

Local Development Setup

# Clone the repo
git clone https://github.com/sidebutton/sidebutton.git
cd sidebutton

# Install dependencies
pnpm install

# Build all packages
pnpm build

# Start the server
pnpm start
# Open http://localhost:9876

Sources: CONTRIBUTING.md

Development Prerequisites

RequirementVersion
Node.js20+
pnpm9.15+
ChromeLatest (for browser automation)

Sources: CONTRIBUTING.md

Development Workflow

Running Components

Start everything in watch mode with hot reload:

pnpm dev

Run components individually:

CommandDescription
pnpm dev:serverServer with auto-restart on :9876
pnpm dev:dashboardDashboard with HMR on :5173
pnpm buildBuild all packages
pnpm testRun all tests

Sources: CONTRIBUTING.md

Provider Preference

When multiple integration methods exist, SideButton follows this preference order:

graph LR
    A["API Provider"] --> B["CLI Tool"] --> C["Browser Automation"]
    style A fill:#90EE90
    style B fill:#FFD700
    style C fill:#FFA07A
  • API is fastest and most reliable
  • CLI provides programmatic access
  • Browser automation is the universal fallback

Sources: packages/server/defaults/roles/software-engineer.md

Data Directories

DirectoryWhat it is
packages/core/Workflow engine — parser, executor, step implementations
packages/server/HTTP server, MCP endpoint, CLI, WebSocket bridge
packages/dashboard/Svelte web UI served at localhost:9876
extension/Chrome extension for browser automation
workflows/Public workflow library (YAML files)
actions/User-created workflows (gitignored)

Sources: CONTRIBUTING.md

PackageNPM Link
@sidebutton/corenpmjs.com
@sidebutton/servernpmjs.com

Sources: packages/core/README.md, packages/server/README.md

External Resources

ResourceURL
Documentationdocs.sidebutton.com
GitHub Repositorygithub.com/sidebutton/sidebutton
Websitesidebutton.com
Knowledge Packssidebutton.com/skills

License

SideButton is licensed under Apache-2.0.

Sources: CONTRIBUTING.md, packages/core/README.md, packages/server/README.md

Sources: [AGENTS.md]()

Getting Started

Related topics: Introduction to SideButton

Section Related Pages

Continue reading this section for the full explanation and source context.

Section Package Manager Installation

Continue reading this section for the full explanation and source context.

Section Development Setup (From Source)

Continue reading this section for the full explanation and source context.

Section Starting the Server

Continue reading this section for the full explanation and source context.

Related topics: Introduction to SideButton

Getting Started

SideButton is a Model Context Protocol (MCP) server that provides browser automation, workflow execution, and knowledge pack management capabilities. It enables AI assistants like Claude Desktop and Cursor to interact with web browsers, execute predefined workflows, and leverage domain-specific knowledge packs.

Prerequisites

Before installing SideButton, ensure your environment meets the following requirements:

RequirementVersion/Details
Node.jsv18 or higher
Package Managerpnpm (recommended) or npm
BrowserChrome/Chromium (for browser automation features)
OSmacOS, Windows, Linux

Sources: README.md:1-50

Installation

Package Manager Installation

Install the SideButton CLI globally using your preferred package manager:

# Using npm
npm install -g sidebutton

# Using pnpm
pnpm add -g sidebutton

# Using yarn
yarn global add sidebutton

Verify the installation:

sidebutton --version

Development Setup (From Source)

For contributing or running the latest development version:

# Clone the repository
git clone https://github.com/sidebutton/sidebutton.git
cd sidebutton

# Install dependencies
pnpm install

# Build all packages
pnpm build

# Run CLI directly
pnpm cli --version

Sources: CONTRIBUTING.md:1-20

Quick Start

Starting the Server

The default command starts the SideButton server on port 9876:

sidebutton

To use a custom port:

sidebutton -p 8080

The server provides:

  • REST API endpoint
  • MCP (Model Context Protocol) endpoint
  • WebSocket connection for browser extension
  • Dashboard UI at http://localhost:9876

Sources: packages/sidebutton/README.md:1-30

Architecture Overview

graph TD
    A[Claude Desktop / Cursor] -->|MCP Protocol| B[SideButton Server]
    A -->|stdio| B
    B -->|REST API| C[Dashboard UI]
    B -->|WebSocket| D[Chrome Extension]
    B -->|Execute| E[Workflow Engine]
    E -->|Browser Actions| F[Chrome Browser]
    E -->|CLI Tools| G[Shell/CLI]
    E -->|LLM Calls| H[OpenAI/Anthropic/Ollama]

MCP Integration

SideButton can be integrated with various AI coding assistants through the MCP protocol.

Claude Desktop

Add SideButton to your Claude Desktop configuration file at ~/Library/Application Support/Claude/claude_desktop_config.json:

{
  "mcpServers": {
    "sidebutton": {
      "command": "npx",
      "args": ["sidebutton", "--stdio"]
    }
  }
}

After adding the configuration, restart Claude Desktop to load the new MCP server.

Sources: README.md:50-70

Cursor

Add SideButton to your Cursor MCP configuration file at ~/.cursor/mcp.json:

{
  "mcpServers": {
    "sidebutton": {
      "url": "http://localhost:9876/mcp"
    }
  }
}

Ensure the SideButton server is running before using Cursor with this configuration.

Sources: packages/server/README.md:20-35

Available MCP Tools

ToolDescription
run_workflowExecute a workflow by ID
list_workflowsList all available workflows
get_workflowGet workflow YAML definition
get_run_logGet execution log for a run
list_run_logsList recent workflow executions
get_browser_statusCheck browser extension connection
capture_pageCapture selectors from current page
navigateNavigate browser to URL
snapshotGet page accessibility snapshot
clickClick an element
typeType text into an element
scrollScroll the page
screenshotCapture page screenshot
hoverHover over element
extractExtract text from element
extract_allExtract all matching elements

Sources: packages/server/README.md:40-60

CLI Commands

Basic Commands

# Start the server (default port 9876)
sidebutton

# Start with stdio transport for Claude Desktop
sidebutton --stdio

# Start on custom port
sidebutton -p 8080

# List available workflows
sidebutton list

# Run a specific workflow
sidebutton run <workflow-id>

# Check server status
sidebutton status

Knowledge Packs Management

# Add a registry
sidebutton registry add <path|url>

# Update installed packs
sidebutton registry update [name]

# Remove a registry
sidebutton registry remove <name>

# List all registries
sidebutton registry list

# Search packs across registries
sidebutton search [query]

# Install a knowledge pack
sidebutton install <path|url|name>

# Uninstall a knowledge pack
sidebutton uninstall <domain>

Knowledge Pack Development

# Scaffold a new knowledge pack
sidebutton init [domain]

# Validate a knowledge pack
sidebutton validate [path]

# Publish to registry
sidebutton publish

Sources: packages/sidebutton/README.md:60-100

Dashboard

The SideButton dashboard provides a web-based UI for managing workflows and viewing execution history.

Access the dashboard at: http://localhost:9876

Dashboard Features

  • View and manage shortcuts
  • Browse available workflows
  • View execution logs
  • Add workflows to dashboard
  • Monitor browser extension status

Chrome Extension

Install the SideButton Chrome extension from the Chrome Web Store.

Extension Features

  • 40+ browser commands for navigation, clicking, typing, extraction
  • Real DOM access via CSS selectors
  • Recording mode to capture manual actions as workflows
  • Embed action buttons into web pages
  • WebSocket connection for stable reconnection

Sources: README.md:80-100

Workflow Execution

Running a Workflow via CLI

# List all available workflows
sidebutton list

# Execute a workflow by ID
sidebutton run <workflow-id>

# With parameters
sidebutton run <workflow-id> --param value

Running a Workflow via MCP

When connected to an MCP client like Claude Desktop:

# Use the run_workflow tool
run_workflow({ id: "workflow-id", params: { key: "value" } })

Workflow Step Types

CategorySteps
Browsernavigate, click, type, scroll, hover, wait, extract, extractAll, exists, key
Shellshell.run, terminal.open, terminal.run
LLMllm.classify, llm.generate
Controlcontrol.if, control.retry, control.stop, workflow.call
Datadata.first

Sources: packages/core/README.md:20-40

Next Steps

Sources: [README.md:1-50](https://github.com/sidebutton/sidebutton/blob/main/README.md)

System Architecture

Related topics: Package Overview, MCP Server Integration, Workflow Engine

Section Related Pages

Continue reading this section for the full explanation and source context.

Related topics: Package Overview, MCP Server Integration, Workflow Engine

System Architecture

Overview

SideButton is a browser automation and workflow orchestration platform that enables AI agents (such as Claude Desktop, Cursor) to interact with web applications through a unified MCP (Model Context Protocol) interface. The system combines browser automation, CLI tools, LLM capabilities, and external integrations into a coherent workflow execution engine.

The architecture follows a modular monorepo design with four primary packages:

PackagePurpose
packages/sidebuttonCLI entry point and CLI transport for MCP
packages/serverFastify-based MCP server with REST API and dashboard
packages/coreWorkflow definition parsing and execution engine
packages/dashboardReact-based web UI for workflow management

Sources: README.md:1-50

Sources: [README.md:1-50]()

Package Overview

Related topics: System Architecture, Workflow Engine, Chrome Extension

Section Related Pages

Continue reading this section for the full explanation and source context.

Section Purpose and Scope

Continue reading this section for the full explanation and source context.

Section Core Exports

Continue reading this section for the full explanation and source context.

Section Step Types

Continue reading this section for the full explanation and source context.

Related topics: System Architecture, Workflow Engine, Chrome Extension

Package Overview

SideButton is a browser automation platform organized as a monorepo with four primary packages. The system enables workflow-driven automation through YAML definitions, MCP (Model Context Protocol) integration, a REST API, and a Chrome extension. This document provides a comprehensive overview of each package's architecture, responsibilities, and interdependencies.

Architecture Overview

SideButton follows a layered architecture pattern with clear separation of concerns across packages. The core workflow engine handles execution logic, the server package provides API endpoints and MCP connectivity, the CLI package offers command-line interaction, and the dashboard provides a web-based user interface.

graph TD
    User[User] --> CLI[CLI Package]
    User --> Dashboard[Dashboard Package]
    User --> MCP[MCP Client]
    User --> REST[REST API Client]
    
    CLI --> Server[Server Package]
    Dashboard --> Server
    MCP --> Server
    REST --> Server
    
    Server --> Core[Core Package]
    Server --> BrowserExtension[Chrome Extension]
    
    Core --> WorkflowEngine[Workflow Engine]
    Core --> Providers[Provider Integrations]

Package Structure

The repository contains four main packages under the packages/ directory:

PackageDescription
@sidebutton/coreCore workflow engine and execution runtime
@sidebutton/serverMCP server, REST API, and embedded dashboard
@sidebutton/sidebuttonCommand-line interface
dashboardFrontend React application for the web UI

Core Package (`@sidebutton/core`)

The core package contains the fundamental workflow orchestration engine. It handles parsing, validation, and execution of YAML-based workflow definitions.

Purpose and Scope

The core package is responsible for the runtime execution of automations. It provides the foundational primitives that the server package wraps with API endpoints. Workflows are defined in YAML and executed through a step-by-step interpreter that supports multiple action types.

Core Exports

The package exposes three primary functions for workflow management:

// packages/core/src/index.ts
export { parseWorkflow, validateWorkflow, executeWorkflow }
FunctionPurpose
parseWorkflowParse YAML workflow definition into internal representation
validateWorkflowValidate workflow structure and step types
executeWorkflowExecute a workflow with provided context and parameters

Step Types

The core package supports multiple categories of step types for workflow construction:

CategorySteps
Browsernavigate, click, type, scroll, hover, wait, extract, extractAll, exists, key
Shellshell.run, terminal.open, terminal.run
LLMllm.classify, llm.generate
Controlcontrol.if, control.retry, control.stop, workflow.call
Datadata.first

Provider Integrations

The core package includes provider implementations for external service integration. GitHub integration is implemented in packages/core/src/providers/github.ts and provides the following capabilities:

  • listPRs - List pull requests
  • getPR - Get pull request details
  • createPR - Create a pull request
  • listIssues - List repository issues
  • getIssue - Get issue details

Sources: packages/core/src/providers/github.ts

Server Package (`@sidebutton/server`)

The server package serves as the central hub for all external interactions with the workflow engine. It wraps the core package with MCP protocol support, REST API endpoints, and embeds the dashboard application.

MCP Server

The MCP server implementation exposes workflow execution capabilities to MCP-compatible clients including Claude Desktop and Cursor. The server runs on port 9876 by default and provides the following tools:

MCP ToolDescription
run_workflowExecute a workflow by ID
list_workflowsList available workflows
get_workflowGet workflow YAML definition
get_run_logGet execution log
list_run_logsList recent executions
get_browser_statusCheck extension connection
capture_pageCapture page selectors
navigateNavigate browser to URL
snapshotGet accessibility tree
clickClick element
typeType text
scrollScroll page
extractExtract text
screenshotCapture screenshot
hoverHover over element

Sources: packages/server/README.md

REST API

The server exposes 60+ JSON endpoints for external integrations. The API supports the same workflow operations available through MCP, enabling programmatic access from any HTTP client.

# Run a workflow
curl -X POST http://localhost:9876/api/workflows/check_ticket/run \
  -H "Content-Type: application/json" \
  -d '{"params": {"ticket_id": "PROJ-123"}}'

# List workflows
curl http://localhost:9876/api/workflows

# Get run log
curl http://localhost:9876/api/runs/latest

Sources: README.md

Embedded Dashboard

The server embeds a React-based dashboard application served from packages/dashboard/. The dashboard provides:

  • Workflow browsing and execution
  • Run log viewing
  • Shortcut management
  • Action library
  • Workflow recording

Workflow Engine Extensions

The server extends the core workflow engine with 34+ step types, providing additional capabilities beyond the core package:

Extended CategoryAdditional Steps
Browserfill, press_key, scroll_into_view, evaluate, select_option
Extendedcheck_writing_quality, capture_page

Knowledge Packs

The server supports knowledge packs (also called skill packs) that provide domain-specific knowledge for AI-driven tasks. Knowledge packs include:

  • Selectors — CSS selectors for UI elements
  • Data models — entity types, fields, relationships, valid states
  • State machines — valid transitions per state
  • Role playbooks — role-specific procedures (QA, SE, PM, SD)
  • Common tasks — step-by-step procedures, gotchas, edge cases

Sources: README.md

CLI Package (`@sidebutton/sidebutton`)

The CLI package provides command-line interaction with the SideButton platform. It serves as the primary interface for local development and scripting workflows.

Installation

npm install -g sidebutton

Core Commands

CommandDescription
sidebuttonStart server (default port 9876)
sidebutton --stdioStart with stdio transport (Claude Desktop)
sidebutton -p 8080Start on custom port

Workflow Management

CommandDescription
sidebutton listList available workflows
sidebutton run <id>Run a workflow
sidebutton statusCheck server status

Knowledge Pack Commands

# Registry management
sidebutton registry add <path|url>   # Install from registry
sidebutton registry update [name]    # Update installed packs
sidebutton registry remove <name>    # Remove registry and packs
sidebutton registry list             # Show registries

# Search and install
sidebutton search [query]            # Search packs across registries
sidebutton install <path|url|name>   # Install a single knowledge pack
sidebutton uninstall <domain>        # Remove a knowledge pack

# Development
sidebutton init [domain]             # Scaffold a new knowledge pack
sidebutton validate [path]           # Lint and validate a knowledge pack
sidebutton publish                   # Publish to registry

Sources: packages/sidebutton/README.md

Publishing Workflows

The CLI supports publishing skill packs to remote registries via the publish command:

sidebutton publish

This command sends the manifest and all associated files to the configured remote registry at ${REMOTE_BASE_URL}/api/skill-packs/publish. Authentication is required via bearer token.

Dashboard Package

The dashboard is a React-based frontend application that provides the visual interface for managing workflows and viewing execution logs.

Entry Point

The dashboard application is mounted in packages/dashboard/index.html:

<div id="app"></div>
<script type="module" src="/src/main.ts"></script>

Key Pages

PageRoutePurpose
Dashboard Home/Display workflow shortcuts
Actions/actionsBrowse and search available workflows
Action Detail/actions/:idView workflow details and run
Workflows/workflowsLibrary of workflows
Workflow Detail/workflows/:idRead-only workflow view
Recordings/recordingsView recorded automations
Run Logs/run-logsView execution history

Chrome Extension

While not a separate npm package, the Chrome extension is an integral part of the SideButton ecosystem. It provides browser automation capabilities with:

  • 40+ browser commands (navigate, click, type, extract, scroll, wait, snapshot)
  • Real DOM access via CSS selectors
  • Recording mode for capturing manual actions as workflows
  • Embed buttons for injecting action buttons into web pages
  • WebSocket connection with stable reconnection

Sources: README.md

Dependency Graph

The packages have the following dependency relationships:

graph LR
    CLI["@sidebutton/sidebutton"] --> Server["@sidebutton/server"]
    Dashboard["dashboard"] --> Server
    Server --> Core["@sidebutton/core"]
    Server --> BrowserExt["Chrome Extension"]
    
    style Core fill:#e1f5fe
    style Server fill:#fff3e0
    style CLI fill:#e8f5e9
    style Dashboard fill:#f3e5f5
ConsumerDependencyRelationship
@sidebutton/sidebutton@sidebutton/serverCLI wraps server functionality
dashboard@sidebutton/serverDashboard embeds in server
@sidebutton/server@sidebutton/coreServer uses core for workflow execution

Technology Stack

LayerTechnology
RuntimeNode.js
Core EngineTypeScript
ServerFastify
API ProtocolMCP (Model Context Protocol)
DashboardReact, Vite
Browser AutomationChrome Extension (Manifest V3)
Package Managerpnpm (monorepo)

Quick Start

Running the Server

# Start the server
sidebutton

# Or with custom port
sidebutton -p 8080

Running a Workflow

# List available workflows
sidebutton list

# Run a specific workflow
sidebutton run <workflow-id>

Integrating with Claude Desktop

Add to ~/Library/Application Support/Claude/claude_desktop_config.json:

{
  "mcpServers": {
    "sidebutton": {
      "command": "npx",
      "args": ["sidebutton", "--stdio"]
    }
  }
}

Sources: [packages/core/src/providers/github.ts](packages/core/src/providers/github.ts)

Workflow Engine

Related topics: Step Types Reference, Workflow Examples, Package Overview

Section Related Pages

Continue reading this section for the full explanation and source context.

Section Core Components

Continue reading this section for the full explanation and source context.

Section Workflow Execution Flow

Continue reading this section for the full explanation and source context.

Section Browser Steps

Continue reading this section for the full explanation and source context.

Related topics: Step Types Reference, Workflow Examples, Package Overview

Workflow Engine

Overview

The SideButton Workflow Engine is a YAML-first orchestration system that enables automation of complex tasks through a declarative step-based approach. It provides 34+ built-in step types spanning browser automation, shell execution, LLM integration, and programmatic control flow operations.

The engine executes workflows defined in YAML format, supporting variable interpolation, conditional branching, retry logic, and cross-workflow chaining. Workflows can be triggered via MCP (Model Context Protocol), the REST API, or the dashboard interface.

Sources: README.md

Architecture

Core Components

graph TD
    A[Workflow YAML] --> B[Parser]
    B --> C[AST / Step Definitions]
    C --> D[Executor]
    D --> E[Step Handlers]
    
    F[Variables/Context] --> D
    D --> F
    
    G[MCP Client] --> D
    H[REST API] --> D
    I[Dashboard] --> D

The engine consists of three primary layers:

  1. Parser: Validates and parses YAML workflow definitions into structured step objects
  2. Executor: Orchestrates step execution, manages state, and handles control flow
  3. Step Handlers: Provider-specific implementations for each step type

Sources: packages/core/README.md

Workflow Execution Flow

sequenceDiagram
    participant Client
    participant Executor
    participant StepHandler
    participant Context
    
    Client->>Executor: executeWorkflow(workflow, context)
    Executor->>Executor: Parse YAML
    Loop For Each Step
        Executor->>StepHandler: Execute Step
        StepHandler->>Context: Update Variables
        StepHandler-->>Executor: Result
        Executor->>Executor: Check Control Flow
    End
    Executor-->>Client: Execution Result

Step Types

The workflow engine supports five major categories of steps:

Sources: README.md:1-50

Browser Steps

Used for web automation tasks. All browser steps require an active browser connection.

Step TypeDescription
browser.navigateOpen a URL in the browser
browser.clickClick an element by CSS selector
browser.typeType text into an input element
browser.fillFill input value directly (React-compatible)
browser.scrollScroll the page
browser.extractExtract text from an element into a variable
browser.extractAllExtract all matching elements
browser.extractMapExtract structured data from repeated elements
browser.waitWait for element or fixed delay
browser.existsCheck if element exists (returns boolean)
browser.hoverPosition cursor over element
browser.keySend keyboard keys
browser.snapshotCapture accessibility tree snapshot
browser.injectCSSInject CSS styles into page
browser.injectJSExecute JavaScript in page context
browser.select_optionSelect dropdown option
browser.scrollIntoViewScroll element into viewport

Sources: README.md:44-63

Shell Steps

Execute command-line operations on the host system.

Step TypeDescription
shell.runExecute a bash/shell command
terminal.openOpen a visible terminal window (macOS)
terminal.runRun command in visible terminal window

Sources: README.md:64-66

LLM Steps

Integrate with large language models for AI-driven operations.

Step TypeDescription
llm.classifyStructured classification with predefined categories
llm.generateFree-form text generation
llm.decideMake decisions based on context

Supported providers include Ollama (local), OpenAI, Anthropic, and Google.

Sources: README.md:67-73

Control Flow Steps

Manage workflow execution logic and branching.

Step TypeDescription
control.ifConditional branching based on expression evaluation
control.retryRetry block with configurable backoff
control.stopEnd workflow with success/error message
workflow.callCall another workflow with parameters
variable.setSet a variable value

Sources: README.md:74-78

Data Steps

Manipulate and transform data between steps.

Step TypeDescription
data.firstExtract first item from a list
data.getRetrieve stored data value

Sources: README.md:79-82

Variable Interpolation

The workflow engine uses {{variable}} syntax for referencing extracted values and parameters.

steps:
  - type: browser.extract
    selector: ".username"
    as: user
  - type: shell.run
    cmd: "echo 'Hello, {{user}}!'"

Variables can be:

  • Extracted from page elements using as parameter
  • Passed as workflow parameters
  • Set via variable.set steps
  • Returned from nested workflow calls

Sources: README.md:103-115

Workflow Definition Schema

A workflow is defined with the following structure:

id: workflow_identifier
title: "Display Title"
description: "What this workflow does"
params:
  param_name: string  # or number, boolean, array, object
steps:
  - type: browser.navigate
    url: "https://example.com"
  - type: browser.extract
    selector: ".element"
    as: extracted_value

Required Fields

FieldTypeDescription
idstringUnique workflow identifier
titlestringHuman-readable title
stepsarrayOrdered list of step definitions

Optional Fields

FieldTypeDescription
descriptionstringWorkflow description
paramsobjectParameter definitions with types
categorystringWorkflow category
platformstringTarget platform

Sources: packages/server/src/server.ts

Control Flow Patterns

Conditional Branching

- type: control.if
  condition: "{{current_status}} != 'Done'"
  then:
    - type: llm.classify
      prompt: "Should this ticket be closed?"
      classes: [close, keep_open]
      as: decision

Sources: README.md:117-125

Retry with Backoff

- type: control.retry
  max_attempts: 3
  backoff: 1000  # milliseconds
  steps:
    - type: shell.run
      cmd: "curl -f https://api.example.com/health"

Workflow Chaining

- type: workflow.call
  workflow_id: another_workflow
  params:
    input_value: "{{extracted_data}}"

Sources: packages/server/defaults/roles/software-engineer.md

MCP Integration

The workflow engine exposes functionality through the Model Context Protocol, enabling AI assistants to execute and manage workflows.

Sources: packages/server/README.md

Available MCP Tools

ToolDescription
run_workflowExecute a workflow by ID
list_workflowsList available workflows
get_workflowGet workflow YAML definition
get_run_logGet execution log for a run
list_run_logsList recent workflow executions
get_browser_statusCheck browser extension connection
capture_pageCapture selectors from current page

MCP Tool Handlers

graph LR
    A[MCP Request] --> B[Handler]
    B --> C{tool_name}
    C -->|list_workflows| D[toolListWorkflows]
    C -->|get_workflow| E[toolGetWorkflow]
    C -->|list_run_logs| F[toolListRunLogs]
    C -->|run_workflow| G[executeWorkflow]

Sources: packages/server/src/mcp/handler.ts

List Workflows Response Format

interface WorkflowListItem {
  workflow: Workflow;
  source: 'actions' | 'workflows';
}

// Response includes:
// - workflow.id
// - workflow.title
// - workflow.params (if any)
// - source identifier

Sources: packages/server/src/mcp/handler.ts:1-50

GitHub Integration

The engine provides specialized steps for GitHub operations through the GitHub CLI provider.

Sources: packages/core/src/providers/github.ts

GitHub Step Types

StepDescription
git.listPRsList pull requests with state filter
git.getPRGet PR details and diff statistics
git.createPRCreate a new pull request
git.listIssuesList issues with filters
git.getIssueGet issue details
issues.createCreate an issue
issues.commentAdd a comment to issue/PR
issues.transitionChange issue status

Create Pull Request Parameters

interface CreatePRParams {
  repo?: string;       // Repository in format "owner/repo"
  title: string;       // PR title
  body?: string;       // PR description
  head: string;        // Head branch name
  base?: string;       // Base branch (default: main)
}

Sources: packages/core/src/providers/github.ts:1-50

Common GitHub Workflows

Review Open PRs:

  1. git.listPRs with state: "open" — view pending reviews
  2. git.getPR with PR number — read details and diff stats
  3. Use browser tools for visual diff review

Autonomous Development Cycle:

  1. git.listIssues — browse available issues
  2. git.getIssue — read candidate details
  3. llm.decide — select best issue based on priority
  4. issues.comment — signal work is starting
  5. git.createPR — submit completed work

Sources: packages/server/defaults/targets/_provider-github-cli.md

Execution Context

Each workflow execution maintains a context object that stores:

  • Variables: Extracted values and set variables
  • Parameters: Input parameters passed to the workflow
  • Results: Step execution results
  • Logs: Execution logs for debugging
# Context is passed to executeWorkflow
context:
  params:
    ticket_id: "PROJ-123"
  variables:
    current_status: "In Progress"

Error Handling

The engine provides multiple mechanisms for error handling:

  1. control.retry: Automatically retry failed steps with exponential backoff
  2. control.stop: Gracefully end workflow with error message
  3. Conditional checks: Use browser.exists to verify elements before actions
steps:
  - type: browser.exists
    selector: ".error-message"
    as: has_error
  - type: control.if
    condition: "{{has_error}}"
    then:
      - type: control.stop
        status: error
        message: "Error detected on page"

Dashboard Integration

The workflow engine integrates with the SideButton dashboard for:

  • Workflow Library: Browse and install workflows
  • Run History: View execution logs and results
  • Quick Run: Execute workflows with parameter inputs

Sources: packages/server/src/server.ts

Workflow Installation Flow

  1. User navigates to workflow in library
  2. Dashboard renders install confirmation page
  3. POST request submits workflow to local server
  4. Server saves workflow to user's action library
  5. Success page confirms installation
graph TD
    A[Browse Workflow] --> B[Click Install]
    B --> C[POST /install/:workflowId]
    C --> D[Server Validates]
    D --> E[Save to Actions Library]
    E --> F[Show Success Page]

Best Practices

  1. Use extracted variables immediately: Variable references should occur close to their extraction step
  2. Add wait conditions: Use browser.wait before extracting dynamic content
  3. Handle missing elements: Check existence before interacting
  4. Limit retry attempts: Configure appropriate max_attempts for unreliable operations
  5. Keep workflows focused: Prefer workflow chaining over monolithic single workflows

Sources: packages/server/defaults/roles/qa.md

Sources: [README.md](https://github.com/sidebutton/sidebutton/blob/main/README.md)

Step Types Reference

Related topics: Workflow Engine, Workflow Examples

Section Related Pages

Continue reading this section for the full explanation and source context.

Related topics: Workflow Engine, Workflow Examples

Step Types Reference

SideButton workflows are composed of discrete steps that define actions to be executed sequentially. Each step type serves a specific purpose—from interacting with web pages to executing shell commands, making decisions via LLM, or orchestrating control flow. Understanding the available step types is essential for building effective automations.

Overview

Steps are the fundamental building blocks of SideButton workflows. They are defined within workflow YAML files and specify:

  • What action to perform (the step type)
  • What parameters to use for that action
  • How to handle results (variable assignment, conditional logic)

Sources: packages/core/README.md

graph TD
    A[Workflow YAML] --> B[Step Execution Engine]
    B --> C[Browser Steps]
    B --> D[Shell Steps]
    B --> E[LLM Steps]
    B --> F[Control Steps]
    B --> G[Data Steps]
    B --> H[Git Steps]
    B --> I[Issues Steps]

Step Type Categories

SideButton organizes steps into the following categories:

CategoryPurposePrimary Use Case
BrowserWeb page interactionUI automation, data extraction
ShellCommand executionBuild tools, CLI operations
LLMAI-powered decisionsClassification, content generation
ControlFlow controlConditionals, retries, sub-workflows
DataData manipulationVariable assignment, extraction
GitVersion controlPRs, issues, repository ops
IssuesIssue trackingBug tracking, task management

Sources: packages/core/README.md

Sources: [packages/core/README.md]()

Workflow Examples

Related topics: Workflow Engine, Step Types Reference

Section Related Pages

Continue reading this section for the full explanation and source context.

Section Basic Anatomy of a Workflow

Continue reading this section for the full explanation and source context.

Section Browser Steps

Continue reading this section for the full explanation and source context.

Section Shell/Terminal Steps

Continue reading this section for the full explanation and source context.

Related topics: Workflow Engine, Step Types Reference

Workflow Examples

Overview

Workflow Examples in SideButton are pre-built YAML-based automation sequences that demonstrate how to combine different step types to accomplish real-world tasks. These examples serve as both practical utilities and learning references for building custom automations.

The SideButton workflow system supports multiple integration methods depending on available providers:

Provider PreferenceMethodReliability
1stAPI ProviderFastest and most reliable
2ndCLI ToolGood for git operations
3rdBrowser AutomationUniversal fallback

Sources: packages/server/defaults/roles/software-engineer.md

Workflow Structure

Every workflow in SideButton follows a standardized YAML format with the following core structure:

id: workflow_identifier
title: "Human Readable Title"
description: "What this workflow accomplishes"
steps:
  - type: step.type
    property: value

Sources: CONTRIBUTING.md

Basic Anatomy of a Workflow

graph TD
    A[YAML Workflow File] --> B[Workflow Engine]
    B --> C{Step Type Router}
    C -->|Browser| D[Browser Tools]
    C -->|Shell| E[Terminal/CLI]
    C -->|LLM| F[AI Provider]
    C -->|Control| G[Flow Control]
    C -->|Data| H[Data Manipulation]
    
    D --> I[Execute Action]
    E --> I
    F --> I
    G --> I
    H --> I
    
    I --> J[Run Log Entry]

Step Types Reference

SideButton provides five categories of steps that can be combined in workflows:

Sources: packages/core/README.md

Browser Steps

StepPurpose
navigateNavigate browser to URL
clickClick an element
typeType text into an element
scrollScroll the page
hoverHover over element
waitWait for condition
extractExtract text from element
extractAllExtract all matching elements
existsCheck element exists
keyPress keyboard key
snapshotGet accessibility tree
screenshotCapture screenshot

Sources: README.md

Shell/Terminal Steps

StepPurpose
shell.runExecute shell command
terminal.openOpen visible terminal window (macOS)
terminal.runRun command in terminal window

LLM Steps

StepPurpose
llm.classifyStructured classification with categories
llm.generateFree-form text generation
llm.decideAI-driven decision making

Control Flow Steps

StepPurpose
control.ifConditional branching
control.retryRetry with backoff
control.stopEnd workflow with message
workflow.callCall another workflow with parameters

Data Steps

StepPurpose
data.firstExtract first item from list
data.getGet value from data object

GitHub Workflow Examples

SideButton includes pre-built workflows for GitHub automation stored in the bundles/github/workflows/ directory.

GitHub PR Claude Review

This workflow demonstrates how to automate PR review using Claude AI. The workflow follows a typical review pattern:

graph LR
    A[List Open PRs] --> B[Get PR Details]
    B --> C[Extract Diff Stats]
    C --> D[Navigate to PR]
    D --> E[Review Files Changed]
    E --> F[Generate Review Comment]

Sources: bundles/github/workflows/github_pr_claude_review.yaml

Common PR Review Sequence:

  1. git.listPRs with state: "open" — see what needs review
  2. git.getPR with number — read details and diff stats
  3. Use browser tools for visual diff review if needed

Sources: packages/server/defaults/targets/_provider-github-cli.md

Create Release Workflow

The release creation workflow demonstrates orchestrating multiple operations:

  1. Open the release page using browser automation
  2. Decide the next version tag based on commit history
  3. Create the release with proper version naming

Sources: bundles/github/workflows/create_release.yaml

Creating a PR after coding:

  1. git.createPR with title, head branch, base branch
  2. issues.comment on related issue linking the PR

Sources: packages/server/defaults/targets/_provider-github-cli.md

LLM-Based Workflows

Summarize Workflow

The llm_summarize.yaml workflow demonstrates integration with AI providers for text processing:

id: llm_summarize
title: "Summarize Content"
description: "Generate a summary using LLM"
steps:
  - type: llm.generate
    prompt: "{{input_text}}"
    instruction: "Provide a concise summary"

Sources: workflows/llm_summarize.yaml Sources: packages/server/defaults/workflows/llm_summarize.yaml

LLM Provider Support:

LLM steps work with multiple providers:

  • Ollama (local)
  • OpenAI
  • Anthropic
  • Google

Sources: README.md

Decision Workflows

The llm.decide step type enables autonomous decision-making:

graph TD
    A[Issue Received] --> B{llm.decide}
    B -->|Clear & well-scoped| C[Pick and start work]
    B -->|Ambiguous/blocked| D[Skip, pick next]
    B -->|Same priority| E[Prefer smaller scope]
    B -->|No suitable issues| F[Stop and report]

Sources: packages/server/defaults/roles/software-engineer.md

Browser-Based Workflows

Wikipedia Open Example

This demonstrates basic browser navigation and content extraction:

id: wikipedia_open
title: "Open Wikipedia Page"
steps:
  - type: browser.navigate
    url: "{{wiki_url}}"
  - type: browser.snapshot
    as: page_content
  - type: browser.extract
    selector: "{{element_selector}}"
    as: extracted_text

Sources: workflows/wikipedia_open.yaml

Best Practices for Browser Steps:

  1. Use snapshot to understand page structure before taking actions
  2. Use extract to pull specific content from pages
  3. Use screenshot for visual verification

Sources: packages/server/defaults/roles/software-engineer.md

Variable Interpolation

All workflows support variable interpolation using {{variable}} syntax:

steps:
  - type: browser.extract
    selector: ".username"
    as: user
  - type: shell.run
    cmd: "echo 'Hello, {{user}}!'"
  - type: llm.generate
    prompt: "Write a greeting for {{user}}"

Sources: README.md

Parameterized Workflows

Workflows can accept parameters for flexibility:

id: check_ticket_status
title: "Check Ticket Status"
params:
  ticket_id: string
steps:
  - type: browser.navigate
    url: "https://jira.example.com/browse/{{ticket_id}}"
  - type: browser.extract
    selector: "[data-testid='status-field']"
    as: current_status

Sources: README.md

Execution Flow

sequenceDiagram
    participant User
    participant MCP as MCP Handler
    participant Engine as Workflow Engine
    participant Steps as Step Executors
    
    User->>MCP: run_workflow(workflow_id, params)
    MCP->>Engine: executeWorkflow(workflow, context)
    Engine->>Steps: Execute Step 1
    Steps-->>Engine: Step Result
    Engine->>Steps: Execute Step 2
    Steps-->>Engine: Step Result
    Engine->>MCP: Run Log Entry
    MCP-->>User: Execution Result

Sources: packages/server/src/mcp/handler.ts Sources: packages/core/README.md

Workflow Bundles

SideButton organizes related workflows into bundles. The GitHub bundle includes:

{
  "name": "sidebutton/github",
  "version": "1.0.0",
  "title": "GitHub Automation",
  "description": "Workflows for GitHub releases, PR reviews, and repository management",
  "workflows": [
    "open_release_page.yaml",
    "decide_next_tag.yaml",
    "create_release.yaml",
    "github_pr_claude_review.yaml"
  ],
  "requires": {
    "llm": true,
    "browser": true
  }
}

Sources: bundles/github/bundle.json

Adding Custom Workflows

The easiest way to contribute is by adding workflows to the workflows/ directory:

# Create a new workflow file
cat > workflows/my_workflow.yaml << 'EOF'
id: my_workflow
title: "My Workflow"
description: "What this workflow does"
steps:
  - type: shell.run
    cmd: "echo 'Hello!'"
EOF

Sources: CONTRIBUTING.md

See Also

Sources: [packages/server/defaults/roles/software-engineer.md]()

MCP Server Integration

Related topics: System Architecture, Chrome Extension, Knowledge Packs

Section Related Pages

Continue reading this section for the full explanation and source context.

Section Transport Modes

Continue reading this section for the full explanation and source context.

Section Component Flow

Continue reading this section for the full explanation and source context.

Section Tool Annotations

Continue reading this section for the full explanation and source context.

Related topics: System Architecture, Chrome Extension, Knowledge Packs

MCP Server Integration

Overview

The SideButton MCP Server is the core integration layer that exposes browser automation tools, workflow execution capabilities, and knowledge pack management through the Model Context Protocol (MCP). This enables AI assistants like Claude Desktop, Claude Code, and Cursor to control web browsers and execute automated workflows using a standardized interface.

Sources: packages/server/README.md

Architecture

The MCP server is built as part of the @sidebutton/server package and supports multiple transport mechanisms for different AI assistant clients.

Transport Modes

TransportUse CaseConfiguration
HTTP/SSEClaude Code, Cursortype: "sse", url: "http://localhost:9876/mcp"
stdioClaude Desktopcommand: "npx", args: ["sidebutton", "--stdio"]
WebSocketChrome ExtensionAutomatic reconnection support

Sources: packages/sidebutton/README.md

Component Flow

graph TD
    subgraph "AI Assistant"
        A[Claude Desktop / Claude Code / Cursor]
    end
    
    subgraph "MCP Transport"
        B[stdio / HTTP-SSE]
    end
    
    subgraph "SideButton Server"
        C[MCP Handler]
        D[Tool Registry]
        E[Workflow Engine]
        F[Browser Controller]
    end
    
    subgraph "Browser Layer"
        G[Chrome Extension]
        H[Real DOM Access]
    end
    
    A --> B
    B --> C
    C --> D
    C --> E
    C --> F
    F <--> G
    G <--> H

Tool Registry

The MCP server exposes a comprehensive set of tools organized by functionality. Each tool follows a consistent schema with annotations for the Claude Connectors Directory.

Sources: packages/server/src/mcp/tools.ts:1-50

Tool Annotations

AnnotationPurposeExample
titleHuman-readable display name"Run Workflow"
readOnlyHintIndicates observation-only toolstrue for snapshot
destructiveHintIndicates state-mutating toolstrue for run_workflow
openWorldHintIndicates external world interactiontrue for browser tools

Sources: packages/server/src/mcp/tools.ts:14-23

Workflow Tools

Core Workflow Operations

ToolDescriptionMutates State
run_workflowExecute a workflow automation by IDYes
list_workflowsList all available workflowsNo
get_workflowGet workflow YAML definitionNo
list_run_logsList recent workflow executionsNo
get_run_logGet execution log for a specific runNo

run_workflow Parameters

{
  workflow_id: string;  // Required: Unique identifier
  params?: {            // Optional: Key-value parameters
    [key: string]: string;
  };
}

Sources: packages/server/src/mcp/tools.ts:35-52

Browser Automation Tools

The MCP server provides direct browser control through the connected Chrome Extension.

Navigation & State

ToolDescription
navigateNavigate browser to a URL
snapshotGet page accessibility tree (DOM structure)
screenshotCapture page screenshot
get_browser_statusCheck extension connection status
capture_pageCapture CSS selectors from current page

Interaction Tools

ToolDescriptionRead-Only
clickClick an element by selectorNo
typeType text into an input elementNo
scrollScroll the pageNo
hoverHover over an elementNo
extractExtract text from an elementYes
extract_allExtract text from all matching elementsYes
extract_mapExtract structured data from repeated elementsYes
select_optionSelect a dropdown optionNo
waitWait for element or conditionNo
existsCheck if element existsYes
keyPress a keyboard keyNo

Sources: packages/server/README.md

Browser Tool Annotations

{
  name: 'snapshot',
  description: 'Get page accessibility snapshot',
  inputSchema: {
    type: 'object',
    properties: {
      // Configuration options
    }
  },
  annotations: {
    title: 'Page Snapshot',
    readOnlyHint: true,      // Observation only
    openWorldHint: true      // Interacts with browser
  }
}

Provider Integration Tools

SideButton integrates with external providers for enhanced functionality:

CategoryTools
Gitgit.listPRs, git.getPR, git.createPR, git.listIssues, git.getIssue
Issuesissues.search, issues.get, issues.create, issues.transition, issues.comment, issues.attach
Chatchat.readChannel, chat.readThread, chat.listChannels
Terminalterminal.open, terminal.run
LLMllm.generate, llm.decide, llm.classify

Git Provider Implementation

The GitHub CLI connector provides programmatic access to GitHub operations:

async createPullRequest(params: {
  repo?: string;
  title: string;
  body?: string;
  head: string;
  base?: string;
}): Promise<{ number: number; url: string }>

Sources: packages/core/src/providers/github.ts

MCP Endpoint Configuration

Server Endpoints

EndpointMethodPurpose
/mcpSSEServer-Sent Events for Claude Code/Cursor
/mcpPOSTTool invocation requests
/mcpGETServer info and capabilities

Sources: packages/server/src/server.ts

Client Configuration Examples

#### Claude Desktop

{
  "mcpServers": {
    "sidebutton": {
      "command": "npx",
      "args": ["sidebutton", "--stdio"]
    }
  }
}

#### Claude Code

{
  "mcpServers": {
    "sidebutton": {
      "type": "sse",
      "url": "http://localhost:9876/mcp"
    }
  }
}

#### Cursor

{
  "mcpServers": {
    "sidebutton": {
      "url": "http://localhost:9876/mcp"
    }
  }
}

Sources: packages/sidebutton/README.md

CLI Commands for MCP

The sidebutton CLI provides workflow management commands:

sidebutton list              # List available workflows
sidebutton run <id>          # Run a workflow by ID
sidebutton status            # Check server status

Sources: packages/sidebutton/README.md

Data Models

MCP Tool Schema

export interface McpTool {
  name: string;
  description: string;
  inputSchema: Record<string, unknown>;
  annotations?: McpToolAnnotations;
}

export interface McpToolAnnotations {
  title: string;
  readOnlyHint?: true;
  destructiveHint?: true;
  openWorldHint?: true;
}

Sources: packages/server/src/mcp/tools.ts:7-23

Quick Start

``bash npx sidebutton@latest ``

  1. Start the server:
  1. Connect your AI assistant using the appropriate configuration above

``bash sidebutton status ``

  1. Verify connection:

``bash sidebutton run <workflow-id> ``

  1. Execute a workflow:

Sources: AGENTS.md

See Also

Source: https://github.com/sidebutton/sidebutton / Human Manual

Chrome Extension

Related topics: MCP Server Integration, Step Types Reference

Section Related Pages

Continue reading this section for the full explanation and source context.

Section Connection Flow

Continue reading this section for the full explanation and source context.

Section Real DOM Access

Continue reading this section for the full explanation and source context.

Section Recording Mode

Continue reading this section for the full explanation and source context.

Related topics: MCP Server Integration, Step Types Reference

Chrome Extension

The SideButton Chrome Extension is a Manifest V3 browser extension that provides real-time browser automation capabilities through a WebSocket connection to the local MCP server. It enables AI agents and workflows to interact with web pages using real DOM access via CSS selectors.

Overview

The Chrome Extension serves as the primary interface between SideButton's workflow engine and web pages. Rather than relying on pixel coordinates or screenshots, it provides direct DOM manipulation capabilities, making browser automation precise and reliable.

Distribution: Available on the Chrome Web Store

Source location: extension/ directory in the repository

Sources: CONTRIBUTING.md

Architecture

graph TD
    subgraph "Browser Context"
        CE[Chrome Extension]
        WS[WebSocket Connection]
    end
    
    subgraph "Local Server"
        MCP[MCP Server :9876]
        API[REST API]
    end
    
    subgraph "Workflow Engine"
        WE[Workflow Executor]
        ST[Step Types]
    end
    
    CE -->|Real DOM Access| PAGE[Web Pages]
    CE -->|WebSocket| WS
    WS --> MCP
    MCP --> WE
    WE --> ST
    ST -->|browser.* steps| CE

Connection Flow

  1. User clicks the SideButton extension icon in Chrome
  2. Extension establishes WebSocket connection to http://localhost:9876
  3. MCP server validates the connection and exposes browser tools
  4. Workflows execute browser steps through the extension
  5. Extension interacts with web pages via Chrome DevTools Protocol

Sources: README.md

Browser Commands

The extension supports 40+ browser commands organized into functional categories:

CommandDescriptionUse Case
navigateNavigate browser to URLOpen pages for automation
clickClick an element by selectorInteract with buttons, links
typeType text into an elementForm input
scrollScroll the pageLoad more content
hoverHover over elementTrigger hover states
extractExtract text from elementRead page content
extract_allExtract all matching elementsGet lists of items
extract_mapExtract structured data from repeated elementsScrape data tables
select_optionSelect dropdown optionChoose from selects
fillFill input value (React-compatible)Handle React inputs
press_keySend keyboard keysKeyboard shortcuts
scroll_into_viewScroll element into viewportEnsure element visible
evaluateExecute JavaScript in browserCustom interactions
existsCheck if element existsConditional logic
waitWait for element or delaySynchronize with page
screenshotCapture page screenshotVisual verification
snapshotGet page accessibility treeUnderstand page structure
capture_pageCapture selectors from current pageIdentify elements
check_writing_qualityEvaluate text qualityContent validation

Sources: README.md

Key Features

Real DOM Access

Unlike screen-based automation tools that rely on pixel coordinates or OCR, SideButton uses real DOM access through CSS selectors. This provides:

  • Precise element targeting
  • Works with dynamically rendered content
  • Handles SPA (Single Page Applications) correctly
  • Faster execution than vision-based alternatives

Sources: README.md

Recording Mode

The extension includes a recording mode that captures manual actions as reusable workflows. This enables:

  1. Manual browsing through desired workflow steps
  2. Extension records each action with selector
  3. Export as YAML workflow definition
  4. Replay with workflow engine

Embed Buttons

SideButton can inject action buttons into any web page, enabling:

  • Quick access to defined actions
  • On-page automation triggers
  • Custom UI integration

WebSocket Connection

The extension maintains a stable WebSocket connection with automatic reconnection:

sequenceDiagram
    participant EXT as Extension
    participant WS as WebSocket
    participant MCP as MCP Server
    participant PAGE as Web Page
    
    EXT->>WS: Connect
    WS->>MCP: Establish Session
    MCP-->>WS: Connected
    WS-->>EXT: Ready
    
    loop On Command
        MCP->>EXT: Execute Tool
        EXT->>PAGE: DOM Action
        PAGE-->>EXT: Result
        EXT-->>MCP: Response
    end
    
    Note over EXT,WS: Auto-reconnect on disconnect

Stable Reconnection

The WebSocket implementation handles connection drops gracefully:

  • Automatic retry with exponential backoff
  • Works with local server instances
  • Supports remote server connections
  • Maintains session state across reconnections

Sources: README.md

Installation

From Chrome Web Store

  1. Visit the Chrome Web Store listing
  2. Click "Add to Chrome"
  3. Grant necessary permissions

From Source (Development)

  1. Go to chrome://extensions/
  2. Enable Developer mode
  3. Click Load unpacked and select the extension/ folder
  4. Navigate to any page and click the extension icon to connect

Sources: CONTRIBUTING.md

Connection States

StateIndicatorMeaning
ConnectedGreen dotExtension linked to server
DisconnectedRed dotNo active connection
ReconnectingYellow dotAttempting to reconnect

Verify connection status using the MCP get_browser_status tool:

{
  "tool": "get_browser_status",
  "expected": { "connected": true }
}

Sources: packages/server/defaults/roles/qa.md

Usage in Workflows

Browser steps are defined in YAML workflows:

steps:
  - type: browser.navigate
    url: "https://github.com/owner/repo/issues"
  
  - type: browser.snapshot
    as: page_state
  
  - type: browser.click
    selector: ".btn-primary"
  
  - type: browser.type
    selector: "#title"
    text: "{{issue_title}}"
  
  - type: browser.extract
    selector: ".issue-number"
    as: new_issue_id

Variable Interpolation

Use {{variable}} syntax to reference extracted values:

steps:
  - type: browser.extract
    selector: ".username"
    as: user
  - type: shell.run
    cmd: "echo 'Hello, {{user}}!'"

Sources: README.md

Step Types Reference

Navigation Steps

Step TypeParametersDescription
browser.navigateurlOpen URL in connected browser

Interaction Steps

Step TypeParametersDescription
browser.clickselectorClick element by CSS selector
browser.typeselector, textType text into input
browser.fillselector, valueFill input value (React-compatible)
browser.hoverselectorHover over element
browser.select_optionselector, valueSelect dropdown option
browser.press_keykeysSend keyboard keys
browser.scrolldirection, amountScroll page
browser.scroll_into_viewselectorScroll element into view

Extraction Steps

Step TypeParametersDescription
browser.extractselector, asExtract text from single element
browser.extract_allselector, asExtract all matching elements
browser.extract_mapselector, mapping, asExtract structured data
browser.snapshotasGet accessibility tree
browser.screenshotasCapture screenshot

Verification Steps

Step TypeParametersDescription
browser.existsselectorCheck if element exists
browser.waitselector or msWait for element or delay

Advanced Steps

Step TypeParametersDescription
browser.capture_page-Capture selectors from current page
browser.evaluatescriptExecute JavaScript

Sources: packages/core/README.md

Integration with Providers

The extension works with platform-specific browser providers for deeper integration:

GitHub Browser Provider

When configured with GITHUB_BROWSER_URL, the extension can:

  1. Navigate to repository pages
  2. Read PR details via snapshot
  3. Review diffs by clicking "Files changed" tab
  4. List and filter pull requests
  5. Create issues through the web interface

Configuration: Set GITHUB_BROWSER_URL in Settings > Environment Variables (e.g., https://github.com)

Requirements: Must be logged into GitHub in the connected browser session

Sources: packages/server/defaults/targets/_provider-github-browser.md

Provider Preference

When multiple integration methods exist, SideButton follows this preference order:

  1. API Provider — Fastest and most reliable
  2. CLI Tool — Good for git operations, builds
  3. Browser Automation — Universal fallback for visual tasks
graph LR
    A[Task] --> B{API Available?}
    B -->|Yes| C[Use API]
    B -->|No| D{CLI Available?}
    D -->|Yes| E[Use CLI]
    D -->|No| F[Browser Automation]
    
    C -->|Browser needed| G[Browser via Extension]
    E -->|Visual review| G

Browser tools complement CLI for visual tasks like:

  • Diff viewing
  • Board reviews
  • UI bug identification
  • Screenshot evidence

Sources: packages/server/defaults/roles/software-engineer.md

Smoke Test

Verify extension connectivity during deployment testing:

Step 1: Server Health

GET http://localhost:9876/health

Expected response:

{"status":"ok","version":"...","browser_connected":true}

If browser_connected: false — stop, Chrome extension is not connected.

Step 2: Extension Connection

Use get_browser_status tool:

Expected: { "connected": true }

If disconnected:

  1. Open Chrome
  2. Verify SideButton extension is enabled at chrome://extensions
  3. Refresh the page

Step 3: Snapshot Test

Navigate to any page, then use snapshot:

Verify: Returns structured YAML with element refs (ref=N), not empty, contains page elements.

Sources: packages/server/defaults/roles/qa.md

Error Handling

Common extension issues and solutions:

IssueCauseSolution
browser_connected: falseExtension not connectedClick extension icon to connect
WebSocket timeoutServer not runningStart with pnpm dev:server
Element not foundSelector changedUse capture_page to refresh selectors
React input issuesVirtual DOMUse fill instead of type

Security Considerations

  • Browser extension requires significant permissions for DOM access
  • WebSocket connection is local by default
  • Remote connections should use authenticated endpoints
  • Never store credentials in workflow definitions

Sources: AGENTS.md

Sources: [CONTRIBUTING.md](https://github.com/sidebutton/sidebutton/blob/main/CONTRIBUTING.md)

Knowledge Packs

Related topics: MCP Server Integration, Getting Started

Section Related Pages

Continue reading this section for the full explanation and source context.

Section Selector Modules

Continue reading this section for the full explanation and source context.

Section Role Playbooks

Continue reading this section for the full explanation and source context.

Section Registry CLI Commands

Continue reading this section for the full explanation and source context.

Related topics: MCP Server Integration, Getting Started

Knowledge Packs

Knowledge Packs (also referred to as Skill Packs in CLI commands and code) are installable domain-specific modules that teach autonomous AI agents how specific web applications work. They serve as the foundational knowledge layer powering AI code review, automated testing, and enterprise AI agent deployments.

Overview

Knowledge Packs provide structured, domain-specific intelligence to the SideButton platform. Rather than requiring AI agents to learn from scratch how to interact with each web application, Knowledge Packs pre-package essential information that enables immediate, accurate automation.

The SideButton registry currently hosts 11 domains with 28+ modules published, and maintains an open registry where anyone can build and share packs for any web application.

Sources: README.md

Pack Components

Each Knowledge Pack comprises five core module types that together provide comprehensive domain understanding:

ComponentDescriptionPurpose
SelectorsCSS selectors for UI elementsPrecise DOM element targeting without pixel coordinates or screenshots
Data ModelsEntity types, fields, relationships, valid statesStructured understanding of domain objects
State MachinesValid transitions per statePredictable, safe workflow execution
Role PlaybooksRole-specific procedures (QA, SE, PM, SD)Context-aware guidance for different user roles
Common TasksStep-by-step procedures, gotchas, edge casesHandling typical operations with best practices

Sources: README.md

Selector Modules

Selectors provide CSS-based targeting for browser automation, ensuring reliability across different browsers and viewport sizes. Unlike coordinate-based or screenshot-based approaches, CSS selectors remain stable as long as the application's DOM structure is maintained.

Role Playbooks

Role playbooks define standard operating procedures for specific personas. For example, the software-engineer role includes:

  • Decision guidance for issue prioritization
  • Step types for common development tasks
  • Integration patterns for git, issues, chat, and terminal operations

Sources: packages/server/defaults/roles/software-engineer.md

Architecture

graph TD
    A[User/Agent] -->|sidebutton install| B[CLI]
    B --> C{Source Type}
    C -->|Local Path| D[Local Directory]
    C -->|Git URL| E[Git Repository]
    C -->|Registry Name| F[SideButton Registry]
    
    D --> G[Install Skill Pack]
    E --> G
    F --> H[Fetch from Registry API]
    H --> G
    
    G --> I[Parse Manifest]
    I --> J[Copy to ~/.sidebutton/packs/]
    J --> K[Knowledge Pack Active]
    
    L[Workflow Engine] -->|Uses| K
    M[MCP Tools] -->|Reads| K

Installation Methods

Knowledge Packs can be installed from multiple sources:

Source TypeCommand ExampleUse Case
Local directorysidebutton install ./my-packDevelopment and testing
Git URLsidebutton install https://github.com/org/skill-packsRemote repositories
Registry namesidebutton install github.comPublished registry packs
# Install from registry
sidebutton install github.com
sidebutton install atlassian.net

# Install from local path
sidebutton install ./custom-pack

# Install from Git URL
sidebutton install https://github.com/org/skill-packs

# Force reinstall
sidebutton install github.com --force

Sources: packages/server/src/cli.ts

Registry Management

The registry system allows centralized distribution and discovery of Knowledge Packs.

Registry CLI Commands

CommandDescription
`sidebutton registry add <path\url>`Register and install all packs from a registry
sidebutton registry update [name]Update installed packs from registry
sidebutton registry remove <name>Uninstall packs and remove registry
sidebutton registry listShow registries and pack counts
sidebutton search [query]Search packs across registries

Sources: packages/server/README.md

Registry Configuration

Registries are stored in the SideButton configuration directory (~/.sidebutton/registries.json) and contain metadata about available skill pack sources.

Publishing Knowledge Packs

Publishing Process

  1. Initialize a new pack using sidebutton init [domain]
  2. Develop the pack with manifest and modules
  3. Validate using sidebutton validate [path]
  4. Authenticate with sidebutton login
  5. Publish via sidebutton publish

Manifest Structure

The manifest.json defines the pack's metadata:

{
  "domain": "github.com",
  "title": "GitHub",
  "version": "1.0.0",
  "description": "GitHub integration for AI agents",
  "tagline": "Streamlined GitHub workflows",
  "category": "development",
  "modules": ["selectors", "data-models", "state-machines"],
  "roles": ["software-engineer", "qa"]
}

Sources: packages/server/src/cli.ts

Publishing Endpoint

const res = await fetch(`${REMOTE_BASE_URL}/api/skill-packs/publish`, {
  method: 'POST',
  headers: {
    'Content-Type': 'application/json',
    'Authorization': `Bearer ${auth.token}`,
  },
  body: JSON.stringify({
    domain: manifest.domain,
    name: manifest.title || manifest.name || manifest.domain,
    version: manifest.version,
    description: manifest.description || '',
    tagline: manifest.tagline || '',
    modules: manifest.modules || [],
    roles: manifest.roles || [],
    category: manifest.category || '',
    manifest,
    files,
  }),
});

Sources: packages/server/src/cli.ts

Integration with Workflow Engine

Knowledge Packs integrate with the core SideButton workflow engine through step types that reference pack-specific configurations:

graph LR
    A[Knowledge Pack] --> B[Step Type Resolution]
    B --> C[Provider Selection]
    C --> D[Git Provider]
    C --> E[Issues Provider]
    C --> F[Chat Provider]
    C --> G[Browser Provider]

Available Step Types

CategorySteps
Browsernavigate, click, type, scroll, hover, wait, extract, extractAll, exists, key
Shellshell.run, terminal.open, terminal.run
LLMllm.classify, llm.generate
Controlcontrol.if, control.retry, control.stop, workflow.call
Datadata.first
Gitgit.listPRs, git.getPR, git.createPR, git.listIssues, git.getIssue
Issuesissues.search, issues.get, issues.create, issues.transition, issues.comment
Chatchat.readChannel, chat.readThread, chat.listChannels

Sources: packages/core/README.md

Development Workflow

Creating a New Pack

# Initialize a new knowledge pack
sidebutton init my-app.com

# Scaffolded structure:
# my-app.com/
# ├── manifest.json
# ├── modules/
# │   ├── selectors/
# │   ├── data-models/
# │   └── state-machines/
# ├── roles/
# │   └── software-engineer.md
# └── targets/
#     └── github.md

Validation

Before publishing, validate the pack structure:

sidebutton validate ./my-app.com

This command lints and checks:

  • Manifest completeness
  • Module structure validity
  • Selector syntax correctness
  • File integrity

Configuration Locations

PathPurpose
~/.sidebutton/packs/Installed Knowledge Pack directories
~/.sidebutton/registries.jsonRegistry configurations
~/.sidebutton/config.jsonMain SideButton configuration

Best Practices

  1. Selector Stability: Use semantic CSS selectors that won't change with visual updates
  2. Versioning: Follow semantic versioning for pack updates
  3. Error Handling: Include edge case documentation in Common Tasks
  4. Role Coverage: Provide at least one role playbook for each major user persona
  5. State Documentation: Clearly define all valid state transitions

Available Packs

The SideButton registry includes Knowledge Packs for popular platforms:

DomainCategoryModules
github.comDevelopmentSelectors, Data Models, SE Role
atlassian.netDevelopmentSelectors, Data Models
*(10 more domains)*VariousVarious

Sources: README.md

See Also

Sources: [README.md](https://github.com/sidebutton/sidebutton/blob/main/README.md)

Doramagic Pitfall Log

Source-linked risks stay visible on the manual page so the preview does not read like a recommendation.

medium Add control.foreach step type for iterating over lists

First-time setup may fail or require extra isolation and rollback planning.

medium README/documentation is current enough for a first validation pass.

The project should not be treated as fully validated until this signal is reviewed.

medium Maintainer activity is unknown

Users cannot judge support quality until recent activity, releases, and issue response are checked.

medium no_demo

The project may affect permissions, credentials, data exposure, or host boundaries.

Doramagic Pitfall Log

Doramagic extracted 10 source-linked risk signals. Review them before installing or handing real data to the project.

1. Installation risk: Add control.foreach step type for iterating over lists

  • Severity: medium
  • Finding: Installation risk is backed by a source signal: Add control.foreach step type for iterating over lists. Treat it as a review item until the current version is checked.
  • User impact: First-time setup may fail or require extra isolation and rollback planning.
  • Recommended check: Open the linked source, confirm whether it still applies to the current version, and keep the first run isolated.
  • Evidence: Source-linked evidence: https://github.com/sidebutton/sidebutton/issues/1

2. Capability assumption: README/documentation is current enough for a first validation pass.

  • Severity: medium
  • Finding: README/documentation is current enough for a first validation pass.
  • User impact: The project should not be treated as fully validated until this signal is reviewed.
  • Recommended check: Open the linked source, confirm whether it still applies to the current version, and keep the first run isolated.
  • Evidence: capability.assumptions | github_repo:1124378210 | https://github.com/sidebutton/sidebutton | README/documentation is current enough for a first validation pass.

3. Maintenance risk: Maintainer activity is unknown

  • Severity: medium
  • Finding: Maintenance risk is backed by a source signal: Maintainer activity is unknown. Treat it as a review item until the current version is checked.
  • User impact: Users cannot judge support quality until recent activity, releases, and issue response are checked.
  • Recommended check: Open the linked source, confirm whether it still applies to the current version, and keep the first run isolated.
  • Evidence: evidence.maintainer_signals | github_repo:1124378210 | https://github.com/sidebutton/sidebutton | last_activity_observed missing

4. Security or permission risk: no_demo

  • Severity: medium
  • Finding: no_demo
  • User impact: The project may affect permissions, credentials, data exposure, or host boundaries.
  • Recommended check: Open the linked source, confirm whether it still applies to the current version, and keep the first run isolated.
  • Evidence: downstream_validation.risk_items | github_repo:1124378210 | https://github.com/sidebutton/sidebutton | no_demo; severity=medium

5. Security or permission risk: No sandbox install has been executed yet; downstream must verify before user use.

  • Severity: medium
  • Finding: No sandbox install has been executed yet; downstream must verify before user use.
  • User impact: The project may affect permissions, credentials, data exposure, or host boundaries.
  • Recommended check: Open the linked source, confirm whether it still applies to the current version, and keep the first run isolated.
  • Evidence: risks.safety_notes | github_repo:1124378210 | https://github.com/sidebutton/sidebutton | No sandbox install has been executed yet; downstream must verify before user use.

6. Security or permission risk: no_demo

  • Severity: medium
  • Finding: no_demo
  • User impact: The project may affect permissions, credentials, data exposure, or host boundaries.
  • Recommended check: Open the linked source, confirm whether it still applies to the current version, and keep the first run isolated.
  • Evidence: risks.scoring_risks | github_repo:1124378210 | https://github.com/sidebutton/sidebutton | no_demo; severity=medium

7. Security or permission risk: Native <select> elements cannot be programmatically selected via click/type tools

  • Severity: medium
  • Finding: Security or permission risk is backed by a source signal: Native <select> elements cannot be programmatically selected via click/type tools. Treat it as a review item until the current version is checked.
  • User impact: The project may affect permissions, credentials, data exposure, or host boundaries.
  • Recommended check: Open the linked source, confirm whether it still applies to the current version, and keep the first run isolated.
  • Evidence: Source-linked evidence: https://github.com/sidebutton/sidebutton/issues/12

8. Security or permission risk: v1.1.0

  • Severity: medium
  • Finding: Security or permission risk is backed by a source signal: v1.1.0. Treat it as a review item until the current version is checked.
  • User impact: The project may affect permissions, credentials, data exposure, or host boundaries.
  • Recommended check: Open the linked source, confirm whether it still applies to the current version, and keep the first run isolated.
  • Evidence: Source-linked evidence: https://github.com/sidebutton/sidebutton/releases/tag/v1.1.0

9. Maintenance risk: issue_or_pr_quality=unknown

  • Severity: low
  • Finding: issue_or_pr_quality=unknown。
  • User impact: Users cannot judge support quality until recent activity, releases, and issue response are checked.
  • Recommended check: Open the linked source, confirm whether it still applies to the current version, and keep the first run isolated.
  • Evidence: evidence.maintainer_signals | github_repo:1124378210 | https://github.com/sidebutton/sidebutton | issue_or_pr_quality=unknown

10. Maintenance risk: release_recency=unknown

  • Severity: low
  • Finding: release_recency=unknown。
  • User impact: Users cannot judge support quality until recent activity, releases, and issue response are checked.
  • Recommended check: Open the linked source, confirm whether it still applies to the current version, and keep the first run isolated.
  • Evidence: evidence.maintainer_signals | github_repo:1124378210 | https://github.com/sidebutton/sidebutton | release_recency=unknown

Source: Doramagic discovery, validation, and Project Pack records

Community Discussion Evidence

These external discussion links are review inputs, not standalone proof that the project is production-ready.

Sources 4

Count of project-level external discussion links exposed on this manual page.

Use Review before install

Open the linked issues or discussions before treating the pack as ready for your environment.

Community Discussion Evidence

Doramagic exposes project-level community discussion separately from official documentation. Review these links before using sidebutton with real data or production workflows.

Source: Project Pack community evidence and pitfall evidence