Doramagic Project Pack · Human Manual

skyvern

Skyvern is an open-source browser automation platform that enables AI agents to interact with websites by understanding natural language instructions. The platform combines large language ...

Introduction to Skyvern

Related topics: System Architecture, Browser Automation Engine

Section Related Pages

Continue reading this section for the full explanation and source context.

Section Multi-LLM Support

Continue reading this section for the full explanation and source context.

Section Workflow Automation

Continue reading this section for the full explanation and source context.

Section Model Context Protocol (MCP) Integration

Continue reading this section for the full explanation and source context.

Related topics: System Architecture, Browser Automation Engine

Introduction to Skyvern

Overview

Skyvern is an open-source browser automation platform that enables AI agents to interact with websites by understanding natural language instructions. The platform combines large language model (LLM) powered reasoning with browser automation capabilities, allowing developers to create workflows that can navigate websites, fill out forms, extract data, download files, and perform complex multi-step web tasks autonomously.

Skyvern operates by interpreting user prompts and executing browser actions through a CDP (Chrome DevTools Protocol) connection, providing AI applications with the ability to interact with the web just like a human user would Sources: README.md:1-50

Key Features

Multi-LLM Support

Skyvern supports integration with multiple LLM providers, enabling flexible deployment options:

ProviderSupported Models
OpenAIGPT-5.5, GPT-5.4, GPT-5, GPT-4.1, o3, o4-mini
AnthropicClaude 4.7 Opus, Claude 4.6 (Sonnet, Opus), Claude 4.5 (Haiku, Sonnet, Opus)
Azure OpenAIAny GPT models deployed to Azure subscription
AWS BedrockClaude 4.7, Claude 4.6 (Sonnet, Opus), Claude 4.5 (Sonnet, Opus)
GeminiGemini 3.1 Pro, Gemini 3 Flash

Sources: README.md:65-72

Workflow Automation

Skyvern enables the creation of automated workflows that can:

  • Navigate to websites and interact with web elements
  • Fill out forms and submit data
  • Extract structured information from web pages
  • Handle authentication and credential management
  • Download files and manage browser sessions
  • Handle multi-factor authentication (2FA/TOTP)
  • Schedule and execute tasks on a recurring basis

Sources: skyvern-frontend/src/routes/tasks/create/CreateNewTaskForm.tsx:1-30

Model Context Protocol (MCP) Integration

Skyvern provides MCP server implementation for seamless integration with AI applications. This allows AI applications to connect to Skyvern and utilize its browser automation capabilities through a standardized protocol Sources: integrations/mcp/README.md:1-25

Architecture Overview

System Components

graph TD
    A[AI Application] -->|MCP Protocol| B[Skyvern MCP Server]
    B --> C[Skyvern API]
    C --> D[Task Executor]
    D --> E[Browser Automation Engine]
    E --> F[CDP Browser Instance]
    
    G[LLM Provider] -->|Reasoning| D
    H[Credential Vault] -->|Auth| D
    I[Schedule Manager] -->|Trigger| C

Browser Connection Options

Skyvern supports multiple browser connection modes:

  1. Local CDP Browser - Connect to a locally running Chrome instance
  2. Skyvern Cloud Browser - Use managed browser infrastructure
  3. Browser Tunneling - Expose local browser to Skyvern Cloud via tunnel

Sources: README.md:85-120

Getting Started

Installation and Setup

Requirements: Python 3.11+ environment Sources: integrations/mcp/README.md:15

# Install Skyvern
pip install skyvern

# Initialize configuration
skyvern init

# Run the server (local mode only)
skyvern run server

Quickstart for Contributors

# Install dependencies using uv
uv sync --group dev

# Run setup wizard
uv run skyvern quickstart

# Access UI at http://localhost:8080

Sources: README.md:45-60

SDK Usage

Python SDK

from skyvern import Skyvern

skyvern = Skyvern(api_key="your-api-key")
skyvern.set_browser_context(
    browser_type="cdp-connect",
    remote_debugging_url="http://127.0.0.1:9222"
)
task = await skyvern.run_task(
    prompt="Find the top post on hackernews today"
)

MCP Tools

Skyvern provides comprehensive MCP tools for browser automation:

CategoryTools
Navigationskyvern_navigate, skyvern_click, skyvern_select_option, skyvern_press_key, skyvern_drag
Data Extractionskyvern_extract, skyvern_screenshot, skyvern_find, skyvern_validate, skyvern_get_html
Authenticationskyvern_login, skyvern_credential_list, skyvern_credential_get
Tabs & Framesskyvern_tab_new, skyvern_tab_list, skyvern_tab_switch, skyvern_frame_list
Networkskyvern_console_messages, skyvern_network_requests, skyvern_network_route

Sources: skyvern/cli/mcp_tools/README.md:1-50

Workflows

Skyvern supports workflow-based automation where complex tasks can be defined as a series of steps with conditional logic, evaluations, and human interaction checkpoints.

graph LR
    A[Start] --> B[Block 1: Action]
    B --> C[Block 2: Condition]
    C -->|True| D[Block 3: Evaluation]
    C -->|False| E[Block 4: Fallback]
    D --> F[Human Interaction]
    F --> G[Continue to Next]
    E --> G

Workflow Block Types

Block TypePurpose
ActionExecute browser actions (click, type, navigate)
ConditionBranch logic based on page state
EvaluationRun JavaScript to validate or extract data
Human InteractionPause workflow for manual input

Sources: skyvern-frontend/src/routes/workflows/workflowRun/WorkflowRunTimelineBlockItem.tsx:1-60

Authentication and Credentials

Credential Services

Skyvern supports multiple credential backends:

  • Skyvern Vault (built-in)
  • Bitwarden
  • 1Password
  • Azure Key Vault
  • Custom credential services via API configuration

Sources: skyvern-frontend/src/components/CustomCredentialServiceConfigForm.tsx:1-40

2FA/TOTP Handling

Skyvern provides automated TOTP code extraction and attachment to runs:

<PushTotpCodeForm
  showAdvancedFields
  onSuccess={handleFormSuccess}
/>

The system extracts verification codes from push notifications and attaches them to relevant workflow runs automatically.

Sources: skyvern-frontend/src/routes/credentials/CredentialsTotpTab.tsx:1-30

Task Creation

Navigation Goal

Tasks are defined using natural language prompts that describe what Skyvern should do:

prompt="Find the top post on hackernews today"

Advanced Settings

ParameterDescription
Navigation PayloadJSON parameters for routes/states
Proxy LocationRoute through geographic proxies
Browser Session IDUse persistent browser sessions
Browser AddressCDP server address

Sources: skyvern-frontend/src/routes/tasks/create/PromptBox.tsx:1-50

Scheduling

Tasks and workflows can be scheduled using cron expressions with timezone support:

schedule = await skyvern.create_schedule(
    workflow_id="workflow_xxx",
    cron_expression="0 9 * * *",  # Daily at 9 AM
    timezone="America/New_York"
)

Sources: skyvern-frontend/src/routes/workflows/editor/panels/schedulePanel/CreateScheduleDialog.tsx:1-60

Cloud Integration

Browser Tunneling

Connect Skyvern Cloud to your local browser with existing cookies and extensions:

# Start Chrome with tunnel to Skyvern Cloud
skyvern browser serve --tunnel

This command creates a tunnel URL that can be used to run tasks with your local browser state Sources: README.md:115-135

Claude Desktop Integration

Skyvern provides downloadable .mcpb bundles for quick Claude Desktop setup:

./scripts/package-mcpb.sh 1.0.23

Sources: skyvern/cli/mcpb/claude_desktop/README.md:1-25

Telemetry

By default, Skyvern collects basic usage statistics to understand how the platform is being used. To opt-out:

export SKYVERN_TELEMETRY=false

Sources: README.md:35-38

License

Skyvern's open-source repository is licensed under AGPL-3.0. The core automation logic is available in this repository, with anti-bot measures available in the managed cloud offering Sources: README.md:40-43

Documentation and Support

For more detailed information on specific features:

Sources: [README.md:65-72]()

System Architecture

Related topics: Introduction to Skyvern, Browser Automation Engine, Workflow System

Section Related Pages

Continue reading this section for the full explanation and source context.

Section Component Structure

Continue reading this section for the full explanation and source context.

Section Key Frontend Components

Continue reading this section for the full explanation and source context.

Section Forge Application

Continue reading this section for the full explanation and source context.

Related topics: Introduction to Skyvern, Browser Automation Engine, Workflow System

System Architecture

Overview

Skyvern is an AI-powered web automation framework that enables programmatic browser control through natural language instructions. The system architecture consists of three primary layers: a React-based frontend interface, a Python backend API (Forge), and a browser automation engine. This document provides a comprehensive technical overview of the system's components, data flows, and integration patterns.

High-Level Architecture

graph TD
    subgraph Frontend["Frontend Layer (React/TypeScript)"]
        UI[User Interface]
        Forms[Task & Workflow Forms]
        Stream[Browser Stream Viewer]
    end
    
    subgraph Backend["Backend Layer (Python/Forge)"]
        API[Forge API]
        Agent[AI Agent Engine]
        Workflow[Workflow Engine]
        Scheduler[Scheduler Service]
    end
    
    subgraph Browser["Browser Automation Layer"]
        BrowserMgr[Browser Manager]
        CDP[Chrome DevTools Protocol]
        BrowserInst[Browser Instances]
    end
    
    subgraph Storage["Storage & External Services"]
        S3[S3 Storage]
        DB[(Database)]
        LLM[LLM Providers]
    end
    
    UI --> Forms
    Forms --> API
    UI --> Stream
    Stream --> BrowserMgr
    API --> Agent
    API --> Workflow
    API --> Scheduler
    Agent --> BrowserMgr
    Agent --> LLM
    Workflow --> S3
    Scheduler --> DB
    BrowserMgr --> CDP
    CDP --> BrowserInst

Frontend Architecture

The frontend is a React-based Single Page Application (SPA) located in the skyvern-frontend/ directory. It provides user interfaces for task creation, workflow management, credentials handling, and real-time browser streaming.

Component Structure

Component CategoryLocationPurpose
Task Formssrc/routes/tasks/create/Task creation and management forms
Workflow Editorsrc/routes/workflows/editor/Visual workflow building interface
Credentialssrc/routes/credentials/Credential and TOTP management
Schedulessrc/routes/schedules/Schedule viewing and configuration
Shared Componentssrc/components/Reusable UI components

Key Frontend Components

#### BrowserStream Component

The BrowserStream component handles real-time browser visualization. It displays animated loading states while establishing connections and renders rotating messages to indicate progress.

// skyvern-frontend/src/components/BrowserStream.tsx
<RotateThrough interval={7 * 1000}>
  <span>Hm, working on the connection...</span>
  <span>Hang tight, we're almost there...</span>
  <span>Just a moment...</span>
  <span>Backpropagating...</span>
  <span>Attention is all I need...</span>
  <span>Consulting the manual...</span>
</RotateThrough>

Sources: skyvern-frontend/src/components/BrowserStream.tsx

#### Task Forms

Task creation is handled through two primary form components:

  1. CreateNewTaskForm: Used for creating new tasks with navigation goals
  2. SavedTaskForm: Used for creating tasks from saved templates

Both forms support advanced settings including navigation payloads for specifying parameters, routes, or states:

// Navigation Payload field in SavedTaskForm
<FormField
  control={form.control}
  name="navigationPayload"
  render={({ field }) => (
    <FormItem>
      <FormLabel>
        <h1 className="text-lg">Navigation Payload</h1>
        <h2 className="text-base text-slate-400">
          Specify important parameters, routes, or states
        </h2>
      </FormLabel>
      <CodeEditor {...field} language="json" />
    </FormItem>
  )}
/>

Sources: skyvern-frontend/src/routes/tasks/create/SavedTaskForm.tsx

#### Workflow Editor Workspace

The workflow editor workspace provides local execution capabilities with a dialog-based interface for running code locally:

// skyvern-frontend/src/routes/workflows/editor/Workspace.tsx
function bash(command: string, code?: string) {
  return <code className="rounded bg-slate-800 px-1.5 py-0.5">{command}</code>;
}

// Installation and setup instructions
// 1. Install skyvern: pip install skyvern
// 2. Set up skyvern: skyvern quickstart
// 3. Run the code: skyvern run code --params '{...}' main.py

Sources: skyvern-frontend/src/routes/workflows/editor/Workspace.tsx

Backend Architecture (Forge)

The Forge backend is the core Python application that handles task execution, workflow orchestration, and browser automation. Key modules include:

Forge Application

The main application entry point in skyvern/forge/forge_app.py initializes the FastAPI application, configures middleware, and registers routes.

AI Agent Engine

The agent system in skyvern/forge/agent.py processes natural language instructions and generates executable browser actions. The agent:

  1. Receives task definitions and navigation goals
  2. Interacts with LLM providers for decision-making
  3. Generates action sequences for browser automation
  4. Handles error recovery and retry logic

Workflow Service

Workflow definitions are managed through the SDK service in skyvern/forge/sdk/workflow/service.py. This module provides:

  • Workflow creation and versioning
  • Script management with cache keys
  • Execution history tracking

Browser Automation Layer

Browser Manager

The browser manager (skyvern/webeye/browser_manager.py) orchestrates browser instances using Chrome DevTools Protocol (CDP). It provides:

  • Browser pool management
  • Session persistence
  • Screenshot and recording capabilities
  • Multi-tab support

Browser Configuration Options

The frontend exposes several browser configuration parameters:

ParameterTypePurpose
proxyLocationstringProxy server routing
browserSessionIdstringPersistent session identifier (format: pbs_xxx)
cdpAddressstringRemote CDP endpoint (e.g., http://127.0.0.1:9222)

Sources: skyvern-frontend/src/routes/tasks/create/PromptBox.tsx

Data Storage and External Services

AWS Integration

Skyvern uses AWS services for storage and cloud operations. The S3Uri class provides URI parsing for S3 resources:

# skyvern/forge/sdk/api/aws.py
class S3Uri:
    """Parse and manipulate S3 URIs."""
    
    def __init__(self, uri: str) -> None:
        self._parsed = urlparse(uri, allow_fragments=False)
    
    @property
    def bucket(self) -> str:
        return self._parsed.netloc
    
    @property
    def key(self) -> str:
        if self._parsed.query:
            return self._parsed.path.lstrip("/") + "?" + self._parsed.query
        return self._parsed.path.lstrip("/")

Sources: skyvern/forge/sdk/api/aws.py

Workflow Scripts Storage

Scripts are stored with metadata including cache keys and revision counts:

FieldDescription
Cache Key ValueUnique identifier for the script
Total RevisionsNumber of versions
RunsExecution count
Last UpdatedMost recent modification timestamp

Sources: skyvern-frontend/src/routes/workflows/WorkflowScriptsPage.tsx

Task Execution Model

Task Creation Flow

sequenceDiagram
    participant User
    participant Frontend
    participant Forge API
    participant Agent
    participant Browser
    
    User->>Frontend: Enter navigation goal
    User->>Frontend: Configure advanced settings
    User->>Frontend: Submit task
    Frontend->>Forge API: POST /v1/tasks
    Forge API->>Agent: Create task instance
    Agent->>Browser: Initialize browser session
    Browser-->>Agent: Session established
    Agent-->>Forge API: Task created
    Forge API-->>Frontend: Task response
    Frontend-->>User: Display task status

Task States

StateDescription
Navigation GoalPrimary instruction for the agent
Navigation PayloadAdditional parameters, routes, states
Proxy LocationOptional proxy routing
Browser Session IDPersistent session reference

Workflow System Architecture

Workflow Components

ComponentPurpose
Workflow ScriptsCached code blocks with versioning
SchedulesCron-based execution triggers
Workflow RunsIndividual execution instances
Workflow HistoryVersion tracking and modification history

Schedule Configuration

Schedules support timezone-aware cron expressions:

// Schedule display components
<div className="space-y-2">
  <span className="text-sm text-slate-400">Timezone</span>
  <span className="text-sm text-slate-50">{schedule.timezone}</span>
</div>
<div className="space-y-2">
  <span className="text-sm text-slate-400">Cron</span>
  <code className="font-mono text-xs">{schedule.cron_expression}</code>
</div>

Sources: skyvern-frontend/src/routes/schedules/ScheduleDetailPage.tsx

Script Versioning

Each workflow script maintains a revision history:

// Revision count calculation
{versions?.versions
  ? versions.versions.filter(
      (v) => v.version < (activeVersion ?? 0),
    ).length
  : 0}
<span className="text-sm font-normal">prior</span>

Sources: skyvern-frontend/src/routes/workflows/WorkflowScriptDetailPage.tsx

Credentials and Authentication

TOTP/2FA Management

Skyvern supports 2FA code management for authenticated workflows:

ComponentDescription
PushTotpCodeFormForm for submitting verification codes
Identifier FilterFilter by email or phone
OTP Type FilterFilter by type (TOTP/Magic Link)

Sources: skyvern-frontend/src/routes/credentials/CredentialsTotpTab.tsx

LLM Provider Integration

Skyvern supports multiple LLM providers through a unified interface:

ProviderSupported Models
OpenAIGPT-5.5, GPT-5.4, GPT-5, GPT-4.1, o3, o4-mini
AnthropicClaude 4.7 Opus, Claude 4.6, Claude 4.5
Azure OpenAIAny deployed GPT models
AWS BedrockClaude 4.7, Claude 4.6, Claude 4.5
GeminiGemini 3.1 Pro, Gemini 3 Flash

Sources: README.md

Development and Deployment

Local Development Setup

# 1. Create virtual environment
uv sync --group dev

# 2. Initialize configuration
uv run skyvern quickstart

# 3. Access UI
# Navigate to http://localhost:8080

Sources: README.md

Running Workflows Locally

The workspace editor provides local execution capabilities:

# 1. Install skyvern
pip install skyvern

# 2. Set up skyvern
skyvern quickstart

# 3. Run workflow code
skyvern run code --params '{"param1": "val1"}' main.py

System Data Flow

graph LR
    subgraph Input["User Input"]
        Prompt[Natural Language Prompt]
        Payload[Navigation Payload]
        Config[Configuration]
    end
    
    subgraph Processing["Forge Processing"]
        Parse[Parse & Validate]
        Agent[Agent Reasoning]
        Plan[Action Planning]
    end
    
    subgraph Execution["Browser Execution"]
        Navigate[Navigate]
        Interact[Interact]
        Extract[Extract Data]
    end
    
    subgraph Output["Results"]
        Screenshots[Screenshots]
        Data[Extracted Data]
        Logs[Execution Logs]
    end
    
    Input --> Parse
    Parse --> Agent
    Agent --> Plan
    Plan --> Execute
    Execute --> Output
    
    style Input fill:#e1f5fe
    style Processing fill:#fff3e0
    style Execution fill:#e8f5e9
    style Output fill:#f3e5f5

Summary

The Skyvern system architecture follows a modular design with clear separation of concerns:

  1. Frontend Layer: React SPA providing task creation, workflow editing, and real-time visualization
  2. Backend Layer: Python FastAPI application handling agent orchestration, workflow management, and scheduling
  3. Browser Layer: Chrome DevTools Protocol-based automation engine for web interaction
  4. Storage Layer: S3 for large objects, database for structured data, and LLM providers for reasoning

The system supports multiple LLM providers, enables persistent browser sessions, and provides comprehensive workflow versioning and scheduling capabilities.

Sources: [skyvern-frontend/src/components/BrowserStream.tsx]()

Browser Automation Engine

Related topics: Introduction to Skyvern, AI-Powered Commands

Section Related Pages

Continue reading this section for the full explanation and source context.

Section System Components

Continue reading this section for the full explanation and source context.

Section Module Structure

Continue reading this section for the full explanation and source context.

Section Session Lifecycle

Continue reading this section for the full explanation and source context.

Related topics: Introduction to Skyvern, AI-Powered Commands

Browser Automation Engine

Overview

The Browser Automation Engine is the core component of Skyvern that enables AI agents to interact with websites through browser control. Instead of relying on fragile XPath-based selectors that break with website layout changes, Skyvern leverages Vision LLMs combined with Playwright and Chrome DevTools Protocol (CDP) to visually understand and interact with web pages.

The engine provides a unified interface for:

  • Launching and managing browser sessions
  • Navigating to URLs with configurable behavior
  • Executing actions (click, type, scroll, hover, etc.)
  • Capturing screenshots for LLM analysis
  • Extracting structured data from web pages
  • Handling multi-step workflows across websites

Sources: README.md:60-80

Architecture

System Components

graph TD
    A[Agent / Task Request] --> B[Browser Manager]
    B --> C[Real Browser Manager]
    C --> D[Playwright Browser]
    C --> E[CDP Connection]
    D --> F[Browser State]
    F --> G[Screenshot Capture]
    F --> H[DOM Extraction]
    E --> I[DevTools Protocol]
    G --> J[Vision LLM Analysis]
    J --> K[Action Handler]
    K --> C

Module Structure

ModulePurpose
webeye/__init__.pyPublic API exports and core abstractions
browser_manager.pyAbstract browser manager interface
real_browser_manager.pyConcrete Playwright-based implementation
browser_state.pyPage state representation and snapshot
actions/handler.pyAction execution and coordination
cdp_connection.pyChrome DevTools Protocol communication

Sources: skyvern/forge/sdk/routes/agent_protocol.py:30-50

Browser Session Management

Session Lifecycle

stateDiagram-v2
    [*] --> Created: browser_session_id
    Created --> Launching: launch()
    Launching --> Ready: browser ready
    Ready --> Navigating: navigate(url)
    Navigating --> Ready: page loaded
    Ready --> Executing: perform_action()
    Executing --> Ready: action complete
    Ready --> Closed: close()
    Closed --> [*]

Persistent Browser Sessions

Skyvern supports persistent browser sessions that maintain cookies, local storage, and login states across task executions:

# Create a persistent browser session
browser_session_id = "pbs_xxxxxxxxxxxx"

# Reuse session for subsequent tasks
task = await skyvern.run_task(
    prompt="Download invoice from my account",
    browser_session_id=browser_session_id,
)

Sources: skyvern-frontend/src/routes/tasks/create/PromptBox.tsx:40-60

Session Configuration Parameters

ParameterTypeDescription
browser_session_idstringID of a persistent browser session
cdp_addressstringBrowser DevTools address (e.g., http://127.0.0.1:9222)
proxy_locationstringGeographic proxy for requests
extra_http_headersdictCustom HTTP headers for requests
totp_identifierstring2FA identifier for authenticated flows

Sources: skyvern/forge/sdk/routes/agent_protocol.py:40-55

Chrome DevTools Protocol Integration

CDP Connection

The CDP connection module provides low-level access to Chrome's debugging interface:

# CDP connection configuration
cdp_address = "http://127.0.0.1:9222"

Skyvern can connect to:

  1. Local Chrome - Chrome with remote debugging enabled
  2. Existing Browser - Your Chrome with cookies and extensions
  3. Cloud Browser - Skyvern-hosted browser via tunnel

Sources: skyvern-frontend/src/routes/tasks/create/PromptBox.tsx:65-80

Remote Debugging Setup

# Step 1: Open Chrome with remote debugging
chrome --remote-debugging-port=9222

# Or use Skyvern's CLI helper
skyvern init browser

The browser exposes WebSocket endpoint at http://127.0.0.1:9222 for CDP commands.

Sources: README.md:45-65

Browser State Representation

State Components

graph LR
    A[Browser State] --> B[Current URL]
    A --> C[Screenshot]
    A --> D[DOM Tree]
    A --> E[Cookies]
    A --> F[Local Storage]
    A --> G[Viewport Info]

Browser State Object

PropertyDescription
urlCurrent page URL
titlePage title
screenshotBase64-encoded screenshot
dom_treeParsed DOM structure
viewportViewport dimensions
elementsInteractive element mapping

Sources: skyvern/webeye/browser_state.py

Action Handler

Supported Actions

The action handler executes LLM-decided actions on the browser:

ActionParametersDescription
clickelement_selectorClick on specified element
typetext, element_selectorEnter text into input field
hoverelement_selectorMouse hover over element
scrolldirection, amountScroll page view
selectvalue, element_selectorSelect dropdown option
press_keykeyPress keyboard key
waitdurationWait for page to settle
navigateurlGo to URL
screenshot-Capture current view
extractschemaExtract data per schema

Sources: skyvern/webeye/actions/handler.py

Action Execution Flow

sequenceDiagram
    participant LLM as Vision LLM
    participant AH as Action Handler
    participant BM as Browser Manager
    participant Browser as Playwright/CDP
    
    LLM->>AH: Decide action from screenshot
    AH->>BM: Execute action request
    BM->>Browser: CDP/Playwright command
    Browser-->>BM: Action result
    BM-->>AH: Updated browser state
    AH-->>LLM: State + screenshot for next decision

Browser Configuration Options

Launch Configuration

OptionDefaultDescription
headlesstrueRun browser without visible window
viewport_width1280Browser viewport width
viewport_height720Browser viewport height
user_agentautoUser agent string
ignore_https_errorsfalseAllow invalid certs

Navigation Options

OptionTypeDescription
urlstringTarget URL
navigation_payloadobjectParameters, routes, or initial states
follow_redirectsbooleanAuto-follow HTTP redirects
timeoutintNavigation timeout in ms

Sources: skyvern-frontend/src/routes/tasks/detail/TaskParameters.tsx:20-40

Integration with Agent System

Agent Protocol Integration

The browser automation engine integrates with Skyvern's agent protocol:

run_request=TaskRunRequest(
    engine=RunEngine.skyvern_v2,
    prompt=task_v2.prompt,
    url=task_v2.url,
    browser_session_id=run_request.browser_session_id,
    totp_identifier=task_v2.totp_identifier,
    proxy_location=task_v2.proxy_location,
    max_steps=run_request.max_steps,
)

Workflow Block Execution

graph TD
    A[Workflow Run] --> B[Initialize Browser]
    B --> C[Go To URL Block]
    C --> D[Browser Navigation]
    D --> E[Action Block]
    E --> F[Extract/Process]
    F --> G{More Blocks?}
    G -->|Yes| E
    G -->|No| H[Close Browser]
    H --> I[Return Results]

Sources: skyvern-frontend/src/routes/workflows/workflowRun/TaskBlockParameters.tsx:10-50

Advanced Features

Custom Browser Connection

Connect Skyvern Cloud to a local browser running on your machine:

# Start Chrome with tunnel to Skyvern Cloud
skyvern browser serve --tunnel

This enables:

  • Use existing cookies and logins
  • Bypass VPN restrictions
  • Full browser control via Skyvern API

Sources: README.md:80-100

Proxy Support

Route browser traffic through geographic proxies:

skyvern.run_task(
    prompt="Search for local restaurants",
    proxy_location="us-east-1",  # or "eu-west-1", "ap-south-1"
)

Available proxy locations provide access to region-specific content.

Sources: skyvern-frontend/src/routes/tasks/create/PromptBox.tsx:25-35

Error Handling

Browser-Specific Errors

Error TypeCauseRecovery
Navigation timeoutPage fails to loadRetry with extended timeout
Element not foundDynamic content issuesRe-screenshot and retry
Browser crashMemory/extension issuesRestart browser session
CDP connection lostNetwork disruptionReconnect and resume

Error Code Mapping

Custom error codes can be mapped for workflow-specific handling:

task = await skyvern.run_task(
    prompt="Process order",
    error_code_mapping={
        "ERR_LOGIN_FAILED": "retry_with_2fa",
        "ERR_PAYMENT_DECLINED": "notify_user",
    },
)

Sources: skyvern-frontend/src/routes/workflows/workflowRun/TaskBlockParameters.tsx:45-65

Security Considerations

Browser Tunneling Security

[!WARNING]
Always use --api-key when exposing your browser via tunnel. Without it, anyone with the URL has full control of your browser.

Best practices:

  • Never expose browser tunnels publicly
  • Use authenticated connections only
  • Rotate tunnel URLs frequently
  • Limit browser session access

Sources: README.md:95-105

Secure Credential Management

TOTP/2FA codes are handled through secure credential storage:

task = await skyvern.run_task(
    prompt="Login to bank account",
    totp_identifier="[email protected]",
)

The system extracts codes from push notifications or SMS and attaches them to relevant workflow steps.

Sources: skyvern-frontend/src/routes/credentials/CredentialsTotpTab.tsx:10-30

Summary

The Browser Automation Engine provides Skyvern's core capability to automate web interactions using Vision LLMs. Key aspects:

  • Unified abstraction over Playwright and CDP protocols
  • Persistent sessions for maintaining login states
  • Visual understanding via screenshot-based LLM analysis
  • Flexible configuration for proxy, headers, and browser options
  • Integrated with workflows for complex multi-step automation

This architecture enables Skyvern to operate on websites it has never seen before, adapt to layout changes automatically, and apply the same workflow across many different sites.

Sources: [README.md:60-80]()

Workflow System

Related topics: System Architecture, Database Models

Section Related Pages

Continue reading this section for the full explanation and source context.

Section WorkflowDefinition

Continue reading this section for the full explanation and source context.

Section WorkflowParameter

Continue reading this section for the full explanation and source context.

Section Supported Block Types

Continue reading this section for the full explanation and source context.

Related topics: System Architecture, Database Models

Workflow System

Overview

The Skyvern Workflow System is a core automation framework that enables chaining multiple tasks together to form cohesive units of work. It allows users to create complex multi-step automations by composing reusable building blocks called "workflow blocks."

Architecture

graph TD
    subgraph "Frontend Layer"
        WE[Workflow Editor]
        RR[Run Workflow Form]
        DP[Debugger Panel]
    end
    
    subgraph "API Layer"
        AP[Agent Protocol Routes]
        WS[Webhook Endpoint]
    end
    
    subgraph "Service Layer"
        WFS[Workflow Service]
        BS[Block Service]
    end
    
    subgraph "Core SDK"
        WMS[Workflow Models]
        BMS[Block Models]
        WDC[Definition Converter]
        WSS[Workflow Service SDK]
    end
    
    WE --> AP
    RR --> AP
    AP --> WFS
    WFS --> BS
    WFS --> WMS
    BS --> BMS
    WDC --> WMS
    WDC --> BMS
    WSS --> WMS
    WSS --> BMS

Workflow Model

WorkflowDefinition

The WorkflowDefinition is the core model representing a workflow:

class WorkflowDefinition(BaseModel):
    title: str
    description: Optional[str] = None
    blocks: List[WorkflowBlockDefinition]
    parameters: List[WorkflowParameter] = []
FieldTypeDescription
titlestrHuman-readable workflow title
descriptionOptional[str]Optional description of workflow purpose
blocksList[WorkflowBlockDefinition]Ordered list of block definitions
parametersList[WorkflowParameter]Input parameters for workflow execution

Sources: skyvern/forge/sdk/workflow/models/workflow.py

WorkflowParameter

Workflows accept typed input parameters:

class WorkflowParameter(BaseModel):
    key: str
    workflow_parameter_type: WorkflowParameterType
    default_value: Optional[Any] = None
    description: Optional[str] = None
    required: bool = True
FieldTypeDescription
keystrParameter identifier
workflow_parameter_typeWorkflowParameterTypeType: string, integer, float, boolean, json
default_valueOptional[Any]Default value if not provided
descriptionOptional[str]Parameter description
requiredboolWhether parameter is mandatory

Sources: skyvern/forge/sdk/workflow/models/workflow.py

Block Types

Skyvern supports 23 block types for multi-step automations. Each block type serves a specific purpose in workflow execution.

graph TD
    A[Workflow Start] --> B{Block Type}
    B --> C[Browser Tasks]
    B --> D[Data Operations]
    B --> E[Control Flow]
    B --> F[External Integration]
    
    C --> C1[Task v2]
    C --> C2[Browser Action]
    C --> C3[Navigation]
    C --> C4[Login]
    
    D --> D1[Extraction]
    D --> D2[HTTP Request]
    D --> D3[File Download]
    
    E --> E1[Conditional]
    E --> E2[For Loop]
    E --> E3[Wait]
    
    F --> F1[Email]
    F --> F2[Text Prompt]
    F --> F3[Print Page]

Supported Block Types

Block TypePurposeKey Parameters
Taskv2Multi-step browser automationprompt, url, max_steps, totp_verification_url, disable_cache
URLNavigate to a URLurl, continue_on_failure
WaitPause executionduration
TextPromptLLM text generationprompt, llm_key, json_schema
HTTPRequestExternal API callsurl, method, headers, body
ExtractionData extraction from pageprompt, llm_key
ValidationValidate extracted dataprompt, error_codes
PrintPagePrint to PDFformat, landscape, print_background
HumanInteractionPause for human inputinstructions, positive_descriptor, negative_descriptor
ConditionalBranch logicexpression
ForLoopIterate over itemsitems, variable_name
FileDownloadDownload filesurl, follow_redirects, save_response_as_file
BrowserActionSingle browser actionaction_type, element_id
LoginHandle authenticationcredential_id, totp_identifier

Sources: skyvern/forge/sdk/workflow/models/block.py Sources: skyvern/cli/mcp_tools/README.md

Block Execution Model

WorkflowBlockExecution

Each block execution is tracked with its status:

class WorkflowBlockExecution(BaseModel):
    workflow_run_id: str
    block_id: str
    block_type: WorkflowBlockType
    status: WorkflowBlockStatus
    output: Optional[Any] = None
    failure_reason: Optional[str] = None
    executed_branch_expression: Optional[str] = None
    executed_branch_result: Optional[bool] = None
    executed_branch_next_block: Optional[str] = None
StatusDescription
createdBlock added to execution queue
queuedWaiting for execution
runningCurrently executing
completedSuccessfully finished
failedExecution failed
cancelledCancelled by user

Block Parameters by Type

#### Taskv2BlockParameters

class Taskv2BlockParameters(BaseModel):
    prompt: str
    url: Optional[str] = None
    max_steps: Optional[int] = None
    totp_verification_url: Optional[str] = None
    totp_identifier: Optional[str] = None
    disable_cache: bool = False
ParameterTypeDefaultDescription
promptstr-Navigation goal for the browser agent
urlOptional[str]NoneStarting URL for navigation
max_stepsOptional[int]NoneMaximum steps before stopping
totp_verification_urlOptional[str]NoneURL for 2FA verification
totp_identifierOptional[str]NoneIdentifier for TOTP credentials
disable_cacheboolFalseDisable action caching

Sources: skyvern/forge/sdk/workflow/models/block.py

#### GotoUrlBlockParameters

class GotoUrlBlockParameters(BaseModel):
    url: str
    continue_on_failure: bool = False
ParameterTypeDefaultDescription
urlstr-Target URL to navigate to
continue_on_failureboolFalseContinue workflow on navigation failure

#### WaitBlockParameters

class WaitBlockParameters(BaseModel):
    duration: int

#### PrintPageBlockParameters

class PrintPageBlockParameters(BaseModel):
    format: PrintFormat = PrintFormat.A4
    landscape: bool = False
    print_background: bool = False
    include_timestamp: bool = True
    custom_filename: Optional[str] = None
ParameterTypeDefaultDescription
formatPrintFormatA4Page format: A4, Letter, Legal
landscapeboolFalseUse landscape orientation
print_backgroundboolFalsePrint background colors
include_timestampboolTrueInclude timestamp in footer
custom_filenameOptional[str]NoneCustom output filename

#### HumanInteractionBlockParameters

class HumanInteractionBlockParameters(BaseModel):
    instructions: Optional[str] = None
    positive_descriptor: Optional[str] = None
    negative_descriptor: Optional[str] = None
ParameterTypeDescription
instructionsOptional[str]Instructions for the human
positive_descriptorOptional[str]Label for positive confirmation
negative_descriptorOptional[str]Label for negative/cancellation action

Workflow Execution Flow

sequenceDiagram
    participant Client
    participant API
    participant WorkflowService
    participant BlockService
    participant Executor

    Client->>API: POST /workflows/{id}/run
    API->>WorkflowService: create_workflow_run()
    WorkflowService->>WorkflowService: Validate parameters
    WorkflowService->>WorkflowService: Create WorkflowRun record
    WorkflowService-->>API: WorkflowRun
    
    loop For each block
        API->>BlockService: execute_block()
        BlockService->>Executor: Process block
        Executor-->>BlockService: Block result
        BlockService-->>API: WorkflowBlockExecution
    end
    
    API->>Client: Webhook callback (optional)

Workflow Service API

Core Operations

MethodDescriptionSource
create_workflowCreate new workflowskyvern/forge/sdk/workflow/service.py
get_workflowRetrieve workflow by IDskyvern/forge/sdk/workflow/service.py
update_workflowUpdate workflow definitionskyvern/forge/sdk/workflow/service.py
delete_workflowDelete workflowskyvern/forge/sdk/workflow/service.py
list_workflowsList all workflowsskyvern/forge/sdk/workflow/service.py
run_workflowExecute workflowskyvern/services/workflow_service.py
cancel_workflow_runCancel running workflowskyvern/services/workflow_service.py

Running Workflows

Workflows can be executed via:

  1. API: POST /workflows/{workflow_id}/run
  2. CLI: skyvern_workflow_run tool
  3. Schedule: Cron-based scheduled execution

Run Parameters

When running a workflow, the following parameters can be specified:

ParameterTypeDescription
parametersDict[str, Any]Workflow input parameters
webhook_callback_urlOptional[str]URL for result callback
proxy_locationOptional[ProxyLocation]Geographic proxy location
run_withRunWithagent or code execution mode
ai_fallbackboolFall back to AI if code generation fails

Sources: skyvern-frontend/src/routes/workflows/RunWorkflowForm.tsx

Webhook Integration

Workflows support webhook callbacks for asynchronous result delivery:

graph LR
    A[Workflow Run] --> B{Complete?}
    B -->|Yes| C[Send webhook]
    B -->|No| D[Retry queue]
    D --> B
    C --> E[Customer Endpoint]

The webhook payload includes:

{
    "workflow_run_id": str,
    "workflow_id": str,
    "status": WorkflowRunStatus,
    "output": Optional[Any],
    "failure_reason": Optional[str],
    "created_at": datetime,
    "modified_at": datetime,
    "blocks": List[WorkflowBlockExecution]
}

MCP Integration

Skyvern provides MCP (Model Context Protocol) tools for workflow management:

Available Tools

ToolDescription
skyvern_workflow_createCreate new workflow
skyvern_workflow_listList all workflows
skyvern_workflow_getGet workflow details
skyvern_workflow_runExecute workflow
skyvern_workflow_statusCheck run status
skyvern_workflow_updateUpdate workflow
skyvern_workflow_deleteDelete workflow
skyvern_workflow_cancelCancel running workflow
skyvern_block_schemaGet block type schema
skyvern_block_validateValidate block definition

Sources: skyvern/cli/mcp_tools/README.md

Frontend Components

Workflow Editor

Located at /workflows/{workflow_id}/build, the editor provides:

  • Visual block composition
  • Block parameter configuration
  • Workflow validation
  • Preview mode

Run Workflow Form

Located at /workflows/{workflow_id}/run, supports:

  • Parameter input with type validation
  • Run method selection (agent or code)
  • Webhook URL configuration
  • Proxy location selection

Debugger Panel

Located at /workflows/{workflow_id}/debug, provides:

  • Real-time execution status
  • Block-by-block output inspection
  • Extracted information viewer
  • Failure reason analysis

Workflow Run Timeline

Displays execution history with:

  • Block status indicators
  • Execution timestamps
  • Extracted data per block
  • Navigation to diagnostics

Data Flow

graph TD
    subgraph "Definition Layer"
        WD[Workflow Definition]
        BD[Block Definitions]
        WP[Workflow Parameters]
    end
    
    subgraph "Execution Layer"
        WR[Workflow Run]
        BR[Block Executions]
        ST[State Management]
    end
    
    subgraph "Output Layer"
        OT[Output Data]
        ER[Error Reports]
        WH[Webhook Events]
    end
    
    WD --> WR
    BD --> BR
    WP --> WR
    BR --> ST
    ST --> OT
    BR -->|on failure| ER
    WR --> WH

Key Features

Conditional Execution

The Conditional block evaluates expressions and branches workflow execution:

class ConditionalBlockParameters(BaseModel):
    expression: str  # e.g., "data.status == 'approved'"

After evaluation, the system records:

  • executed_branch_expression: The evaluated expression
  • executed_branch_result: Boolean result
  • executed_branch_next_block: Next block ID based on result

For Loop Iteration

The ForLoop block iterates over collections:

class ForLoopBlockParameters(BaseModel):
    items: List[Any]
    variable_name: str  # Variable to expose in loop context

Error Handling

Blocks support continue_on_failure flag for graceful degradation:

class GotoUrlBlockParameters:
    url: str
    continue_on_failure: bool = False

When enabled, workflow continues to next block on failure.

TOTP/2FA Support

Browser tasks can handle two-factor authentication:

class Taskv2BlockParameters:
    totp_verification_url: Optional[str]
    totp_identifier: Optional[str]

Users can push verification codes via the frontend or API.

Security Considerations

Webhook Signature Validation

Webhook endpoints must validate signatures:

async def webhook(request: Request) -> Response:
    signature = request.headers.get("x-skyvern-signature")
    timestamp = request.headers.get("x-skyvern-timestamp")
    
    if not signature or not timestamp:
        raise HTTPException(status_code=400)
    
    payload = await request.body()
    expected = generate_skyvern_signature(
        payload.decode("utf-8"),
        settings.SKYVERN_API_KEY
    )

Credential Management

Workflows requiring authentication reference stored credentials by ID rather than embedding sensitive data.

CLI Commands

# Switch between environments
skyvern mcp switch

# List workflows
skyvern workflow list

# Run workflow
skyvern workflow run <workflow_id>

See Also

Sources: [skyvern/forge/sdk/workflow/models/workflow.py]()

AI-Powered Commands

Related topics: Browser Automation Engine, LLM Provider Configuration

Section Related Pages

Continue reading this section for the full explanation and source context.

Section Copilot Agent (skyvern/forge/sdk/copilot/agent.py)

Continue reading this section for the full explanation and source context.

Section Tool System (skyvern/forge/sdk/copilot/tools.py)

Continue reading this section for the full explanation and source context.

Section Browser Page AI (skyvern/library/skyvernbrowserpageai.py)

Continue reading this section for the full explanation and source context.

Related topics: Browser Automation Engine, LLM Provider Configuration

AI-Powered Commands

Skyvern provides a comprehensive suite of AI-powered commands that enable intelligent browser automation through natural language instructions. These commands leverage Large Language Models (LLMs) to interpret user intent and execute complex browser interactions autonomously.

Overview

AI-Powered Commands in Skyvern represent a paradigm shift from traditional scripted automation to intelligent, intent-based browser control. Instead of writing precise step-by-step instructions, users describe what they want to achieve in natural language, and Skyvern's AI agents interpret and execute the necessary browser actions.

The system integrates with multiple LLM providers including OpenAI (GPT-4.1, o3, o4-mini), Anthropic (Claude 4.5-4.7), Azure OpenAI, AWS Bedrock, and Google Gemini to power the AI decision-making engine.

Architecture

graph TD
    A[User Input / Natural Language] --> B[Copilot Agent]
    B --> C[LLM Provider]
    C --> D[Decision Engine]
    D --> E[Browser Actions]
    E --> F[Element Interaction]
    F --> G[State Validation]
    G --> H[Continue / Complete]
    
    B --> I[Tool Selection]
    I --> J[Data Extraction]
    I --> K[Visual Validation]
    I --> L[Network Monitoring]
    
    subgraph Tools
        J
        K
        L
    end

Core Components

Copilot Agent (`skyvern/forge/sdk/copilot/agent.py`)

The Copilot Agent serves as the central orchestration layer for AI-powered commands. It maintains conversation context, manages tool selection, and coordinates the execution flow between user instructions and browser actions.

ComponentResponsibility
Context ManagerMaintains conversation history and state
Tool SelectorChooses appropriate tools based on intent
Action ExecutorExecutes browser actions
Response FormatterFormats AI responses for user consumption

Tool System (`skyvern/forge/sdk/copilot/tools.py`)

Skyvern's tool system provides a comprehensive set of primitives for browser automation. Each tool is designed to handle specific interaction patterns while being composable for complex workflows.

Browser Page AI (`skyvern/library/skyvern_browser_page_ai.py`)

This module provides the foundational AI capabilities for understanding and interacting with web page content. It includes element identification, content extraction, and visual analysis capabilities.

Navigation and Interaction Commands

Element Interactions

CommandPurposeParameters
skyvern_clickClick on identified elementselement_selector, options
skyvern_typeEnter text into input fieldstext, element_selector
skyvern_hoverHover over elementselement_selector
skyvern_scrollScroll within page or elementsdirection, amount
skyvern_select_optionSelect dropdown optionsvalue, element_selector
skyvern_press_keyPress keyboard keyskey, modifiers
skyvern_dragDrag and drop operationssource, target
skyvern_waitWait for conditionscondition, timeout
skyvern_file_uploadUpload files to elementsfile_path, element_selector

Browser Navigation

CommandPurpose
skyvern_navigateNavigate to URLs
skyvern_go_backNavigate browser history back
skyvern_go_forwardNavigate browser history forward
skyvern_reloadReload current page

Tab and Frame Management

CommandPurpose
skyvern_tab_newOpen new browser tab
skyvern_tab_listList all open tabs
skyvern_tab_switchSwitch to specific tab
skyvern_tab_closeClose current or specified tab
skyvern_tab_wait_for_newWait for new tab to open
skyvern_frame_listList all iframes on page
skyvern_frame_switchSwitch to iframe context

Data Extraction Commands

Skyvern provides multiple methods for extracting structured data from web pages:

Structured Extraction

CommandPurposeOutput Format
skyvern_extractExtract structured dataJSON with defined schema
skyvern_get_htmlGet page HTMLRaw HTML string
skyvern_get_valueGet form element valuesString or JSON

Visual Extraction

CommandPurpose
skyvern_screenshotCapture full or partial screenshots
skyvern_get_stylesGet computed CSS styles
skyvern_findFind elements by visual similarity

Content Analysis

The extraction system uses AI to understand page structure and extract relevant information based on user intent. It supports:

  • Dynamic schema generation based on natural language requests
  • Multi-field extraction from complex layouts
  • Nested data structures and repeating elements
  • Confidence scoring for extracted values

Validation and Verification

AI-Powered Validation

CommandPurpose
skyvern_validateValidate element states or page conditions
skyvern_evaluateRun JavaScript for custom validation
skyvern_evaluate_asyncExecute async JavaScript operations

Validation commands use the LLM to interpret complex conditions that would be difficult to express in traditional selectors or XPath expressions.

Screenshot Validation

The screenshot command supports comparison against reference images and can detect visual regressions:

result = await skyvern.screenshot(
    full_page=True,
    compare_with="baseline.png",
    threshold=0.1  # 10% allowed difference
)

Network and Console Commands

Network Monitoring

CommandPurpose
skyvern_network_requestsList network requests
skyvern_network_request_detailGet request/response details
skyvern_network_routeIntercept and modify requests
skyvern_network_unrouteRemove request interception
skyvern_har_startStart HAR recording
skyvern_har_stopStop and export HAR data

Console Inspection

CommandPurpose
skyvern_console_messagesRetrieve console logs
skyvern_get_errorsGet JavaScript errors
skyvern_handle_dialogHandle browser dialogs (alert, confirm, prompt)

Authentication and Credentials

Login Commands

Skyvern supports intelligent login flows with multiple authentication methods:

CommandPurpose
skyvern_loginExecute automated login
skyvern_credential_listList stored credentials
skyvern_credential_getRetrieve specific credentials
skyvern_credential_deleteRemove stored credentials

Credential Management

The credential system integrates with:

  • Skyvern Vault: Built-in secure storage
  • Bitwarden: Enterprise password management
  • 1Password: Team password sharing
  • Azure Key Vault: Cloud credential storage

Two-Factor Authentication

Skyvern handles 2FA/TOTP flows automatically:

  1. Detects OTP requirement during login
  2. Extracts codes from configured sources
  3. Supports magic link authentication
  4. Push notification handling via skyvern/cli/skills/README.md

State Management

Session State

CommandPurpose
------------------
skyvern_state_saveSave current browser state
skyvern_state_loadRestore saved state
skyvern_get_session_storageRead session storage
skyvern_set_session_storageWrite to session storage
skyvern_clear_session_storageClear session storage
skyvern_clear_local_storageClear local storage

Clipboard Operations

CommandPurpose
------------------
skyvern_clipboard_readRead from clipboard
skyvern_clipboard_writeWrite to clipboard

Workflow Integration

AI-Powered Commands can be orchestrated into complete workflows:

graph LR
    A[Navigation] --> B[Authentication]
    B --> C[Data Extraction]
    C --> D[Validation]
    D --> E{Success?}
    E -->|No| F[Retry Logic]
    F --> B
    E -->|Yes| G[Output Results]

Workflow Commands

CommandPurpose
skyvern_workflow_createCreate new workflow
skyvern_workflow_listList available workflows
skyvern_workflow_getGet workflow details
skyvern_workflow_runExecute workflow
skyvern_workflow_cancelCancel running workflow

Agent Functions (`skyvern/forge/agent_functions.py`)

The agent functions module provides the core building blocks for AI-driven browser automation:

Function Categories

  1. Navigation Functions: Handle URL navigation, back/forward, and reload
  2. Interaction Functions: Click, type, hover, scroll, and element manipulation
  3. Extraction Functions: HTML retrieval, value extraction, screenshot capture
  4. Validation Functions: Element presence, state verification, screenshot comparison
  5. State Functions: Local/session storage, clipboard, authentication state

Function Interface

All agent functions follow a consistent interface:

async def agent_function(
    task_id: str,
    step_id: str,
    **kwargs  # Function-specific parameters
) -> AgentFunctionCallResult:
    """
    Execute AI-powered browser action
    
    Returns:
        AgentFunctionCallResult with:
        - success: bool
        - extracted_data: Optional[dict]
        - screenshot: Optional[str] base64
        - error: Optional[str]
    """

Integration with Skills Package

The skills package (skyvern/cli/skills/README.md) bundles AI-powered commands for coding agents:

Available Skills

SkillDescription
qaQA test frontend changes in real browser
skyvernFull CLI reference for browser automation
smoke-testCI-oriented smoke testing

QA Skill Workflow

graph TD
    A[git diff] --> B[Generate Tests]
    B --> C[Run Against Dev Server]
    C --> D[Report Results]
    D --> E{Screenshots}
    E --> F[Pass/Fail Status]

Configuration

Environment Variables

VariablePurposeDefault
SKYVERN_TELEMETRYEnable/disable usage telemetrytrue
SKYVERN_BASE_URLAPI endpoint for Skyvern CloudLocal server
SKYVERN_API_KEYAuthentication keyNone

Browser Configuration

ParameterPurpose
BROWSER_TYPEBrowser engine (chromium, firefox, webkit)
BROWSER_HEADLESSRun without visible UI
BROWSER_REMOTE_DEBUGGING_URLConnect to remote browser instance

Best Practices

Effective Command Usage

  1. Be Specific with Selectors: Use precise element identifiers when available
  2. Add Validation Steps: Always validate state changes after actions
  3. Handle Timing: Use wait commands for dynamic content
  4. Screenshot for Debugging: Capture screenshots at key decision points

Error Handling

try:
    result = await skyvern.act("click", selector="#submit-button")
    if not result.success:
        # Fallback or retry logic
        await skyvern.validate("element_visible", selector="#error-message")
except Exception as e:
    await skyvern.screenshot()
    raise

Summary

AI-Powered Commands in Skyvern transform browser automation from rigid scripting to intelligent, adaptive interactions. By combining natural language understanding with comprehensive browser control primitives, developers can create robust automation flows that handle complexity and edge cases gracefully.

The modular architecture allows commands to be used individually for simple tasks or combined into sophisticated workflows for enterprise-scale automation needs.

Source: https://github.com/Skyvern-AI/skyvern / Human Manual

Database Models

Related topics: Artifact Storage, Workflow System

Section Related Pages

Continue reading this section for the full explanation and source context.

Section Task Model

Continue reading this section for the full explanation and source context.

Section Workflow Model

Continue reading this section for the full explanation and source context.

Section Workflow Run Model

Continue reading this section for the full explanation and source context.

Related topics: Artifact Storage, Workflow System

Database Models

Overview

Skyvern's database layer is built using SQLAlchemy ORM with Alembic for database migrations. The persistence layer is located in skyvern/forge/sdk/db/ and provides the data models for all core entities including Tasks, Workflows, Workflow Runs, Browser Profiles, Credentials, and Schedules.

The database models define the schema for persistent storage of automation tasks, execution state, workflow definitions, and runtime data.

Architecture

graph TD
    A[API Layer] --> B[Repository Layer]
    B --> C[SQLAlchemy Models]
    C --> D[(PostgreSQL Database)]
    B --> E[Task Repository]
    B --> F[Workflow Repository]
    B --> G[Workflow Run Repository]

Core Entities

Task Model

The Task model represents an automation task with its configuration and execution state.

FieldTypeDescription
task_idStringUnique identifier (UUID)
workflow_run_idString (nullable)Associated workflow run
statusTaskStatusCurrent task status
requestJSONTask request configuration
navigation_goalStringNavigation objective
navigation_payloadJSONAdditional navigation parameters
data_extraction_goalStringData extraction objective
extracted_information_schemaJSONExpected output schema
created_atDateTimeCreation timestamp
modified_atDateTimeLast modification timestamp
organization_idStringOrganization ownership

Sources: skyvern/forge/sdk/db/models.py

Workflow Model

The Workflow model stores workflow definitions and configurations.

FieldTypeDescription
workflow_idStringUnique workflow identifier
titleStringWorkflow name
descriptionStringWorkflow description
workflow_definitionJSONWorkflow structure and steps
webhook_callback_urlString (nullable)Callback URL for completion
organization_idStringOrganization ownership
created_atDateTimeCreation timestamp
modified_atDateTimeLast modification timestamp

Sources: skyvern/forge/sdk/db/models.py

Workflow Run Model

The WorkflowRun model tracks individual executions of workflows.

FieldTypeDescription
workflow_run_idStringUnique run identifier
workflow_idStringParent workflow reference
statusWorkflowRunStatusRun status
organization_idStringOrganization ownership
started_atDateTimeExecution start time
completed_atDateTime (nullable)Execution completion time
errorString (nullable)Error message if failed

Sources: skyvern/forge/sdk/db/models.py

Task Status Enum

The TaskStatus enum defines possible task states:

class TaskStatus(str, Enum):
    created = "created"
    pending = "pending"
    running = "running"
    completed = "completed"
    failed = "failed"
    cancelled = "cancelled"

Sources: skyvern/forge/sdk/db/enums.py

Task Status Flow

stateDiagram-v2
    [*] --> created: Task Created
    created --> pending: Queued for Execution
    pending --> running: Agent Starts
    running --> completed: Success
    running --> failed: Error
    running --> cancelled: User Cancelled
    completed --> [*]
    failed --> [*]
    cancelled --> [*]

Repository Pattern

Skyvern uses a repository pattern to abstract database operations.

TaskRepository

Provides CRUD operations for Task entities:

  • create_task() - Create new task record
  • get_task() - Retrieve task by ID
  • update_task() - Update task fields
  • get_tasks_for_workflow_run() - Get tasks for workflow execution
  • get_tasks_by_organization() - List organization tasks

Sources: skyvern/forge/sdk/db/repositories/tasks.py

WorkflowRepository

Manages Workflow entity persistence:

  • create_workflow() - Create new workflow
  • get_workflow() - Retrieve workflow definition
  • update_workflow() - Update workflow
  • get_workflows_by_organization() - List organization workflows

Sources: skyvern/forge/sdk/db/repositories/workflows.py

WorkflowRunRepository

Handles WorkflowRun entity operations:

  • create_workflow_run() - Start new workflow execution
  • get_workflow_run() - Get run details
  • update_workflow_run() - Update run status
  • get_workflow_runs_for_workflow() - List runs for a workflow

Sources: skyvern/forge/sdk/db/repositories/workflow_runs.py

Database Migrations

Alembic manages database schema migrations in the alembic/versions/ directory.

Migration files follow the naming convention: {version}_{description}.py

Example migration operations:

  • Adding new columns to existing tables
  • Creating new tables for additional entities
  • Index creation for query optimization
  • Data type modifications

Sources: alembic/versions

Relationships

erDiagram
    Organization ||--o{ Task : owns
    Organization ||--o{ Workflow : owns
    Organization ||--o{ WorkflowRun : owns
    Workflow ||--o{ WorkflowRun : executes
    WorkflowRun ||--o{ Task : contains

Additional Models

The database layer also includes models for:

ModelPurpose
BrowserProfileBrowser configuration settings
CredentialAuthentication credentials storage
ScheduleCron-based task scheduling
ScheduleRunScheduled execution tracking

Sources: skyvern/forge/sdk/db/models.py

Usage Example

from skyvern.forge.sdk.db.repositories.tasks import TaskRepository
from skyvern.forge.sdk.db.models import Task

task_repo = TaskRepository()
new_task = await task_repo.create_task(
    organization_id="org_123",
    navigation_goal="Search for flights",
    navigation_payload={"origin": "SFO", "destination": "LAX"}
)

Configuration

Database connection is configured via environment variables:

VariableDescription
DATABASE_URLPostgreSQL connection string
SKYVERN_ORG_IDDefault organization ID

Sources: skyvern/forge/sdk/db/models.py

Sources: [skyvern/forge/sdk/db/models.py]()

Artifact Storage

Related topics: Database Models

Section Related Pages

Continue reading this section for the full explanation and source context.

Section High-Level Architecture

Continue reading this section for the full explanation and source context.

Section Component Responsibilities

Continue reading this section for the full explanation and source context.

Section Supported Artifact Types

Continue reading this section for the full explanation and source context.

Related topics: Database Models

Artifact Storage

Overview

Artifact Storage is a core system in Skyvern responsible for persisting and retrieving various artifacts generated during task execution and workflow runs. These artifacts include screenshots, HTML content, LLM prompts and responses, element trees, download files, and execution logs. The system provides a pluggable storage backend architecture that supports multiple storage providers while maintaining a consistent API.

The storage layer abstracts away the complexity of different storage backends (local filesystem, Amazon S3, Azure Blob Storage) from the rest of the application, allowing deployments to choose the most appropriate storage solution for their infrastructure requirements.

Architecture

High-Level Architecture

graph TD
    A[API Clients] --> B[Agent Protocol Routes]
    B --> C[Artifact Manager]
    C --> D[Storage Factory]
    D --> E[Local Storage]
    D --> F[S3 Storage]
    D --> G[Azure Blob Storage]
    
    H[Artifact Models] --> C
    C --> H
    
    I[Configuration] --> D

Component Responsibilities

ComponentFileResponsibility
Artifact Managerartifact/manager.pyOrchestrates artifact operations, lifecycle management
Storage Factoryartifact/storage/factory.pyCreates appropriate storage backend based on configuration
Local Storageartifact/storage/local.pyFilesystem-based storage implementation
S3 Storageartifact/storage/s3.pyAWS S3/ S3-compatible storage implementation
Azure Blob Storageartifact/storage/azure.pyAzure Blob Storage implementation
Artifact Modelsartifact/models.pyData models for artifacts and artifact types

Artifact Types

Skyvern distinguishes between multiple artifact types, each serving a specific purpose in documenting and debugging task execution.

Supported Artifact Types

class ArtifactType(str, Enum):
    SCREENSHOT_LLM = "screenshot_llm"
    SCREENSHOT_ACTION = "screenshot_action"
    HTML_SCRAPE = "html_scrape"
    ELEMENT_TREE = "element_tree"
    ELEMENT_TREE_VISIBLE = "element_tree_visible"
    LLM_PROMPT = "llm_prompt"
    LLM_RESPONSE_PARSED = "llm_response_parsed"
    DOWNLOAD = "download"
    SKYVERN_LOG = "skyvern_log"
TypeDescriptionContent-Type
SCREENSHOT_LLMAnnotated screenshots for LLM contextimage/png
SCREENSHOT_ACTIONAction screenshots captured during executionimage/png
HTML_SCRAPERaw HTML content from web pagestext/html
ELEMENT_TREEComplete DOM element treeapplication/json
ELEMENT_TREE_VISIBLEFiltered visible elements treeapplication/json
LLM_PROMPTPrompt sent to LLM for decision makingtext/plain
LLM_RESPONSE_PARSEDParsed LLM response with action listapplication/json
DOWNLOADDownloaded file contentapplication/octet-stream
SKYVERN_LOGSkyvern execution logstext/plain

Sources: skyvern/forge/sdk/artifact/models.py

Data Models

Artifact Model

The Artifact model represents a single stored artifact with metadata:

class Artifact(BaseModel):
    artifact_id: str
    organization_id: str
    run_id: str | None = None
    task_id: str | None = None
    step_id: str | None = None
    workflow_run_id: str | None = None
    workflow_block_execution_id: str | None = None
    artifact_type: ArtifactType
    uri: str
    filename: str | None = None
    content_type: str | None = None
    metadata: dict[str, Any] | None = None
    created_at: datetime
    modified_at: datetime | None = None

Sources: skyvern/forge/sdk/artifact/models.py

Content-Type Mapping

_ARTIFACT_CONTENT_TYPES: dict[ArtifactType, str] = {
    ArtifactType.SCREENSHOT_LLM: "image/png",
    ArtifactType.SCREENSHOT_ACTION: "image/png",
    ArtifactType.HTML_SCRAPE: "text/html",
    ArtifactType.ELEMENT_TREE: "application/json",
    ArtifactType.ELEMENT_TREE_VISIBLE: "application/json",
    ArtifactType.LLM_PROMPT: "text/plain",
    ArtifactType.LLM_RESPONSE_PARSED: "application/json",
    ArtifactType.DOWNLOAD: "application/octet-stream",
    ArtifactType.SKYVERN_LOG: "text/plain",
}

Storage Backends

Local Storage

The local storage backend stores artifacts on the filesystem, ideal for development and single-instance deployments.

class LocalStorage(BaseStorage):
    def __init__(self, artifact_path: str = settings.ARTIFACT_STORAGE_PATH) -> None:
        self.artifact_path = artifact_path

Key implementation details:

  • Path Construction: Uses organization and artifact IDs to create hierarchical directory structures
  • Windows Compatibility: Replaces colons with dashes in timestamps and removes invalid filename characters on Windows systems
  • SHA256 Verification: Computes SHA256 checksums for stored files
def _safe_timestamp() -> str:
    ts = datetime.utcnow().isoformat()
    return ts.replace(":", "-") if WINDOWS else ts

def _windows_safe_filename(name: str) -> str:
    if not WINDOWS:
        return name
    invalid = '<>:"/\\|?*'
    name = "".join("-" if ch in invalid else ch for ch in name)
    return name.rstrip(" .")

Sources: skyvern/forge/sdk/artifact/storage/local.py

S3 Storage

The S3 backend provides scalable object storage suitable for production deployments.

Configuration Environment Variables:

VariableDescription
AWS_ACCESS_KEY_IDAWS access key for authentication
AWS_SECRET_ACCESS_KEYAWS secret key for authentication
AWS_REGIONAWS region for bucket operations
S3_BUCKET_NAMEName of the S3 bucket
ARTIFACT_S3_ENDPOINT_URLCustom S3-compatible endpoint (optional)

Sources: skyvern/forge/sdk/artifact/storage/s3.py

Azure Blob Storage

The Azure backend integrates with Azure Blob Storage for cloud deployments.

Configuration Environment Variables:

VariableDescription
AZURE_STORAGE_CONNECTION_STRINGAzure storage connection string
AZURE_STORAGE_CONTAINER_NAMEContainer name for artifacts

Sources: skyvern/forge/sdk/artifact/storage/azure.py

Storage Factory

The storage factory pattern enables runtime selection of the appropriate storage backend:

graph LR
    A[Configuration] --> B[Storage Factory]
    B --> C{Backend Type}
    C -->|local| D[LocalStorage]
    C -->|s3| E[S3Storage]
    C -->|azure| F[AzureBlobStorage]

Backend Selection Logic:

def get_storage_backend() -> BaseStorage:
    if settings.ARTIFACT_STORAGE_BACKEND == "s3":
        return S3Storage()
    elif settings.ARTIFACT_STORAGE_BACKEND == "azure":
        return AzureBlobStorage()
    else:
        return LocalStorage()

Sources: skyvern/forge/sdk/artifact/storage/factory.py

API Endpoints

Get Artifact Content

Retrieves raw content of an artifact with support for range requests and HMAC-signed URLs.

Endpoint: GET /api/v1/artifacts/{artifact_id}/content

Query Parameters:

ParameterTypeDescription
sigstringHMAC signature for URL authentication
expirystringExpiration timestamp for signed URLs
kidstringKey identifier for signature verification
artifact_namestringOptional filename override
artifact_typestringExpected artifact type
x-api-keystringAPI key authentication (header)
authorizationstringBearer token authentication (header)

Responses:

StatusDescription
200Raw artifact content
206Partial content (Range request)
403Invalid or expired artifact URL
404Artifact not found
416Range not satisfiable

Content-Disposition Behavior:

if artifact.artifact_type == ArtifactType.DOWNLOAD:
    # Use attachment disposition for downloads
    return media_type, _build_attachment_disposition(raw_name)
return media_type, "inline"  # Inline for all other types

Sources: skyvern/forge/sdk/routes/agent_protocol.py

Range Request Support

The artifact content endpoint supports HTTP range requests for partial content retrieval:

def _parse_range_header(range_header: str | None, content_length: int) -> tuple[int, int] | None:
    """Return one satisfiable byte range, _RANGE_UNSATISFIABLE when unsatisfiable, or None when ignored."""
    if not range_header:
        return None
    # Parses "bytes=start-end" format
    # Validates ASCII digits, rejects negatives

Range Header Format: bytes=start-end (RFC 7233 compliant)

Sources: skyvern/forge/sdk/routes/agent_protocol.py

HMAC URL Signing

Artifact URLs can be signed using HMAC for time-limited access without requiring API key authentication:

sequenceDiagram
    Client->>Server: Request with sig, expiry, kid
    Server->>Server: Validate HMAC signature
    Server->>Storage: Fetch artifact
    Storage-->>Server: Artifact content
    Server-->>Client: Signed URL response

Signing Requirements:

  1. HMAC keyring must be configured: ARTIFACT_CONTENT_HMAC_KEYRING
  2. URL must include valid sig, expiry, and kid query parameters
  3. Signature is verified before returning artifact content

Sources: skyvern/forge/sdk/routes/agent_protocol.py

Configuration Options

Storage Configuration

Environment VariableDefaultDescription
ARTIFACT_STORAGE_BACKENDlocalStorage backend type (local/s3/azure)
ARTIFACT_STORAGE_PATH/tmp/skyvern/artifactsLocal storage path
ARTIFACT_CONTENT_HMAC_KEYRING-HMAC keyring for signed URLs

S3 Configuration

Environment VariableDescription
AWS_ACCESS_KEY_IDAWS credentials
AWS_SECRET_ACCESS_KEYAWS credentials
AWS_REGIONRegion setting
S3_BUCKET_NAMETarget bucket
ARTIFACT_S3_ENDPOINT_URLS3-compatible endpoint

Azure Configuration

Environment VariableDescription
AZURE_STORAGE_CONNECTION_STRINGConnection string
AZURE_STORAGE_CONTAINER_NAMEContainer name

File Extension Mapping

The storage layer maintains a mapping from artifact types to file extensions for consistent naming:

FILE_EXTENTSION_MAP: dict[ArtifactType, str] = {
    ArtifactType.SCREENSHOT_LLM: ".png",
    ArtifactType.SCREENSHOT_ACTION: ".png",
    ArtifactType.HTML_SCRAPE: ".html",
    ArtifactType.ELEMENT_TREE: ".json",
    ArtifactType.ELEMENT_TREE_VISIBLE: ".json",
    ArtifactType.LLM_PROMPT: ".txt",
    ArtifactType.LLM_RESPONSE_PARSED: ".json",
    ArtifactType.DOWNLOAD: ".bin",
    ArtifactType.SKYVERN_LOG: ".log",
}

Sources: skyvern/forge/sdk/artifact/storage/base.py

Usage Patterns

Storing an Artifact

# Via Artifact Manager
artifact = await artifact_manager.create_artifact(
    organization_id=org_id,
    artifact_type=ArtifactType.SCREENSHOT_LLM,
    content=image_bytes,
    task_id=task_id,
    step_id=step_id,
)

Retrieving an Artifact

# Get artifact metadata
artifact = await artifact_manager.get_artifact(artifact_id)

# Get presigned or signed URL
url = await artifact_manager.get_artifact_url(artifact)

Range Request for Large Files

headers = {"Range": "bytes=0-1023"}
response = await client.get(f"/api/v1/artifacts/{id}/content", headers=headers)

Security Considerations

  1. Signed URLs: HMAC-signed URLs provide time-limited access without exposing storage credentials
  2. Attachment Disposition: Download artifacts use Content-Disposition: attachment to prevent browser rendering of potentially malicious content
  3. Organization Isolation: Artifacts are namespaced by organization ID to prevent cross-tenant access
  4. Content-Type Validation: Responses set appropriate content-types based on artifact type

Frontend Integration

The frontend displays artifacts through dedicated UI components:

ComponentLocationPurpose
StepArtifacts.tsxroutes/tasks/detail/Task artifact viewer with tabbed interface
Artifact componentSharedRenders different artifact types
ZoomableImageSharedDisplays screenshots with zoom capability

The artifact viewer supports multiple tabs for different artifact types:

  • Info
  • Annotated Screenshots
  • Action Screenshots
  • HTML Element Tree
  • Element Tree
  • Prompt
  • Action List
  • HTML (Raw)

Sources: skyvern-frontend/src/routes/tasks/detail/StepArtifacts.tsx

Sources: [skyvern/forge/sdk/artifact/models.py]()

Credential Management

Related topics: Browser Automation Engine, Workflow System

Section Related Pages

Continue reading this section for the full explanation and source context.

Section Skyvern Internal Vault

Continue reading this section for the full explanation and source context.

Section Bitwarden Integration

Continue reading this section for the full explanation and source context.

Section Azure Key Vault

Continue reading this section for the full explanation and source context.

Related topics: Browser Automation Engine, Workflow System

Credential Management

Overview

Credential Management in Skyvern provides a secure, unified system for storing, retrieving, and managing authentication credentials across tasks and workflows. Skyvern supports multiple credential vault types, enabling integration with external password managers and custom credential services while maintaining a native internal vault.

Credentials in Skyvern can be of three primary types:

Credential TypeDescription
passwordUsername/password credential pairs for basic authentication
credit_cardCredit card information for payment forms
secretGeneric secret values for API keys, tokens, and other sensitive data

Sources: skyvern-frontend/src/routes/workflows/components/CredentialSelector.tsx:1-100

Architecture

Skyvern's credential management system is designed with a multi-vault architecture that allows seamless integration with various credential providers while maintaining a consistent internal API.

graph TD
    subgraph "Client Layer"
        UI[Web UI]
        API[API Client]
        MCP[MCP Tools]
    end
    
    subgraph "Credential Services"
        SkyvernVault[Skyvern Internal Vault]
        Bitwarden[Bitwarden Service]
        Azure[Azure Key Vault Service]
        Custom[Custom Credential Service]
    end
    
    subgraph "Storage Layer"
        DB[(Database)]
    end
    
    UI --> API
    MCP --> API
    API --> SkyvernVault
    API --> Bitwarden
    API --> Azure
    API --> Custom
    SkyvernVault --> DB

Sources: skyvern-frontend/src/components/CustomCredentialServiceConfigForm.tsx:1-50

Credential Vault Types

Skyvern Internal Vault

The default vault type stores credentials directly in Skyvern's database. This is the simplest option for getting started and requires no external configuration.

Bitwarden Integration

Skyvern can integrate with Bitwarden to leverage existing credentials stored in your Bitwarden vault. This integration supports:

  • Reading existing credentials from Bitwarden
  • Writing new credentials back to Bitwarden
  • Automatic 2FA/TOTP handling

Sources: skyvern/cli/mcp_tools/README.md:1-50

Azure Key Vault

For enterprise environments, Skyvern supports Azure Key Vault integration, allowing credentials stored in Azure's secure key management system to be used in tasks and workflows.

Sources: skyvern-frontend/src/routes/workflows/editor/panels/WorkflowParameterEditPanel.tsx:1-80

Custom Credential Service

Organizations with proprietary credential management systems can implement a custom credential service. This requires:

  1. API Configuration: Set up API base URL and authentication token
  2. Service Implementation: Implement the credential service interface
  3. Vault Type Selection: Configure parameters to use vault_type="custom"

The custom credential service configuration includes:

  • api_base_url: The base URL of your credential service API
  • api_token: Authentication token for the service

Sources: skyvern-frontend/src/routes/workflows/editor/panels/WorkflowParameterEditPanel.tsx:60-75

Using Credentials in Workflows

Credential Parameter Types

Credentials can be referenced as workflow parameters, allowing secure injection of sensitive data into task execution. The system supports the following parameter types:

Parameter TypeUsageExample Reference
credentialCredential objects from vault{{ my_credential.username }}
contextContext parameters from previous steps{{ context.source_param }}
customCustom credential service credentialsUses vault_type selection

Sources: skyvern-frontend/src/routes/workflows/editor/panels/WorkflowParameterEditPanel.tsx:40-65

Credential Reference Syntax

Within HTTP Request nodes, credentials are referenced using template syntax:

Password credential: {{ my_credential.username }} / {{ my_credential.password }}
Secret credential: {{ my_secret.secret_value }}

Sources: skyvern-frontend/src/routes/workflows/editor/nodes/HttpRequestNode/HttpRequestNode.tsx:1-50

Credential Parameter Validation

When running workflows, credential parameters are validated to ensure:

  1. Required Fields: Boolean and credential parameters must have values
  2. JSON Validation: JSON-type credential parameters must parse correctly
  3. Missing Credential Detection: The system detects orphaned credential parameters where the referenced credential no longer exists in the vault
// Validation example from workflow execution
if (parameter.workflow_parameter_type === "credential") {
    if (value === null || value === undefined) {
        return "This field is required";
    }
}

Sources: skyvern-frontend/src/routes/workflows/RunWorkflowForm.tsx:1-100

Orphaned Credential Detection

The system provides warnings when workflow parameters reference credentials that no longer exist in the vault:

⚠️ my_credential (missing credential)

This warning helps identify workflows that need to be updated after credential deletion or vault changes.

Sources: skyvern-frontend/src/routes/workflows/editor/nodes/TaskNode/ParametersMultiSelect.tsx:1-50

Two-Factor Authentication (TOTP)

Skyvern supports automated Two-Factor Authentication through TOTP (Time-based One-Time Password) handling. This is critical for automating workflows that require 2FA verification.

Push TOTP Code Flow

  1. Initiate Push: When a task encounters a TOTP challenge, Skyvern can push a verification code to the user
  2. Code Entry: User receives the verification message (SMS, email, or authenticator app)
  3. Code Extraction: Skyvern extracts the code from the verification message
  4. Attachment: The code is automatically attached to the relevant workflow run
interface TOTPConfig {
    totp_identifier: string;  // Email or phone for receiving codes
    totp_url?: string;        // Direct verification URL if available
    totp_type: 'totp' | 'magic_link';
}

Sources: skyvern-frontend/src/routes/credentials/CredentialsTotpTab.tsx:1-80

TOTP Parameter Filtering

The credential management interface supports filtering TOTP credentials by:

  • Identifier: Filter by email or phone number
  • OTP Type: Filter by numeric code or magic link

MCP Integration

Skyvern's Model Context Protocol (MCP) tools provide programmatic access to credential management:

{
  "mcpServers": {
    "skyvern": {
      "type": "streamable-http",
      "url": "https://api.skyvern.com/mcp/",
      "headers": { "x-api-key": "YOUR_API_KEY" }
    }
  }
}

Available MCP Credential Tools

ToolDescription
skyvern_credential_listList all credentials in the vault
skyvern_credential_getRetrieve a specific credential
skyvern_credential_deleteRemove a credential from the vault
skyvern_loginAuthenticate using stored credentials

Supported vault integrations: Skyvern vault, Bitwarden, 1Password, and Azure Key Vault with automatic 2FA/TOTP support.

Sources: integrations/mcp/README.md:1-80

Security Considerations

Browser Tunneling Security

When exposing Skyvern through browser tunneling, ensure API key authentication is enabled:

WARNING: Always use --api-key when exposing your browser via a tunnel. Without it, anyone with the URL has full control of your browser.

Sources: README.md:1-100

Credential Masking

Sensitive credential data is masked in UI displays:

  • Tokens longer than 8 characters are truncated: sk_live_xxx...
  • Full values are never displayed in logs or error messages

Sources: skyvern-frontend/src/components/CustomCredentialServiceConfigForm.tsx:20-35

External Vault Security

When using external credential services:

  1. Store API tokens securely (environment variables preferred)
  2. Use HTTPS for all credential service communications
  3. Implement IP allowlisting where supported
  4. Rotate credentials regularly

Configuration Reference

Environment Variables

VariableDescription
SKYVERN_API_KEYAPI key for Skyvern authentication
SKYVERN_BASE_URLBase URL for self-hosted deployments
SKYVERN_TELEMETRYSet to false to opt out of telemetry

Credential Service Configuration

FieldRequiredDescription
api_base_urlYes (custom)Base URL of the credential service
api_tokenYes (custom)Authentication token
token_typeNoType of authentication token
tested_urlNoURL used to test credential validity

Best Practices

  1. Use Type-Specific Credentials: Store credentials with appropriate types (password, credit_card, secret) for better organization and retrieval
  2. Implement Custom Services for Enterprise: For large-scale deployments, implement a custom credential service for centralized management
  3. Enable TOTP Automation: Configure TOTP handling for automated 2FA workflows
  4. Monitor Orphaned Parameters: Regularly check for and clean up orphaned credential references
  5. Rotate API Tokens: Periodically rotate API tokens for custom credential services
  6. Leverage Bitwarden for Existing Teams: If your team already uses Bitwarden, integrate it to avoid credential duplication

Sources: [skyvern-frontend/src/routes/workflows/components/CredentialSelector.tsx:1-100]()

LLM Provider Configuration

Related topics: AI-Powered Commands

Section Related Pages

Continue reading this section for the full explanation and source context.

Section Provider Selection Criteria

Continue reading this section for the full explanation and source context.

Section Core Configuration Components

Continue reading this section for the full explanation and source context.

Section Basic Setup

Continue reading this section for the full explanation and source context.

Related topics: AI-Powered Commands

LLM Provider Configuration

Skyvern leverages Large Language Models (LLMs) as the core intelligence engine for AI-powered browser automation. The LLM Provider Configuration system provides a flexible abstraction layer that enables Skyvern to connect with multiple LLM providers including OpenAI, Anthropic, Azure OpenAI, AWS Bedrock, and Google Gemini. This architecture decouples the automation logic from specific LLM implementations, allowing users to select their preferred provider without modifying core application code.

Supported LLM Providers

Skyvern supports a comprehensive range of LLM providers to accommodate diverse enterprise requirements and budget considerations. The framework utilizes litellm as a unified transport layer, which normalizes API interactions across different providers through a consistent interface.

ProviderSupported Models
OpenAIGPT-5.5, GPT-5.4, GPT-5, GPT-4.1, o3, o4-mini
AnthropicClaude 4.7 Opus, Claude 4.6 (Sonnet, Opus), Claude 4.5 (Haiku, Sonnet, Opus)
Azure OpenAIAny GPT models deployed to your Azure subscription
AWS BedrockClaude 4.7, Claude 4.6 (Sonnet, Opus), Claude 4.5 (Sonnet, Opus)
Google GeminiGemini 3.1 Pro, Gemini 3 Flash

Sources: README.md:1-20

Provider Selection Criteria

When selecting an LLM provider for Skyvern deployments, consider the following factors. OpenAI models offer strong general-purpose performance with the broadest model availability. Anthropic's Claude series excels in instruction following and extended reasoning tasks, making it particularly suitable for complex multi-step browser automation workflows. Azure OpenAI provides enterprise-grade security and compliance features with the ability to use custom model deployments. AWS Bedrock offers seamless integration with other AWS services and HIPAA-compliant deployments. Google Gemini provides competitive pricing with strong multimodal capabilities.

Configuration Architecture

The LLM Provider Configuration system follows a layered architecture that separates provider selection, credential management, and runtime dispatch. This design enables runtime provider switching and supports fallback mechanisms for production deployments.

graph TD
    A[Task Request] --> B[LLM API Handler]
    B --> C{LLM Provider Selection}
    C -->|OpenAI| D[OpenAI Transport]
    C -->|Anthropic| E[Anthropic Transport]
    C -->|Azure| F[Azure OpenAI Transport]
    C -->|AWS| G[Bedrock Transport]
    C -->|Gemini| H[Gemini Transport]
    D --> I[litellm Unified Interface]
    E --> I
    F --> I
    G --> I
    H --> I
    I --> J[Provider API Endpoint]

Core Configuration Components

The configuration system comprises several interconnected components that manage provider selection, authentication, and request handling. The API handler serves as the primary entry point for LLM interactions, coordinating between the task execution engine and the underlying transport layer. Models define the data structures for requests, responses, and provider-specific configurations. The litellm transport provides the unified interface that normalizes differences between provider APIs.

Environment Configuration

Basic Setup

LLM provider credentials are configured through environment variables in the .env file. After running skyvern quickstart or skyvern init, the setup wizard will guide you through provider selection and credential configuration.

# Required for OpenAI
OPENAI_API_KEY=sk-...

# Required for Anthropic
ANTHROPIC_API_KEY=sk-ant-...

# Required for Azure OpenAI
AZURE_OPENAI_API_KEY=your-azure-key
AZURE_OPENAI_BASE_URL=https://your-resource.openai.azure.com

# Required for AWS Bedrock
AWS_ACCESS_KEY_ID=your-access-key
AWS_SECRET_ACCESS_KEY=your-secret-key
AWS_REGION=us-east-1

# Required for Gemini
GOOGLE_GENERATIVE_AI_API_KEY=your-gemini-key

Sources: README.md:1-50

Provider-Specific Configuration

#### OpenAI Configuration

For OpenAI providers, Skyvern supports both standard OpenAI endpoints and custom base URLs for proxy or gateway scenarios. Model selection can be specified at the task level or configured as the default in the environment.

#### Anthropic Configuration

Anthropic Claude models require the ANTHROPIC_API_KEY environment variable. The setup wizard can automatically configure this during initialization. Claude models are particularly well-suited for Skyvern's browser automation tasks due to their strong instruction-following capabilities.

#### Azure OpenAI Configuration

Azure OpenAI deployments require additional configuration for deployment-specific endpoints. The AZURE_OPENAI_BASE_URL should point to your Azure OpenAI resource endpoint, and the system supports any GPT models deployed to your Azure subscription.

#### AWS Bedrock Configuration

AWS Bedrock integration uses standard AWS credential chain resolution, including environment variables, IAM roles, and AWS profile configurations. The AWS_REGION variable determines which AWS region your Bedrock endpoints are hosted in.

#### Google Gemini Configuration

Gemini models are configured using the GOOGLE_GENERATIVE_AI_API_KEY. The framework supports both Gemini 3.1 Pro for complex reasoning tasks and Gemini 3 Flash for faster, cost-effective operations.

Provider Selection in Code

When using Skyvern programmatically through the SDK, you can specify the LLM provider at task creation time. The framework will use the configured credentials for the selected provider.

from skyvern import Skyvern

skyvern = Skyvern(api_key="your-api-key")
task = await skyvern.run_task(
    prompt="Find the top post on hackernews today",
)

Sources: README.md:50-80

Cloud vs Local Configuration

Skyvern supports two operational modes for LLM configuration. In Skyvern Cloud mode, the platform manages provider configuration and billing. In local mode, you configure your own LLM provider credentials, and Skyvern routes requests through your specified provider.

For local deployments, the setup wizard configures credentials automatically during initialization. For custom configurations, you can manually edit the .env file with your provider-specific credentials.

Advanced Configuration Options

Custom Endpoint Configuration

For enterprise deployments requiring proxy servers or custom API gateways, Skyvern supports base URL customization through provider-specific environment variables. This enables integration with internal LLM deployments, specialized inference endpoints, or regional API endpoints.

Multi-Provider Fallback

Production deployments can implement multi-provider fallback strategies by configuring multiple provider credentials. When the primary provider is unavailable, Skyvern can automatically route requests to backup providers based on priority configuration.

Model Selection Per Task

Individual tasks can specify model preferences that override the default configuration. This enables cost optimization by using lighter models for simple tasks while reserving more capable models for complex automation sequences.

Credential Security

Credential management follows security best practices by storing sensitive information exclusively in environment variables. The .env file should never be committed to version control. Skyvern's initialization process creates the .env file from .env.example if it does not exist, ensuring template credentials are never exposed.

Sources: README.md:1-30

Troubleshooting

Common LLM provider configuration issues include incorrect API keys, network connectivity problems, and quota exhaustion. The setup wizard validates credentials during configuration to catch most issues early. For runtime errors, Skyvern provides detailed error messages that identify the specific provider and error type.

If you encounter authentication errors, verify that your API keys are correctly set in the .env file and that the corresponding provider account has sufficient credits or quota available.

Sources: [README.md:1-20]()

Model Context Protocol (MCP) Integration

Related topics: LLM Provider Configuration

Section Related Pages

Continue reading this section for the full explanation and source context.

Section Connection Modes

Continue reading this section for the full explanation and source context.

Section MCP Client Configuration

Continue reading this section for the full explanation and source context.

Section Browser Session Management

Continue reading this section for the full explanation and source context.

Related topics: LLM Provider Configuration

Model Context Protocol (MCP) Integration

Overview

Skyvern's Model Context Protocol (MCP) integration enables AI applications to connect to Skyvern's browser automation capabilities. This integration allows AI-powered applications to perform browser-based tasks such as filling out forms, downloading files, researching information on the web, and executing complex web automation workflows through natural language commands.

The MCP server implementation serves as a bridge between AI applications and Skyvern's browser engine, providing a standardized interface for browser automation tasks.

Sources: integrations/mcp/README.md

Architecture

The MCP integration supports multiple deployment models and connection methods:

Connection Modes

ModeDescriptionUse Case
Skyvern CloudConnect to managed cloud serviceProduction without self-hosting
Local Skyvern ServerSelf-hosted deploymentDevelopment, privacy, custom infrastructure

MCP Client Configuration

#### Cloud Configuration (streamable-http)

{
  "mcpServers": {
    "skyvern": {
      "type": "streamable-http",
      "url": "https://api.skyvern.com/mcp/",
      "headers": { "x-api-key": "YOUR_API_KEY" }
    }
  }
}

#### Local Configuration

{
  "mcpServers": {
    "skyvern": {
      "command": "python3",
      "args": ["-m", "skyvern", "run", "mcp"],
      "env": {
        "SKYVERN_BASE_URL": "http://localhost:8000",
        "SKYVERN_API_KEY": "YOUR_API_KEY"
      }
    }
  }
}

Sources: skyvern/cli/mcp_tools/README.md

Available MCP Tools

Browser Session Management

ToolDescription
skyvern_browser_session_createCreate a new browser session
skyvern_browser_session_closeClose an existing browser session
skyvern_browser_session_listList all active browser sessions
skyvern_browser_session_getGet details of a specific session
skyvern_browser_session_connectConnect to an existing session

Browser Actions

ToolDescription
skyvern_actExecute natural language actions
skyvern_navigateNavigate to a URL
skyvern_clickClick on an element
skyvern_typeType text into a field
skyvern_hoverHover over an element
skyvern_scrollScroll the page
skyvern_select_optionSelect an option from dropdown
skyvern_press_keyPress a keyboard key
skyvern_dragDrag an element
skyvern_file_uploadUpload a file
skyvern_waitWait for page to load

Data Extraction & Validation

ToolDescription
skyvern_extractExtract structured JSON data from page
skyvern_screenshotTake a screenshot
skyvern_findFind elements on the page
skyvern_validateValidate page content
skyvern_evaluateRun JavaScript code
skyvern_get_htmlGet page HTML

Sources: skyvern/cli/mcp_tools/README.md

Quick Start Guide

Prerequisites

REQUIREMENT: Skyvern only runs in Python 3.11 environment today

Installation Steps

``bash pip install skyvern ``

  1. Install Skyvern

Run the setup wizard which will guide you through the configuration process: ``bash skyvern init `` You can connect to either Skyvern Cloud or a local version of Skyvern.

  1. Configure Skyvern

Only required in local mode: ``bash skyvern run server ``

  1. Launch Local Server (Optional)

Sources: integrations/mcp/README.md

Claude Desktop Integration

Skyvern provides a downloadable .mcpb bundle that installs Skyvern Cloud into Claude Desktop without requiring the user to install Node.js.

Building the MCP Bundle

./scripts/package-mcpb.sh 1.0.23

Publishing to Releases

./scripts/package-mcpb.sh 1.0.23 skyvern-claude-desktop.mcpb \
  skyvern/cli/mcpb/releases/skyvern-claude-desktop.mcpb

Sources: skyvern/cli/mcpb/claude_desktop/README.md

Usage Patterns

Natural Language Actions

The skyvern_act tool allows you to describe actions in natural language, which Skyvern's AI interprets and executes:

"Click the login button"
"Fill in the email field with [email protected]"
"Select 'Premium' from the subscription dropdown"

Data Extraction

Use skyvern_extract to extract structured JSON data from web pages by describing the data you need:

"Extract all product names, prices, and ratings"

Screenshot and Validation Loops

For debugging and verification, use screenshot + validate loops:

# Take screenshot
screenshot = skyvern_screenshot()

# Validate content
validation = skyvern_validate("The login form is visible")

# If validation fails, take another screenshot for debugging
if not validation.success:
    screenshot = skyvern_screenshot()

Integration with AI Applications

The MCP integration enables AI applications to:

  • Automate form filling: Submit complex forms with AI-guided input
  • Research web content: Extract structured data from multiple sources
  • Download files: Navigate to and download files from websites
  • Execute workflows: Run browser automation workflows
  • Handle 2FA flows: Manage TOTP (Time-based One-Time Password) authentication

Credential Management

Skyvern's MCP tools support secure credential management for login flows:

Credential TypeUsage Pattern
Password{{ my_credential.username }} / {{ my_credential.password }}
Secret{{ my_secret.secret_value }}
Custom ServiceConfigure via CustomCredentialServiceConfigForm

API Reference

HTTP Request Block Tips

When using HTTP request blocks with MCP tools:

  • Use "Import cURL" to quickly convert API documentation examples
  • Use "Quick Headers" to add common authentication and content headers
  • The request will return response data including status, headers, and body
  • Reference response data in later blocks with parameters

Response Data

All MCP tool responses include:

FieldDescription
statusHTTP status code
headersResponse headers
bodyResponse body content

Workflow Integration

MCP tools can be integrated into Skyvern workflows for:

  • Browser automation blocks: Execute MCP actions as part of workflow steps
  • Conditional logic: Use validation results to control workflow branching
  • Data extraction: Feed extracted data into subsequent workflow blocks
  • Scheduled execution: Run MCP-powered workflows on cron schedules

Best Practices

  1. Session Management: Always close browser sessions when done to free resources
  2. Error Handling: Use validation tools to check page state before proceeding
  3. Screenshot Debugging: Take screenshots at key points for debugging failed automations
  4. Credential Security: Use environment variables and secure credential storage
  5. Rate Limiting: Be mindful of API rate limits when making frequent requests

Sources: [integrations/mcp/README.md]()

Doramagic Pitfall Log

Source-linked risks stay visible on the manual page so the preview does not read like a recommendation.

high what ensures it’s the correct one in that context?

The project may affect permissions, credentials, data exposure, or host boundaries.

medium Release v1.0.29

First-time setup may fail or require extra isolation and rollback planning.

medium Task Execution Performance: Seeking guidance on optimizing execution speed

First-time setup may fail or require extra isolation and rollback planning.

medium [Feature Request] Multi-session VNC support for local/self-hosted deployments (Live view & Take Control)

First-time setup may fail or require extra isolation and rollback planning.

Doramagic Pitfall Log

Doramagic extracted 16 source-linked risk signals. Review them before installing or handing real data to the project.

1. Security or permission risk: what ensures it’s the correct one in that context?

  • Severity: high
  • Finding: Security or permission risk is backed by a source signal: what ensures it’s the correct one in that context?. Treat it as a review item until the current version is checked.
  • User impact: The project may affect permissions, credentials, data exposure, or host boundaries.
  • Recommended check: Open the linked source, confirm whether it still applies to the current version, and keep the first run isolated.
  • Evidence: Source-linked evidence: https://github.com/Skyvern-AI/skyvern/issues/5637

2. Installation risk: Release v1.0.29

  • Severity: medium
  • Finding: Installation risk is backed by a source signal: Release v1.0.29. Treat it as a review item until the current version is checked.
  • User impact: First-time setup may fail or require extra isolation and rollback planning.
  • Recommended check: Open the linked source, confirm whether it still applies to the current version, and keep the first run isolated.
  • Evidence: Source-linked evidence: https://github.com/Skyvern-AI/skyvern/releases/tag/v1.0.29

3. Installation risk: Task Execution Performance: Seeking guidance on optimizing execution speed

  • Severity: medium
  • Finding: Installation risk is backed by a source signal: Task Execution Performance: Seeking guidance on optimizing execution speed. Treat it as a review item until the current version is checked.
  • User impact: First-time setup may fail or require extra isolation and rollback planning.
  • Recommended check: Open the linked source, confirm whether it still applies to the current version, and keep the first run isolated.
  • Evidence: Source-linked evidence: https://github.com/Skyvern-AI/skyvern/issues/4375

4. Installation risk: [Feature Request] Multi-session VNC support for local/self-hosted deployments (Live view & Take Control)

  • Severity: medium
  • Finding: Installation risk is backed by a source signal: [Feature Request] Multi-session VNC support for local/self-hosted deployments (Live view & Take Control). Treat it as a review item until the current version is checked.
  • User impact: First-time setup may fail or require extra isolation and rollback planning.
  • Recommended check: Open the linked source, confirm whether it still applies to the current version, and keep the first run isolated.
  • Evidence: Source-linked evidence: https://github.com/Skyvern-AI/skyvern/issues/4392

5. Configuration risk: Performance bottleneck: High latency for simple form-filling workflows

  • Severity: medium
  • Finding: Configuration risk is backed by a source signal: Performance bottleneck: High latency for simple form-filling workflows. Treat it as a review item until the current version is checked.
  • User impact: Users may get misleading failures or incomplete behavior unless configuration is checked carefully.
  • Recommended check: Open the linked source, confirm whether it still applies to the current version, and keep the first run isolated.
  • Evidence: Source-linked evidence: https://github.com/Skyvern-AI/skyvern/issues/4439

6. Capability assumption: README/documentation is current enough for a first validation pass.

  • Severity: medium
  • Finding: README/documentation is current enough for a first validation pass.
  • User impact: The project should not be treated as fully validated until this signal is reviewed.
  • Recommended check: Open the linked source, confirm whether it still applies to the current version, and keep the first run isolated.
  • Evidence: capability.assumptions | art_9274907e6629499384a5a574e4caa877 | https://github.com/Skyvern-AI/skyvern#readme | README/documentation is current enough for a first validation pass.

7. Maintenance risk: Release v1.0.34

  • Severity: medium
  • Finding: Maintenance risk is backed by a source signal: Release v1.0.34. Treat it as a review item until the current version is checked.
  • User impact: Users cannot judge support quality until recent activity, releases, and issue response are checked.
  • Recommended check: Open the linked source, confirm whether it still applies to the current version, and keep the first run isolated.
  • Evidence: Source-linked evidence: https://github.com/Skyvern-AI/skyvern/releases/tag/v1.0.34

8. Maintenance risk: Release v1.0.35

  • Severity: medium
  • Finding: Maintenance risk is backed by a source signal: Release v1.0.35. Treat it as a review item until the current version is checked.
  • User impact: Users cannot judge support quality until recent activity, releases, and issue response are checked.
  • Recommended check: Open the linked source, confirm whether it still applies to the current version, and keep the first run isolated.
  • Evidence: Source-linked evidence: https://github.com/Skyvern-AI/skyvern/releases/tag/v1.0.35

9. Maintenance risk: Maintainer activity is unknown

  • Severity: medium
  • Finding: Maintenance risk is backed by a source signal: Maintainer activity is unknown. Treat it as a review item until the current version is checked.
  • User impact: Users cannot judge support quality until recent activity, releases, and issue response are checked.
  • Recommended check: Open the linked source, confirm whether it still applies to the current version, and keep the first run isolated.
  • Evidence: evidence.maintainer_signals | art_9274907e6629499384a5a574e4caa877 | https://github.com/Skyvern-AI/skyvern#readme | last_activity_observed missing

10. Security or permission risk: no_demo

  • Severity: medium
  • Finding: no_demo
  • User impact: The project may affect permissions, credentials, data exposure, or host boundaries.
  • Recommended check: Open the linked source, confirm whether it still applies to the current version, and keep the first run isolated.
  • Evidence: downstream_validation.risk_items | art_9274907e6629499384a5a574e4caa877 | https://github.com/Skyvern-AI/skyvern#readme | no_demo; severity=medium

11. Security or permission risk: No sandbox install has been executed yet; downstream must verify before user use.

  • Severity: medium
  • Finding: No sandbox install has been executed yet; downstream must verify before user use.
  • User impact: The project may affect permissions, credentials, data exposure, or host boundaries.
  • Recommended check: Open the linked source, confirm whether it still applies to the current version, and keep the first run isolated.
  • Evidence: risks.safety_notes | art_9274907e6629499384a5a574e4caa877 | https://github.com/Skyvern-AI/skyvern#readme | No sandbox install has been executed yet; downstream must verify before user use.

12. Security or permission risk: no_demo

  • Severity: medium
  • Finding: no_demo
  • User impact: The project may affect permissions, credentials, data exposure, or host boundaries.
  • Recommended check: Open the linked source, confirm whether it still applies to the current version, and keep the first run isolated.
  • Evidence: risks.scoring_risks | art_9274907e6629499384a5a574e4caa877 | https://github.com/Skyvern-AI/skyvern#readme | no_demo; severity=medium

Source: Doramagic discovery, validation, and Project Pack records

Community Discussion Evidence

These external discussion links are review inputs, not standalone proof that the project is production-ready.

Sources 12

Count of project-level external discussion links exposed on this manual page.

Use Review before install

Open the linked issues or discussions before treating the pack as ready for your environment.

Community Discussion Evidence

Doramagic exposes project-level community discussion separately from official documentation. Review these links before using skyvern with real data or production workflows.

Source: Project Pack community evidence and pitfall evidence