Agent SDK and Runtime · Preview

promptfoo

Agent SDK project for checking tool calls, state, handoffs, traces, evaluation, and permission boundaries.

Agent SDKTool callsHandoffsTracingEvaluation boundaries

Best fitDevelopers building observable, testable, multi-tool agent applications.

Check whether this project matches your task before installing it.

What it can doAgent runtime preflights, tool permissions, state/handoff boundaries, trace acceptance, and evaluation checks

Review the portable capability path.

Before continuingVerify in a sandbox

Do not treat a preview pack as a proven local install.

GitHub snapshot22k stars

2.0k forks · 299 contributors

Doramagic.ai Last verification date: 2026-06-21 Verification method: source evidence, semantic profile, public page gate, and static build acceptance.

Official first step Read manual preview Source repository

Preview status · 2026-06-21

What is promptfoo?

promptfoo is an Agent SDK or runtime for tool calls, state, handoffs, tracing, and evaluation boundaries.
Best fit: Developers building observable, testable, multi-tool agent applications.
Not for: Not for one prompt, simple API calls, or environments that cannot isolate tool permissions.
Capability added to an AI workflow: Agent runtime preflights, tool permissions, state/handoff boundaries, trace acceptance, and evaluation checks
First safe verification step: Verify one minimal agent loop with fake tools and temporary credentials first.
Verification state: source, Quick Start, and sandbox install checks are recorded as passed.
Top risk: Upgrade or migration may change expected behavior: 0.121.8
Evidence base: https://github.com/promptfoo/promptfoo, https://github.com/promptfoo/promptfoo#readme, Human Manual, Pitfall Log

Quick decision

Use this section to decide whether the project is worth a deeper read.

Best forDevelopers building observable, testable, multi-tool agent applications.

Match the project to your task before installing it.

CapabilityAgent runtime preflights, tool permissions, state/handoff boundaries, trace acceptance, and evaluation checks

Agent SDK project for checking tool calls, state, handoffs, traces, evaluation, and permission boundaries.

Repositorypromptfoo/promptfoo

22k stars · 2.0k forks

What it can do

Translate the upstream project into concrete capabilities the user can judge before installing.

Core Evaluation Engine & Architecture

Related topics: LLM Provider Ecosystem & Custom Integrations, Web UI, Code Scanning, Server & Deployment

Source: https://github.com/promptfoo/promptfoo / Human Manual

LLM Provider Ecosystem & Custom Integrations

Related topics: Core Evaluation Engine & Architecture, Red Teaming & Adversarial Security Testing

Source: https://github.com/promptfoo/promptfoo / Human Manual

Red Teaming & Adversarial Security Testing

Related topics: LLM Provider Ecosystem & Custom Integrations, Web UI, Code Scanning, Server & Deployment

Source: https://github.com/promptfoo/promptfoo / Human Manual

Web UI, Code Scanning, Server & Deployment

Related topics: Core Evaluation Engine & Architecture, Red Teaming & Adversarial Security Testing

Source: https://github.com/promptfoo/promptfoo / Human Manual

Doramagic Pitfall Log

Source-linked risks stay visible on the manual page so the preview does not read like a recommendation.

Source: Doramagic discovery, validation, and Project Pack records

Sources: https://github.com/promptfoo/promptfoo, Human Manual, Project Pack evidence, and downstream validation signals.

Community Discussion Evidence

Project-level external discussion stays visible on the detail page, not only inside the manual.

Stars22k stars

Forks2.0k forks

Contributors299 contributors

Licenseunknown

Community Discussion Evidence

12 source-linked items

Review these external discussions before using promptfoo with real data or production workflows. They are review inputs, not standalone proof that the project is production-ready.

01
Per-test-case `repeat` option to control how many times individual tests
github / github_issue
02
code-scan-action: 0.1.8
github / github_release
03
0.121.17
github / github_release
04
0.121.16
github / github_release
05
0.121.15
github / github_release
06
0.121.14
github / github_release
07
code-scan-action: 0.1.7
github / github_release
08
0.121.13
github / github_release
09
code-scan-action: 0.1.6
github / github_release
10
0.121.12
github / github_release
11
0.121.11
github / github_release
12
0.121.10
github / github_release

How to start

Only source-backed commands are shown here. Verify them in an isolated environment first.

Try the prompt first

Test the workflow without installing the upstream project.

preview

Read the Human Manual

Understand inputs, outputs, limits, and failure modes.

manual

Take context to your AI host

Use the compiled assets in your preferred AI environment.

context

Run sandbox verification

Confirm install commands and rollback before using a primary environment.

verify

npm install -g promptfoo

Official start command · https://github.com/promptfoo/promptfoo#readme · verified: yes

Human Manual

The English page must expose the real manual, not a short placeholder.

8+ sections · Human Manual

promptfoo Manual

Promptfoo is described in its manifest as an "LLM eval & testing toolkit" distributed as a Node.js ES module with dual entry points for import and require, and ships CLI binaries promptfoo...

Open the full manual

https://github.com/promptfoo/promptfoo Project Manual
Table of Contents
Core Evaluation Engine & Architecture
Related Pages
Purpose and Scope
MCP Tool Surface
Provider and Assertion Architecture
Redteam Subsystem

Core Evaluation Engine & Architecture

Related topics: LLM Provider Ecosystem & Custom Integrations, Web UI, Code Scanning, Server & Deployment

Source: https://github.com/promptfoo/promptfoo / Human Manual

LLM Provider Ecosystem & Custom Integrations

Related topics: Core Evaluation Engine & Architecture, Red Teaming & Adversarial Security Testing

Source: https://github.com/promptfoo/promptfoo / Human Manual

Red Teaming & Adversarial Security Testing

Related topics: LLM Provider Ecosystem & Custom Integrations, Web UI, Code Scanning, Server & Deployment

Source: https://github.com/promptfoo/promptfoo / Human Manual

Web UI, Code Scanning, Server & Deployment

Related topics: Core Evaluation Engine & Architecture, Red Teaming & Adversarial Security Testing

Source: https://github.com/promptfoo/promptfoo / Human Manual

Doramagic Pitfall Log

Source-linked risks stay visible on the manual page so the preview does not read like a recommendation.

Source: Doramagic discovery, validation, and Project Pack records

AI Context Pack and portable assets

After deciding to continue, take the project context into your own AI host.

Complete pack plus user-owned assets

These files are planning and verification assets for Claude Code, Codex, Gemini, Cursor, ChatGPT, and other AI hosts.

Download complete pack Read Human Manual

BundleComplete Project Pack AssetAI Context Pack AssetBoundary & Risk Card AssetHuman Manual AssetPitfall Log AssetPrompt Preview AssetQuick Start EvidenceREPO_INSPECTION.json

Preflight checks

Treat this page as a planning asset, not proof that your local environment is ready.

The manual is generated from source-linked project files and Doramagic validation signals.
Community evidence warnings stay visible instead of being converted into marketing claims.
This preview remains noindex and excluded from sitemap/llms citation targets until English quality and index gates pass.
Use the upstream repository as the final authority for installation commands, license, and version-specific behavior.

Pitfall Log and verification risks

Doramagic surfaces high-risk items before users treat a candidate capability as verified.

medium

Installation risk requires verification

Upgrade or migration may change expected behavior: 0.121.8

medium

Installation risk requires verification

Upgrade or migration may change expected behavior: code-scan-action: 0.1.6

medium

Configuration risk requires verification

May increase setup, validation, or first-run risk for the user.

medium

Configuration risk requires verification

Upgrade or migration may change expected behavior: 0.121.15

medium

Configuration risk requires verification

Developers may misconfigure credentials, environment, or host setup: Per-test-case `repeat` option to control how many times individual tests run

medium

Configuration risk requires verification

May increase setup, validation, or first-run risk for the user.

medium

Capability evidence risk requires verification

May increase setup, validation, or first-run risk for the user.

medium

Runtime risk requires verification

Upgrade or migration may change expected behavior: 0.121.12