Doramagic Project Pack · Human Manual

prompttools

Open-source tools for prompt testing and experimentation, with support for both LLMs (e.g. OpenAI, LLaMA) and vector databases (e.g. Chroma, Weaviate, LanceDB).

Overview, Supported Integrations, and Quickstart

Related topics: Core Experiments API: LLMs, Vector Databases, and Frameworks, Playground, Widgets, and Visualization

Section Related Pages

Continue reading this section for the full explanation and source context.

Section Release-Driven Additions

Continue reading this section for the full explanation and source context.

Section Community-Reported Integration Gaps

Continue reading this section for the full explanation and source context.

Section Installation

Continue reading this section for the full explanation and source context.

Related topics: Core Experiments API: LLMs, Vector Databases, and Frameworks, Playground, Widgets, and Visualization

Overview, Supported Integrations, and Quickstart

1. Project Overview

prompttools is an open-source library created by Hegel AI that provides self-hostable tools for experimenting with, testing, and evaluating LLMs, vector databases, and prompts. The project positions itself around three familiar interfaces: code, notebooks, and a local playground, so that developers can rapidly compare prompts, models, and retrieval configurations with minimal boilerplate Source: [README.md].

A core design choice is that all executions and calls to LLM services happen locally on the user's machine. The library explicitly states it does not forward requests or log user information Source: [prompttools/playground/README.md]. By default, the package does emit error telemetry to Sentry to track its own reliability issues; this can be disabled with the SENTRY_OPT_OUT environment variable Source: [README.md].

A representative entry point is the OpenAIChatExperiment class, which can be instantiated with models, messages, and parameters, then executed and visualized in a few lines Source: [README.md].

2. Supported Integrations

The README maintains an explicit matrix of integrations grouped into three categories. The table below summarizes the supported surface as documented:

CategoryComponentStatus
LLMsOpenAI (Completion, ChatCompletion, Fine-tuned models)Supported
LLMsLLaMA.Cpp (LLaMA 1, LLaMA 2)Supported
LLMsHuggingFace (Hub API, Inference Endpoints)Supported
LLMsAnthropicSupported
LLMsMistral AISupported
LLMsGoogle GeminiSupported
LLMsGoogle PaLM (legacy)Supported
LLMsGoogle Vertex AISupported
LLMsAzure OpenAI ServiceSupported
LLMsReplicateSupported
LLMsOllamaIn Progress
Vector DBsChroma, Weaviate, Qdrant, LanceDB, PineconeSupported
Vector DBsMilvusExploratory
Vector DBsEpsillaIn Progress
FrameworksLangChain, MindsDBSupported

Source: README.md

Release-Driven Additions

Versioned releases in the public changelog show how the supported surface has expanded:

  • v0.0.35 introduced Google Vertex AI, Azure OpenAI Service, Replicate, Stable Diffusion, Pinecone, Qdrant, and RAG experiments, alongside utility helpers chunk_text, autoeval_with_documents, and structural_similarity Source: [README.md].
  • v0.0.41 launched the hosted Playground (private beta), persisting experiments with version control and team collaboration features Source: [README.md].
  • v0.0.45 added observability features via import prompttools.logger, intended for monitoring production LLM usage Source: [README.md].

Community-Reported Integration Gaps

The community tracker surfaces requested integrations that are not yet shipped. Tracked feature requests include Ollama (Issue #39), Microsoft Semantic-Kernel (Issue #114), the OpenAI Image Generation API (Issue #113), MusicGen/audio model evaluation (Issue #82), and broader LangChain harnesses (Issue #5). Ollama is explicitly listed as "In Progress" in the official matrix Source: [README.md].

3. Quickstart

Installation

The package is distributed on PyPI and installed via pip Source: [README.md]:

pip install prompttools

To run the playground locally, the repo must be cloned and the Streamlit dependency installed separately because the playground is not bundled as a runtime dependency of the pip package. Community issue #126 reports that streamlit is missing from the main requirements, so the documented command sequence is:

git clone https://github.com/hegelai/prompttools.git
cd prompttools && pip install -r prompttools/playground/requirements.txt
streamlit run prompttools/playground/playground.py

Source: prompttools/playground/README.md

Minimal Code Example

The canonical first example uses OpenAIChatExperiment. The user supplies a list of message threads, a list of model identifiers, and a parameter grid (e.g., temperature values), then calls .run() followed by .visualize() Source: [README.md]:

from prompttools.experiment import OpenAIChatExperiment

messages = [
    [{"role": "user", "content": "Tell me a joke."}],
    [{"role": "user", "content": "Is 17077 a prime number?"}],
]

models = ["gpt-3.5-turbo", "gpt-4"]
temperatures = [0.0]
openai_experiment = OpenAIChatExperiment(models, messages, temperature=temperatures)
openai_experiment.run()
openai_experiment.visualize()

Notebook Examples

A curated set of runnable notebooks is available under examples/notebooks/. Coverage spans single-model LLM experiments (OpenAI Chat, Anthropic, PaLM 2, Vertex AI, LLaMA.Cpp, HuggingFace Hub), function calling, regression testing, human feedback, and multimodal cases such as Stable Diffusion Source: [examples/notebooks/README.md]. Vector database notebooks under vectordb_experiments/ exercise ChromaDB, Weaviate, LanceDB, Qdrant, and Pinecone, while framework notebooks demonstrate LangChain sequential chains, router chains, and MindsDB integration Source: [examples/notebooks/README.md].

Known Setup Pitfalls

Several community-reported issues affect first-run experience:

  • Streamlit deprecation warnings — Issue #124 and #127 report that the playground uses st.experimental_get_query_params/st.experimental_set_query_params, which were slated for removal after 2024-04-11. Users should migrate to st.query_params Source: [prompttools/playground/README.md].
  • LanceDB import breakage — Issue #132 documents a crash on from prompttools.experiment import LanceDBExperiment in the latest version, with a traceback screenshot attached.
  • OpenAI client compatibility — Issue #122 reports AttributeError: module 'openai' has no attribute 'types' after a clean clone, indicating a need to pin or upgrade the openai package.
  • Notebook dependency drift — Issue #121 and #116 document that notebook examples require explicit pins such as pandas==1.5.3, fastapi, kaleido, uvicorn, cohere, and tiktoken, and that AzureOpenAIService notebooks require both model and prompt (or the stream flag).

4. Utility Layer

Beyond experiment drivers, prompttools.utils exposes a curated set of evaluation and text utilities, re-exported through prompttools/utils/__init__.py:

These utilities are surfaced in the public namespace as autoeval_binary_scoring, autoeval_with_documents, chunk_text, semantic_similarity, cos_similarity, validate_json_response, validate_python_response, and apply_moderation Source: [prompttools/utils/__init__.py].

See Also

Source: https://github.com/hegelai/prompttools / Human Manual

Core Experiments API: LLMs, Vector Databases, and Frameworks

Related topics: Overview, Supported Integrations, and Quickstart, Utilities, Harness, PromptTest, and Observability

Section Related Pages

Continue reading this section for the full explanation and source context.

Related topics: Overview, Supported Integrations, and Quickstart, Utilities, Harness, PromptTest, and Observability

Core Experiments API: LLMs, Vector Databases, and Frameworks

Overview and Purpose

The Core Experiments API is the central public surface of prompttools. It exposes a uniform Experiment abstraction that lets developers sweep prompts, model parameters, and embedding/retrieval configurations across heterogeneous backends, then collect, evaluate, and visualize the results in a pandas DataFrame. Source: README.md.

The library targets three backend families that share the same call pattern (.run() + .visualize()):

FamilyPurposeExample backends
LLMsCompare prompt/model/parameter combinationsOpenAI, Anthropic, LLaMA.Cpp, HuggingFace, Mistral, Google Gemini/PaLM/Vertex, Azure OpenAI, Replicate
Vector DatabasesEvaluate retrieval quality and embedding functionsChroma, Weaviate, Qdrant, LanceDB, Pinecone (Milvus/Epsilla exploratory)
FrameworksTest composed chains and agentsLangChain, MindsDB (LlamaIndex exploratory)

Source: README.md (Supported Integrations section).

Architecture and Data Flow

All experiment classes inherit from a common Experiment base class that defines the run loop, the evaluation hooks, and the persistence methods (to_csv, to_json, to_lora_json, to_mongo_db). Source: README.md.

flowchart LR
    A[User code<br/>inputs + param grid] --> B[Experiment subclass<br/>e.g. OpenAIChatExperiment]
    B --> C[.run()]
    C --> D[Backend SDK<br/>OpenAI / Anthropic / Chroma / LangChain]
    D --> E[pandas DataFrame<br/>prompt, response, metadata]
    E --> F[Evaluation utilities<br/>similarity, autoeval, moderation]
    F --> G[.visualize() / .to_csv / .to_json]
    E --> H[Hosted Playground / Observability<br/>import prompttools.logger]

The pipeline is deliberately local-first: API calls originate from the user's machine and only the experiment framework code is loaded. Source: README.md (FAQ: "Will this library forward my LLM calls to a server...?").

LLM Experiments

LLM experiments are constructed from three orthogonal axes: the message/prompt list, the model identifier list, and a parameter dictionary (e.g., temperature). The OpenAIChatExperiment shown in the README demonstrates the canonical pattern. Source: README.md.

from prompttools.experiment import OpenAIChatExperiment

messages = [
    [{"role": "user", "content": "Tell me a joke."}],
    [{"role": "user", "content": "Is 17077 a prime number?"}],
]
models = ["gpt-3.5-turbo", "gpt-4"]
temperatures = [0.0]
exp = OpenAIChatExperiment(models, messages, temperature=temperatures)
exp.run()
exp.visualize()

Equivalent notebooks exist for Anthropic Claude, PaLM 2, Google Vertex chat, LLaMA.Cpp, and HuggingFace Hub, each parameterized to that provider's SDK. Source: examples/notebooks/README.md.

Community-known failure modes on this surface include a TypeError: Missing required arguments thrown by the Azure OpenAI notebook when model/prompt/stream are not all supplied (Issue #116), and AttributeError: module 'openai' has no attribute 'types' when the local openai SDK is older than what the experiment expects (Issue #122). Both are tracked separately from the experiment API itself.

Vector Database and Retrieval Experiments

Vector database experiments take a corpus plus an embedding configuration and measure retrieval accuracy, typically with ranking-correlation utilities against an expected ordering. Source: examples/notebooks/README.md.

The supported notebooks include Chroma, Weaviate, LanceDB, Qdrant, Pinecone, and a Retrieval-Augmented Generation (RAG) experiment that chains a vector store with an LLM. Source: examples/notebooks/README.md.

Two helper utilities make these experiments practical:

  • chunk_text(text, max_chunk_length) splits paragraphs without breaking words, enabling consistent ingestion across embedding configurations. Source: prompttools/utils/chunk_text.py.
  • autoeval_with_documents(row, documents, response_column_name) asks GPT-4 to grade whether a response is grounded in retrieved documents, returning an integer 0–10. Source: prompttools/utils/autoeval_with_docs.py.

Community-known issue: from prompttools.experiment import LanceDBExperiment raises an ImportError on the latest published version (Issue #132), indicating that optional backends are imported eagerly rather than lazily.

Framework Experiments and Evaluation Utilities

Framework experiments let users treat chains and routers as first-class experimentables. The currently documented notebooks are LangChainSequentialChainExperiment, LangChainRouterChainExperiment, and MindsDBExperiment. Source: examples/notebooks/README.md.

A request to add Microsoft Semantic-Kernel support (Issue #114) and ongoing interest in deeper LangChain support (Issue #5) reflect where the framework surface is expected to grow.

Evaluation utilities are re-exported from prompttools.utils and attach to experiment rows. Source: prompttools/utils/__init__.py.

UtilityPurposeSource
semantic_similarity / cos_similarityEmbedding-based comparison of two strings (HuggingFace or Chroma)prompttools/utils/similarity.py
structural_similaritySSIM between images (cv2 + skimage)prompttools/utils/similarity.py
autoeval_binary_scoringGPT-4 judges whether the response follows the promptprompttools/utils/autoeval.py
autoeval_from_expected_responseGPT-4 grades ACTUAL against EXPECTEDprompttools/utils/autoeval_from_expected.py
autoeval_with_documentsGPT-4 grades RAG grounding in provided docsprompttools/utils/autoeval_with_docs.py
apply_moderationOpenAI moderation API on a response columnprompttools/utils/moderation.py
validate_json_response / validate_python_responseSchema validation against model outputprompttools/utils/__init__.py
ranking_correlationCompare vector DB ordering to an expected orderingprompttools/utils/__init__.py

Playground, Persistence, and Observability

Experiment objects can be persisted locally via to_csv, to_json, to_lora_json, or to_mongo_db. Source: README.md. For interactive exploration, the Streamlit playground is launched from the cloned repo and shares the same Experiment API. Source: prompttools/playground/README.md.

Two community-reported issues affect the playground specifically and are worth noting here: deprecation warnings from st.experimental_get_query_params (Issue #124, Issue #127) and a missing streamlit dependency in the playground's requirements.txt (Issue #126).

For hosted workflows, import prompttools.logger enables the PromptTools Observability beta (v0.0.45), which persists experiments with version control and adds a one-line observability hook to production LLM calls. Source: README.md.

See Also

  • Evaluation Utilities Reference — deeper documentation of the auto-eval, similarity, and moderation helpers.
  • Playground UI Guide — running and troubleshooting the Streamlit playground.
  • Notebook Examples Index — full catalog of runnable notebooks per backend.

Source: https://github.com/hegelai/prompttools / Human Manual

Playground, Widgets, and Visualization

Related topics: Overview, Supported Integrations, and Quickstart, Core Experiments API: LLMs, Vector Databases, and Frameworks

Section Related Pages

Continue reading this section for the full explanation and source context.

Related topics: Overview, Supported Integrations, and Quickstart, Core Experiments API: LLMs, Vector Databases, and Frameworks

Playground, Widgets, and Visualization

Overview and Purpose

The Playground is prompttools' Streamlit-based graphical interface that lets users evaluate prompts, model parameters, and vector-database retrieval settings without writing code. It complements the notebook-driven workflow (OpenAIChatExperiment.ipynb, ChromaDBExperiment.ipynb, etc.) described in examples/notebooks/README.md and exposes the same experiment.run() + experiment.visualize() flow described in the top-level README.md. Per prompttools/playground/README.md, the playground can:

  • Evaluate different system instructions (system prompts)
  • Try different prompt templates
  • Compare responses across models (e.g., GPT-4 vs. local LLaMA 2)

All calls to LLM services and vector databases execute locally on the user's machine; the package does not forward requests or log responses, as stated in prompttools/playground/README.md.

flowchart LR
    A[User opens Playground] --> B[Select Model & API]
    B --> C[Configure Prompts / Templates]
    C --> D[Run Experiment]
    D --> E[Experiment.run&#40;&#41;]
    E --> F[Experiment.visualize&#40;&#41;]
    F --> G[Streamlit Widgets: Tables, Charts, Rankings]
    G --> H[Compare Responses Across Models]

Architecture and Module Layout

The playground is structured as a small package:

FileRole
prompttools/playground/playground.pyStreamlit entry point that renders widgets and dispatches experiments.
prompttools/playground/constants.pyDefines supported model lists, experiment types, and reusable constants.
prompttools/playground/data_loader.pyLoads user-supplied CSV/data inputs into the experiment runtime.
prompttools/playground/__init__.pyPackage marker; exposes playground helpers.
prompttools/playground/packages.txtLists system packages required by the hosted Streamlit deployment.

Per the launch instructions in prompttools/playground/README.md, the application is started locally with:

git clone https://github.com/hegelai/prompttools.git
cd prompttools && pip install -r prompttools/playground/requirements.txt
streamlit run prompttools/playground/playground.py

The playground can also be reached through the hosted Streamlit Community Cloud deployment at https://prompttools.streamlit.app/, as documented in README.md. That hosted variant does not support the LlamaCpp experiment.

Widgets and Visualization Pipeline

Each widget in the playground corresponds to a stage of the experiment lifecycle declared in the underlying experiments package. After the user configures inputs and triggers a run, playground.py invokes the standard run() / visualize() contract, which is the same one exposed in the notebook examples summarized in examples/notebooks/README.md.

The typical visualization surfaces include:

  • Tabular response grids that pivot input prompts against model/parameter combinations.
  • Charts and ranking correlations for vector-database experiments (e.g., ChromaDBExperiment, LanceDBExperiment).
  • Side-by-side response comparison for chat and completion models.
  • Evaluation overlays powered by utility functions re-exported from prompttools/utils/__init__.pysemantic_similarity, cos_similarity, ranking_correlation, autoeval_scoring, autoeval_with_documents, validate_json_response, and validate_python_response.

The __init__.py re-exports ensure the playground can attach any of these evaluators to a response column without modifying experiment code:

from prompttools.utils import semantic_similarity, autoeval_with_documents

Configuration, Deployment, and Community-Reported Issues

Several community-reported issues directly shape how the playground should be configured and operated:

  1. Missing Streamlit dependency — Issue #126 reports that streamlit is not declared in the top-level requirements.txt. The workaround documented in prompttools/playground/README.md is to install prompttools/playground/requirements.txt before invoking streamlit run.
  2. Streamlit deprecations — Issues #124 and #127 report that st.experimental_get_query_params and st.experimental_set_query_params were removed after 2024-04-11. Users running the playground against recent Streamlit versions should migrate to st.query_params per the upstream Streamlit docs.
  3. Hosted Playground (0.0.41) — Release v0.0.41 introduced the hosted Playground as a private beta with experiment persistence and collaboration features.
  4. Observability overlay (0.0.45) — Release v0.0.45 added import prompttools.logger so that teams can monitor production LLM usage from inside the same UI surface.
  5. Integration gaps — Open community requests include Ollama (#39), Microsoft Semantic-Kernel (#114), OpenAI Assistants API (#111), MusicGen (#82), and OpenAI Image Generation (#113). Until those experiments are added under prompttools/experiment/experiments/, their constants are absent from prompttools/playground/constants.py and they cannot be selected in the playground UI.

The packages.txt file is consulted when deploying to Streamlit Community Cloud so that native dependencies (e.g., for cv2, librosa, image/audio experiments referenced in examples/notebooks/README.md) are present at runtime.

Failure Modes and Best Practices

  • CSV ingestion errorsdata_loader.py requires well-formed inputs; malformed CSVs will surface as Streamlit exceptions rather than silent skips.
  • Missing API keys — Utility evaluators such as autoeval, autoeval_from_expected_response, and apply_moderation (re-exported from prompttools/utils/__init__.py) raise PromptToolsUtilityError when OPENAI_API_KEY is unset.
  • Local model limitations — LlamaCpp experiments run only on the local playground, never on the hosted Streamlit deployment, per the note in README.md.
  • Dependency drift — Issue #121 documents a working pinned notebook dependency set (fastapi, kaleido, python-multipart, uvicorn, cohere, tiktoken, pandas==1.5.3) that users running playground-launched notebooks may need to mirror.
  • Vector-DB imports — Issue #132 reports a crash when importing LanceDBExperiment. This propagates to the playground whenever LanceDB is selected, so users should verify the experiment imports cleanly in isolation before relying on the UI surface.

See Also

Source: https://github.com/hegelai/prompttools / Human Manual

Utilities, Harness, PromptTest, and Observability

Related topics: Core Experiments API: LLMs, Vector Databases, and Frameworks, Playground, Widgets, and Visualization

Section Related Pages

Continue reading this section for the full explanation and source context.

Section Auto-evaluation Utilities

Continue reading this section for the full explanation and source context.

Section Similarity and Structural Metrics

Continue reading this section for the full explanation and source context.

Section Text and Code Validators

Continue reading this section for the full explanation and source context.

Related topics: Core Experiments API: LLMs, Vector Databases, and Frameworks, Playground, Widgets, and Visualization

Utilities, Harness, PromptTest, and Observability

Overview

prompttools provides four cross-cutting capabilities that sit on top of its experiment suite: a Utilities module for evaluating, scoring, and chunking text/code, a Harness abstraction for embedding full applications (such as LangChain agents) into experiments, the PromptTest workflow for asserting behavioral expectations, and an Observability layer that ships call metadata to the hosted Hegel AI platform. The Utilities module is fully open-sourced and importable from the prompttools.utils namespace, while Harness, PromptTest, and Observability are referenced in the project's roadmap and release notes as the path toward higher-level prompt evaluation and production monitoring.

The README frames the project as a way to "test and experiment with prompts, LLMs, and vector databases" using familiar interfaces such as code, notebooks, and a local playground, and the Utilities module is the bridge that connects raw model responses to these interfaces Source: [README.md].

Utilities Module

The prompttools.utils package re-exports a curated set of helper functions used to score, compare, and post-process model outputs. The full public surface is declared in prompttools/utils/__init__.py and includes the following entry points Source: [prompttools/utils/__init__.py]:

FunctionPurpose
autoeval_binary_scoringJudge a response as RIGHT/WRONG via GPT-4
autoeval_from_expected_responseCompare actual vs. expected with a grader model
autoeval_scoringScore a response on an integer scale
autoeval_with_documentsGrounded RAG scoring using supporting documents
chunk_textSplit a paragraph into word-preserving chunks
compute_similarity_against_modelEmbedding-based similarity to a model output
apply_moderationRun OpenAI moderation on a response
ranking_correlationRank-correlation metric for retrieval results
semantic_similarity, cos_similarityHuggingFace/Chroma-backed similarity
validate_json_response, validate_python_responseStructural validators for generated code/data

A custom exception, PromptToolsUtilityError, is defined in prompttools/utils/error.py and is raised by utilities when preconditions are not met (for example, a missing OPENAI_API_KEY environment variable) Source: [prompttools/utils/error.py].

Auto-evaluation Utilities

prompttools/utils/autoeval.py implements an LLM-as-judge pattern: it asks a chat model (defaulting to GPT-4) to classify a response as RIGHT or WRONG and returns 1.0 or 0.0 accordingly Source: [prompttools/utils/autoeval.py]. autoeval_from_expected.py extends the same idea to ground-truth comparisons, asking the judge to compare PROMPT, EXPECTED, and ACTUAL strings Source: [prompttools/utils/autoeval_from_expected.py]. autoeval_with_docs.py adds document-grounded evaluation, rendering retrieved contexts through a Jinja template and producing an integer rating between 0 and 10 Source: [prompttools/utils/autoeval_with_docs.py].

Similarity and Structural Metrics

prompttools/utils/similarity.py lazily initializes a SentenceTransformer model (all-MiniLM-L6-v2) and an optional Chroma client so that similarity functions can run without forcing a heavy import at module load Source: [prompttools/utils/similarity.py]. Optional dependencies are imported defensively: if cv2 or skimage is missing, structural_similarity raises a ModuleNotFoundError directing the user to install opencv-python and scikit-image.

Text and Code Validators

prompttools/utils/chunk_text.py exposes a single chunk_text(text, max_chunk_length) function that splits on whitespace and never breaks a word across chunks Source: [prompttools/utils/chunk_text.py]. validate_python.py writes the response to a temporary file and shells out to pylint via pylint.epylint; if pylint is not installed, it raises a RuntimeError asking the user to either install pylint<3.0 or supply a custom evaluator Source: [prompttools/utils/validate_python.py].

flowchart LR
    A[Experiment.run] --> B[DataFrame of responses]
    B --> C{Choose Utility}
    C --> D[autoeval_*]
    C --> E[semantic_similarity]
    C --> F[validate_*]
    C --> G[chunk_text]
    D --> H[Scored DataFrame]
    E --> H
    F --> H
    H --> I[visualize / export]

Harness, PromptTest, and Observability

The Harness concept is referenced in community requests such as issue #5 (LangChain Support), which proposes "Harnesses and Experiments to support testing LangChains natively" with low-level chain/agent experiments, step-by-step visualizations, and intermediate-output evaluation. The Utilities module is the natural plug-in point for the harness: scoring and validation functions can be applied per-step rather than only on the final response.

PromptTest is introduced in the 0.0.41 Hosted Playground release as part of the broader effort to persist experiments and add behavioral assertions that survive across runs (see GitHub release notes for v0.0.41). It is intended to be authored alongside the existing experiment workflow and evaluated using the same autoeval and similarity utilities documented above.

Observability was announced in the 0.0.45 release as a private beta on the hosted Hegel AI platform. The integration is a one-line opt-in via import prompttools.logger, which begins forwarding call-level telemetry to the hosted dashboard (GitHub release notes for v0.0.45). The hosted Playground is accessible at prompttools.streamlit.app for users who do not wish to run the Streamlit app locally Source: [README.md].

Common Failure Modes

Several issues surfaced in the community align directly with this topic:

  • Missing streamlit dependency when running the playground via the documented streamlit run command (issue #126). The Playground README correctly lists a separate prompttools/playground/requirements.txt that should be installed first Source: [prompttools/playground/README.md].
  • Deprecation warnings from st.experimental_get_query_params / st.experimental_set_query_params printed at playground launch (issues #124 and #127) — these originate in the Streamlit version pinned to the playground requirements.
  • Optional-dependency errors: utilities like structural_similarity and validate_python raise ModuleNotFoundError/RuntimeError when cv2, skimage, or pylint<3.0 are absent, so callers should pre-install these or supply a custom evaluator Source: [prompttools/utils/similarity.py; prompttools/utils/validate_python.py].
  • Missing OPENAI_API_KEY: the autoeval utilities check os.environ["OPENAI_API_KEY"] and raise PromptToolsUtilityError if it is unset, which is the most common cause of zero scores in CI runs Source: [prompttools/utils/autoeval.py].

See Also

  • Getting Started (Quickstart & Integrations)
  • Experiment Reference
  • Vector Database & RAG Experiments
  • Notebook Examples Index

Source: https://github.com/hegelai/prompttools / Human Manual

Doramagic Pitfall Log

Source-linked risks stay visible on the manual page so the preview does not read like a recommendation.

high Maintenance risk requires verification

May increase setup, validation, or first-run risk for the user.

high Security or permission risk requires verification

May increase setup, validation, or first-run risk for the user.

medium Installation risk requires verification

May increase setup, validation, or first-run risk for the user.

medium Installation risk requires verification

May increase setup, validation, or first-run risk for the user.

Doramagic Pitfall Log

Found 14 structured pitfall item(s), including 2 high/blocking item(s). Top priority: Maintenance risk - Maintenance risk requires verification.

1. Maintenance risk: Maintenance risk requires verification

  • Severity: high
  • Finding: Project evidence flags a maintenance risk. Review the linked source before relying on this workflow.
  • User impact: May increase setup, validation, or first-run risk for the user.
  • Recommended check: Reproduce the official install and quickstart path in an isolated environment.
  • Evidence: community_evidence:github | https://github.com/hegelai/prompttools/issues/132

2. Security or permission risk: Security or permission risk requires verification

  • Severity: high
  • Finding: Project evidence flags a security or permission risk. Review the linked source before relying on this workflow.
  • User impact: May increase setup, validation, or first-run risk for the user.
  • Recommended check: Reproduce the official install and quickstart path in an isolated environment.
  • Evidence: community_evidence:github | https://github.com/hegelai/prompttools/issues/121

3. Installation risk: Installation risk requires verification

  • Severity: medium
  • Finding: Project evidence flags a installation risk. Review the linked source before relying on this workflow.
  • User impact: May increase setup, validation, or first-run risk for the user.
  • Recommended check: Reproduce the official install and quickstart path in an isolated environment.
  • Evidence: community_evidence:github | https://github.com/hegelai/prompttools/issues/122

4. Installation risk: Installation risk requires verification

  • Severity: medium
  • Finding: Project evidence flags a installation risk. Review the linked source before relying on this workflow.
  • User impact: May increase setup, validation, or first-run risk for the user.
  • Recommended check: Reproduce the official install and quickstart path in an isolated environment.
  • Evidence: community_evidence:github | https://github.com/hegelai/prompttools/issues/116

5. Installation risk: Installation risk requires verification

  • Severity: medium
  • Finding: Project evidence flags a installation risk. Review the linked source before relying on this workflow.
  • User impact: May increase setup, validation, or first-run risk for the user.
  • Recommended check: Reproduce the official install and quickstart path in an isolated environment.
  • Evidence: community_evidence:github | https://github.com/hegelai/prompttools/issues/126

6. Capability evidence risk: Capability evidence risk requires verification

  • Severity: medium
  • Finding: README/documentation is current enough for a first validation pass.
  • User impact: May increase setup, validation, or first-run risk for the user.
  • Recommended check: Reproduce the official install and quickstart path in an isolated environment.
  • Evidence: capability.assumptions | https://github.com/hegelai/prompttools

7. Runtime risk: Runtime risk requires verification

  • Severity: medium
  • Finding: Project evidence flags a runtime risk. Review the linked source before relying on this workflow.
  • User impact: May increase setup, validation, or first-run risk for the user.
  • Recommended check: Reproduce the official install and quickstart path in an isolated environment.
  • Evidence: community_evidence:github | https://github.com/hegelai/prompttools/issues/124

8. Runtime risk: Runtime risk requires verification

  • Severity: medium
  • Finding: Project evidence flags a runtime risk. Review the linked source before relying on this workflow.
  • User impact: May increase setup, validation, or first-run risk for the user.
  • Recommended check: Reproduce the official install and quickstart path in an isolated environment.
  • Evidence: community_evidence:github | https://github.com/hegelai/prompttools/issues/127

9. Runtime risk: Runtime risk requires verification

  • Severity: medium
  • Finding: Project evidence flags a runtime risk. Review the linked source before relying on this workflow.
  • User impact: May increase setup, validation, or first-run risk for the user.
  • Recommended check: Reproduce the official install and quickstart path in an isolated environment.
  • Evidence: packet_text.keyword_scan | https://github.com/hegelai/prompttools

10. Maintenance risk: Maintenance risk requires verification

  • Severity: medium
  • Finding: Project evidence flags a maintenance risk. Review the linked source before relying on this workflow.
  • User impact: May increase setup, validation, or first-run risk for the user.
  • Recommended check: Reproduce the official install and quickstart path in an isolated environment.
  • Evidence: evidence.maintainer_signals | https://github.com/hegelai/prompttools

11. Security or permission risk: Security or permission risk requires verification

  • Severity: medium
  • Finding: no_demo
  • User impact: May increase setup, validation, or first-run risk for the user.
  • Recommended check: Reproduce the official install and quickstart path in an isolated environment.
  • Evidence: downstream_validation.risk_items | https://github.com/hegelai/prompttools

12. Security or permission risk: Security or permission risk requires verification

  • Severity: medium
  • Finding: no_demo
  • User impact: May increase setup, validation, or first-run risk for the user.
  • Recommended check: Reproduce the official install and quickstart path in an isolated environment.
  • Evidence: risks.scoring_risks | https://github.com/hegelai/prompttools

Source: Doramagic discovery, validation, and Project Pack records

Community Discussion Evidence

These external discussion links are review inputs, not standalone proof that the project is production-ready.

Sources 12

Count of project-level external discussion links exposed on this manual page.

Use Review before install

Open the linked issues or discussions before treating the pack as ready for your environment.

Community Discussion Evidence

Doramagic exposes project-level community discussion separately from official documentation. Review these links before using prompttools with real data or production workflows.

Source: Project Pack community evidence and pitfall evidence