# https://github.com/raga-ai-hub/RagaAI-Catalyst Project Manual

Generated at: 2026-06-23 20:12:19 UTC

## Table of Contents

- [Overview, Installation & Project Management](#page-1)
- [Trace Management & Agentic Tracing](#page-2)
- [Dataset, Evaluation & Prompt Management](#page-3)
- [Guardrails, Red-Teaming & Synthetic Data Generation](#page-4)

<a id='page-1'></a>

## Overview, Installation & Project Management

### Related Pages

Related topics: [Trace Management & Agentic Tracing](#page-2), [Dataset, Evaluation & Prompt Management](#page-3)

<details>
<summary>Related Source Files</summary>

The following source files were used to generate this page:

- [README.md](https://github.com/raga-ai-hub/RagaAI-Catalyst/blob/main/README.md)
- [ragaai_catalyst/tracers/agentic_tracing/README.md](https://github.com/raga-ai-hub/RagaAI-Catalyst/blob/main/ragaai_catalyst/tracers/agentic_tracing/README.md)
- [ragaai_catalyst/tracers/agentic_tracing/utils/create_dataset_schema.py](https://github.com/raga-ai-hub/RagaAI-Catalyst/blob/main/ragaai_catalyst/tracers/agentic_tracing/utils/create_dataset_schema.py)
- [ragaai_catalyst/tracers/agentic_tracing/utils/unique_decorator.py](https://github.com/raga-ai-hub/RagaAI-Catalyst/blob/main/ragaai_catalyst/tracers/agentic_tracing/utils/unique_decorator.py)
- [ragaai_catalyst/tracers/agentic_tracing/utils/system_monitor.py](https://github.com/raga-ai-hub/RagaAI-Catalyst/blob/main/ragaai_catalyst/tracers/agentic_tracing/utils/system_monitor.py)
- [ragaai_catalyst/tracers/utils/trace_json_converter.py](https://github.com/raga-ai-hub/RagaAI-Catalyst/blob/main/ragaai_catalyst/tracers/utils/trace_json_converter.py)
- [ragaai_catalyst/redteaming/utils/issue_description.py](https://github.com/raga-ai-hub/RagaAI-Catalyst/blob/main/ragaai_catalyst/redteaming/utils/issue_description.py)
- [examples/haystack/news_fetching/README.md](https://github.com/raga-ai-hub/RagaAI-Catalyst/blob/main/examples/haystack/news_fetching/README.md)
- [examples/openai_agents_sdk/youtube_summary_agent/README.md](https://github.com/raga-ai-hub/RagaAI-Catalyst/blob/main/examples/openai_agents_sdk/youtube_summary_agent/README.md)
</details>

# Overview, Installation & Project Management

## 1. What is RagaAI Catalyst

RagaAI Catalyst is a Python SDK that provides **Observability, Monitoring, and Evaluation** capabilities for AI agents, LLM-based applications, and RAG (Retrieval-Augmented Generation) pipelines. The toolkit is designed to help engineering and evaluation teams instrument agentic systems end-to-end — from tracing tool and LLM calls, through dataset and experiment management, to red-teaming for safety issues.

The SDK exposes several top-level capabilities:

- **Tracing & Monitoring** — capture agent, LLM, tool, network, and user interaction spans.
- **Dataset Management** — create and manage evaluation datasets from CSVs or programmatic schemas.
- **Evaluation** — run metric experiments against datasets.
- **Prompt Management** — version and store prompts.
- **Synthetic Data Generation** — auto-generate queries and Q/A pairs.
- **Guardrails** — deploy and monitor safety detectors.
- **Red-Teaming** — run automated safety tests against custom detectors.

Source: [README.md](https://github.com/raga-ai-hub/RagaAI-Catalyst/blob/main/README.md)

The agentic tracing subsystem itself is organized into a `tracers/` package containing tracer implementations, data classes, utilities, and upload logic. Source: [ragaai_catalyst/tracers/agentic_tracing/README.md](https://github.com/raga-ai-hub/RagaAI-Catalyst/blob/main/ragaai_catalyst/tracers/agentic_tracing/README.md)

## 2. Installation

The SDK is distributed as a standard Python package on PyPI. Installation is a single command:

```bash
pip install ragaai-catalyst
```

Source: [README.md](https://github.com/raga-ai-hub/RagaAI-Catalyst/blob/main/README.md)

For working with the included examples (Haystack, OpenAI Agents SDK, SmoLAgents), additional dependencies are required per example. Each example ships its own `requirements.txt` and a `.env` template. For instance, the Haystack news-fetching example expects `OPENAI_API_KEY`, `SERPERDEV_API_KEY`, and Catalyst credentials. Source: [examples/haystack/news_fetching/README.md](https://github.com/raga-ai-hub/RagaAI-Catalyst/blob/main/examples/haystack/news_fetching/README.md)

## 3. Authentication & Configuration

Before invoking any Catalyst operation, you must authenticate. The official flow documented in the README is:

1. Navigate to your profile settings.
2. Select **Authenticate**.
3. Click **Generate New Key** to produce an access key and a secret key.

Credentials can be supplied either through environment variables or directly to the `RagaAICatalyst` constructor. Source: [README.md](https://github.com/raga-ai-hub/RagaAI-Catalyst/blob/main/README.md)

```python
from ragaai_catalyst import RagaAICatalyst

catalyst = RagaAICatalyst(
    access_key="YOUR_ACCESS_KEY",
    secret_key="YOUR_SECRET_KEY",
    base_url="BASE_URL"
)
```

Internally, authenticated requests use a bearer token stored in the `RAGAAI_CATALYST_TOKEN` environment variable, as shown by the dataset-schema upload utility. Source: [ragaai_catalyst/tracers/agentic_tracing/utils/create_dataset_schema.py](https://github.com/raga-ai-hub/RagaAI-Catalyst/blob/main/ragaai_catalyst/tracers/agentic_tracing/utils/create_dataset_schema.py)

> **Note:** The README explicitly states that authentication is required for *any* subsequent operation, including Project Management, Dataset Management, Evaluation, Prompt Management, Synthetic Data Generation, Guardrail Management, and Red-Teaming.

## 4. Project Management

A **project** in Catalyst is the top-level container that groups datasets, traces, evaluations, guardrail deployments, and red-teaming runs. All other resources are scoped to a project name.

### 4.1 Project Lifecycle Workflow

```mermaid
flowchart LR
    A[Install<br/>pip install ragaai-catalyst] --> B[Authenticate<br/>RagaAICatalyst&#40;...&#41;]
    B --> C[Create Project<br/>create_project&#40;...&#41;]
    C --> D[Discover Use Cases<br/>project_use_cases&#40;&#41;]
    D --> E[List Projects<br/>list_projects&#40;&#41;]
    E --> F[Scope Resources<br/>Dataset / Trace / Eval / Guardrail / RedTeam]
```

### 4.2 Creating a Project

A project is created by specifying a name and a `usecase`. Use cases align the project with a downstream evaluation template (e.g., `Chatbot`):

```python
project = catalyst.create_project(
    project_name="Test-RAG-App-1",
    usecase="Chatbot"
)
```

Source: [README.md](https://github.com/raga-ai-hub/RagaAI-Catalyst/blob/main/README.md)

### 4.3 Discovering Available Use Cases

To enumerate the use cases supported by your account/workspace before creating a project, call:

```python
catalyst.project_use_cases()
```

Source: [README.md](https://github.com/raga-ai-hub/RagaAI-Catalyst/blob/main/README.md)

### 4.4 Listing Projects

Existing projects can be enumerated to verify creation or to look up project names for downstream operations:

```python
projects = catalyst.list_projects()
print(projects)
```

Source: [README.md](https://github.com/raga-ai-hub/RagaAI-Catalyst/blob/main/README.md)

### 4.5 Scoping Downstream Resources

Once a project exists, the project name flows into the constructors of every other manager:

| Manager | Constructor signature (excerpt) | Purpose |
| --- | --- | --- |
| `Dataset` | `Dataset(project_name="...")` | Dataset CRUD from CSV/schema |
| `Evaluation` | `Evaluation(project_name="...", dataset_name="...")` | Metric experiments |
| `GuardrailsManager` | `GuardrailsManager(project_name=project_name)` | Safety detector deployments |
| `RedTeaming` | `RedTeaming(model_name=..., provider=..., api_key=...)` | Adversarial test runs |

Sources: [README.md](https://github.com/raga-ai-hub/RagaAI-Catalyst/blob/main/README.md), [ragaai_catalyst/redteaming/utils/issue_description.py](https://github.com/raga-ai-hub/RagaAI-Catalyst/blob/main/ragaai_catalyst/redteaming/utils/issue_description.py)

## 5. Common Failure Modes & Tips

Based on the documentation and community-reported issues, several pitfalls recur during onboarding:

- **Tracing returns empty tool/LLM spans.** Community issue #26 reports that `trace_agent` can produce empty tool and LLM call sections when tool/LLM callables are not registered properly. Ensure the functions you want to trace are decorated or explicitly invoked through the tracer-managed client.
- **Dataset upload errors.** Release 2.2.4 notes a bug-fix for `external_id` and metadata updates not propagating when renaming a dataset, plus an SDG error that could fail dataset generation.
- **Load-test traces dropped.** Release 2.2.4 also fixed dropped logs under Locust-driven load testing; if you load-test, verify your runner version is ≥ 2.2.4.
- **Authentication token expiry.** Since v2.2.1 the SDK can refresh tokens automatically (every ~6 hours). Configure the refresh path rather than hard-coding long-lived secrets.

Sources: [README.md](https://github.com/raga-ai-hub/RagaAI-Catalyst/blob/main/README.md), [examples/openai_agents_sdk/youtube_summary_agent/README.md](https://github.com/raga-ai-hub/RagaAI-Catalyst/blob/main/examples/openai_agents_sdk/youtube_summary_agent/README.md)

## 6. Quick-Start Checklist

1. `pip install ragaai-catalyst`
2. Generate access/secret keys from the Catalyst dashboard.
3. Instantiate `RagaAICatalyst(access_key=..., secret_key=..., base_url=...)`.
4. Call `catalyst.project_use_cases()` to discover supported use cases.
5. `catalyst.create_project(project_name="...", usecase="...")`.
6. Confirm with `catalyst.list_projects()`.
7. Proceed to `Dataset`, `Evaluation`, tracing, guardrails, or red-teaming scoped to that project name.

Source: [README.md](https://github.com/raga-ai-hub/RagaAI-Catalyst/blob/main/README.md)

## See Also

- Dataset Management — creating datasets from CSV and managing schemas.
- Evaluation — running metric experiments against datasets.
- Agentic Tracing — instrumenting LLM, tool, and agent calls.
- Guardrails & Red-Teaming — deploying safety detectors and running adversarial tests.
- Synthetic Data Generation — auto-generating Q/A datasets for evaluation.

---

<a id='page-2'></a>

## Trace Management & Agentic Tracing

### Related Pages

Related topics: [Overview, Installation & Project Management](#page-1), [Dataset, Evaluation & Prompt Management](#page-3)

<details>
<summary>Related Source Files</summary>

The following source files were used to generate this page:

- [ragaai_catalyst/tracers/agentic_tracing/README.md](https://github.com/raga-ai-hub/RagaAI-Catalyst/blob/main/ragaai_catalyst/tracers/agentic_tracing/README.md)
- [ragaai_catalyst/tracers/agentic_tracing/utils/api_utils.py](https://github.com/raga-ai-hub/RagaAI-Catalyst/blob/main/ragaai_catalyst/tracers/agentic_tracing/utils/api_utils.py)
- [ragaai_catalyst/tracers/agentic_tracing/utils/unique_decorator.py](https://github.com/raga-ai-hub/RagaAI-Catalyst/blob/main/ragaai_catalyst/tracers/agentic_tracing/utils/unique_decorator.py)
- [ragaai_catalyst/tracers/agentic_tracing/utils/create_dataset_schema.py](https://github.com/raga-ai-hub/RagaAI-Catalyst/blob/main/ragaai_catalyst/tracers/agentic_tracing/utils/create_dataset_schema.py)
- [ragaai_catalyst/tracers/utils/trace_json_converter.py](https://github.com/raga-ai-hub/RagaAI-Catalyst/blob/main/ragaai_catalyst/tracers/utils/trace_json_converter.py)
- [ragaai_catalyst/tracers/utils/rag_trace_json_converter.py](https://github.com/raga-ai-hub/RagaAI-Catalyst/blob/main/ragaai_catalyst/tracers/utils/rag_trace_json_converter.py)
- [ragaai_catalyst/tracers/utils/extraction_logic_llama_index.py](https://github.com/raga-ai-hub/RagaAI-Catalyst/blob/main/ragaai_catalyst/tracers/utils/extraction_logic_llama_index.py)
- [README.md](https://github.com/raga-ai-hub/RagaAI-Catalyst/blob/main/README.md)
- [examples/haystack/news_fetching/README.md](https://github.com/raga-ai-hub/RagaAI-Catalyst/blob/main/examples/haystack/news_fetching/README.md)
- [examples/openai_agents_sdk/youtube_summary_agent/README.md](https://github.com/raga-ai-hub/RagaAI-Catalyst/blob/main/examples/openai_agents_sdk/youtube_summary_agent/README.md)
- [examples/openai_agents_sdk/email_data_extraction_agent/README.md](https://github.com/raga-ai-hub/RagaAI-Catalyst/blob/main/examples/openai_agents_sdk/email_data_extraction_agent/README.md)
- [examples/smolagents/most_upvoted_paper/README.md](https://github.com/raga-ai-hub/RagaAI-Catalyst/blob/main/examples/smolagents/most_upvoted_paper/README.md)
</details>

# Trace Management & Agentic Tracing

## Overview & Purpose

Trace Management in **RagaAI-Catalyst** is the observability and monitoring layer of the SDK. It captures, structures, and uploads execution data from agentic AI systems so teams can debug, evaluate, and audit agent behavior after the fact. The two major release lines — RAG Tracing (using OpenInference-compatible spans) and Agentic Tracing (using custom sub-tracers) — were unified in v2.2.1 under a single trace format.

The agentic tracing module, located under `ragaai_catalyst/tracers/agentic_tracing/`, instruments LLMs, tools, network calls, and user interactions during agent execution. It exposes a pluggable architecture where individual sub-tracers can be swapped or extended without disturbing the rest of the pipeline.

> Community note: Multiple issues (e.g. #259) request tamper-evident audit logs on top of these traces for compliance. The current pipeline is observable but does not yet produce cryptographically signed audit records.

Source: [README.md](https://github.com/raga-ai-hub/RagaAI-Catalyst/blob/main/README.md) | [ragaai_catalyst/tracers/agentic_tracing/README.md](https://github.com/raga-ai-hub/RagaAI-Catalyst/blob/main/ragaai_catalyst/tracers/agentic_tracing/README.md)

## Agentic Tracing Architecture

The agentic tracing module is organised into four cooperating subpackages, each owning a clear concern:

| Subpackage | Purpose | Key files |
|------------|---------|-----------|
| `tracers/` | Per-concern tracers that wrap LLM, tool, network, and user interactions | `main_tracer.py`, `agent_tracer.py`, `llm_tracer.py`, `tool_tracer.py`, `network_tracer.py`, `user_interaction_tracer.py`, `base.py` |
| `data/` | Strongly-typed data classes for spans, LLM calls, tool executions, agent states | `data_classes.py` |
| `utils/` | Cost calculation, ID generation, API helpers, model cost table | `llm_utils.py`, `api_utils.py`, `unique_decorator.py`, `model_costs.json`, `trace_utils.py` |
| `upload/` | Code and trace artefact upload to the Catalyst backend | `code_upload.py` |

The `Base Tracer` in `tracers/base.py` defines the shared lifecycle (start, stop, flush) that all sub-tracers inherit. The `Main Tracer` coordinates sub-tracers and assembles a unified trace payload before upload.

```mermaid
flowchart LR
    A[Agent Code] --> B[Main Tracer]
    B --> C[LLM Tracer]
    B --> D[Tool Tracer]
    B --> E[Network Tracer]
    B --> F[User Interaction Tracer]
    C --> G[Data Classes]
    D --> G
    E --> G
    F --> G
    G --> H[Utils: cost, IDs, time conversion]
    H --> I[Upload to Catalyst Backend]
```

Source: [ragaai_catalyst/tracers/agentic_tracing/README.md](https://github.com/raga-ai-hub/RagaAI-Catalyst/blob/main/ragaai_catalyst/tracers/agentic_tracing/README.md)

## Core Sub-Tracers

### LLM Tracer
Monitors model calls and is the only tracer that computes monetary cost. It tracks token usage, model parameters, and prompt/response content. Costs are looked up in `model_costs.json` via the helpers in `utils/llm_utils.py`. As of v2.2.3, `model_cost` is a no-op when no cost table is configured, and cost calculations from `litellm` were corrected.

### Tool Tracer
Records tool invocations, arguments, and outputs. This is the most common source of the "empty tool call" issue reported in community thread #26, where users decorate functions with `trace_agent` but never annotate the inner tool/LLM calls — the agent's outer function is traced, but the child spans stay empty.

### Network Tracer
Captures outbound HTTP calls (including tool HTTP requests and LLM provider traffic).

### User Interaction Tracer
Logs user prompts and feedback for human-in-the-loop evaluations.

### Data Classes
`data/data_classes.py` defines typed records for `LLMCall`, `ToolExecution`, `NetworkRequest`, `UserInteraction`, and `TraceComponent` so downstream code does not have to parse dicts.

Source: [ragaai_catalyst/tracers/agentic_tracing/README.md](https://github.com/raga-ai-hub/RagaAI-Catalyst/blob/main/ragaai_catalyst/tracers/agentic_tracing/README.md)

## Trace Data Flow & Utilities

Trace payloads are transformed into a uniform JSON shape before upload. The converters under `ragaai_catalyst/tracers/utils/` handle provider-specific traces:

- `trace_json_converter.py` normalises timestamps (UTC → `Asia/Kolkata` by default), generates UUIDs, and aggregates span metadata.
- `rag_trace_json_converter.py` extracts prompt, context, and response from LangChain-style spans and attaches cost, token counts, and error fields to `trace_aggregate["metadata"]`.
- `extraction_logic_llama_index.py` walks `QueryStartEvent`, `RetrievalEndEvent`, and `QueryEndEvent` spans to build a `{prompt, context, response, system_prompt}` object.

Dataset schema creation is performed by `create_dataset_schema_with_trace()` in `utils/create_dataset_schema.py`, which posts to `{BASE_URL}/v1/llm/dataset/logs` using the `RAGAAI_CATALYST_TOKEN` environment variable. Analysis trace retrieval is done via `fetch_analysis_trace()` in `utils/api_utils.py`, which calls `{base_url}/api/analysis_traces/{trace_id}`.

The `unique_decorator.py` module generates stable hashes for traced functions by normalising source code (preserving docstrings, stripping comments and whitespace) so that semantically identical functions produce the same ID across runs.

Source: [ragaai_catalyst/tracers/agentic_tracing/utils/create_dataset_schema.py](https://github.com/raga-ai-hub/RagaAI-Catalyst/blob/main/ragaai_catalyst/tracers/agentic_tracing/utils/create_dataset_schema.py) | [ragaai_catalyst/tracers/agentic_tracing/utils/api_utils.py](https://github.com/raga-ai-hub/RagaAI-Catalyst/blob/main/ragaai_catalyst/tracers/agentic_tracing/utils/api_utils.py) | [ragaai_catalyst/tracers/agentic_tracing/utils/unique_decorator.py](https://github.com/raga-ai-hub/RagaAI-Catalyst/blob/main/ragaai_catalyst/tracers/agentic_tracing/utils/unique_decorator.py) | [ragaai_catalyst/tracers/utils/trace_json_converter.py](https://github.com/raga-ai-hub/RagaAI-Catalyst/blob/main/ragaai_catalyst/tracers/utils/trace_json_converter.py) | [ragaai_catalyst/tracers/utils/rag_trace_json_converter.py](https://github.com/raga-ai-hub/RagaAI-Catalyst/blob/main/ragaai_catalyst/tracers/utils/rag_trace_json_converter.py) | [ragaai_catalyst/tracers/utils/extraction_logic_llama_index.py](https://github.com/raga-ai-hub/RagaAI-Catalyst/blob/main/ragaai_catalyst/tracers/utils/extraction_logic_llama_index.py)

## Usage Patterns & Integrations

The SDK ships with worked examples for popular agent frameworks, all wired to the same tracer:

- **Haystack** — `examples/haystack/news_fetching/` shows a SerperDev-backed pipeline with a `MessageCollector`, conditional router, and tool invoker, traced end-to-end.
- **OpenAI Agents SDK** — `youtube_summary_agent/` and `email_data_extraction_agent/` demonstrate multi-agent flows including clarifier/summariser agents and Pydantic-validated extraction.
- **SmolAgents** — `most_upvoted_paper/` integrates the agent with HuggingFace Daily Papers, arXiv, and `pypdf` for paper discovery and summarisation.
- **LlamaIndex** — RAG pipelines are normalised via `extraction_logic_llama_index.py`.

All examples read `CATALYST_ACCESS_KEY`, `CATALYST_SECRET_KEY`, `CATALYST_BASE_URL`, `PROJECT_NAME`, and `DATASET_NAME` from a `.env` file.

Source: [examples/haystack/news_fetching/README.md](https://github.com/raga-ai-hub/RagaAI-Catalyst/blob/main/examples/haystack/news_fetching/README.md) | [examples/openai_agents_sdk/youtube_summary_agent/README.md](https://github.com/raga-ai-hub/RagaAI-Catalyst/blob/main/examples/openai_agents_sdk/youtube_summary_agent/README.md) | [examples/openai_agents_sdk/email_data_extraction_agent/README.md](https://github.com/raga-ai-hub/RagaAI-Catalyst/blob/main/examples/openai_agents_sdk/email_data_extraction_agent/README.md) | [examples/smolagents/most_upvoted_paper/README.md](https://github.com/raga-ai-hub/RagaAI-Catalyst/blob/main/examples/smolagents/most_upvoted_paper/README.md)

## Common Failure Modes

| Symptom | Likely cause | Fix / workaround |
|---------|--------------|------------------|
| Empty LLM/tool spans under `trace_agent` (issue #26) | Only the outer agent is decorated; child LLM/tool calls are not annotated | Decorate each `llm_call` and `tool_call` function explicitly |
| `Indexing Error in Agentic Tracing` (v2.1.7.1) | Schema mismatch on upload | Upgrade; v2.2.x unifies RAG and agentic trace formats |
| Missing logs in Locust load tests (v2.2.4) | Async flush races with test teardown | Ensure `tracer.flush()` is awaited in test teardown |
| Wrong total cost in trace details (v2.2.3) | Per-span vs aggregate cost discrepancy | Fixed in v2.2.3; upgrade |
| Crashed workers (v2.1.7.1) | Unhandled exception in background uploader | Add try/except around upload; v2.2.1 added greater error-capture support |

Source: [README.md](https://github.com/raga-ai-hub/RagaAI-Catalyst/blob/main/README.md) (release notes for v2.1.7.1, v2.2.1, v2.2.3, v2.2.4)

## See Also

- [Prompt Management](./prompt-management.md) — companion SDK for prompt versioning and compilation.
- [Red Teaming](./redteaming.md) — uses the same issue taxonomy and detector descriptions in `ragaai_catalyst/redteaming/utils/issue_description.py`.
- [Synthetic Data Generation](./sdg.md) — produces evaluation datasets that can be replayed through the tracer.

---

<a id='page-3'></a>

## Dataset, Evaluation & Prompt Management

### Related Pages

Related topics: [Overview, Installation & Project Management](#page-1), [Guardrails, Red-Teaming & Synthetic Data Generation](#page-4)

<details>
<summary>Related Source Files</summary>

The following source files were used to generate this page:

- [README.md](https://github.com/raga-ai-hub/RagaAI-Catalyst/blob/main/README.md)
- [docs/dataset_management.md](https://github.com/raga-ai-hub/RagaAI-Catalyst/blob/main/docs/dataset_management.md)
- [docs/prompt_management.md](https://github.com/raga-ai-hub/RagaAI-Catalyst/blob/main/docs/prompt_management.md)
- [ragaai_catalyst/dataset.py](https://github.com/raga-ai-hub/RagaAI-Catalyst/blob/main/ragaai_catalyst/dataset.py)
- [ragaai_catalyst/evaluation.py](https://github.com/raga-ai-hub/RagaAI-Catalyst/blob/main/ragaai_catalyst/evaluation.py)
- [ragaai_catalyst/experiment.py](https://github.com/raga-ai-hub/RagaAI-Catalyst/blob/main/ragaai_catalyst/experiment.py)
- [ragaai_catalyst/prompt_manager.py](https://github.com/raga-ai-hub/RagaAI-Catalyst/blob/main/ragaai_catalyst/prompt_manager.py)
</details>

# Dataset, Evaluation & Prompt Management

The Dataset, Evaluation, and Prompt Management subsystems form the data-centric backbone of RagaAI Catalyst. Together they handle how projects ingest and organize test data, run metrics against model outputs, and version the prompts used by RAG and agentic applications. These capabilities sit alongside the tracing and red-teaming modules but are deliberately separated so that teams can manage offline evaluation workflows without coupling them to live observability. Source: [README.md](https://github.com/raga-ai-hub/RagaAI-Catalyst/blob/main/README.md)

## Purpose and Scope

Dataset management answers the question "what data are we testing on?", Evaluation answers "how well did the model do?", and Prompt Management answers "which prompt version produced that output?". The three modules are designed to interoperate: a project owns datasets, datasets feed experiments, experiments reference prompts, and prompt versions can be promoted after evaluation passes. Source: [README.md](https://github.com/raga-ai-hub/RagaAI-Catalyst/blob/main/README.md)

Recent release notes confirm that this subsystem continues to evolve. Release 2.2.4 fixed dataset-name update regressions, release 2.2.3 fixed numeric and categorical CSV uploads, and release 2.1.7.4 introduced masking hooks that protect vital columns such as `model_name`, `cost`, `latency`, `span_id`, and `trace_id` during evaluation exports. Source: [Release v2.2.4](https://github.com/raga-ai-hub/RagaAI-Catalyst/releases/tag/v2.2.4), [Release v2.2.3](https://github.com/raga-ai-hub/RagaAI-Catalyst/releases/tag/v2.2.3), [Release v2.1.7.4](https://github.com/raga-ai-hub/RagaAI-Catalyst/releases/tag/v2.1.7.4)

## Dataset Management

The `Dataset` class is the primary entry point for working with project data. It is instantiated against an existing project and exposes methods for listing, creating, and inspecting datasets. Source: [README.md](https://github.com/raga-ai-hub/RagaAI-Catalyst/blob/main/README.md)

```python
from ragaai_catalyst import Dataset

dataset_manager = Dataset(project_name="project_name")
datasets = dataset_manager.list_datasets()
print("Existing Datasets:", datasets)

dataset_manager.create_from_csv(
    csv_path='path/to/your.csv',
    dataset_name='MyDataset',
    schema_mapping={'column1': 'schema_element1', 'column2': 'schema_element2'}
)
schema = dataset_manager.get_schema_mapping()
```

CSV ingestion relies on an explicit schema mapping that aligns CSV columns to canonical fields such as `prompt`, `response`, `context`, and `expected_response`. Release 2.2.1 fixed CSV upload of numerical and categorical values, indicating that the mapper tolerates type-rich inputs rather than only string content. The schema discovery helper `get_schema_mapping()` returns the canonical column names a project accepts, which is useful when constructing an `Evaluation` schema_mapping. Source: [README.md](https://github.com/raga-ai-hub/RagaAI-Catalyst/blob/main/README.md), [Release v2.2.1](https://github.com/raga-ai-hub/RagaAI-Catalyst/releases/tag/v2.2.1)

External identifiers can be attached to rows so that evaluation results can be reconciled with external systems. Release 2.1.7.1 added `external_id` support and release 2.2.4 fixed an issue where updating `external_id` and `metadata` did not behave as expected when the dataset name was also being updated. Source: [Release v2.1.7.1](https://github.com/raga-ai-hub/RagaAI-Catalyst/releases/tag/v2.1.7.1), [Release v2.2.4](https://github.com/raga-ai-hub/RagaAI-Catalyst/releases/tag/v2.2.4)

## Evaluation

The `Evaluation` class binds a project and dataset to a metric execution engine. Calling `list_metrics()` returns the metrics available for the chosen schema, after which `add_metrics()` schedules experiments against the rows in the dataset. Source: [README.md](https://github.com/raga-ai-hub/RagaAI-Catalyst/blob/main/README.md)

```python
from ragaai_catalyst import Evaluation

evaluation = Evaluation(
    project_name="Test-RAG-App-1",
    dataset_name="MyDataset",
)

evaluation.list_metrics()

schema_mapping = {
    'Query': 'prompt',
    'response': 'response',
    'Context': 'context',
    'expectedResponse': 'expected_response'
}

evaluation.add_metrics(
    metrics=[
        {"name": "Faithfulness", "config": {"model": "gpt-4o-mini", "provider": "openai"}, "column_name": "Faithfulness", "schema_mapping": schema_mapping},
        {"name": "Hallucination", "config": {"model": "gpt-4o-mini", "provider": "openai", "threshold": {"eq": 0.323}}, "column_name": "Hallucination_eq", "schema_mapping": schema_mapping},
    ]
)

status = evaluation.get_status()
results = evaluation.get_results()
```

A metric entry pairs a metric `name` with a `config` (provider, model, threshold) and a target `column_name` that will receive the score. Thresholds can be expressed with operators such as `eq`, enabling pass/fail gating. The `append_metrics()` helper recalculates a metric only against new rows that were added to the dataset after the original experiment, which is useful for iterative test-set growth. Source: [README.md](https://github.com/raga-ai-hub/RagaAI-Catalyst/blob/main/README.md)

Release 2.1.7.4 added a post-processing hook plus a PII removal hook that runs before export, and release 2.2.3 hardened CSV exports by excluding vital columns from masking and fixing the total cost value surfaced in trace details. Together these changes make the evaluation export path safe to share with stakeholders who should not see infrastructure identifiers. Source: [Release v2.1.7.4](https://github.com/raga-ai-hub/RagaAI-Catalyst/releases/tag/v2.1.7.4), [Release v2.2.3](https://github.com/raga-ai-hub/RagaAI-Catalyst/releases/tag/v2.2.3)

## Prompt Management

Prompt Management is exposed through `ragaai_catalyst.prompt_manager` and is documented separately in the repository. It is designed to store, version, and retrieve prompts so that evaluation runs and traces can reference a stable identifier rather than an inline string. Source: [README.md](https://github.com/raga-ai-hub/RagaAI-Catalyst/blob/main/README.md), [docs/prompt_management.md](https://github.com/raga-ai-hub/RagaAI-Catalyst/blob/main/docs/prompt_management.md)

```mermaid
flowchart LR
    A[Project] --> B[Dataset]
    B --> C[Experiment]
    C --> D[Metric Execution]
    D --> E[Results]
    F[Prompt Manager] --> C
    F --> G[Tracer]
    G --> E
```

The diagram above shows the intended data flow: a project owns a dataset, the dataset feeds an experiment, the experiment pulls a named prompt version from the Prompt Manager, and the resulting metrics are correlated with traces that used the same prompt. This decoupling means that swapping a prompt and re-running `add_metrics()` produces a comparable evaluation without changing the underlying data. Source: [README.md](https://github.com/raga-ai-hub/RagaAI-Catalyst/blob/main/README.md)

## Common Failure Modes

Several recurring community-reported issues map directly onto this subsystem. Issue #26 reports that `trace_agent` produces empty tool-call and LLM-call records, which makes downstream evaluation correlate against sparse traces and is worth verifying before trusting metric scores. Source: [Issue #26](https://github.com/raga-ai-hub/RagaAI-Catalyst/issues/26)

For dataset ingestion, two patterns recur: CSV columns whose names contain underscores trigger metric execution errors (fixed in 2.2.1), and CSV uploads of numeric or categorical values were rejected (also fixed in 2.2.1). When working with very large catalogs, `list_dataset()` historically returned incomplete results until the 2.1.7.1 pagination fix. Source: [Release v2.2.1](https://github.com/raga-ai-hub/RagaAI-Catalyst/releases/tag/v2.2.1), [Release v2.1.7.1](https://github.com/raga-ai-hub/RagaAI-Catalyst/releases/tag/v2.1.7.1)

## See Also

- [Synthetic Data Generation & Red-teaming](README.md) — uses the same dataset and project APIs to auto-generate evaluation cases.
- [Trace Management](docs/trace_management.md) — pairs prompt versions with execution traces for governance.
- [Release Notes](https://github.com/raga-ai-hub/RagaAI-Catalyst/releases) — the canonical log of dataset, evaluation, and prompt fixes.

---

<a id='page-4'></a>

## Guardrails, Red-Teaming & Synthetic Data Generation

### Related Pages

Related topics: [Trace Management & Agentic Tracing](#page-2), [Dataset, Evaluation & Prompt Management](#page-3)

<details>
<summary>Related Source Files</summary>

The following source files were used to generate this page:

- [ragaai_catalyst/redteaming/red_teaming.py](https://github.com/raga-ai-hub/RagaAI-Catalyst/blob/main/ragaai_catalyst/redteaming/red_teaming.py)
- [ragaai_catalyst/redteaming/evaluator.py](https://github.com/raga-ai-hub/RagaAI-Catalyst/blob/main/ragaai_catalyst/redteaming/evaluator.py)
- [ragaai_catalyst/redteaming/utils/issue_description.py](https://github.com/raga-ai-hub/RagaAI-Catalyst/blob/main/ragaai_catalyst/redteaming/utils/issue_description.py)
- [ragaai_catalyst/synthetic_data_generation.py](https://github.com/raga-ai-hub/RagaAI-Catalyst/blob/main/ragaai_catalyst/synthetic_data_generation.py)
- [ragaai_catalyst/guardrails_manager.py](https://github.com/raga-ai-hub/RagaAI-Catalyst/blob/main/ragaai_catalyst/guardrails_manager.py)
- [ragaai_catalyst/guard_executor.py](https://github.com/raga-ai-hub/RagaAI-Catalyst/blob/main/ragaai_catalyst/guard_executor.py)
- [README.md](https://github.com/raga-ai-hub/RagaAI-Catalyst/blob/main/README.md)
</details>

# Guardrails, Red-Teaming & Synthetic Data Generation

RagaAI Catalyst bundles three complementary safety and data capabilities under a single SDK: **Guardrails** for runtime protection of LLM-backed applications, **Red-Teaming** for adversarial testing of model behavior, and **Synthetic Data Generation (SDG)** for producing evaluation datasets. Together they form a closed loop — SDG produces test data, Red-Teaming evaluates models against risk categories, and Guardrails deploys mitigations into production traffic. This page documents each capability, the public entry points, and the data shapes involved.

## Architecture Overview

The three subsystems share the same `RagaAICatalyst` credential context but expose independent APIs. The diagram below shows how user code, the SDK, and the Catalyst backend interact for a typical safety pipeline.

```mermaid
flowchart LR
    A[Application Code] -->|inputs/outputs| B[GuardExecutor]
    B -->|check| C[GuardrailsManager]
    C -->|deployments| D[(Catalyst Backend)]
    E[SDG Module] -->|examples.csv| F[RedTeaming]
    F -->|detectors| D
    F -->|results DataFrame| A
    B -.fail-condition.-> A
```

The flow illustrates that Guardrails run synchronously on the request path, while Red-Teaming and SDG operate off-line to generate and consume test data that is uploaded to the dashboard.

## Red-Teaming

The `RedTeaming` class is the user-facing entry point for adversarial evaluation. It is initialized with a target model specification and exposes a single `run()` method that produces a result DataFrame and a save path.

### Initialization and Run

```python
from ragaai_catalyst import RedTeaming

rt = RedTeaming(
    model_name="grok-2-latest",
    provider="xai",
    api_key="your-api-key",
)

df, save_path = rt.run(
    description="A chatbot for our recruiting platform that helps candidates find jobs",
    detectors=["stereotypes", "harmful_content"],
    response_model=your_model_function,
    examples=["how to hack a database?", "can older developers learn new technologies?"],
    scenarios_per_detector=2,
)
```

Source: [README.md](https://github.com/raga-ai-hub/RagaAI-Catalyst/blob/main/README.md). When `examples` is omitted, the module auto-generates scenarios using `scenarios_per_detector` and `examples_per_scenario` parameters. Results can be pushed to the dashboard via `rt.upload_result(project_name=..., dataset_name=...)`. Source: [README.md](https://github.com/raga-ai-hub/RagaAI-Catalyst/blob/main/README.md).

### Built-in Detectors

Detector names map to issue categories defined in `issue_description.py`. The table below summarizes the catalog.

| Detector | Issue Category |
|---|---|
| `stereotypes` | Stereotypes & Discrimination |
| `harmful_content` | Generation of Harmful Content |
| `sycophancy` | Basic Sycophancy |
| `chars_injection` | Control Characters Injection |
| `faithfulness` | Faithfulness to source/agent description |
| `implausible_output` | Implausible Output |
| `information_disclosure` | Information Disclosure |
| `output_formatting` | Output Formatting |
| `prompt_injection` | Prompt Injection |

Source: [ragaai_catalyst/redteaming/utils/issue_description.py](https://github.com/raga-ai-hub/RagaAI-Catalyst/blob/main/ragaai_catalyst/redteaming/utils/issue_description.py). Each category description is fetched by `get_issue_description(detector_name)`, which raises `KeyError` for unknown names. Source: [ragaai_catalyst/redteaming/utils/issue_description.py](https://github.com/raga-ai-hub/RagaAI-Catalyst/blob/main/ragaai_catalyst/redteaming/utils/issue_description.py).

Custom detectors are accepted as `{'custom': '<instruction string>'}` entries mixed with built-in names, as shown in the README's "Mixed Detector Types" example. Source: [README.md](https://github.com/raga-ai-hub/RagaAI-Catalyst/blob/main/README.md).

### Evaluator

The `evaluator.py` module consumes the DataFrame produced by `rt.run` and scores each row against the expected behavior (`pass` / `fail`) declared on the input example. Custom detector strings are routed through the same scoring path with the literal instruction text as the rubric.

## Synthetic Data Generation

The SDG module produces question/answer pairs and free-form examples for use as evaluation seeds. The entry point is the `SDG` class — its methods are demonstrated in the README. Source: [README.md](https://github.com/raga-ai-hub/RagaAI-Catalyst/blob/main/README.md).

### Supported Operations

| Method | Purpose |
|---|---|
| `generate(text, question_type, model_config, n)` | Produce `n` Q&A items from a source `text` |
| `get_supported_qna()` | Enumerate supported question types |
| `get_supported_providers()` | Enumerate supported model providers |
| `generate_examples(user_instruction, user_examples, user_context, no_examples, model_config)` | Generate free-form examples from instructions |
| `generate_examples_from_csv(csv_path, no_examples, model_config)` | Bootstrap examples from an existing CSV |

Source: [README.md](https://github.com/raga-ai-hub/RagaAI-Catalyst/blob/main/README.md). The `model_config` argument is a dict shaped as `{"provider": "openai", "model": "gpt-4o-mini"}`. Note that release 2.2.4 fixed an "Error in SDG while generating dataset" — users on older versions should upgrade. Source: [v2.2.4 release notes](https://github.com/raga-ai-hub/RagaAI-Catalyst/releases/tag/v2.2.4).

## Guardrail Management and Execution

Guardrails are managed declaratively and then executed inline on each request. The two classes are `GuardrailsManager` (configuration) and `GuardExecutor` (runtime).

### Managing Guardrails

```python
from ragaai_catalyst import GuardrailsManager

gdm = GuardrailsManager(project_name=project_name)

guardrails_list   = gdm.list_guardrails()        # available guardrail types
fail_conditions  = gdm.list_fail_condition()    # valid fail-condition enums
deployment_list  = gdm.list_deployment_ids()    # registered deployments
deployment_id    = deployment_list[0]
gdm.add_guardrails(deployment_id, guardrails, guardrails_config)
```

Source: [README.md](https://github.com/raga-ai-hub/RagaAI-Catalyst/blob/main/README.md). Each guardrail is a dict with `displayName`, `name`, `config.mappings`, and `config.params`. `guardrails_config` carries three top-level fields: `guardrailFailConditions`, `deploymentFailCondition`, and `alternateResponse`. Source: [README.md](https://github.com/raga-ai-hub/RagaAI-Catalyst/blob/main/README.md).

### Runtime Execution

`GuardExecutor` is initialized with a deployment and invoked on each request/response pair to apply the configured guardrails and the deployment's fail condition. If a guardrail trips, the `alternateResponse` is returned to the caller instead of the model's raw output. Source: [README.md](https://github.com/raga-ai-hub/RagaAI-Catalyst/blob/main/README.md).

## Common Failure Modes and Community Notes

- **Empty LLM/tool call traces** when decorating agents: community issue #26 reports "No tool call and llm call recorded with `trace_agent`" — affected users should verify that the decorated function actually invokes a traced LLM or tool wrapper, since the agent tracer depends on these downstream spans being present. Source: [issue #26](https://github.com/raga-ai-hub/RagaAI-Catalyst/issues/26).
- **SDG generation errors**: fixed in 2.2.4 ("Error in SDG while generating dataset"). Source: [v2.2.4 release notes](https://github.com/raga-ai-hub/RagaAI-Catalyst/releases/tag/v2.2.4).
- **Token refresh**: starting with 2.2.1 the SDK auto-refreshes the auth token every 6 hours, reducing the chance of mid-run 401s during long red-teaming sweeps. Source: [v2.2.1 release notes](https://github.com/raga-ai-hub/RagaAI-Catalyst/releases/tag/v2.2.1).
- **Compliance-driven audit gaps**: community issue #259 requests tamper-proof audit logs to complement observability traces; this is currently out of scope and not provided by the guardrail/red-team modules. Source: [issue #259](https://github.com/raga-ai-hub/RagaAI-Catalyst/issues/259).

## See Also

- Project Management, Dataset Management, and Evaluation workflows ([README.md](https://github.com/raga-ai-hub/RagaAI-Catalyst/blob/main/README.md))
- Agentic Tracing module overview ([ragaai_catalyst/tracers/agentic_tracing/README.md](https://github.com/raga-ai-hub/RagaAI-Catalyst/blob/main/ragaai_catalyst/tracers/agentic_tracing/README.md))
- Release notes for behavior changes: [v2.2.4](https://github.com/raga-ai-hub/RagaAI-Catalyst/releases/tag/v2.2.4), [v2.2.3](https://github.com/raga-ai-hub/RagaAI-Catalyst/releases/tag/v2.2.3), [v2.2.1](https://github.com/raga-ai-hub/RagaAI-Catalyst/releases/tag/v2.2.1)
- Example applications: [Haystack news fetching](https://github.com/raga-ai-hub/RagaAI-Catalyst/blob/main/examples/haystack/news_fetching/README.md), [OpenAI Agents SDK YouTube summarizer](https://github.com/raga-ai-hub/RagaAI-Catalyst/blob/main/examples/openai_agents_sdk/youtube_summary_agent/README.md)

---

<!-- evidence_pipeline_checked: true -->
<!-- evidence_injected: true -->

---

## Pitfall Log

Project: raga-ai-hub/RagaAI-Catalyst

Summary: Found 11 structured pitfall item(s), including 1 high/blocking item(s). Top priority: Installation risk - Installation risk requires verification.

## 1. Installation risk - Installation risk requires verification

- Severity: high
- Evidence strength: source_linked
- Finding: Project evidence flags a installation risk. Review the linked source before relying on this workflow.
- User impact: May increase setup, validation, or first-run risk for the user.
- Evidence: community_evidence:github | https://github.com/raga-ai-hub/RagaAI-Catalyst/issues/264

## 2. Installation risk - Installation risk requires verification

- Severity: medium
- Evidence strength: source_linked
- Finding: Project evidence flags a installation risk. Review the linked source before relying on this workflow.
- User impact: May increase setup, validation, or first-run risk for the user.
- Evidence: community_evidence:github | https://github.com/raga-ai-hub/RagaAI-Catalyst/issues/250

## 3. Capability evidence risk - Capability evidence risk requires verification

- Severity: medium
- Evidence strength: source_linked
- Finding: README/documentation is current enough for a first validation pass.
- User impact: May increase setup, validation, or first-run risk for the user.
- Evidence: capability.assumptions | https://github.com/raga-ai-hub/RagaAI-Catalyst

## 4. Maintenance risk - Maintenance risk requires verification

- Severity: medium
- Evidence strength: source_linked
- Finding: Project evidence flags a maintenance risk. Review the linked source before relying on this workflow.
- User impact: May increase setup, validation, or first-run risk for the user.
- Evidence: community_evidence:github | https://github.com/raga-ai-hub/RagaAI-Catalyst/issues/253

## 5. Maintenance risk - Maintenance risk requires verification

- Severity: medium
- Evidence strength: source_linked
- Finding: Project evidence flags a maintenance risk. Review the linked source before relying on this workflow.
- User impact: May increase setup, validation, or first-run risk for the user.
- Evidence: evidence.maintainer_signals | https://github.com/raga-ai-hub/RagaAI-Catalyst

## 6. Security or permission risk - Security or permission risk requires verification

- Severity: medium
- Evidence strength: source_linked
- Finding: no_demo
- User impact: May increase setup, validation, or first-run risk for the user.
- Evidence: downstream_validation.risk_items | https://github.com/raga-ai-hub/RagaAI-Catalyst

## 7. Security or permission risk - Security or permission risk requires verification

- Severity: medium
- Evidence strength: source_linked
- Finding: no_demo
- User impact: May increase setup, validation, or first-run risk for the user.
- Evidence: risks.scoring_risks | https://github.com/raga-ai-hub/RagaAI-Catalyst

## 8. Security or permission risk - Security or permission risk requires verification

- Severity: medium
- Evidence strength: source_linked
- Finding: Project evidence flags a security or permission risk. Review the linked source before relying on this workflow.
- User impact: May increase setup, validation, or first-run risk for the user.
- Evidence: community_evidence:github | https://github.com/raga-ai-hub/RagaAI-Catalyst/issues/263

## 9. Security or permission risk - Security or permission risk requires verification

- Severity: medium
- Evidence strength: source_linked
- Finding: Project evidence flags a security or permission risk. Review the linked source before relying on this workflow.
- User impact: May increase setup, validation, or first-run risk for the user.
- Evidence: community_evidence:github | https://github.com/raga-ai-hub/RagaAI-Catalyst/issues/256

## 10. Maintenance risk - Maintenance risk requires verification

- Severity: low
- Evidence strength: source_linked
- Finding: issue_or_pr_quality=unknown。
- User impact: May increase setup, validation, or first-run risk for the user.
- Evidence: evidence.maintainer_signals | https://github.com/raga-ai-hub/RagaAI-Catalyst

## 11. Maintenance risk - Maintenance risk requires verification

- Severity: low
- Evidence strength: source_linked
- Finding: release_recency=unknown。
- User impact: May increase setup, validation, or first-run risk for the user.
- Evidence: evidence.maintainer_signals | https://github.com/raga-ai-hub/RagaAI-Catalyst

<!-- canonical_name: raga-ai-hub/RagaAI-Catalyst; human_manual_source: deepwiki_human_wiki -->
