# https://github.com/deepset-ai/haystack 项目说明书

生成时间：2026-05-15 20:17:22 UTC

## 目录

- [Introduction to Haystack](#introduction)
- [Pipeline Architecture](#pipeline-architecture)
- [Core Concepts](#core-concepts)
- [Pipeline Component Types](#component-types)
- [Data Processing Components](#data-processing)
- [LLM and Embedder Integrations](#llm-integrations)
- [Document Stores and Retrievers](#document-stores)
- [Agent Systems](#agents)
- [Development Guide](#development-guide)
- [Deployment and Infrastructure](#deployment)

<a id='introduction'></a>

## Introduction to Haystack

### 相关页面

相关主题：[Pipeline Architecture](#pipeline-architecture), [Core Concepts](#core-concepts)

<details>
<summary>Relevant Source Files</summary>

以下源码文件用于生成本页说明：

- [README.md](https://github.com/deepset-ai/haystack/blob/main/README.md)
- [docs-website/README.md](https://github.com/deepset-ai/haystack/blob/main/docs-website/README.md)
- [docker/README.md](https://github.com/deepset-ai/haystack/blob/main/docker/README.md)
- [pydoc/README.md](https://github.com/deepset-ai/haystack/blob/main/pydoc/README.md)
- [examples/README.md](https://github.com/deepset-ai/haystack/blob/main/examples/README.md)
</details>

# Introduction to Haystack

Haystack is an end-to-end LLM framework that enables developers to build applications powered by Large Language Models (LLMs), Transformer models, vector search, and more. The framework provides a flexible architecture for orchestrating state-of-the-art embedding models and LLMs into pipelines to solve real-world NLP use cases.

## What is Haystack?

Haystack is designed to facilitate the development of production-ready AI applications with a focus on **context engineering**—giving developers explicit control over how information is retrieved, ranked, filtered, combined, structured, and routed before it reaches the language model.

资料来源：[README.md:1]()()

### Core Capabilities

| Capability | Description |
|------------|-------------|
| **Retrieval-Augmented Generation (RAG)** | Combine vector search with LLMs for accurate, context-grounded responses |
| **Document Search** | Full-featured document indexing and semantic search |
| **Question Answering** | Extract answers from large document collections |
| **Pipeline Orchestration** | Build complex workflows with customizable components |
| **Agent Integration** | Deploy autonomous agents with tool-use capabilities |

资料来源：[docker/README.md:4-6]()()

## Architecture Overview

Haystack follows a component-based architecture where pipelines serve as the foundational building blocks. Pipelines connect various components including document stores, retrievers, readers, generators, and custom tools.

```mermaid
graph TD
    A[User Query] --> B[Pipeline]
    B --> C[Retrievers]
    B --> D[Document Stores]
    C --> E[Rankers]
    E --> F[LLM / Generator]
    F --> G[Response]
    
    H[Documents] --> D
    
    style F fill:#e1f5fe
    style D fill:#fff3e0
    style C fill:#e8f5e9
```

### Pipeline Components

Pipelines in Haystack are composed of interconnected nodes that process data sequentially or in parallel. Each component handles a specific stage of the document processing or inference workflow.

| Component Type | Function |
|----------------|----------|
| **DocumentStore** | Stores and indexes documents for retrieval |
| **Retriever** | Finds relevant documents from the store |
| **Ranker** | Reorders retrieved documents by relevance |
| **Reader/Generator** | Extracts answers or generates responses |
| **Preprocessor** | Cleans and splits documents before indexing |
| **Custom Nodes** | User-defined processing logic |

资料来源：[README.md:54-58]()()

## Key Features

### Built for Context Engineering

Haystack provides fine-grained control over the entire retrieval and generation pipeline. Developers can:

- Define custom retrieval strategies
- Implement multi-stage ranking pipelines
- Route queries to specialized processing branches
- Control how context is assembled before reaching the LLM

### Flexible Pipeline Design

The framework supports both declarative and programmatic pipeline construction, allowing developers to define workflows through configuration files or Python code.

```mermaid
graph LR
    A[Query Input] --> B[Retriever Node]
    B --> C[Ranker Node]
    C --> D[LLM Node]
    D --> E[Output]
    
    F[Documents] --> G[Document Store]
    G --> B
```

### Production-Ready Architecture

Haystack includes enterprise features such as:

- **Telemetry**: Anonymous usage statistics collection for component initialization tracking (opt-out available)
- **Container Support**: Docker images for consistent deployment environments
- **CI/CD Integration**: Automated testing with GitHub Actions workflows
- **Type Checking**: Full MyPy type annotation support

资料来源：[README.md:60-62]()()

## Installation

### Package Installation

The primary method for installing Haystack is via pip:

```bash
pip install haystack-ai
```

For testing pre-release features:

```bash
pip install --pre haystack-ai
```

资料来源：[README.md:28-34]()()

### Docker Installation

Haystack provides Docker images for containerized deployments:

| Image | Description |
|-------|-------------|
| `haystack:base-<version>` | Base image with Haystack preinstalled for derivation |

Multi-platform builds are supported for various architectures including `linux/arm64` and `linux/amd64`.

```bash
docker buildx bake base
```

资料来源：[docker/README.md:8-14]()()

## Documentation Structure

The Haystack documentation is hosted at [docs.haystack.deepset.ai](https://docs.haystack.deepset.ai) and organized into several sections:

| Section | Content |
|---------|---------|
| **Overview/Intro** | Getting started guides and project introduction |
| **Get Started** | Quick-start guide for building first LLM applications |
| **Tutorials** | Step-by-step learning paths |
| **Cookbook** | Pre-built recipes and example implementations |
| **API Reference** | Auto-generated documentation from docstrings |
| **Concepts** | Core architectural concepts and design patterns |

资料来源：[docs-website/README.md:1-8]()()

### Documentation Versioning

The documentation site supports multiple versions:

- **Next (Unreleased)**: Documentation for upcoming features
- **Current (Stable)**: Documentation for the latest stable release
- **Past Versions**: Archived documentation for previous releases

资料来源：[docs-website/src/pages/versions.js:1-25]()()

### API Reference Generation

The API reference pages are automatically generated from docstrings using [haystack-pydoc-tools](https://github.com/deepset-ai/haystack-pydoc-tools). A GitHub workflow regenerates the API reference when code changes are merged.

资料来源：[pydoc/README.md:1-12]()()

## Project Structure

```
haystack/
├── haystack/                    # Main package source code
├── docs-website/                # Docusaurus documentation site
│   ├── docs/                    # Main documentation content
│   ├── reference/               # Auto-generated API reference
│   └── versioned_docs/           # Versioned documentation snapshots
├── docker/                      # Docker image configurations
├── pydoc/                       # PyDoc configuration files
└── examples/                    # Example implementations
```

> **Note**: Example implementations have been moved to the [haystack-cookbook](https://github.com/deepset-ai/haystack-cookbook/) repository.

资料来源：[examples/README.md:1-5]()()

## Community and Contributing

Haystack is open to contributions from developers of all skill levels. There are multiple ways to contribute:

| Contribution Area | Repository |
|-------------------|------------|
| Core Framework | `deepset-ai/haystack` |
| Integrations | `deepset-ai/haystack-core-integrations` |
| Documentation | `deepset-ai/haystack/tree/main/docs-website` |

### Community Resources

- **GitHub Issues**: Bug reports and feature requests
- **GitHub Discussions**: General questions and community support
- **Discord**: Real-time community engagement
- **Stack Overflow**: Tagged questions at `haystack`
- **Twitter/X**: Updates and announcements

资料来源：[README.md:89-95]()()

## Organizations Using Haystack

Haystack is trusted by thousands of production AI teams across industries:

| Industry | Organizations |
|----------|---------------|
| **Technology & AI** | Apple, Meta, Databricks, NVIDIA, Intel |
| **Public Sector** | European Commission |

资料来源：[README.md:78-85]()()

## Licensing and Compliance

- **License**: Apache 2.0
- **Type Checking**: MyPy validated
- **Coverage**: Automated test coverage tracking
- **License Compliance**: Automated workflow verification

资料来源：[README.md:10-11]()()

## Summary

Haystack provides a comprehensive framework for building production-ready LLM applications with emphasis on retrieval-augmented generation, flexible pipeline design, and context engineering. The framework's component-based architecture enables developers to customize every stage of the document processing and inference pipeline while maintaining production-grade reliability through integrated testing, documentation, and deployment tooling.

With support for Docker containerization, comprehensive documentation, and an active open-source community, Haystack serves as a robust foundation for teams implementing enterprise AI solutions across diverse industries.

---

<a id='pipeline-architecture'></a>

## Pipeline Architecture

### 相关页面

相关主题：[Introduction to Haystack](#introduction), [Pipeline Component Types](#component-types), [Core Concepts](#core-concepts)

<details>
<summary>相关源码文件</summary>

以下源码文件用于生成本页说明：

- [docs-website/docs/concepts/pipelines.mdx](https://github.com/deepset-ai/haystack/blob/main/docs-website/docs/concepts/pipelines.mdx)
- [docs-website/docs/concepts/pipelines/asyncpipeline.mdx](https://github.com/deepset-ai/haystack/blob/main/docs-website/docs/concepts/pipelines/asyncpipeline.mdx)
- [docs-website/docs/concepts/pipelines/serialization.mdx](https://github.com/deepset-ai/haystack/blob/main/docs-website/docs/concepts/pipelines/serialization.mdx)
- [docs-website/docs/concepts/pipelines/debugging-pipelines.mdx](https://github.com/deepset-ai/haystack/blob/main/docs-website/docs/concepts/pipelines/debugging-pipelines.mdx)
- [docs-website/docs/concepts/pipelines/pipeline-breakpoints.mdx](https://github.com/deepset-ai/haystack/blob/main/docs-website/docs/concepts/pipelines/pipeline-breakpoints.mdx)
</details>

# Pipeline Architecture

## Overview

The Pipeline architecture is the foundational component of the Haystack framework, enabling developers to construct flexible, modular workflows for building LLM-powered applications. Pipelines orchestrate the execution of various components—including retrievers, readers, generators, and custom processors—into cohesive data processing flows.

Pipelines in Haystack 2.x provide a declarative approach to defining application workflows, allowing developers to:

- Connect multiple components in directed acyclic graphs (DAGs)
- Route data between components with explicit connections
- Handle both synchronous and asynchronous execution models
- Debug and inspect execution through breakpoints and hooks
- Persist and share pipeline configurations through serialization

资料来源：[docs-website/docs/concepts/pipelines.mdx](https://github.com/deepset-ai/haystack/blob/main/docs-website/docs/concepts/pipelines.mdx)

## Core Concepts

### Component Connections

Components in a Haystack Pipeline are connected through named input/output connections. Each component exposes specific input and output slots that define how data flows through the pipeline.

```mermaid
graph LR
    A[Document Store] -->|query results| B[Retriever]
    B -->|retrieved docs| C[Reader]
    C -->|answers| D[Output]
    
    style A fill:#e1f5fe
    style B fill:#fff3e0
    style C fill:#e8f5e9
    style D fill:#fce4ec
```

The connection model requires that:
- Output types must be compatible with target input types
- Components can have multiple inputs and outputs
- Connections form a directed graph structure

资料来源：[docs-website/docs/concepts/pipelines.mdx:1-20](https://github.com/deepset-ai/haystack/blob/main/docs-website/docs/concepts/pipelines.mdx)

### Pipeline Types

Haystack provides multiple pipeline implementations optimized for different use cases:

| Pipeline Type | Use Case | Execution Model |
|---------------|----------|-----------------|
| Standard Pipeline | General-purpose workflows | Synchronous |
| AsyncPipeline | High-throughput I/O operations | Asynchronous with `async/await` |
| SearchPipeline | Retrieval-focused workflows | Optimized for search |
| GenerativePipeline | LLM-centric applications | Optimized for generation |

资料来源：[docs-website/docs/concepts/pipelines.mdx](https://github.com/deepset-ai/haystack/blob/main/docs-website/docs/concepts/pipelines.mdx)

## AsyncPipeline

The AsyncPipeline extends the standard Pipeline with asynchronous execution capabilities, making it suitable for applications requiring high concurrency and non-blocking I/O operations.

### Key Features

- **Non-blocking execution**: Components can execute concurrently when dependencies are satisfied
- **Streaming support**: Better handling of streaming responses from LLMs
- **Resource efficiency**: Improved CPU and memory utilization for I/O-bound workloads

```python
async def run_async_pipeline(pipeline, query):
    result = await pipeline.run_async(query=query)
    return result
```

资料来源：[docs-website/docs/concepts/pipelines/asyncpipeline.mdx](https://github.com/deepset-ai/haystack/blob/main/docs-website/docs/concepts/pipelines/asyncpipeline.mdx)

### Execution Flow

```mermaid
graph TD
    A[Start] --> B{AsyncPipeline.run_async}
    B --> C[Execute Independent Components]
    C --> D{Wait for Dependencies?}
    D -->|No| E[Collect Results]
    D -->|Yes| F[Await Dependency]
    F --> E
    E --> G[Return Unified Result]
    
    style B fill:#bbdefb
    style C fill:#c8e6c9
    style G fill:#ffe0b2
```

## Serialization

Pipeline configurations can be serialized to YAML format, enabling:

- Persistence of pipeline definitions
- Sharing configurations across environments
- Version control for pipeline definitions
- Reproducible deployments

### Serialization Format

```yaml
version: '2.0'
components:
  - name: MyRetriever
    type: BM25Retriever
    init_parameters:
      document_store: MyDocumentStore
  - name: MyReader
    type: FARMReader
    init_parameters:
      model_name_or_path: deepset/roberta-base-squad2
edges: []
```

资料来源：[docs-website/docs/concepts/pipelines/serialization.mdx](https://github.com/deepset-ai/haystack/blob/main/docs-website/docs/concepts/pipelines/serialization.mdx)

### Loading Serialized Pipelines

```python
from haystack import Pipeline

# Load from YAML
pipeline = Pipeline.load_from_yaml(path="pipeline_config.yaml")
```

## Debugging Pipelines

Haystack provides comprehensive debugging capabilities to inspect and troubleshoot pipeline execution.

### Execution Tracing

The debugging system tracks:
- Component execution order
- Input/output data at each stage
- Execution timing and performance metrics
- Error locations and stack traces

```python
from haystack import Pipeline

pipeline = Pipeline()
pipeline.debug = True  # Enable debug mode
result = pipeline.run(query="What is Haystack?")
```

资料来源：[docs-website/docs/concepts/pipelines/debugging-pipelines.mdx](https://github.com/deepset-ai/haystack/blob/main/docs-website/docs/concepts/pipelines/debugging-pipelines.mdx)

### Pipeline Inspector

The Pipeline Inspector provides detailed visibility into:

| Inspection Target | Information Provided |
|-------------------|---------------------|
| Component Graph | Node and edge relationships |
| Data Flow | Input/output shapes and types |
| Execution State | Runtime values at breakpoints |
| Performance | Timing and memory profiles |

## Pipeline Breakpoints

Breakpoints allow execution to pause at specific points, enabling detailed inspection of intermediate results.

```mermaid
graph LR
    A[Pipeline Run] --> B{Breakpoint 1?}
    B -->|Yes| C[Pause & Inspect]
    C --> D{Continue?}
    D -->|Yes| E{Breakpoint 2?}
    D -->|No| Z[Abort]
    E -->|Yes| F[Pause & Inspect]
    E -->|No| G[Continue to End]
    B -->|No| E
    
    style C fill:#fff9c4
    style F fill:#fff9c4
    style Z fill:#ffcdd2
```

### Breakpoint Configuration

Breakpoints can be configured at:

- **Component level**: Pause before or after specific component execution
- **Connection level**: Inspect data flowing through specific connections
- **Condition level**: Pause only when certain conditions are met

资料来源：[docs-website/docs/concepts/pipelines/pipeline-breakpoints.mdx](https://github.com/deepset-ai/haystack/blob/main/docs-website/docs/concepts/pipelines/pipeline-breakpoints.mdx)

## Best Practices

### Pipeline Design

1. **Modularity**: Keep components focused on single responsibilities
2. **Clear naming**: Use descriptive names for components and connections
3. **Error handling**: Implement proper error handling at component boundaries
4. **Testing**: Unit test individual components before integration

### Performance Optimization

| Strategy | Description |
|----------|-------------|
| Caching | Enable caching for expensive operations |
| Batching | Use batch processing for multiple queries |
| Async execution | Prefer AsyncPipeline for I/O-bound workflows |
| Resource limits | Set appropriate timeouts and memory limits |

## Architecture Summary

```mermaid
graph TD
    subgraph "Pipeline Layer"
        A[Pipeline] --> B[AsyncPipeline]
        A --> C[SearchPipeline]
        A --> D[GenerativePipeline]
    end
    
    subgraph "Component Layer"
        E[Retrievers] --> A
        F[Readers] --> A
        G[Generators] --> A
        H[Custom Processors] --> A
    end
    
    subgraph "Data Layer"
        I[Document Stores] --> E
        J[Models] --> F
        J --> G
    end
    
    subgraph "Infrastructure"
        K[Serialization] -.-> A
        L[Debugging] -.-> A
        M[Breakpoints] -.-> A
    end
```

## Related Documentation

- [Components Overview](https://docs.haystack.deepset.ai/docs/intro)
- [Pipeline Components](https://docs.haystack.deepset.ai/docs/pipeline-components)
- [API Reference](https://docs.haystack.deepset.ai/reference/pipeline)
- [Cookbook Examples](https://haystack.deepset.ai/cookbook)

---

<a id='core-concepts'></a>

## Core Concepts

### 相关页面

相关主题：[Pipeline Architecture](#pipeline-architecture), [Pipeline Component Types](#component-types), [Introduction to Haystack](#introduction)

<details>
<summary>相关源码文件</summary>

以下源码文件用于生成本页说明：

- [README.md](https://github.com/deepset-ai/haystack/blob/main/README.md)
- [docs-website/README.md](https://github.com/deepset-ai/haystack/blob/main/docs-website/README.md)
- [pydoc/README.md](https://github.com/deepset-ai/haystack/blob/main/pydoc/README.md)
- [docker/README.md](https://github.com/deepset-ai/haystack/blob/main/docker/README.md)
- [examples/README.md](https://github.com/deepset-ai/haystack/blob/main/examples/README.md)
- [docs-website/src/theme/SearchBar.js](https://github.com/deepset-ai/haystack/blob/main/docs-website/src/theme/SearchBar.js)
- [docs-website/src/components/CopyDropdown/index.tsx](https://github.com/deepset-ai/haystack/blob/main/docs-website/src/components/CopyDropdown/index.tsx)
</details>

# Core Concepts

Haystack is an end-to-end LLM (Large Language Model) framework that enables developers to build applications powered by LLMs, Transformer models, vector search, and more. The framework orchestrates state-of-the-art embedding models and LLMs into pipelines to solve use cases such as retrieval-augmented generation (RAG), document search, question answering, and answer generation.

## What is Haystack?

Haystack provides a flexible architecture for designing systems with explicit control over how information is retrieved, ranked, filtered, combined, structured, and routed before it reaches the model. The framework allows developers to define pipelines and agent workflows where retrieval, memory, tools, and other components work together seamlessly.

资料来源：[README.md](https://github.com/deepset-ai/haystack/blob/main/README.md)

## Architecture Overview

Haystack's architecture is built around the concept of **pipelines** that orchestrate various components. These pipelines provide explicit control over the data flow from input to output, enabling developers to build complex LLM applications with fine-grained control.

```mermaid
graph TD
    A[Input Query] --> B[Pipeline]
    B --> C[Components]
    C --> D[Retrievers]
    C --> E[Rankers]
    C --> F[Memory]
    C --> G[Tools]
    D --> H[Document Store]
    E --> I[LLM]
    H --> J[Context Engineering]
    I --> K[Generated Response]
    J --> I
```

资料来源：[README.md](https://github.com/deepset-ai/haystack/blob/main/README.md)

## Installation

Haystack can be installed via pip using the main package:

```sh
pip install haystack-ai
```

For trying newest features, install nightly pre-releases:

```sh
pip install --pre haystack-ai
```

资料来源：[README.md](https://github.com/deepset-ai/haystack/blob/main/README.md)

## Docker Support

Haystack provides Docker images for containerized deployments. The base image `haystack:base-<version>` contains a working Python environment with Haystack preinstalled and is designed to be derived `FROM`.

Images are built with BuildKit and orchestrated using `bake`:

```sh
docker buildx bake base
```

Custom images can be built by overriding variables defined in the `docker-bake.hcl` file:

```sh
HAYSTACK_VERSION=mybranch_or_tag BASE_IMAGE_TAG_SUFFIX=latest docker buildx bake base --no-cache
```

资料来源：[docker/README.md](https://github.com/deepset-ai/haystack/blob/main/docker/README.md)

## Documentation System

Haystack maintains comprehensive documentation at [docs.haystack.deepset.ai](https://docs.haystack.deepset.ai). The documentation is built with Docusaurus 3 and provides guides, tutorials, API references, and best practices.

### Documentation Structure

| Directory | Purpose |
|-----------|---------|
| `docs/` | Main documentation (guides, tutorials, concepts) |
| `docs/concepts/` | Core Haystack concepts |
| `docs/pipeline-components/` | Component documentation |
| `reference/` | API reference (auto-generated) |
| `versioned_docs/` | Versioned copies of docs |
| `src/` | React components and custom code |

资料来源：[docs-website/README.md](https://github.com/deepset-ai/haystack/blob/main/docs-website/README.md)

### Versioning

Documentation versions are released alongside Haystack releases and are fully automated through GitHub workflows. The versioning process includes:

- `promote_unstable_docs.yml` - Automatically triggered during Haystack releases
- `minor_version_release.yml` - Creates new version directories and updates version configuration

资料来源：[docs-website/README.md](https://github.com/deepset-ai/haystack/blob/main/docs-website/README.md)

## API Reference

The API reference is generated from docstrings in the codebase using [haystack-pydoc-tools](https://github.com/deepset-ai/haystack-pydoc-tools). A GitHub workflow regenerates the API reference when code changes.

To add documentation for a new module:

1. Create a `.yml` file in the `pydoc` directory
2. Configure how haystack-pydoc-tools will generate the page
3. Commit to main

All API reference updates are initially deployed to unstable docs and promoted to stable docs during releases.

资料来源：[pydoc/README.md](https://github.com/deepset-ai/haystack/blob/main/pydoc/README.md)

## Documentation Website Development

The documentation site can be run locally for development:

```bash
git clone https://github.com/deepset-ai/haystack.git
cd haystack/docs-website
npm install
npm start
```

The site opens at http://localhost:3000 with live reload functionality.

Common development tasks include:

- Edit a page: update files under `docs/` or `versioned_docs/`
- Add to sidebar: update `sidebars.js` with your doc ID
- Production check: `npm run build && npm run serve`

资料来源：[docs-website/README.md](https://github.com/deepset-ai/haystack/blob/main/docs-website/README.md)

## Search Functionality

The documentation website includes a custom search bar that groups results by page and sorts them by relevance score. The search system supports filtering by category and provides snippets from matching documents.

### Search Architecture

```mermaid
graph TD
    A[User Query] --> B[Search Input]
    B --> C[Debounced Search]
    C --> D[Search Algorithm]
    D --> E{Results Found?}
    E -->|Yes| F[Group by Page]
    E -->|No| G[No Results State]
    F --> H[Sort by Score]
    H --> I[Display Results]
    G --> J[Show Error/Message]
```

资料来源：[docs-website/src/theme/SearchBar.js](https://github.com/deepset-ai/haystack/blob/main/docs-website/src/theme/SearchBar.js)

## Documentation Export Features

The documentation site provides multiple ways to export and share content:

| Feature | Description |
|---------|-------------|
| Copy as Markdown | Copy page content in Markdown format for LLMs |
| View as Markdown | View page as plain text |
| Export as PDF | Save page as PDF file |
| Ask AI | Open page in external AI assistants |

资料来源：[docs-website/src/components/CopyDropdown/index.tsx](https://github.com/deepset-ai/haystack/blob/main/docs-website/src/components/CopyDropdown/index.tsx)

### Markdown Conversion Rules

The export feature uses custom Turndown rules:

- Code blocks: Wrapped in backticks
- Admonitions: Converted to blockquotes with type labels (NOTE, TIP, WARNING, etc.)
- Navigation elements: Removed from export
- Scripts and styles: Filtered out

资料来源：[docs-website/src/components/CopyDropdown/index.tsx](https://github.com/deepset-ai/haystack/blob/main/docs-website/src/components/CopyDropdown/index.tsx)

## Examples and Cookbooks

Example code and cookbooks have been moved to a dedicated repository: [haystack-cookbook](https://github.com/deepset-ai/haystack-cookbook/)

This separation allows for easier maintenance and discovery of example applications.

资料来源：[examples/README.md](https://github.com/deepset-ai/haystack/blob/main/examples/README.md)

## CI/CD and Quality Assurance

Haystack maintains high code quality through automated workflows:

| Workflow | Purpose |
|----------|---------|
| tests.yml | Run test suite |
| types (Mypy) | Type checking |
| Coverage | Code coverage tracking |
| Ruff | Linting |
| license_compliance.yml | License verification |

资料来源：[README.md](https://github.com/deepset-ai/haystack/blob/main/README.md)

## Contributing to Haystack

Haystack welcomes community contributions in various forms:

- **Main project**: Contribute to the core Haystack repository
- **Integrations**: Contribute on [haystack-core-integrations](https://github.com/deepset-ai/haystack-core-integrations)
- **Documentation**: Contribute to [haystack/docs-website](https://github.com/deepset-ai/haystack/tree/main/docs-website)

The project provides a [full list of issues open to contributions](https://github.com/orgs/deepset-ai/projects/14) for both new and experienced contributors.

资料来源：[README.md](https://github.com/deepset-ai/haystack/blob/main/README.md)

## Organizations Using Haystack

Haystack is used in production by numerous organizations across industries:

| Industry | Organizations |
|----------|---------------|
| Technology & AI | Apple, Meta, Databricks, NVIDIA, Intel |
| Public Sector | European Commission |
| Various | Thousands of teams building production AI systems |

资料来源：[README.md](https://github.com/deepset-ai/haystack/blob/main/README.md)

---

<a id='component-types'></a>

## Pipeline Component Types

### 相关页面

相关主题：[Pipeline Architecture](#pipeline-architecture), [Data Processing Components](#data-processing), [LLM and Embedder Integrations](#llm-integrations)

<details>
<summary>相关源码文件</summary>

以下源码文件用于生成本页说明：

- [docs-website/docs/pipeline-components/generators.mdx](https://github.com/deepset-ai/haystack/blob/main/docs-website/docs/pipeline-components/generators.mdx)
- [docs-website/docs/pipeline-components/embedders.mdx](https://github.com/deepset-ai/haystack/blob/main/docs-website/docs/pipeline-components/embedders.mdx)
- [docs-website/docs/pipeline-components/retrievers.mdx](https://github.com/deepset-ai/haystack/blob/main/docs-website/docs/pipeline-components/retrievers.mdx)
- [docs-website/docs/pipeline-components/rankers.mdx](https://github.com/deepset-ai/haystack/blob/main/docs-website/docs/pipeline-components/rankers.mdx)
- [docs-website/docs/pipeline-components/preprocessors.mdx](https://github.com/deepset-ai/haystack/blob/main/docs-website/docs/pipeline-components/preprocessors.mdx)
- [docs-website/docs/pipeline-components/converters.mdx](https://github.com/deepset-ai/haystack/blob/main/docs-website/docs/pipeline-components/converters.mdx)
- [docs-website/docs/pipeline-components/builders.mdx](https://github.com/deepset-ai/haystack/blob/main/docs-website/docs/pipeline-components/builders.mdx)
- [docs-website/docs/pipeline-components/routers.mdx](https://github.com/deepset-ai/haystack/blob/main/docs-website/docs/pipeline-components/routers.mdx)
- [docs-website/docs/pipeline-components/joiners.mdx](https://github.com/deepset-ai/haystack/blob/main/docs-website/docs/pipeline-components/joiners.mdx)
</details>

# Pipeline Component Types

Pipeline components are the fundamental building blocks of Haystack pipelines. They are modular units that perform specific operations such as retrieving documents, converting file formats, generating responses, and routing data between pipeline stages. Each component follows a consistent interface that enables seamless integration into pipeline workflows, allowing developers to compose complex LLM applications from reusable, interchangeable parts.

## Overview

Haystack provides a comprehensive set of built-in pipeline components that cover the full lifecycle of LLM-powered applications. These components are designed to work together through a unified API, enabling developers to build retrieval-augmented generation (RAG) systems, question-answering pipelines, document processing workflows, and agent-based applications with minimal configuration.

The architecture follows a modular pattern where each component receives inputs, performs a specific transformation or operation, and produces outputs that can be consumed by subsequent components in the pipeline. This design philosophy ensures that components remain loosely coupled and highly reusable across different use cases.

Components in Haystack are categorized based on their primary function within the data flow. Some components handle input preparation (converters, preprocessors), others manage information retrieval (retrievers, embedders), some optimize result ordering (rankers), and others control program flow (routers, joiners). Understanding these categories is essential for designing effective pipelines that balance performance, accuracy, and resource utilization.

## Component Architecture

### Component Lifecycle

Components in Haystack follow a standardized lifecycle that includes initialization, execution, and optional teardown phases. During initialization, components receive their configuration parameters and prepare any required resources such as model weights, API connections, or index data. The execution phase processes input data through the component's core logic, while the teardown phase releases resources when the component is no longer needed.

```mermaid
graph TD
    A[Initialize Component] --> B[Load Resources]
    B --> C[Receive Input Data]
    C --> D[Process Data]
    D --> E[Produce Output]
    E --> F{Check Pipeline Status}
    F -->|Continue| C
    F -->|Complete| G[Release Resources]
    G --> H[Component Lifecycle End]
```

### Data Flow Patterns

Haystack pipelines support multiple data flow patterns that determine how information moves between components. Linear flow passes output directly to the next component, while branching flow sends data to multiple paths based on conditions. Parallel flow distributes work across multiple components simultaneously, and feedback flow allows outputs to influence earlier pipeline stages.

## Input Processing Components

Input processing components prepare raw data for use by downstream pipeline stages. These components handle the transformation of unstructured or heterogeneous data sources into standardized formats that can be processed consistently throughout the pipeline.

### Converters

Converters transform documents from various file formats into Haystack's internal document representation. They handle the extraction of text content from source files while preserving metadata that may be useful for subsequent processing or retrieval operations.

| Converter Type | Supported Formats | Primary Use Case |
|---------------|-------------------|------------------|
| PDF Converter | PDF | Extract text from PDF documents |
| Text Converter | TXT, MD | Plain text and markdown files |
| DOCX Converter | DOCX | Microsoft Word documents |
| HTML Converter | HTML | Web page content extraction |

Converters are typically placed at the beginning of indexing pipelines where they process source documents before the content is split, embedded, and stored. The output of converters feeds directly into preprocessors that further refine the content.

资料来源：[docs-website/docs/pipeline-components/converters.mdx]()

### Preprocessors

Preprocessors clean, normalize, and transform document content to improve retrieval quality and downstream processing. They apply transformations such as text cleaning, language detection, and content segmentation to prepare documents for embedding and storage.

```mermaid
graph LR
    A[Raw Document] --> B[Clean Text]
    B --> C[Detect Language]
    C --> D[Split Document]
    D --> E[Normalize Content]
    E --> F[Processed Document]
```

Key preprocessing operations include removing unnecessary whitespace, normalizing unicode characters, splitting long documents into manageable chunks, and filtering out low-quality content. These operations significantly impact the quality of retrieval results and should be configured based on the specific characteristics of your data.

Preprocessors work closely with converters to form the input preparation stage of indexing pipelines. The processed output is then passed to embedders or directly to storage depending on the pipeline configuration.

资料来源：[docs-website/docs/pipeline-components/preprocessors.mdx]()

### Builders

Builders construct specialized data structures or artifacts that support pipeline operations. Unlike converters that handle file formats, builders create complex objects such as prompt templates, search indexes, or custom data representations required by other components.

Builders enable the composition of reusable building blocks that can be shared across multiple pipelines. They abstract away the complexity of constructing complex objects, allowing pipeline developers to focus on workflow design rather than implementation details.

资料来源：[docs-website/docs/pipeline-components/builders.mdx]()

## Information Retrieval Components

Information retrieval components locate and retrieve relevant content from data stores. These components form the core of RAG systems and document search applications, enabling pipelines to find the most relevant information based on query semantics or keywords.

### Retrievers

Retrievers search document stores to find content relevant to a given query. Haystack supports multiple retrieval strategies ranging from keyword-based sparse retrieval to semantic dense retrieval, enabling developers to choose the approach that best fits their use case.

| Retrieval Type | Description | Best For |
|--------------|-------------|----------|
| Dense Retrieval | Uses neural embeddings for semantic matching | Conceptual queries, semantic similarity |
| Sparse Retrieval | Traditional keyword-based matching | Exact matches, specific terminology |
| Hybrid Retrieval | Combines dense and sparse methods | Balanced performance across query types |

Retrievers are fundamental to RAG pipelines where they identify the documents or passages most likely to contain information relevant to the user's question. The retrieved content is then passed to generators that synthesize the final response.

资料来源：[docs-website/docs/pipeline-components/retrievers.mdx]()

### Embedders

Embedders convert text content into vector representations that capture semantic meaning. These vectors enable semantic similarity searches where documents are matched based on meaning rather than exact keyword occurrence.

```mermaid
graph TD
    A[Text Input] --> B[Embedding Model]
    B --> C[Vector Representation]
    C --> D[Vector Store]
    
    E[Query] --> F[Same Embedding Model]
    F --> G[Query Vector]
    G --> D
    D --> H[Similarity Search]
    H --> I[Ranked Results]
```

Embedders are used both during indexing (to create document vectors) and at query time (to create query vectors). The choice of embedding model significantly impacts retrieval quality, and Haystack supports integration with various embedding providers including OpenAI, Hugging Face, and local models.

资料来源：[docs-website/docs/pipeline-components/embedders.mdx]()

### Rankers

Rankers improve retrieval results by reordering documents based on additional relevance signals. While retrievers perform the initial candidate selection, rankers apply more sophisticated scoring models to identify the most relevant results.

Rankers typically use cross-encoder models that jointly analyze query-document pairs to produce relevance scores. This approach is computationally more expensive than bi-encoder retrieval but provides higher accuracy for tasks where precision is critical.

The typical pipeline arrangement places rankers after retrievers, with retrievers performing the broad candidate selection and rankers performing the refined reordering. This two-stage approach balances computational efficiency with result quality.

资料来源：[docs-website/docs/pipeline-components/rankers.mdx]()

## Output Generation Components

Output generation components synthesize final responses or artifacts from the information retrieved and processed by earlier pipeline stages. These components transform raw retrieved content into user-facing outputs.

### Generators

Generators produce final outputs such as text responses, summaries, or structured data from retrieved context and user queries. In RAG systems, generators receive relevant documents and formulate answers that incorporate information from the retrieved content.

```mermaid
graph TD
    A[User Query] --> E[Generator]
    B[Retrieved Context] --> E
    E --> F[Generate Response]
    F --> G[Response Output]
    
    H[LLM Provider] <--> E
    H --> |API Key| E
```

Generators integrate with various LLM providers including OpenAI, Anthropic, Cohere, Hugging Face, and local models. Configuration options control parameters such as temperature, max tokens, and response format to customize generator behavior for specific applications.

资料来源：[docs-website/docs/pipeline-components/generators.mdx]()

## Flow Control Components

Flow control components manage how data moves through pipelines, enabling conditional logic, parallel processing, and result aggregation. These components add flexibility to pipeline design beyond simple linear data flow.

### Routers

Routers direct input data to different pipeline branches based on conditions or classifications. They enable conditional execution where different components handle different types of inputs or queries.

| Router Type | Decision Basis | Use Case |
|------------|---------------|----------|
| Conditional Router | User-defined rules | Route queries to appropriate handlers |
| Semantic Router | Query classification | Direct to specialized pipelines |
| Custom Router | Any Python logic | Flexible routing strategies |

Routers are essential for building multi-stage pipelines that handle diverse input types or implement complex query routing strategies. They enable pipelines to adapt their behavior based on the specific requirements of each input.

资料来源：[docs-website/docs/pipeline-components/routers.mdx]()

### Joiners

Joiners combine outputs from multiple pipeline branches into unified inputs for downstream components. They handle the aggregation of results from parallel processing paths or the merging of different data streams.

```mermaid
graph TD
    A[Input] --> B[Branch 1]
    A --> C[Branch 2]
    A --> D[Branch N]
    B --> E[Joiner]
    C --> E
    D --> E
    E --> F[Combined Output]
```

Joiners implement various combination strategies including concatenation, interleaving, and weighted merging. The appropriate strategy depends on the data types being combined and the requirements of downstream components.

资料来源：[docs-website/docs/pipeline-components/joiners.mdx]()

## Component Configuration Patterns

### Initialization Parameters

Components accept configuration during initialization that determines their behavior, resource connections, and operational parameters. Common configuration categories include model selection, connection settings, and behavioral parameters.

### Default Parameters

Components provide sensible defaults for most parameters, enabling quick pipeline construction while allowing customization when needed. Default values are documented in each component's reference documentation.

### Runtime Parameters

Some components accept parameters at runtime (during pipeline execution) in addition to initialization-time configuration. Runtime parameters enable dynamic behavior adjustment based on input characteristics or pipeline state.

## Building Custom Components

Haystack's component architecture supports extension through custom implementations. Custom components follow the same interface patterns as built-in components, ensuring compatibility with existing pipeline infrastructure.

### Component Interface Requirements

Custom components must implement the standard component methods including initialization, execution, and any component-specific lifecycle hooks. The exact interface depends on the component type, but all components must be serializable for pipeline persistence.

### Integration with Pipeline

Custom components integrate seamlessly with built-in components through the unified pipeline interface. They can receive inputs from and produce outputs for any other component type, enabling flexible composition of custom and built-in functionality.

## Best Practices

### Component Selection

Choose components based on your specific use case requirements including accuracy needs, latency constraints, and resource availability. Consider the trade-offs between different retrieval strategies, embedding models, and generation approaches.

### Pipeline Design

Design pipelines with clear separation of concerns between components. Input processing, retrieval, and generation should be logically separated to enable independent optimization and testing.

### Performance Optimization

Optimize component ordering based on computational cost. Place computationally expensive operations later in the pipeline where they operate on reduced candidate sets. Use rankers selectively based on the required result quality.

## Summary

Pipeline components form the foundation of Haystack's architecture, enabling modular construction of LLM-powered applications. The component taxonomy spans input processing (converters, preprocessors, builders), information retrieval (retrievers, embedders, rankers), output generation (generators), and flow control (routers, joiners). Each component category serves a distinct purpose in the pipeline data flow, and understanding these roles enables effective pipeline design and customization.

---

<a id='data-processing'></a>

## Data Processing Components

### 相关页面

相关主题：[Document Stores and Retrievers](#document-stores), [Pipeline Component Types](#component-types)

<details>
<summary>相关源码文件</summary>

以下源码文件用于生成本页说明：

- [docs-website/docs/pipeline-components/preprocessors/documentsplitter.mdx](https://github.com/deepset-ai/haystack/blob/main/docs-website/docs/pipeline-components/preprocessors/documentsplitter.mdx)
- [docs-website/docs/pipeline-components/preprocessors/recursivesplitter.mdx](https://github.com/deepset-ai/haystack/blob/main/docs-website/docs/pipeline-components/preprocessors/recursivesplitter.mdx)
- [docs-website/docs/pipeline-components/preprocessors/hierarchicaldocumentsplitter.mdx](https://github.com/deepset-ai/haystack/blob/main/docs-website/docs/pipeline-components/preprocessors/hierarchicaldocumentsplitter.mdx)
- [docs-website/docs/pipeline-components/converters.mdx](https://github.com/deepset-ai/haystack/blob/main/docs-website/docs/pipeline-components/converters.mdx)
- [docs-website/docs/pipeline-components/preprocessors/documentcleaner.mdx](https://github.com/deepset-ai/haystack/blob/main/docs-website/docs/pipeline-components/preprocessors/documentcleaner.mdx)
</details>

# Data Processing Components

Data Processing Components are fundamental pipeline elements in Haystack that transform, clean, and prepare documents for downstream operations such as retrieval, indexing, and LLM processing. These components operate on `Document` objects, enabling structured manipulation of content while preserving metadata integrity throughout the processing pipeline.

## Overview

Data Processing Components in Haystack serve as the preprocessing layer that bridges raw document ingestion with semantic retrieval and generation tasks. They are designed to handle various document formats, split long content into manageable chunks, and ensure data quality through cleaning operations.

The architecture follows a modular design pattern where each component type specializes in a specific transformation task:

- **Document Splitters**: Divide documents into smaller, semantically coherent chunks
- **Document Cleaners**: Remove noise, normalize text, and enhance readability
- **Converters**: Transform external file formats into Haystack `Document` objects

资料来源：[docs-website/docs/pipeline-components/preprocessors/documentsplitter.mdx](https://github.com/deepset-ai/haystack/blob/main/docs-website/docs/pipeline-components/preprocessors/documentsplitter.mdx)

## Architecture and Processing Flow

```mermaid
graph TD
    A[Raw Document Input] --> B[Converters]
    B --> C[Document Objects]
    C --> D[Document Cleaners]
    D --> E[Document Splitters]
    E --> F[Processed Chunks]
    F --> G[Embedding Stores]
    G --> H[Retrieval Pipelines]
    
    B -.->|File Types| I[TXT]
    B -.->|File Types| J[PDF]
    B -.->|File Types| K[Markdown]
    B -.->|File Types| L[HTML]
    B -.->|File Types| M[Docx]
    
    D -.->|Operations| N[Text Normalization]
    D -.->|Operations| O[Whitespace Cleaning]
    D -.->|Operations| P[Metadata Preservation]
    
    E -.->|Strategies| Q[Character Split]
    E -.->|Strategies| R[Recursive Split]
    E -.->|Strategies| S[Hierarchical Split]
```

## Document Splitters

Document splitters are preprocessors that divide long documents into smaller, manageable chunks while attempting to preserve semantic coherence. This is critical for effective retrieval since chunk size directly impacts retrieval precision and context window utilization.

资料来源：[docs-website/docs/pipeline-components/preprocessors/recursivesplitter.mdx](https://github.com/deepset-ai/haystack/blob/main/docs-website/docs/pipeline-components/preprocessors/recursivesplitter.mdx)

### Splitter Types

| Splitter Type | Use Case | Splitting Strategy |
|---------------|----------|---------------------|
| `DocumentSplitter` | Basic character or token-based splitting | Fixed-length chunks |
| `RecursiveSplitter` | Hierarchical splitting by delimiters | Recursive character/separator traversal |
| `HierarchicalDocumentSplitter` | Multi-level document structure | Preserves headings and sections |

### DocumentSplitter

The base `DocumentSplitter` provides fundamental splitting capabilities using either character count or token count as the primary division criterion.

**Key Parameters:**

| Parameter | Type | Default | Description |
|-----------|------|---------|-------------|
| `split_length` | `int` | Required | Target size of each chunk |
| `split_overlap` | `int` | `0` | Number of overlapping elements between chunks |
| `split_by` | `str` | `"word"` | Splitting criterion: `"word"`, `"sentence"`, `"passage"`, or `"token"` |

资料来源：[docs-website/docs/pipeline-components/preprocessors/documentsplitter.mdx](https://github.com/deepset-ai/haystack/blob/main/docs-website/docs/pipeline-components/preprocessors/documentsplitter.mdx)

### RecursiveSplitter

The `RecursiveSplitter` implements an intelligent multi-level splitting strategy that attempts to split documents at natural boundaries before falling back to smaller units.

```python
from haystack.components.preprocessors import RecursiveSplitter

splitter = RecursiveSplitter(
    split_by="sentence",
    split_length=5,
    split_overlap=2,
    separators=["\n\n", "\n", ". ", " ", ""]
)
```

The splitter iterates through the `separators` list, attempting to split at each level. If a split produces chunks larger than `split_length`, it moves to the next (smaller) separator in the list.

资料来源：[docs-website/docs/pipeline-components/preprocessors/recursivesplitter.mdx](https://github.com/deepset-ai/haystack/blob/main/docs-website/docs/pipeline-components/preprocessors/recursivesplitter.mdx)

**Separator Priority:**

| Priority | Separator | Context |
|----------|-----------|---------|
| 1 | `"\n\n"` | Paragraph breaks |
| 2 | `"\n"` | Line breaks |
| 3 | `". "` | Sentence endings |
| 4 | `" "` | Word boundaries |
| 5 | `""` | Character-level fallback |

### HierarchicalDocumentSplitter

The `HierarchicalDocumentSplitter` is designed for structured documents that contain hierarchical headings and section markers. It preserves document structure by splitting at heading boundaries first.

**Key Features:**

- Detects heading patterns (e.g., `#`, `##`, `###` in Markdown)
- Splits at the highest heading level available
- Maintains hierarchical relationships between sections and subsections
- Ideal for technical documentation and Markdown-based content

```python
from haystack.components.preprocessors import HierarchicalDocumentSplitter

splitter = HierarchicalDocumentSplitter(
    split_by="sentence",
    split_length=10,
    split_overlap=3
)
```

资料来源：[docs-website/docs/pipeline-components/preprocessors/hierarchicaldocumentsplitter.mdx](https://github.com/deepset-ai/haystack/blob/main/docs-website/docs/pipeline-components/preprocessors/hierarchicaldocumentsplitter.mdx)

## Document Cleaners

Document cleaners are preprocessing components that normalize and sanitize text content while preserving essential structure and metadata. They remove unwanted artifacts, standardize formatting, and enhance downstream processing quality.

资料来源：[docs-website/docs/pipeline-components/preprocessors/documentcleaner.mdx](https://github.com/deepset-ai/haystack/blob/main/docs-website/docs/pipeline-components/preprocessors/documentcleaner.mdx)

### Core Cleaning Operations

| Operation | Description | Example |
|-----------|-------------|---------|
| Whitespace normalization | Collapse multiple spaces, trim line breaks | `"  Hello\n\n  World  "` → `"Hello World"` |
| Character removal | Strip control characters and special symbols | Removes `\x00` to `\x1f` except `\n`, `\t` |
| Quote normalization | Standardize quote characters | Smart quotes → straight quotes |
| Heading normalization | Clean heading markers | Removes `#` from Markdown headings |

### Common Parameters

| Parameter | Type | Default | Description |
|-----------|------|---------|-------------|
| `remove_empty_lines` | `bool` | `True` | Remove lines with no content |
| `remove_extra_whitespace` | `bool` | `True` | Normalize whitespace between words |
| `remove_repeated_substrings` | `bool` | `False` | Eliminate duplicate consecutive substrings |

## Converters

Converters are components that transform external file formats into Haystack `Document` objects. They handle the ingestion pipeline by parsing various document formats and extracting both content and metadata.

资料来源：[docs-website/docs/pipeline-components/converters.mdx](https://github.com/deepset-ai/haystack/blob/main/docs-website/docs/pipeline-components/converters.mdx)

### Supported Formats

| Format | Converter Class | Features |
|--------|-----------------|----------|
| Plain Text | `TextConverter` | Direct text extraction |
| PDF | `PdfToDocumentConverter` | Text and table extraction |
| Markdown | `MarkdownToDocumentConverter` | Preserves structure and headings |
| HTML | `HtmlToDocumentConverter` | Extracts text from HTML elements |
| Microsoft Word | `DocxToDocumentConverter` | Document and paragraph parsing |

### Converter Architecture

```mermaid
graph LR
    A[Input File] --> B[Format Detection]
    B --> C[Format-Specific Parser]
    C --> D[Content Extraction]
    D --> E[Metadata Enrichment]
    E --> F[Haystack Document]
    
    G[File Path] -.->|Direct Input| D
    H[Binary Content] -.->|Raw Data| C
```

### Common Converter Parameters

| Parameter | Type | Default | Description |
|-----------|------|---------|-------------|
| `encoding` | `str` | `"utf-8"` | Text encoding for file reading |
| `encoding_errors` | `str` | `"strict"` | How to handle encoding errors |
| `id_hash_keys` | `List[str]` | `["content"]` | Keys for document ID generation |
| `meta` | `Dict[str, Any]` | `{}` | Additional metadata to attach |

资料来源：[docs-website/docs/pipeline-components/converters.mdx](https://github.com/deepset-ai/haystack/blob/main/docs-website/docs/pipeline-components/converters.mdx)

## Integration with Pipelines

Data Processing Components integrate seamlessly into Haystack pipelines as standard pipeline nodes. They can be composed in any order to create custom preprocessing workflows.

### Typical Pipeline Configuration

```python
from haystack import Pipeline
from haystack.components.preprocessors import DocumentCleaner, RecursiveSplitter
from haystack.components.converters import TextConverter

pipeline = Pipeline()
pipeline.add_component("converter", TextConverter())
pipeline.add_component("cleaner", DocumentCleaner())
pipeline.add_component("splitter", RecursiveSplitter(split_length=200, split_by="word"))

pipeline.connect("converter", "cleaner")
pipeline.connect("cleaner", "splitter")
```

### Processing Order Recommendation

While components can be connected in various orders, the recommended processing sequence is:

1. **Convert** - Transform source files into `Document` objects
2. **Clean** - Normalize and sanitize the text content
3. **Split** - Divide documents into retrieval-optimized chunks

This sequence ensures that cleaning operations apply to the complete document before splitting, maintaining consistency across chunks.

## Metadata Preservation

All Data Processing Components preserve and propagate document metadata throughout the processing pipeline. Metadata added during conversion is carried through cleaning and splitting operations.

**Automatic Metadata Fields:**

| Field | Source | Description |
|-------|--------|-------------|
| `source` | Converter | Original file path or URI |
| `file_type` | Converter | Document format (pdf, txt, etc.) |
| `page_number` | PDF Converter | Page number for page-level tracking |
| `split_id` | Splitter | Unique identifier for each chunk |
| `split_idx_start` | Splitter | Character offset where chunk begins |

## Best Practices

### Chunk Size Selection

| Chunk Size | Recommended Use Case |
|------------|---------------------|
| 50-100 tokens | High-precision queries, precise fact extraction |
| 200-300 tokens | Balanced retrieval, general Q&A |
| 500+ tokens | Complex reasoning, multi-document synthesis |

### Cleaning Configuration

- Enable `remove_extra_whitespace` for all text-based content
- Use `remove_empty_lines` when building dense indexes
- Disable cleaning for Markdown/HTML if structure preservation is critical

### Overlap Strategy

When configuring `split_overlap`, consider:

- **Low overlap (0-10%)**: Maximizes diversity, suitable for unique content
- **Medium overlap (10-20%)**: Balances context preservation and diversity
- **High overlap (20%+**: Essential for documents with continuous context

## Related Components

- **Embedding Generators**: Process chunks to create vector representations
- **Document Stores**: Store and index processed chunks for retrieval
- **Rankers**: Reorder retrieved chunks by relevance
- **Prompt Engineers**: Combine chunks for LLM context windows

---

<a id='llm-integrations'></a>

## LLM and Embedder Integrations

### 相关页面

相关主题：[Document Stores and Retrievers](#document-stores), [Pipeline Component Types](#component-types), [Development Guide](#development-guide)

<details>
<summary>相关源码文件</summary>

以下源码文件用于生成本页说明：

- [docs-website/docs/pipeline-components/generators/guides-to-generators/choosing-the-right-generator.mdx](https://github.com/deepset-ai/haystack/blob/main/docs-website/docs/pipeline-components/generators/guides-to-generators/choosing-the-right-generator.mdx)
- [docs-website/docs/pipeline-components/generators/guides-to-generators/function-calling.mdx](https://github.com/deepset-ai/haystack/blob/main/docs-website/docs/pipeline-components/generators/guides-to-generators/function-calling.mdx)
- [docs-website/docs/pipeline-components/embedders/choosing-the-right-embedder.mdx](https://github.com/deepset-ai/haystack/blob/main/docs-website/docs/pipeline-components/embedders/choosing-the-right-embedder.mdx)
- [docs-website/docs/concepts/integrations.mdx](https://github.com/deepset-ai/haystack/blob/main/docs-website/docs/concepts/integrations.mdx)
</details>

# LLM and Embedder Integrations

## Overview

LLM and Embedder Integrations in Haystack provide the core components for interfacing with Large Language Models and embedding services. These integrations enable developers to build production-ready applications powered by LLMs, Transformer models, and vector search capabilities.

资料来源：[README.md:1-10]()

## Architecture

Haystack's integration architecture follows a modular pipeline design where Generators (LLMs) and Embedders serve as fundamental building blocks within the orchestration framework.

```mermaid
graph TD
    A[Haystack Pipeline] --> B[Retrieval Components]
    A --> C[Generator Components]
    A --> D[Embedder Components]
    C --> E[LLM Providers]
    D --> F[Embedding Models]
    B --> F
    E --> G[API Services]
    F --> G
```

## Generator Integration

### Purpose

Generators in Haystack are components that interact with Large Language Models to generate responses based on prompts and retrieved context. They serve as the core reasoning engine within RAG (Retrieval-Augmented Generation) pipelines.

资料来源：[docs-website/docs/pipeline-components/generators/guides-to-generators/choosing-the-right-generator.mdx:1-15]()

### Supported Providers

Haystack supports multiple LLM providers through its integration system. The framework provides standardized interfaces for:

| Provider | Integration Type | API Access |
|----------|------------------|------------|
| OpenAI | Chat Completions API | API Key |
| Anthropic | Claude API | API Key |
| Azure OpenAI | Azure OpenAI Service | Azure Credentials |
| Hugging Face | Inference API / Local | API Key / Local |
| Ollama | Local Models | Local Host |

### Component Configuration

Generator components in Haystack follow a consistent initialization pattern:

```python
from haystack import Pipeline
from haystack.components.generators import OpenAIChatGenerator

generator = OpenAIChatGenerator(
    api_key="your-api-key",
    model="gpt-4",
    streaming_callback=None,
    generation_kwargs={"temperature": 0.7, "max_tokens": 500}
)
```

## Embedder Integration

### Purpose

Embedders are components that convert text into vector representations (embeddings) suitable for semantic search and similarity comparisons. They are essential for the retrieval portion of RAG pipelines.

资料来源：[docs-website/docs/pipeline-components/embedders/choosing-the-right-embedder.mdx:1-20]()

### Embedder Types

| Type | Use Case | Deployment |
|------|----------|------------|
| Sentence Transformers | General text embeddings | Local / API |
| OpenAI Embeddings | API-based generation | Remote |
| Hugging Face | Transformer models | Local / Inference API |
| Cohere | Multi-lingual support | API |

### Integration with Retrievers

Embedders work in conjunction with document stores to enable semantic search:

```mermaid
graph LR
    A[Documents] --> B[Embedder]
    B --> C[Vector Store]
    C --> D[Retriever]
    E[Query] --> F[Query Embedder]
    F --> D
    D --> G[Retrieved Docs]
```

## Function Calling

Function calling extends LLM integrations to enable structured interactions between LLMs and external tools. This feature allows Generators to produce structured outputs that can trigger specific actions.

资料来源：[docs-website/docs/pipeline-components/generators/guides-to-generators/function-calling.mdx:1-30]()

### Workflow

```mermaid
sequenceDiagram
    participant User
    participant Pipeline
    participant LLM
    participant Tool
    
    User->>Pipeline: Query with function definitions
    Pipeline->>LLM: Send prompt + function specs
    LLM->>LLM: Analyze request
    LLM-->>Pipeline: Function call + parameters
    Pipeline->>Tool: Execute function
    Tool-->>Pipeline: Function result
    Pipeline->>LLM: Send result + original context
    LLM-->>Pipeline: Final response
    Pipeline-->>User: Return answer
```

## Integration Configuration

### Environment Setup

Integrations in Haystack typically require API credentials which can be configured via environment variables:

```bash
export OPENAI_API_KEY="your-openai-key"
export ANTHROPIC_API_KEY="your-anthropic-key"
export HUGGINGFACE_TOKEN="your-hf-token"
```

资料来源：[docs-website/docs/concepts/integrations.mdx:1-25]()

### Configuration Options

| Parameter | Description | Default |
|-----------|-------------|---------|
| `api_key` | Provider API key | Environment variable |
| `model` | Model identifier | Provider default |
| `timeout` | Request timeout in seconds | 60 |
| `max_retries` | Number of retry attempts | 3 |

## Pipeline Integration Example

```python
from haystack import Pipeline
from haystack.components.retrievers import InMemoryBM25Retriever
from haystack.components.generators import OpenAIChatGenerator
from haystack.document_stores import InMemoryDocumentStore

# Initialize components
document_store = InMemoryDocumentStore()
retriever = InMemoryBM25Retriever(document_store=document_store)
generator = OpenAIChatGenerator(model="gpt-4")

# Build pipeline
pipeline = Pipeline()
pipeline.add_component("retriever", retriever)
pipeline.add_component("generator", generator)
pipeline.connect("retriever", "generator")
```

## Installation

To use LLM and Embedder integrations, install the appropriate Haystack packages:

```sh
# Core package
pip install haystack-ai

# For specific integrations
pip install "haystack-ai[openai]"    # OpenAI models
pip install "haystack-ai[anthropic]"  # Anthropic Claude
pip install "haystack-ai[transformers]" # Hugging Face
```

## Additional Resources

- [Documentation Site](https://docs.haystack.deepset.ai)
- [GitHub Repository](https://github.com/deepset-ai/haystack)
- [Integration Guides](https://docs.haystack.deepset.ai/docs/integrations)

---

<a id='document-stores'></a>

## Document Stores and Retrievers

### 相关页面

相关主题：[LLM and Embedder Integrations](#llm-integrations), [Data Processing Components](#data-processing)

<details>
<summary>Related Source Files</summary>

The following source files were used to generate this page:

- [docs-website/docs/concepts/document-store.mdx](https://github.com/deepset-ai/haystack/blob/main/docs-website/docs/concepts/document-store.mdx)
- [docs-website/docs/concepts/document-store/choosing-a-document-store.mdx](https://github.com/deepset-ai/haystack/blob/main/docs-website/docs/concepts/document-store/choosing-a-document-store.mdx)
- [docs-website/docs/document-stores/inmemorydocumentstore.mdx](https://github.com/deepset-ai/haystack/blob/main/docs-website/docs/document-stores/inmemorydocumentstore.mdx)
- [docs-website/docs/document-stores/elasticsearch-document-store.mdx](https://github.com/deepset-ai/haystack/blob/main/docs-website/docs/document-stores/elasticsearch-document-store.mdx)
- [docs-website/docs/document-stores/qdrant-document-store.mdx](https://github.com/deepset-ai/haystack/blob/main/docs-website/docs/document-stores/qdrant-document-store.mdx)
- [docs-website/docs/document-stores/pinecone-document-store.mdx](https://github.com/deepset-ai/haystack/blob/main/docs-website/docs/document-stores/pinecone-document-store.mdx)
</details>

# Document Stores and Retrievers

Document Stores and Retrievers are fundamental components in the Haystack framework that enable efficient storage, indexing, and retrieval of documents for LLM-powered applications. These components form the backbone of retrieval-augmented generation (RAG) pipelines and semantic search systems.

## Overview

Haystack provides a unified abstraction layer for document storage and retrieval, allowing developers to work with different backend technologies through a consistent interface. The framework supports multiple document store implementations, each optimized for different use cases, scales, and deployment requirements.

Document Stores in Haystack handle the persistence and indexing of documents, while Retrievers are specialized components that query these stores to find relevant documents based on user queries. This separation of concerns allows for flexible pipeline composition and easy swapping of storage backends.

## Architecture

```mermaid
graph TD
    A[User Query] --> B[Retriever]
    B --> C[Document Store]
    C --> D[(Vector Index)]
    C --> E[(Document DB)]
    F[Documents] --> C
    G[Embedding Model] --> D
    B --> H[Query Embedding]
    H --> D
    D --> I[Relevant Documents]
    I --> J[RAG Pipeline]
```

The architecture separates concerns between storage and retrieval, enabling optimized implementations for each layer.

## Document Store Types

Haystack supports multiple document store implementations, each with distinct characteristics:

| Document Store | Type | Use Case | Scalability |
|----------------|------|----------|--------------|
| InMemoryDocumentStore | In-memory | Development, testing, small datasets | Single machine, limited scale |
| ElasticsearchDocumentStore | Distributed search | Production, full-text search | Horizontal scaling |
| QdrantDocumentStore | Vector database | Semantic search, embeddings | High-dimensional vectors |
| PineconeDocumentStore | Managed vector DB | Cloud-native, managed infrastructure | Global distribution |

### InMemoryDocumentStore

The `InMemoryDocumentStore` is the simplest document store implementation, storing all data in memory. It is primarily used for development, testing, and prototyping scenarios where persistence is not required.

**Key Characteristics:**
- No external dependencies required
- Fast read/write operations for small datasets
- Data lost on application restart
- Not suitable for production deployments with large volumes

资料来源：[docs-website/docs/document-stores/inmemorydocumentstore.mdx](https://github.com/deepset-ai/haystack/blob/main/docs-website/docs/document-stores/inmemorydocumentstore.mdx)

### ElasticsearchDocumentStore

Elasticsearch provides a mature, production-ready document store with powerful full-text search capabilities. It is well-suited for applications requiring sophisticated text analysis, faceted search, and scalable infrastructure.

**Key Characteristics:**
- Distributed architecture for high availability
- Rich query DSL for complex search operations
- BM25 ranking algorithm for relevance scoring
- Supports millions of documents

资料来源：[docs-website/docs/document-stores/elasticsearch-document-store.mdx](https://github.com/deepset-ai/haystack/blob/main/docs-website/docs/document-stores/elasticsearch-document-store.mdx)

### QdrantDocumentStore

Qdrant is a vector database optimized for similarity search and high-dimensional embeddings. It provides efficient nearest neighbor search operations essential for semantic retrieval.

**Key Characteristics:**
- Optimized for vector similarity search
- Supports payload filtering
- Hybrid sparse-dense vector search
- gRPC-based API for performance

资料来源：[docs-website/docs/document-stores/qdrant-document-store.mdx](https://github.com/deepset-ai/haystack/blob/main/docs-website/docs/document-stores/qdrant-document-store.mdx)

### PineconeDocumentStore

Pinecone is a managed vector database service that eliminates infrastructure management overhead. It provides global distribution and automatic scaling for production deployments.

**Key Characteristics:**
- Fully managed cloud service
- Automatic scaling and sharding
- Multi-tenancy support
- Low-latency querying at scale

资料来源：[docs-website/docs/document-stores/pinecone-document-store.mdx](https://github.com/deepset-ai/haystack/blob/main/docs-website/docs/document-stores/pinecone-document-store.mdx)

## Choosing a Document Store

Selecting the appropriate document store depends on several factors including scale, performance requirements, deployment environment, and feature needs.

资料来源：[docs-website/docs/concepts/document-store/choosing-a-document-store.mdx](https://github.com/deepset-ai/haystack/blob/main/docs-website/docs/concepts/document-store/choosing-a-document-store.mdx)

### Decision Criteria

| Factor | InMemory | Elasticsearch | Qdrant | Pinecone |
|--------|----------|---------------|--------|----------|
| Dataset Size | < 100K docs | Unlimited | Unlimited | Unlimited |
| Latency | Very low | Medium | Low | Low |
| Persistence | None | Full | Full | Full |
| Full-text Search | Basic | Advanced | Limited | Limited |
| Vector Search | Basic | Plugin required | Native | Native |
| Managed Service | No | Self-hosted/Cloud | Self-hosted/Cloud | Yes (managed) |
| Cost | Free | Infrastructure | Infrastructure | Usage-based |

### Recommendations

**Development and Testing:**
Use `InMemoryDocumentStore` for rapid prototyping and unit testing. It requires no setup and provides immediate feedback.

**Production with Full-text Search:**
Choose `ElasticsearchDocumentStore` when your application requires complex text queries, aggregations, or you already have an Elasticsearch infrastructure.

**Semantic Search at Scale:**
Select `QdrantDocumentStore` or `PineconeDocumentStore` for applications primarily relying on embedding-based similarity search. Both provide native vector operations with efficient indexing.

## Document Model

Documents in Haystack follow a standardized data model that captures content, metadata, and embedding vectors.

```mermaid
classDiagram
    class Document {
        +str id
        +str content
        +dict meta
        +List[float] embedding
        +str blob
        +str blob_mime_type
    }
```

**Core Document Fields:**

| Field | Type | Description |
|-------|------|-------------|
| `id` | string | Unique identifier for the document |
| `content` | string | Main text content of the document |
| `meta` | dict | Arbitrary metadata (source, author, date, etc.) |
| `embedding` | list[float] | Vector representation for semantic search |

资料来源：[docs-website/docs/concepts/document-store.mdx](https://github.com/deepset-ai/haystack/blob/main/docs-website/docs/concepts/document-store.mdx)

## Retriever Types

Retrievers query document stores to find the most relevant documents for a given query. Haystack provides multiple retriever implementations optimized for different search strategies.

### Dense Retrievers

Dense retrievers use neural network models to encode queries and documents into dense vector representations. They excel at capturing semantic meaning and handling synonyms.

### Sparse Retrievers

Sparse retrievers use traditional information retrieval techniques like BM25 or TF-IDF. They are effective for exact term matching and keyword-based queries.

### Hybrid Retrievers

Hybrid retrievers combine both dense and sparse approaches, leveraging the strengths of each to provide robust retrieval across different query types.

## Pipeline Integration

```mermaid
graph LR
    A[Query] --> B[Retriever]
    B --> C[Document Store]
    C --> D[Top-K Documents]
    D --> E[Ranker]
    E --> F[Reader/Generator]
    F --> G[Answer]
```

Document Stores and Retrievers integrate seamlessly into Haystack pipelines, typically appearing early in the pipeline to fetch candidate documents before passing them to downstream components like Readers or Generators.

## Basic Usage Example

```python
from haystack import Document
from haystack.document_stores import InMemoryDocumentStore
from haystack.nodes import BM25Retriever

# Initialize document store
document_store = InMemoryDocumentStore()

# Write documents
documents = [
    Document(content="Haystack is an open-source NLP framework", meta={"source": "docs"}),
    Document(content="It supports retrieval-augmented generation", meta={"source": "blog"}),
]
document_store.write_documents(documents)

# Initialize retriever
retriever = BM25Retriever(document_store=document_store)

# Query
results = retriever.retrieve(query="What is Haystack?", top_k=10)
```

## Performance Considerations

### Indexing Performance

| Store | Indexing Speed | Memory Usage |
|-------|----------------|--------------|
| InMemory | Very Fast | Proportional to dataset |
| Elasticsearch | Medium | Distributed across nodes |
| Qdrant | Fast | Optimized for vectors |
| Pinecone | Fast | Managed externally |

### Query Performance

Query latency depends on the number of documents, vector dimensions, and the complexity of filters applied. Vector databases like Qdrant and Pinecone use specialized indexing structures (HNSW, IVF) to achieve sub-millisecond query times on large datasets.

## See Also

- [Document Store Concepts](docs/concepts/document-store.mdx) - Detailed conceptual overview
- [Choosing a Document Store](docs/concepts/document-store/choosing-a-document-store.mdx) - Selection guide
- [Pipeline Components](../pipeline-components/overview.mdx) - How retrievers fit into pipelines
- [Embedding Models](../components/embedder.mdx) - Generating document embeddings

---

<a id='agents'></a>

## Agent Systems

### 相关页面

相关主题：[Introduction to Haystack](#introduction), [Pipeline Architecture](#pipeline-architecture), [LLM and Embedder Integrations](#llm-integrations)

<details>
<summary>相关源码文件</summary>

以下源码文件用于生成本页说明：

- [docs-website/docs/concepts/agents.mdx](https://github.com/deepset-ai/haystack/blob/main/docs-website/docs/concepts/agents.mdx)
- [docs-website/docs/concepts/agents/multi-agent-systems.mdx](https://github.com/deepset-ai/haystack/blob/main/docs-website/docs/concepts/agents/multi-agent-systems.mdx)
- [docs-website/docs/pipeline-components/agents-1/agent.mdx](https://github.com/deepset-ai/haystack/blob/main/docs-website/docs/pipeline-components/agents-1/agent.mdx)
- [docs-website/docs/pipeline-components/agents-1/state.mdx](https://github.com/deepset-ai/haystack/blob/main/docs-website/docs/pipeline-components/agents-1/state.mdx)
- [docs-website/docs/pipeline-components/agents-1/human-in-the-loop.mdx](https://github.com/deepset-ai/haystack/blob/main/docs-website/docs/pipeline-components/agents-1/human-in-the-loop.mdx)
</details>

# Agent Systems

Agent systems in Haystack represent a powerful paradigm for building autonomous and semi-autonomous AI applications that can perceive, reason, act, and interact with their environment. Haystack's agent framework enables developers to create sophisticated LLM-powered applications where agents can use tools, maintain state, collaborate with other agents, and incorporate human feedback into their decision-making processes.

## Overview

Haystack agents are designed to extend beyond simple prompt-response interactions by providing a structured mechanism for Large Language Models to take actions, make decisions, and execute multi-step workflows. The agent system in Haystack is built with flexibility and modularity in mind, allowing developers to customize every aspect of agent behavior from the underlying model to the specific tools available and the logic governing agent decisions.

The framework supports a variety of agent types and architectures, ranging from single-agent systems that handle specific tasks to complex multi-agent ecosystems where multiple specialized agents collaborate to solve problems. This flexibility makes Haystack suitable for a wide range of use cases, from simple question-answering applications to sophisticated autonomous systems that can browse the web, execute code, and coordinate with other agents to complete complex tasks.

## Core Architecture

The agent architecture in Haystack is built around a pipeline-based model that connects perception, reasoning, action selection, and execution into a cohesive workflow. At its core, an agent consists of several key components that work together to enable autonomous behavior.

### Agent Components

| Component | Purpose | Description |
|-----------|---------|-------------|
| LLM | Reasoning Engine | The underlying language model that drives decision-making |
| Tools | Action Interface | Capabilities that allow the agent to interact with external systems |
| Prompt Builder | Instruction Assembly | Constructs prompts that guide agent behavior |
| Output Handler | Response Processing | Interprets and executes agent decisions |
| Memory | State Management | Maintains conversation history and context |

资料来源：[docs-website/docs/pipeline-components/agents-1/agent.mdx](https://github.com/deepset-ai/haystack/blob/main/docs-website/docs/pipeline-components/agents-1/agent.mdx)

### Execution Flow

```mermaid
graph TD
    A[User Input] --> B[Agent Receives Task]
    B --> C[LLM Reasoning]
    C --> D{Tool Selection?}
    D -->|Yes| E[Execute Tool]
    E --> F[Process Result]
    D -->|No| G[Generate Response]
    F --> C
    G --> H[Return to User]
    C --> I{Human Input Needed?}
    I -->|Yes| J[Pause for Human Feedback]
    J --> C
    I -->|No| D
```

The execution flow demonstrates how Haystack agents operate in a loop, continuously reasoning about the best course of action until the task is complete. The agent receives input, reasons about what to do, selects and executes tools as needed, and continues until it can provide a final response or requires additional input from the user or human overseer.

## State Management

State management is a critical aspect of agent systems, enabling agents to maintain context across multiple interactions and track the progress of complex, multi-step tasks. Haystack provides a flexible state management system that allows agents to store, retrieve, and update information throughout their execution lifecycle.

### State Structure

The state system in Haystack agents typically includes several key elements that together form a comprehensive view of the agent's current situation and history. These elements enable the agent to maintain awareness of what has happened previously, what actions have been taken, and what information has been gathered.

| State Element | Type | Description |
|--------------|------|-------------|
| Conversation History | List | Previous messages and interactions |
| Tool Usage Log | List | Record of tools called and results |
| Intermediate Results | Dict | Data collected during task execution |
| User Preferences | Dict | Learned user preferences and feedback |
| Task Progress | Dict | Current status of ongoing tasks |

资料来源：[docs-website/docs/pipeline-components/agents-1/state.mdx](https://github.com/deepset-ai/haystack/blob/main/docs-website/docs/pipeline-components/agents-1/state.mdx)

### State Persistence

Agents in Haystack can maintain state across sessions, enabling persistent memory and long-term learning. This is particularly valuable for applications where the agent needs to build relationships with users over time or maintain knowledge about specific domains or tasks. The state management system supports various backends for persistence, from simple in-memory storage to distributed databases for production deployments.

## Multi-Agent Systems

Haystack supports the creation of sophisticated multi-agent systems where multiple specialized agents work together to solve problems. This architectural pattern enables the decomposition of complex tasks into smaller, manageable subtasks that can be handled by agents with specialized capabilities.

### Agent Collaboration Patterns

```mermaid
graph TD
    subgraph Coordinator Agent
        A[Task Received] --> B{Analyze Task}
        B --> C[Decompose into Subtasks]
    end
    
    subgraph Specialized Agents
        D[Agent A: Research]
        E[Agent B: Analysis]
        F[Agent C: Synthesis]
    end
    
    C --> D
    C --> E
    C --> F
    D --> G[Results Aggregation]
    E --> G
    F --> G
    G --> H[Final Response]
```

Multi-agent systems in Haystack can be configured with various collaboration patterns. In the supervisor pattern, a single coordinating agent directs the work of subordinate agents, assigning tasks and collecting results. In the collaborative pattern, agents work together as equals, sharing information and contributing their expertise to solve problems collectively.

### Communication Protocols

Agents in a multi-agent system communicate through well-defined interfaces that specify how messages are passed between agents, how responses are aggregated, and how conflicts are resolved. This structured approach to agent communication ensures reliable operation even in complex agent ecosystems with many participants.

资料来源：[docs-website/docs/concepts/agents/multi-agent-systems.mdx](https://github.com/deepset-ai/haystack/blob/main/docs-website/docs/concepts/agents/multi-agent-systems.mdx)

## Human-in-the-Loop

Haystack agents support human-in-the-loop workflows, enabling humans to provide guidance, approval, or corrections during agent execution. This capability is essential for applications where autonomous operation must be balanced with human oversight and control.

### Interaction Modes

| Mode | Description | Use Case |
|------|-------------|----------|
| Approval | Human approves agent actions before execution | High-stakes decisions |
| Feedback | Human provides corrective feedback during execution | Fine-tuning agent behavior |
| Escalation | Agent defers to human when uncertain | Handling edge cases |
| Validation | Human validates agent outputs before completion | Quality assurance |

资料来源：[docs-website/docs/pipeline-components/agents-1/human-in-the-loop.mdx](https://github.com/deepset-ai/haystack/blob/main/docs-website/docs/pipeline-components/agents-1/human-in-the-loop.mdx)

### Workflow Integration

```mermaid
graph TD
    A[Agent Task] --> B{Requires Human Input?}
    B -->|Yes| C[Pause Execution]
    C --> D[Notify Human]
    D --> E[Await Response]
    E --> F{Human Action}
    F -->|Approve| G[Continue Execution]
    F -->|Reject| H[Abort or Retry]
    F -->|Modify| I[Apply Modifications]
    B -->|No| G
    I --> G
    G --> J[Task Complete]
```

The human-in-the-loop system is designed to be non-intrusive, minimizing the cognitive load on human overseers while ensuring that critical decisions receive appropriate human review. Agents can be configured to automatically escalate certain types of decisions based on predefined rules, such as actions that affect sensitive data or exceed specified cost thresholds.

## Tool Integration

A defining characteristic of Haystack agents is their ability to use tools to interact with external systems and perform actions beyond text generation. The tool integration system provides a standardized interface for defining, registering, and invoking tools that extend agent capabilities.

### Available Tool Categories

| Category | Examples | Capabilities |
|----------|----------|--------------|
| Web Search | Google Search, Bing Search | Internet research, fact checking |
| API Clients | REST, GraphQL | External service integration |
| Code Execution | Python, Shell | Computation, automation |
| Document Processing | PDF, CSV parsers | Information extraction |
| Database | SQL, Vector DB | Data retrieval, storage |

Tools in Haystack follow a consistent interface that makes it easy to create custom tools for domain-specific applications. Each tool is defined with a name, description, input schema, and implementation, and the agent automatically learns when and how to use tools based on their descriptions.

## Configuration Options

Haystack agents expose a wide range of configuration options that allow developers to customize agent behavior for specific use cases. These options control aspects ranging from the underlying model selection to detailed parameters governing agent decision-making.

### Core Configuration Parameters

| Parameter | Type | Default | Description |
|-----------|------|---------|-------------|
| `model` | String | Required | The LLM to use for reasoning |
| `max_iterations` | Integer | 10 | Maximum tool-calling loops |
| `tools` | List | Empty | Available tools for the agent |
| `prompt_template` | String | Default | Custom instruction template |
| `verbose` | Boolean | False | Enable detailed logging |

Advanced configuration options allow developers to customize how the agent reasons, how it selects tools, and how it handles errors and edge cases. These options can be set at the agent level or overridden for specific use cases.

## Best Practices

When building agent systems with Haystack, several best practices can help ensure reliable and maintainable applications. Careful attention to prompt design, tool definitions, and error handling will significantly improve agent performance and user experience.

Clear and specific tool descriptions are essential for guiding agent behavior. Tools should have descriptive names and comprehensive descriptions that explain not just what the tool does, but when and why an agent should consider using it. This helps the underlying LLM make informed decisions about tool selection.

State management should be designed with the target use case in mind. For simple single-turn interactions, minimal state management is appropriate. For complex multi-step tasks, comprehensive state tracking ensures the agent maintains context and can recover from errors gracefully.

Human-in-the-loop integration should be thoughtfully designed to balance autonomy with oversight. Critical decisions should require human approval, while routine operations can proceed autonomously. The escalation criteria should be clearly defined and regularly reviewed.

## Summary

Haystack's agent systems provide a comprehensive framework for building LLM-powered applications that can perceive, reason, and act. The architecture supports everything from simple single-agent applications to complex multi-agent ecosystems with human oversight. Key features include flexible state management, extensive tool integration, human-in-the-loop workflows, and configurable agent behavior.

资料来源：[docs-website/docs/concepts/agents.mdx](https://github.com/deepset-ai/haystack/blob/main/docs-website/docs/concepts/agents.mdx)

---

<a id='development-guide'></a>

## Development Guide

### 相关页面

相关主题：[Deployment and Infrastructure](#deployment), [Introduction to Haystack](#introduction)

<details>
<summary>相关源码文件</summary>

以下源码文件用于生成本页说明：

- [README.md](https://github.com/deepset-ai/haystack/blob/main/README.md)
- [docs-website/README.md](https://github.com/deepset-ai/haystack/blob/main/docs-website/README.md)
- [docker/README.md](https://github.com/deepset-ai/haystack/blob/main/docker/README.md)
- [pydoc/README.md](https://github.com/deepset-ai/haystack/blob/main/pydoc/README.md)
- [examples/README.md](https://github.com/deepset-ai/haystack/blob/main/examples/README.md)
</details>

# Development Guide

This guide provides comprehensive information for developers who want to contribute to Haystack or extend its functionality. Haystack is an end-to-end LLM framework that enables building applications powered by Large Language Models, Transformer models, and vector search capabilities.

## Overview

Haystack is an open-source framework maintained by deepset that allows developers to build production-ready AI applications. The framework supports retrieval-augmented generation (RAG), document search, question answering, and answer generation by orchestrating state-of-the-art embedding models and LLMs into pipelines.

资料来源：[README.md:1-10]()

## Project Structure

The Haystack repository is organized into several main directories, each serving a specific purpose in the overall project ecosystem.

```mermaid
graph TD
    A[haystack/ root] --> B[Main Package]
    A --> C[docs-website/]
    A --> D[docker/]
    A --> E[pydoc/]
    A --> F[examples/]
    
    B --> G[Core Framework Code]
    C --> H[Documentation Site]
    D --> I[Docker Images]
    E --> J[API Reference Generation]
    F --> K[Example Cookbooks]
```

### Directory Breakdown

| Directory | Purpose |
|-----------|---------|
| `haystack/` | Main Python package containing core framework code |
| `docs-website/` | Docusaurus-powered documentation site |
| `docker/` | Docker image definitions and build configurations |
| `pydoc/` | YAML configurations for API reference generation |
| `examples/` | Example applications and cookbooks (moved to haystack-cookbook) |

资料来源：[docs-website/README.md:40-55]()

## Installation for Development

### Standard Installation

To set up Haystack for development, install the package via pip:

```bash
pip install haystack-ai
```

### Nightly Pre-releases

For trying the newest features before official releases:

```bash
pip install --pre haystack-ai
```

### Docker-based Development

Haystack provides Docker images for development environments. The base image contains a working Python environment with Haystack preinstalled and is designed to be derived `FROM`.

```bash
docker buildx bake base
```

To build custom images with specific branches or tags:

```sh
HAYSTACK_VERSION=mybranch_or_tag BASE_IMAGE_TAG_SUFFIX=latest docker buildx bake base --no-cache
```

资料来源：[docker/README.md:15-30]()

### Multi-Platform Docker Builds

Haystack images support multiple architectures. To limit builds to your local architecture:

```bash
# For Apple M1 (ARM)
docker buildx bake base --set "*.platform=linux/arm64"
```

资料来源：[docker/README.md:40-45]()

## Documentation Development

The documentation website is built with Docusaurus 3 and provides comprehensive guides, tutorials, API references, and best practices for using Haystack.

### Prerequisites

- **Node.js** 18 or higher
- **npm** (included with Node.js) or Yarn

### Setting Up the Documentation Site

```bash
# Clone the repository and navigate to docs-website
git clone https://github.com/deepset-ai/haystack.git
cd haystack/docs-website

# Install dependencies
npm install

# Start the development server
npm start

# The site opens at http://localhost:3000 with live reload
```

### Common Documentation Tasks

| Task | Command | Location |
|------|---------|----------|
| Edit a page | Update files under `docs/` or `versioned_docs/` | Preview at http://localhost:3000 |
| Add to sidebar | Update `sidebars.js` with doc ID | `docs-website/` |
| Production check | `npm run build && npm run serve` | `docs-website/` |

资料来源：[docs-website/README.md:20-35]()

### Documentation Project Structure

```
docs-website/
├── docs/                          # Main documentation (guides, tutorials, concepts)
│   ├── _templates/               # Authoring templates (excluded from build)
│   ├── concepts/                 # Core Haystack concepts
│   ├── pipeline-components/      # Component documentation
│   └── ...
├── reference/                     # API reference (auto-generated, do not edit manually)
├── versioned_docs/               # Versioned copies of docs/
├── reference_versioned_docs/     # Versioned copies of reference/
├── src/                          # React components and custom code
│   ├── components/              # Custom React components
│   ├── css/                     # Global styles
│   ├── pages/                   # Custom pages
│   ├── remark/                  # Remark plugins
│   └── theme/                   # Docusaurus theme customization
```

资料来源：[docs-website/README.md:45-60]()

## API Reference Development

The API reference is generated automatically from docstrings in the code using [haystack-pydoc-tools](https://github.com/deepset-ai/haystack-pydoc-tools). A GitHub workflow regenerates the API reference when code changes.

### How API Reference Works

1. Create a `.yml` file in the `pydoc` directory
2. Configure how haystack-pydoc-tools will generate the page
3. Commit the configuration to the main branch
4. The GitHub workflow automatically generates the Markdown files

### Version Management

All updates to API reference live in unstable docs version and are promoted to stable docs version when a new version is released.

资料来源：[pydoc/README.md:1-20]()

## Contributing to Haystack

Haystack welcomes community contributions ranging from quick fixes like typo corrections to entirely new features.

### Contribution Areas

| Area | Repository | Description |
|------|------------|-------------|
| Main Haystack | `deepset-ai/haystack` | Core framework development |
| Integrations | `deepset-ai/haystack-core-integrations` | Integration components |
| Documentation | `haystack/docs-website` | Documentation content |

### Getting Started

1. Review the Contributor Guidelines in [CONTRIBUTING.md](https://github.com/deepset-ai/haystack/blob/main/CONTRIBUTING.md)
2. Check the [full list of open issues](https://github.com/orgs/deepset-ai/projects/14) available for contributions
3. You don't need to be a Haystack expert to provide meaningful improvements

### CI/CD and Quality Standards

The project maintains high quality standards through automated checks:

| Check | Badge | Description |
|-------|-------|-------------|
| Tests | GitHub Actions | Automated test suite |
| Type Checking | Mypy | Static type analysis |
| Code Coverage | Coverage Badge | Test coverage reporting |
| Linting | Ruff | Code style enforcement |
| License Compliance | License Check | Dependency license verification |

资料来源：[README.md:30-55]()

## Development Workflow

```mermaid
graph TD
    A[Start Development] --> B[Clone Repository]
    B --> C[Set Up Environment]
    C --> D[Install Dependencies]
    D --> E[Make Changes]
    E --> F[Run Tests]
    F --> G{Tests Pass?}
    G -->|No| H[Fix Issues]
    H --> E
    G -->|Yes| I[Run Linters]
    I --> J{Code Quality OK?}
    J -->|No| K[Address Linter Issues]
    K --> E
    J -->|Yes| L[Submit Pull Request]
    L --> M[Review Process]
    M --> N[Merge to Main]
```

## Examples and Cookbooks

Example applications have been moved to a dedicated repository. All example cookbooks are now located at:

**Repository:** [https://github.com/deepset-ai/haystack-cookbook/](https://github.com/deepset-ai/haystack-cookbook/)

This separation allows for more focused development and easier discovery of example applications.

资料来源：[examples/README.md:1-10]()

## License and Compliance

All contributions must comply with the project's license. View license information at:

- [https://github.com/deepset-ai/haystack/blob/main/LICENSE](https://github.com/deepset-ai/haystack/blob/main/LICENSE)

The project includes automated license compliance checking through GitHub workflows.

资料来源：[docker/README.md:50-60]()

## Quick Reference Commands

| Command | Purpose |
|---------|---------|
| `pip install haystack-ai` | Install Haystack |
| `pip install --pre haystack-ai` | Install pre-release version |
| `npm install` | Install documentation dependencies |
| `npm start` | Start documentation dev server |
| `npm run build` | Build documentation site |
| `docker buildx bake base` | Build Docker base image |

## Additional Resources

- **Documentation Site:** [https://docs.haystack.deepset.ai](https://docs.haystack.deepset.ai)
- **GitHub Repository:** [https://github.com/deepset-ai/haystack](https://github.com/deepset-ai/haystack)
- **Community:** [GitHub Discussions](https://github.com/deepset-ai/haystack/discussions) and [Stack Overflow](https://stackoverflow.com/questions/tagged/haystack)
- **Discord:** Join the [Haystack Discord community](https://discord.gg/VBpFBDegHY)

---

<a id='deployment'></a>

## Deployment and Infrastructure

### 相关页面

相关主题：[Development Guide](#development-guide), [Introduction to Haystack](#introduction)

<details>
<summary>相关源码文件</summary>

以下源码文件用于生成本页说明：

- [docker/Dockerfile.base](https://github.com/deepset-ai/haystack/blob/main/docker/Dockerfile.base)
- [docker/README.md](https://github.com/deepset-ai/haystack/blob/main/docker/README.md)
- [docs-website/docs/development/deployment.mdx](https://github.com/deepset-ai/haystack/blob/main/docs-website/docs/development/deployment.mdx)
- [docs-website/docs/development/deployment/docker.mdx](https://github.com/deepset-ai/haystack/blob/main/docs-website/docs/development/deployment/docker.mdx)
- [docs-website/docs/development/deployment/kubernetes.mdx](https://github.com/deepset-ai/haystack/blob/main/docs-website/docs/development/deployment/kubernetes.mdx)
- [docs-website/docs/development/enabling-gpu-acceleration.mdx](https://github.com/deepset-ai/haystack/blob/main/docs-website/docs/development/enabling-gpu-acceleration.mdx)
</details>

# Deployment and Infrastructure

## Overview

Haystack provides a comprehensive deployment infrastructure designed for production-ready LLM applications. The framework supports multiple deployment strategies including Docker containers, Kubernetes orchestration, and cloud platform integrations. This documentation covers the core deployment mechanisms, containerization approach, GPU acceleration support, and production best practices.

The deployment system is built around Docker images using BuildKit for efficient multi-platform builds, enabling deployment across x86_64 and ARM64 architectures. The infrastructure supports both development environments and production-grade deployments with high availability requirements.

## Docker Containerization

### Base Images

Haystack provides pre-built Docker images that serve as the foundation for custom deployments. The base images contain a working Python environment with Haystack preinstalled and are intended to be extended with application-specific configurations.

The primary image variant available is:

| Image Tag | Description | Use Case |
|-----------|-------------|----------|
| `haystack:base-<version>` | Base Python environment with Haystack | Custom image derivation |

All images are published to Docker Hub and can be pulled directly for use in production environments. The images follow semantic versioning and align with Haystack releases.

### Building Custom Images

Custom images can be built using Docker BuildKit and the `bake` command orchestrator. This approach allows for:

- Custom Haystack versions or branches
- Pre-installed dependencies
- Application-specific configurations
- Multi-platform support

The build process uses the `docker-bake.hcl` configuration file which defines build targets, platforms, and variable substitutions.

#### Basic Build Command

```sh
docker buildx bake base
```

#### Building with Custom Variables

To build with a custom Haystack version or branch, override the `HAYSTACK_VERSION` variable:

```sh
HAYSTACK_VERSION=mybranch_or_tag BASE_IMAGE_TAG_SUFFIX=latest docker buildx bake base --no-cache
```

This mechanism enables CI/CD pipelines to build images from specific commits, branches, or release tags without modifying the underlying Dockerfile.

### Multi-Platform Builds

Haystack Docker images support multiple architectures including:

- `linux/amd64` (x86_64)
- `linux/arm64` (ARM64)

#### Platform Limitations

Depending on the operating system and Docker environment, building all platforms locally may not be possible. If encountering the following error:

```
multiple platforms feature is currently not supported for docker driver. Please switch to a different driver
(eg. "docker buildx create --use")
```

The platform option must be overridden to match the local architecture. For example, on Apple M1 (ARM64):

```sh
docker buildx bake base --set "*.platform=linux/arm64"
```

#### Cross-Platform Considerations

When deploying multi-platform images, consider the following:

- **CPU Compatibility**: Ensure target nodes match the built architecture
- **Performance**: Native architecture builds perform optimally
- **Registry Support**: Use registries that support multi-platform manifests

## GPU Acceleration

### Hardware Acceleration Support

Haystack supports GPU acceleration for compute-intensive operations including:

- Model inference
- Embedding generation
- Tokenization
- Custom model operations

GPU acceleration significantly improves throughput for LLM-based pipelines and embedding-heavy workloads.

### Enabling GPU Support

#### NVIDIA GPUs (CUDA)

For NVIDIA GPU support, use CUDA-enabled base images and ensure the nvidia-container-toolkit is installed on the host system.

**Docker Compose Example:**

```yaml
services:
  haystack:
    image: haystack:base-latest
    runtime: nvidia
    environment:
      - NVIDIA_VISIBLE_DEVICES=all
    deploy:
      resources:
        reservations:
          devices:
            - driver: nvidia
              count: 1
              capabilities: [gpu]
```

#### AMD GPUs (ROCm)

AMD GPU support requires ROCm-enabled images and appropriate runtime configuration.

### GPU Memory Management

For production deployments, configure memory limits based on model size:

| Model Size | Recommended GPU Memory | Configuration |
|------------|------------------------|---------------|
| Small (<1B params) | 8 GB | `CUDA_VISIBLE_DEVICES=0` |
| Medium (1-7B params) | 16 GB | `CUDA_VISIBLE_DEVICES=0,1` |
| Large (7-70B params) | 32+ GB | Multi-GPU / quantization |

### Quantization Options

To reduce GPU memory requirements, consider model quantization:

- **4-bit quantization**: Reduces memory by ~75%
- **8-bit quantization**: Reduces memory by ~50%
- **Dynamic quantization**: Trade-off between speed and accuracy

## Kubernetes Deployment

### Container Orchestration

Haystack can be deployed on Kubernetes for production environments requiring:

- Horizontal scaling
- High availability
- Rolling updates
- Resource management
- Service discovery

### Resource Configuration

#### Resource Limits

Configure CPU and memory limits based on workload:

```yaml
resources:
  limits:
    cpu: "4"
    memory: "16Gi"
  requests:
    cpu: "2"
    memory: "8Gi"
```

#### GPU Resource Allocation

For GPU workloads, define accelerator resources:

```yaml
resources:
  limits:
    nvidia.com/gpu: "2"
  requests:
    nvidia.com/gpu: "1"
```

### High Availability Configuration

For production deployments, implement:

1. **Replica Sets**: Deploy multiple replicas for fault tolerance
2. **Health Checks**: Configure liveness and readiness probes
3. **Pod Disruption Budgets**: Ensure availability during updates
4. **Anti-Affinity Rules**: Distribute pods across nodes

```yaml
spec:
  replicas: 3
  strategy:
    type: RollingUpdate
    rollingUpdate:
      maxSurge: 1
      maxUnavailable: 0
```

### Service Configuration

Expose Haystack services using Kubernetes Services:

```yaml
apiVersion: v1
kind: Service
metadata:
  name: haystack-api
spec:
  selector:
    app: haystack
  ports:
    - protocol: TCP
      port: 80
      targetPort: 8000
  type: LoadBalancer
```

## Production Best Practices

### Security Considerations

| Practice | Implementation |
|----------|----------------|
| Non-root execution | Configure USER directive in Dockerfile |
| Secret management | Use Kubernetes Secrets or external secret stores |
| Network policies | Restrict pod-to-pod communication |
| Image scanning | Scan images for vulnerabilities before deployment |
| TLS termination | Configure ingress with TLS certificates |

### Monitoring and Observability

Implement monitoring using:

- **Metrics**: Prometheus exporter for pipeline metrics
- **Logging**: Centralized logging with ELK/Graylog
- **Tracing**: OpenTelemetry for request tracing
- **Alerts**: Configure alerts for error rates and latency

### Performance Optimization

1. **Connection Pooling**: Reuse database and API connections
2. **Caching**: Implement caching for frequently accessed data
3. **Batch Processing**: Process multiple requests in batches
4. **Async Processing**: Use async/await for I/O operations

## CI/CD Integration

### Automated Builds

Haystack supports automated Docker image builds through:

- GitHub Actions workflows
- BuildKit with bake files
- Multi-stage Docker builds

### Deployment Workflows

```mermaid
graph TD
    A[Code Change] --> B[Run Tests]
    B --> C[Build Docker Image]
    C --> D[Push to Registry]
    D --> E[Update Deployment]
    E --> F[Health Check]
    F --> G{Healthy?}
    G -->|Yes| H[Deployment Complete]
    G -->|No| I[Rollback]
```

### Registry Configuration

Popular registry options for Haystack images:

| Registry | Use Case | Authentication |
|----------|----------|----------------|
| Docker Hub | Public deployments | Optional |
| AWS ECR | AWS infrastructure | IAM roles |
| GCR | GCP infrastructure | Service accounts |
| Azure ACR | Azure infrastructure | Service principals |
| Private Registry | Enterprise deployments | Username/password |

## License and Compliance

The Haystack Docker images contain:

- Haystack framework code under the Apache 2.0 license
- Python runtime components
- Base distribution software with their respective licenses

Users are responsible for ensuring compliance with all software licenses contained within deployed images. For enterprise deployments, review the license implications of all included components.

## Related Documentation

- [Installation Guide](https://docs.haystack.deepset.ai/docs/installation)
- [Pipeline Components](https://docs.haystack.deepset.ai/docs/pipeline-components)
- [API Reference](https://docs.haystack.deepset.ai/reference)
- [Contributing Guide](https://github.com/deepset-ai/haystack/blob/main/CONTRIBUTING.md)

## Summary

Haystack provides a flexible and production-ready deployment infrastructure supporting Docker containerization, Kubernetes orchestration, and GPU acceleration. The multi-platform Docker images enable deployment across diverse infrastructure, while Kubernetes support facilitates enterprise-grade deployments with high availability and scalability requirements. GPU acceleration support enables high-performance inference for LLM-powered applications, with quantization options for resource-constrained environments.

---

---

## Doramagic Pitfall Log

Project: deepset-ai/haystack

Summary: Found 38 potential pitfall items; 7 are high/blocking. Highest priority: installation - 来源证据：RFC: Signed receipts for Haystack pipeline component calls.

## 1. installation · 来源证据：RFC: Signed receipts for Haystack pipeline component calls

- Severity: high
- Evidence strength: source_linked
- Finding: GitHub 社区证据显示该项目存在一个安装相关的待验证问题：RFC: Signed receipts for Haystack pipeline component calls
- User impact: 可能增加新用户试用和生产接入成本。
- Suggested check: 来源问题仍为 open，Pack Agent 需要复核是否仍影响当前版本。
- Guardrail action: 不得脱离来源链接放大为确定性结论；需要标注适用版本和复核状态。
- Evidence: community_evidence:github | cevd_192c840953e54837869723f54ccfdd1a | https://github.com/deepset-ai/haystack/issues/11039 | 来源讨论提到 python 相关条件，需在安装/试用前复核。

## 2. installation · 来源证据：feat: Add `run_async` to `MultiQueryEmbeddingRetriever`, `MultiQueryTextRetriever`, and `TextEmbeddingRetriever`

- Severity: high
- Evidence strength: source_linked
- Finding: GitHub 社区证据显示该项目存在一个安装相关的待验证问题：feat: Add `run_async` to `MultiQueryEmbeddingRetriever`, `MultiQueryTextRetriever`, and `TextEmbeddingRetriever`
- User impact: 可能阻塞安装或首次运行。
- Suggested check: 来源问题仍为 open，Pack Agent 需要复核是否仍影响当前版本。
- Guardrail action: 不得脱离来源链接放大为确定性结论；需要标注适用版本和复核状态。
- Evidence: community_evidence:github | cevd_4b8f3323f54c4fd6b8de4e2d466cfe8b | https://github.com/deepset-ai/haystack/issues/11358 | 来源讨论提到 python 相关条件，需在安装/试用前复核。

## 3. installation · 来源证据：feat: add INTERSECTION join mode to DocumentJoiner

- Severity: high
- Evidence strength: source_linked
- Finding: GitHub 社区证据显示该项目存在一个安装相关的待验证问题：feat: add INTERSECTION join mode to DocumentJoiner
- User impact: 可能增加新用户试用和生产接入成本。
- Suggested check: 来源问题仍为 open，Pack Agent 需要复核是否仍影响当前版本。
- Guardrail action: 不得脱离来源链接放大为确定性结论；需要标注适用版本和复核状态。
- Evidence: community_evidence:github | cevd_00757f9859234e9cab8f8d4ce4f3e771 | https://github.com/deepset-ai/haystack/issues/11365 | 来源类型 github_issue 暴露的待验证使用条件。

## 4. maintenance · 来源证据：docs: Update Ragas docs

- Severity: high
- Evidence strength: source_linked
- Finding: GitHub 社区证据显示该项目存在一个维护/版本相关的待验证问题：docs: Update Ragas docs
- User impact: 可能影响升级、迁移或版本选择。
- Suggested check: 来源问题仍为 open，Pack Agent 需要复核是否仍影响当前版本。
- Guardrail action: 不得脱离来源链接放大为确定性结论；需要标注适用版本和复核状态。
- Evidence: community_evidence:github | cevd_3204fffa09664d9f8553be2a3008f270 | https://github.com/deepset-ai/haystack/issues/11178 | 来源类型 github_issue 暴露的待验证使用条件。

## 5. security_permissions · 来源证据：EnvVarSecrets: add multi-tenant context support (ContextVar / pipeline-run context)

- Severity: high
- Evidence strength: source_linked
- Finding: GitHub 社区证据显示该项目存在一个安全/权限相关的待验证问题：EnvVarSecrets: add multi-tenant context support (ContextVar / pipeline-run context)
- User impact: 可能影响升级、迁移或版本选择。
- Suggested check: 来源问题仍为 open，Pack Agent 需要复核是否仍影响当前版本。
- Guardrail action: 不得脱离来源链接放大为确定性结论；需要标注适用版本和复核状态。
- Evidence: community_evidence:github | cevd_8f72793700a1416891c2eedddc379129 | https://github.com/deepset-ai/haystack/issues/11366 | 来源讨论提到 python 相关条件，需在安装/试用前复核。

## 6. security_permissions · 来源证据：Security: OWASP Agent Memory Guard for pipeline memory poisoning defense

- Severity: high
- Evidence strength: source_linked
- Finding: GitHub 社区证据显示该项目存在一个安全/权限相关的待验证问题：Security: OWASP Agent Memory Guard for pipeline memory poisoning defense
- User impact: 可能阻塞安装或首次运行。
- Suggested check: 来源问题仍为 open，Pack Agent 需要复核是否仍影响当前版本。
- Guardrail action: 不得脱离来源链接放大为确定性结论；需要标注适用版本和复核状态。
- Evidence: community_evidence:github | cevd_4f0868673100472fb74d831b5a04735f | https://github.com/deepset-ai/haystack/issues/11311 | 来源讨论提到 python 相关条件，需在安装/试用前复核。

## 7. security_permissions · 来源证据：feat: support token-based budget in LostInTheMiddleRanker

- Severity: high
- Evidence strength: source_linked
- Finding: GitHub 社区证据显示该项目存在一个安全/权限相关的待验证问题：feat: support token-based budget in LostInTheMiddleRanker
- User impact: 可能影响授权、密钥配置或安全边界。
- Suggested check: 来源问题仍为 open，Pack Agent 需要复核是否仍影响当前版本。
- Guardrail action: 不得脱离来源链接放大为确定性结论；需要标注适用版本和复核状态。
- Evidence: community_evidence:github | cevd_7ad00787309c442eb497b10879fb3b28 | https://github.com/deepset-ai/haystack/issues/11351 | 来源类型 github_issue 暴露的待验证使用条件。

## 8. installation · 失败模式：installation: Proposal: Transaction Protocol for idempotent, auditable agent pipelines

- Severity: medium
- Evidence strength: source_linked
- Finding: Developers should check this installation risk before relying on the project: Proposal: Transaction Protocol for idempotent, auditable agent pipelines
- User impact: Developers may fail before the first successful local run: Proposal: Transaction Protocol for idempotent, auditable agent pipelines
- Suggested check: Before packaging this project, run the relevant install/config/quickstart check for: Proposal: Transaction Protocol for idempotent, auditable agent pipelines. Context: Observed when using python
- Guardrail action: State this as source-backed community evidence, not as Doramagic reproduction.
- Evidence: failure_mode_cluster:github_issue | fmev_58038e9b6373edf9376049b42d4b7bb4 | https://github.com/deepset-ai/haystack/issues/11266 | Proposal: Transaction Protocol for idempotent, auditable agent pipelines

## 9. installation · 失败模式：installation: RFC: Signed receipts for Haystack pipeline component calls

- Severity: medium
- Evidence strength: source_linked
- Finding: Developers should check this installation risk before relying on the project: RFC: Signed receipts for Haystack pipeline component calls
- User impact: Developers may fail before the first successful local run: RFC: Signed receipts for Haystack pipeline component calls
- Suggested check: Before packaging this project, run the relevant install/config/quickstart check for: RFC: Signed receipts for Haystack pipeline component calls. Context: Observed when using node, python
- Guardrail action: State this as source-backed community evidence, not as Doramagic reproduction.
- Evidence: failure_mode_cluster:github_issue | fmev_ce0b9c65d21126dcf11ede12120e154f | https://github.com/deepset-ai/haystack/issues/11039 | RFC: Signed receipts for Haystack pipeline component calls

## 10. installation · 失败模式：installation: Security: OWASP Agent Memory Guard for pipeline memory poisoning defense

- Severity: medium
- Evidence strength: source_linked
- Finding: Developers should check this installation risk before relying on the project: Security: OWASP Agent Memory Guard for pipeline memory poisoning defense
- User impact: Developers may fail before the first successful local run: Security: OWASP Agent Memory Guard for pipeline memory poisoning defense
- Suggested check: Before packaging this project, run the relevant install/config/quickstart check for: Security: OWASP Agent Memory Guard for pipeline memory poisoning defense. Context: Observed when using python
- Guardrail action: State this as source-backed community evidence, not as Doramagic reproduction.
- Evidence: failure_mode_cluster:github_issue | fmev_4d3276b6b9938595cb2dbb864a5509da | https://github.com/deepset-ai/haystack/issues/11311 | Security: OWASP Agent Memory Guard for pipeline memory poisoning defense

## 11. installation · 失败模式：installation: [FEATURE] Support for code syntax-aware Document Splitters

- Severity: medium
- Evidence strength: source_linked
- Finding: Developers should check this installation risk before relying on the project: [FEATURE] Support for code syntax-aware Document Splitters
- User impact: Developers may fail before the first successful local run: [FEATURE] Support for code syntax-aware Document Splitters
- Suggested check: Before packaging this project, run the relevant install/config/quickstart check for: [FEATURE] Support for code syntax-aware Document Splitters. Context: Observed when using python
- Guardrail action: State this as source-backed community evidence, not as Doramagic reproduction.
- Evidence: failure_mode_cluster:github_issue | fmev_997b84068ae32409b1d8d55daaddd984 | https://github.com/deepset-ai/haystack/issues/11354 | [FEATURE] Support for code syntax-aware Document Splitters

## 12. installation · 来源证据：MCP Server for Haystack docs

- Severity: medium
- Evidence strength: source_linked
- Finding: GitHub 社区证据显示该项目存在一个安装相关的待验证问题：MCP Server for Haystack docs
- User impact: 可能增加新用户试用和生产接入成本。
- Suggested check: 来源显示可能已有修复、规避或版本变化，说明书中必须标注适用版本。
- Guardrail action: 不得脱离来源链接放大为确定性结论；需要标注适用版本和复核状态。
- Evidence: community_evidence:github | cevd_398390cf2fcd41d589dd5614a3bc646d | https://github.com/deepset-ai/haystack/issues/11346 | 来源类型 github_issue 暴露的待验证使用条件。

## 13. installation · 来源证据：[FEATURE] Support for code syntax-aware Document Splitters

- Severity: medium
- Evidence strength: source_linked
- Finding: GitHub 社区证据显示该项目存在一个安装相关的待验证问题：[FEATURE] Support for code syntax-aware Document Splitters
- User impact: 可能阻塞安装或首次运行。
- Suggested check: 来源问题仍为 open，Pack Agent 需要复核是否仍影响当前版本。
- Guardrail action: 不得脱离来源链接放大为确定性结论；需要标注适用版本和复核状态。
- Evidence: community_evidence:github | cevd_76b3b1b8eae94593a2cd248d0ec55e2a | https://github.com/deepset-ai/haystack/issues/11354 | 来源讨论提到 python 相关条件，需在安装/试用前复核。

## 14. installation · 来源证据：v2.25.2

- Severity: medium
- Evidence strength: source_linked
- Finding: GitHub 社区证据显示该项目存在一个安装相关的待验证问题：v2.25.2
- User impact: 可能增加新用户试用和生产接入成本。
- Suggested check: 来源显示可能已有修复、规避或版本变化，说明书中必须标注适用版本。
- Guardrail action: 不得脱离来源链接放大为确定性结论；需要标注适用版本和复核状态。
- Evidence: community_evidence:github | cevd_55d8aef5d1c3417ba9bdf05c0f5a3053 | https://github.com/deepset-ai/haystack/releases/tag/v2.25.2 | 来源类型 github_release 暴露的待验证使用条件。

## 15. installation · 来源证据：v2.26.0

- Severity: medium
- Evidence strength: source_linked
- Finding: GitHub 社区证据显示该项目存在一个安装相关的待验证问题：v2.26.0
- User impact: 可能增加新用户试用和生产接入成本。
- Suggested check: 来源显示可能已有修复、规避或版本变化，说明书中必须标注适用版本。
- Guardrail action: 不得脱离来源链接放大为确定性结论；需要标注适用版本和复核状态。
- Evidence: community_evidence:github | cevd_d73f121017b64b04a8ad885da241fc6f | https://github.com/deepset-ai/haystack/releases/tag/v2.26.0 | 来源讨论提到 python 相关条件，需在安装/试用前复核。

## 16. installation · 来源证据：v2.28.0

- Severity: medium
- Evidence strength: source_linked
- Finding: GitHub 社区证据显示该项目存在一个安装相关的待验证问题：v2.28.0
- User impact: 可能影响升级、迁移或版本选择。
- Suggested check: 来源显示可能已有修复、规避或版本变化，说明书中必须标注适用版本。
- Guardrail action: 不得脱离来源链接放大为确定性结论；需要标注适用版本和复核状态。
- Evidence: community_evidence:github | cevd_d9746a9178f0445d853c95cbb4a5241b | https://github.com/deepset-ai/haystack/releases/tag/v2.28.0 | 来源讨论提到 python 相关条件，需在安装/试用前复核。

## 17. configuration · 失败模式：configuration: MCP Server for Haystack docs

- Severity: medium
- Evidence strength: source_linked
- Finding: Developers should check this configuration risk before relying on the project: MCP Server for Haystack docs
- User impact: Developers may misconfigure credentials, environment, or host setup: MCP Server for Haystack docs
- Suggested check: Before packaging this project, run the relevant install/config/quickstart check for: MCP Server for Haystack docs. Context: Observed when using python
- Guardrail action: State this as source-backed community evidence, not as Doramagic reproduction.
- Evidence: failure_mode_cluster:github_issue | fmev_e20d9655fcfaa20fd6aea7f45a938545 | https://github.com/deepset-ai/haystack/issues/11346 | MCP Server for Haystack docs, failure_mode_cluster:github_issue | fmev_a1eed7aea672a032017343738a09159f | https://github.com/deepset-ai/haystack/issues/11346 | MCP Server for Haystack docs

## 18. configuration · 失败模式：configuration: v2.26.0

- Severity: medium
- Evidence strength: source_linked
- Finding: Developers should check this configuration risk before relying on the project: v2.26.0
- User impact: Upgrade or migration may change expected behavior: v2.26.0
- Suggested check: Before packaging this project, run the relevant install/config/quickstart check for: v2.26.0. Context: Observed when using python, windows
- Guardrail action: State this as source-backed community evidence, not as Doramagic reproduction.
- Evidence: failure_mode_cluster:github_release | fmev_3b9fc694d24804c99a261297652bf3cf | https://github.com/deepset-ai/haystack/releases/tag/v2.26.0 | v2.26.0

## 19. configuration · 失败模式：configuration: v2.28.0

- Severity: medium
- Evidence strength: source_linked
- Finding: Developers should check this configuration risk before relying on the project: v2.28.0
- User impact: Upgrade or migration may change expected behavior: v2.28.0
- Suggested check: Before packaging this project, run the relevant install/config/quickstart check for: v2.28.0. Context: Observed when using python
- Guardrail action: State this as source-backed community evidence, not as Doramagic reproduction.
- Evidence: failure_mode_cluster:github_release | fmev_0c6c5701a51e86d2246a4919b45c2606 | https://github.com/deepset-ai/haystack/releases/tag/v2.28.0 | v2.28.0

## 20. configuration · 失败模式：configuration: v2.29.0

- Severity: medium
- Evidence strength: source_linked
- Finding: Developers should check this configuration risk before relying on the project: v2.29.0
- User impact: Upgrade or migration may change expected behavior: v2.29.0
- Suggested check: Before packaging this project, run the relevant install/config/quickstart check for: v2.29.0. Context: Observed when using python
- Guardrail action: State this as source-backed community evidence, not as Doramagic reproduction.
- Evidence: failure_mode_cluster:github_release | fmev_285696f6bc066dc6f42482171a097432 | https://github.com/deepset-ai/haystack/releases/tag/v2.29.0 | v2.29.0

## 21. capability · 能力判断依赖假设

- Severity: medium
- Evidence strength: source_linked
- Finding: README/documentation is current enough for a first validation pass.
- User impact: 假设不成立时，用户拿不到承诺的能力。
- Suggested check: 将假设转成下游验证清单。
- Guardrail action: 假设必须转成验证项；没有验证结果前不能写成事实。
- Evidence: capability.assumptions | github_repo:221654678 | https://github.com/deepset-ai/haystack | README/documentation is current enough for a first validation pass.

## 22. runtime · 失败模式：runtime: v2.25.2

- Severity: medium
- Evidence strength: source_linked
- Finding: Developers should check this runtime risk before relying on the project: v2.25.2
- User impact: Upgrade or migration may change expected behavior: v2.25.2
- Suggested check: Before packaging this project, run the relevant install/config/quickstart check for: v2.25.2. Context: Observed when using python
- Guardrail action: State this as source-backed community evidence, not as Doramagic reproduction.
- Evidence: failure_mode_cluster:github_release | fmev_32dfb0f93116d56f30cc46cdab3a0751 | https://github.com/deepset-ai/haystack/releases/tag/v2.25.2 | v2.25.2

## 23. maintenance · 失败模式：migration: docs: Update Ragas docs

- Severity: medium
- Evidence strength: source_linked
- Finding: Developers should check this migration risk before relying on the project: docs: Update Ragas docs
- User impact: Developers may hit a documented source-backed failure mode: docs: Update Ragas docs
- Suggested check: Before packaging this project, run the relevant install/config/quickstart check for: docs: Update Ragas docs. Context: Observed during version upgrade or migration.
- Guardrail action: State this as source-backed community evidence, not as Doramagic reproduction.
- Evidence: failure_mode_cluster:github_issue | fmev_57550d7e13c6f14ad00a030d3e3a20db | https://github.com/deepset-ai/haystack/issues/11178 | docs: Update Ragas docs, failure_mode_cluster:github_issue | fmev_c4773f63705049b6c2714f8a4517b847 | https://github.com/deepset-ai/haystack/issues/11178 | docs: Update Ragas docs

## 24. maintenance · 来源证据：DocumentJoiner concatenate mode incorrectly drops documents with score=0.0 during deduplication

- Severity: medium
- Evidence strength: source_linked
- Finding: GitHub 社区证据显示该项目存在一个维护/版本相关的待验证问题：DocumentJoiner concatenate mode incorrectly drops documents with score=0.0 during deduplication
- User impact: 可能增加新用户试用和生产接入成本。
- Suggested check: 来源问题仍为 open，Pack Agent 需要复核是否仍影响当前版本。
- Guardrail action: 不得脱离来源链接放大为确定性结论；需要标注适用版本和复核状态。
- Evidence: community_evidence:github | cevd_9e25887dd3694aa695807058e368f46c | https://github.com/deepset-ai/haystack/issues/11352 | 来源讨论提到 python 相关条件，需在安装/试用前复核。

## 25. maintenance · 维护活跃度未知

- Severity: medium
- Evidence strength: source_linked
- Finding: 未记录 last_activity_observed。
- User impact: 新项目、停更项目和活跃项目会被混在一起，推荐信任度下降。
- Suggested check: 补 GitHub 最近 commit、release、issue/PR 响应信号。
- Guardrail action: 维护活跃度未知时，推荐强度不能标为高信任。
- Evidence: evidence.maintainer_signals | github_repo:221654678 | https://github.com/deepset-ai/haystack | last_activity_observed missing

## 26. security_permissions · 下游验证发现风险项

- Severity: medium
- Evidence strength: source_linked
- Finding: no_demo
- User impact: 下游已经要求复核，不能在页面中弱化。
- Suggested check: 进入安全/权限治理复核队列。
- Guardrail action: 下游风险存在时必须保持 review/recommendation 降级。
- Evidence: downstream_validation.risk_items | github_repo:221654678 | https://github.com/deepset-ai/haystack | no_demo; severity=medium

## 27. security_permissions · 存在评分风险

- Severity: medium
- Evidence strength: source_linked
- Finding: no_demo
- User impact: 风险会影响是否适合普通用户安装。
- Suggested check: 把风险写入边界卡，并确认是否需要人工复核。
- Guardrail action: 评分风险必须进入边界卡，不能只作为内部分数。
- Evidence: risks.scoring_risks | github_repo:221654678 | https://github.com/deepset-ai/haystack | no_demo; severity=medium

## 28. security_permissions · 来源证据：Proposal: Transaction Protocol for idempotent, auditable agent pipelines

- Severity: medium
- Evidence strength: source_linked
- Finding: GitHub 社区证据显示该项目存在一个安全/权限相关的待验证问题：Proposal: Transaction Protocol for idempotent, auditable agent pipelines
- User impact: 可能影响升级、迁移或版本选择。
- Suggested check: 来源问题仍为 open，Pack Agent 需要复核是否仍影响当前版本。
- Guardrail action: 不得脱离来源链接放大为确定性结论；需要标注适用版本和复核状态。
- Evidence: community_evidence:github | cevd_e0fcf29e18c5480baf59b94a464ecc85 | https://github.com/deepset-ai/haystack/issues/11266 | 来源讨论提到 python 相关条件，需在安装/试用前复核。

## 29. security_permissions · 来源证据：v2.26.1

- Severity: medium
- Evidence strength: source_linked
- Finding: GitHub 社区证据显示该项目存在一个安全/权限相关的待验证问题：v2.26.1
- User impact: 可能增加新用户试用和生产接入成本。
- Suggested check: 来源显示可能已有修复、规避或版本变化，说明书中必须标注适用版本。
- Guardrail action: 不得脱离来源链接放大为确定性结论；需要标注适用版本和复核状态。
- Evidence: community_evidence:github | cevd_1520403ba7f24184b2c108c30e5d609f | https://github.com/deepset-ai/haystack/releases/tag/v2.26.1 | 来源类型 github_release 暴露的待验证使用条件。

## 30. security_permissions · 来源证据：v2.27.0

- Severity: medium
- Evidence strength: source_linked
- Finding: GitHub 社区证据显示该项目存在一个安全/权限相关的待验证问题：v2.27.0
- User impact: 可能增加新用户试用和生产接入成本。
- Suggested check: 来源显示可能已有修复、规避或版本变化，说明书中必须标注适用版本。
- Guardrail action: 不得脱离来源链接放大为确定性结论；需要标注适用版本和复核状态。
- Evidence: community_evidence:github | cevd_1dddbe7bf8094d669dd185a18844ef75 | https://github.com/deepset-ai/haystack/releases/tag/v2.27.0 | 来源讨论提到 python 相关条件，需在安装/试用前复核。

## 31. capability · 失败模式：conceptual: feat: Add `run_async` to `MultiQueryEmbeddingRetriever`, `MultiQueryTextRetriever`, and `Text...

- Severity: low
- Evidence strength: source_linked
- Finding: Developers should check this conceptual risk before relying on the project: feat: Add `run_async` to `MultiQueryEmbeddingRetriever`, `MultiQueryTextRetriever`, and `TextEmbeddingRetriever`
- User impact: Developers may hit a documented source-backed failure mode: feat: Add `run_async` to `MultiQueryEmbeddingRetriever`, `MultiQueryTextRetriever`, and `TextEmbeddingRetriever`
- Suggested check: 复核 source-backed failure mode cluster，并把适用版本和验证路径写入资产。
- Guardrail action: State this as source-backed community evidence, not as Doramagic reproduction.
- Evidence: failure_mode_cluster:github_issue | fmev_bf87ad8f610a525641ac857abffd6388 | https://github.com/deepset-ai/haystack/issues/11358 | feat: Add `run_async` to `MultiQueryEmbeddingRetriever`, `MultiQueryTextRetriever`, and `TextEmbeddingRetriever`, failure_mode_cluster:github_issue | fmev_315e3f2ec26809f7348a1892a9730a05 | https://github.com/deepset-ai/haystack/issues/11358 | feat: Add `run_async` to `MultiQueryEmbeddingRetriever`, `MultiQueryTextRetriever`, and `TextEmbeddingRetriever`

## 32. capability · 失败模式：conceptual: feat: add INTERSECTION join mode to DocumentJoiner

- Severity: low
- Evidence strength: source_linked
- Finding: Developers should check this conceptual risk before relying on the project: feat: add INTERSECTION join mode to DocumentJoiner
- User impact: Developers may hit a documented source-backed failure mode: feat: add INTERSECTION join mode to DocumentJoiner
- Suggested check: 复核 source-backed failure mode cluster，并把适用版本和验证路径写入资产。
- Guardrail action: State this as source-backed community evidence, not as Doramagic reproduction.
- Evidence: failure_mode_cluster:github_issue | fmev_175e4485fffcc53c711d1fd504db9a38 | https://github.com/deepset-ai/haystack/issues/11365 | feat: add INTERSECTION join mode to DocumentJoiner

## 33. capability · 失败模式：conceptual: feat: support token-based budget in LostInTheMiddleRanker

- Severity: low
- Evidence strength: source_linked
- Finding: Developers should check this conceptual risk before relying on the project: feat: support token-based budget in LostInTheMiddleRanker
- User impact: Developers may hit a documented source-backed failure mode: feat: support token-based budget in LostInTheMiddleRanker
- Suggested check: 复核 source-backed failure mode cluster，并把适用版本和验证路径写入资产。
- Guardrail action: State this as source-backed community evidence, not as Doramagic reproduction.
- Evidence: failure_mode_cluster:github_issue | fmev_eff234be9632dc6eb35cf59720b2c3f0 | https://github.com/deepset-ai/haystack/issues/11351 | feat: support token-based budget in LostInTheMiddleRanker

## 34. runtime · 失败模式：performance: DocumentJoiner concatenate mode incorrectly drops documents with score=0.0 during deduplication

- Severity: low
- Evidence strength: source_linked
- Finding: Developers should check this performance risk before relying on the project: DocumentJoiner concatenate mode incorrectly drops documents with score=0.0 during deduplication
- User impact: Developers may hit a documented source-backed failure mode: DocumentJoiner concatenate mode incorrectly drops documents with score=0.0 during deduplication
- Suggested check: Before packaging this project, run the relevant install/config/quickstart check for: DocumentJoiner concatenate mode incorrectly drops documents with score=0.0 during deduplication. Context: Observed when using python, macos, cuda
- Guardrail action: State this as source-backed community evidence, not as Doramagic reproduction.
- Evidence: failure_mode_cluster:github_issue | fmev_7f9bb8e374256d979ec52a0c96020977 | https://github.com/deepset-ai/haystack/issues/11352 | DocumentJoiner concatenate mode incorrectly drops documents with score=0.0 during deduplication, failure_mode_cluster:github_issue | fmev_21fc5a912bed31520bb91639ca4fa3b3 | https://github.com/deepset-ai/haystack/issues/11352 | DocumentJoiner concatenate mode incorrectly drops documents with score=0.0 during deduplication

## 35. runtime · 失败模式：performance: v2.27.0

- Severity: low
- Evidence strength: source_linked
- Finding: Developers should check this performance risk before relying on the project: v2.27.0
- User impact: Upgrade or migration may change expected behavior: v2.27.0
- Suggested check: Before packaging this project, run the relevant install/config/quickstart check for: v2.27.0. Context: Observed when using python
- Guardrail action: State this as source-backed community evidence, not as Doramagic reproduction.
- Evidence: failure_mode_cluster:github_release | fmev_9757a305d020b89fd79c9dc31c6a9d1c | https://github.com/deepset-ai/haystack/releases/tag/v2.27.0 | v2.27.0

## 36. maintenance · issue/PR 响应质量未知

- Severity: low
- Evidence strength: source_linked
- Finding: issue_or_pr_quality=unknown。
- User impact: 用户无法判断遇到问题后是否有人维护。
- Suggested check: 抽样最近 issue/PR，判断是否长期无人处理。
- Guardrail action: issue/PR 响应未知时，必须提示维护风险。
- Evidence: evidence.maintainer_signals | github_repo:221654678 | https://github.com/deepset-ai/haystack | issue_or_pr_quality=unknown

## 37. maintenance · 发布节奏不明确

- Severity: low
- Evidence strength: source_linked
- Finding: release_recency=unknown。
- User impact: 安装命令和文档可能落后于代码，用户踩坑概率升高。
- Suggested check: 确认最近 release/tag 和 README 安装命令是否一致。
- Guardrail action: 发布节奏未知或过期时，安装说明必须标注可能漂移。
- Evidence: evidence.maintainer_signals | github_repo:221654678 | https://github.com/deepset-ai/haystack | release_recency=unknown

## 38. maintenance · 失败模式：maintenance: v2.26.1

- Severity: low
- Evidence strength: source_linked
- Finding: Developers should check this maintenance risk before relying on the project: v2.26.1
- User impact: Upgrade or migration may change expected behavior: v2.26.1
- Suggested check: Before packaging this project, run the relevant install/config/quickstart check for: v2.26.1. Context: Source discussion did not expose a precise runtime context.
- Guardrail action: State this as source-backed community evidence, not as Doramagic reproduction.
- Evidence: failure_mode_cluster:github_release | fmev_29416bd44cdae3aebbb8d4bd245bc398 | https://github.com/deepset-ai/haystack/releases/tag/v2.26.1 | v2.26.1

<!-- canonical_name: deepset-ai/haystack; human_manual_source: deepwiki_human_wiki -->