# https://github.com/huggingface/diffusers Project Manual

Generated at: 2026-05-30 13:42:25 UTC

## Table of Contents

- [Getting Started with Diffusers](#getting-started)
- [System Architecture](#system-architecture)
- [Pipelines Overview](#pipelines-overview)
- [Modular Diffusers](#modular-pipelines)
- [Training Guide](#training-guide)
- [Optimization Guide](#optimization-guide)
- [Loaders & Adapters](#loaders-adapters)
- [Quantization Guide](#quantization-guide)

<a id='getting-started'></a>

## Getting Started with Diffusers

### Related Pages

Related topics: [System Architecture](#system-architecture), [Pipelines Overview](#pipelines-overview)

<details>
<summary>Related Source Files</summary>

The following source files were used to generate this page:

- [README.md](https://github.com/huggingface/diffusers/blob/main/README.md)
- [PHILOSOPHY.md](https://github.com/huggingface/diffusers/blob/main/PHILOSOPHY.md)
- [examples/README.md](https://github.com/huggingface/diffusers/blob/main/examples/README.md)
- [examples/community/README_community_scripts.md](https://github.com/huggingface/diffusers/blob/main/examples/community/README_community_scripts.md)
- [src/diffusers/pipelines/pipeline_loading_utils.py](https://github.com/huggingface/diffusers/blob/main/src/diffusers/pipelines/pipeline_loading_utils.py)
- [src/diffusers/models/auto_model.py](https://github.com/huggingface/diffusers/blob/main/src/diffusers/models/auto_model.py)
- [src/diffusers/loaders/single_file.py](https://github.com/huggingface/diffusers/blob/main/src/diffusers/loaders/single_file.py)
- [src/diffusers/modular_pipelines/modular_pipeline_utils.py](https://github.com/huggingface/diffusers/blob/main/src/diffusers/modular_pipelines/modular_pipeline_utils.py)
- [src/diffusers/utils/hub_utils.py](https://github.com/huggingface/diffusers/blob/main/src/diffusers/utils/hub_utils.py)
- [src/diffusers/utils/dynamic_modules_utils.py](https://github.com/huggingface/diffusers/blob/main/src/diffusers/utils/dynamic_modules_utils.py)
</details>

# Getting Started with Diffusers

Diffusers is a state-of-the-art library for diffusion models, providing researchers and practitioners with modular, flexible, and efficient tools for image, audio, and video generation. This page serves as a comprehensive guide for getting started with Diffusers, covering installation, core concepts, model loading, and common usage patterns.

## Overview

Diffusers serves as a modular toolbox for pretrained diffusion models. According to the project philosophy, the library embraces the following design principles (Source: [PHILOSOPHY.md](https://github.com/huggingface/diffusers/blob/main/PHILOSOPHY.md)):

- **Reusability**: Pipelines should be self-contained and reusable
- **Composability**: Smaller building blocks like `attention.py`, `resnet.py`, and `embeddings.py` should be composable
- **Flexibility**: Models should expose complexity and give clear error messages
- **Performance**: Models can be optimized without major code changes while maintaining backward compatibility

The library supports a wide range of tasks including text-to-image, image-to-image, inpainting, video generation, and more. Recent releases (v0.33.0 through v0.38.0) have introduced numerous new pipelines including Wan 2.1/2.2, Flux variants, LLaDA2, and specialized ControlNet implementations.

## Installation

### Basic Installation

To install the latest stable version of Diffusers:

```bash
pip install diffusers
```

For GPU acceleration (recommended):

```bash
pip install diffusers[torch]
```

### Installing from Source

For the latest features and example scripts, install from source:

```bash
git clone https://github.com/huggingface/diffusers
cd diffusers
pip install .
```

Source: [examples/README.md](https://github.com/huggingface/diffusers/blob/main/examples/README.md)

### Example-Specific Dependencies

Training scripts and community examples may require additional dependencies:

```bash
cd examples  # Navigate to the specific example folder
pip install -r requirements.txt
```

> [!IMPORTANT]
> Example scripts frequently depend on the latest library version. Always install from source to ensure compatibility.

## Core Concepts

Understanding Diffusers requires familiarity with three fundamental building blocks: Pipelines, Models, and Schedulers.

### Architectural Overview

```mermaid
graph TD
    A[User Input] --> B[DiffusionPipeline]
    B --> C[Models]
    B --> D[Schedulers]
    B --> E[Tokenizers/Processors]
    C --> F[UNet2D / Transformer2D]
    C --> G[VAE]
    D --> H[Noise Schedule]
    F --> I[Latent Space]
    G --> J[Generated Output]
    style B fill:#e1f5fe
    style C fill:#fff3e0
    style D fill:#e8f5e8
```

### Pipelines

Pipelines are the high-level API that orchestrates the entire diffusion process. They combine models, schedulers, and optional components like tokenizers or control networks into a cohesive inference workflow.

Key pipeline characteristics (Source: [src/diffusers/pipelines/pipeline_utils.py](https://github.com/huggingface/diffusers/blob/main/src/diffusers/pipelines/pipeline_utils.py)):

| Pipeline Type | Description | Typical Use Case |
|--------------|-------------|------------------|
| `DiffusionPipeline` | Base pipeline class | Custom implementations |
| `StableDiffusionPipeline` | SD 1.x text-to-image | General image generation |
| `StableDiffusionXLPipeline` | SDXL optimized | High-quality image generation |
| `StableDiffusionControlNetPipeline` | With ControlNet | Controlled generation |
| `AutoPipeline` | Task-agnostic | Flexible pipeline selection |

### Models

Diffusers models are PyTorch modules that inherit from `ModelMixin` and `ConfigMixin`. They are designed to be:

- Composable from smaller building blocks
- Configurable with clear parameter handling
- Optimizable for memory and compute efficiency

Source: [PHILOSOPHY.md](https://github.com/huggingface/diffusers/blob/main/PHILOSOPHY.md)

Common model architectures include:

| Model | Description | Location |
|-------|-------------|----------|
| `UNet2DConditionModel` | Conditioning UNet for text-to-image | `src/diffusers/models/unets/` |
| `AutoencoderKL` | VAE for latent operations | `src/diffusers/models/autoencoders/` |
| `Transformer2DModel` | Transformer-based diffusion | `src/diffusers/models/transformers/` |
| `ControlNetModel` | ControlNet conditioning | `src/diffusers/models/controlnet/` |

### Schedulers

Schedulers implement various diffusion sampling strategies. The library supports numerous scheduling algorithms:

| Scheduler | A1111 Equivalent | Characteristics |
|-----------|------------------|-----------------|
| `DDPMScheduler` | DDPM | High-quality, many steps |
| `DDIMScheduler` | DDIM | Fast convergence |
| `DPMSolverMultistepScheduler` | DPM++ 2M | Fast, good quality |
| `EulerDiscreteScheduler` | Euler | Simple, fast |
| `EulerAncestralDiscreteScheduler` | Euler a | Ancestral sampling |
| `UniPCMultistepScheduler` | UniPC | Very fast convergence |

Source: [github.com/huggingface/diffusers/issues/4167](https://github.com/huggingface/diffusers/issues/4167)

## Loading Models and Pipelines

The library provides multiple ways to load models and pipelines, addressing common community needs around universal model loading.

### Using DiffusionPipeline (Recommended)

The `DiffusionPipeline` is the recommended entry point for loading pretrained models:

```python
from diffusers import DiffusionPipeline

# Load from Hugging Face Hub
pipeline = DiffusionPipeline.from_pretrained(
    "runwayml/stable-diffusion-v1-5",
    torch_dtype=torch.float16,
    use_safetensors=True
)

# Move to GPU
pipeline = pipeline.to("cuda")
```

Source: [src/diffusers/pipelines/pipeline_loading_utils.py](https://github.com/huggingface/diffusers/blob/main/src/diffusers/pipelines/pipeline_loading_utils.py)

### Using AutoModel Classes

For loading individual model components, use the `AutoModel` classes:

```python
from diffusers import AutoModel, AutoTokenizer

# Load a model from config automatically
model = AutoModel.from_pretrained(
    "path/to/model",
    torch_dtype=torch.float16,
    variant="fp16"
)
```

The `AutoModel` class determines the appropriate model class from the configuration:

```python
# Source: src/diffusers/models/auto_model.py
if "_class_name" in config:
    class_name = config["_class_name"]
    library = "diffusers"
elif "model_type" in config:
    class_name = "AutoModel"
    library = "transformers"
```

Source: [src/diffusers/models/auto_model.py](https://github.com/huggingface/diffusers/blob/main/src/diffusers/models/auto_model.py)

### Loading Single-File Checkpoints

For custom models stored in single checkpoint files (including GGUF formats in supported models):

```python
from diffusers import SomeModelClass

# Load from a single checkpoint file
model = SomeModelClass.from_single_file(
    "path/to/checkpoint.safetensors",
    config="path/to/config.json"  # Optional: provide config
)
```

> [!NOTE]
> The `from_single_file` method is available on models that inherit from `FromOriginalModelMixin`. Source: [src/diffusers/loaders/single_file.py](https://github.com/huggingface/diffusers/blob/main/src/diffusers/loaders/single_file.py)

The loading logic determines the appropriate method:

```python
# Source: src/diffusers/modular_pipelines/modular_pipeline_utils.py
load_method = (
    getattr(self.type_hint, "from_single_file")
    if is_single_file
    else getattr(self.type_hint, "from_pretrained")
)
```

### Loading with Trust Remote Code

Some models require executing custom code from the repository:

```python
pipeline = DiffusionPipeline.from_pretrained(
    "some/model-with-custom-code",
    trust_remote_code=True
)
```

When `trust_remote_code=True` is not set and custom code is detected, the library raises:

```
ValueError: The repository for {pretrained_model_name_or_path} contains custom code 
which must be executed to correctly load the model.
```

Source: [src/diffusers/utils/dynamic_modules_utils.py](https://github.com/huggingface/diffusers/blob/main/src/diffusers/utils/dynamic_modules_utils.py)

## Basic Usage Patterns

### Text-to-Image Generation

```python
import torch
from diffusers import DiffusionPipeline

pipeline = DiffusionPipeline.from_pretrained(
    "runwayml/stable-diffusion-v1-5",
    torch_dtype=torch.float16
)
pipeline = pipeline.to("cuda")

prompt = "a photo of an astronaut riding a horse on mars"
image = pipeline(prompt).images[0]
image.save("output.png")
```

### Image-to-Image Generation

```python
from diffusers import StableDiffusionImg2ImgPipeline
from diffusers.utils import load_image

pipe = StableDiffusionImg2ImgPipeline.from_pretrained(
    "runwayml/stable-diffusion-v1-5",
    torch_dtype=torch.float16
)
pipe = pipe.to("cuda")

init_image = load_image("path/to/input.jpg").resize((768, 768))
image = pipe(prompt="modern art style", image=init_image).images[0]
```

### Inpainting with ControlNet

```python
from diffusers import StableDiffusionControlNetPipeline, ControlNetModel
from diffusers.utils import load_image
import numpy as np
import cv2

# Load controlnet and pipeline
controlnet = ControlNetModel.from_pretrained(
    "lllyasviel/sd-controlnet-canny",
    torch_dtype=torch.float16
)
pipeline = StableDiffusionControlNetPipeline.from_pretrained(
    "runwayml/stable-diffusion-v1-5",
    controlnet=controlnet,
    torch_dtype=torch.float16
)

# Prepare control image
prompt = "your prompt"
control_image = load_image("path/to/control.jpg")

image = pipeline(prompt, image=control_image).images[0]
```

### Using Schedulers

Schedulers can be swapped for the same pipeline:

```python
from diffusers import StableDiffusionPipeline, DPMSolverMultistepScheduler

pipeline = StableDiffusionPipeline.from_pretrained(
    "runwayml/stable-diffusion-v1-5",
    torch_dtype=torch.float16
)

# Replace default scheduler with DPM++ 2M Karras
pipeline.scheduler = DPMSolverMultistepScheduler.from_config(
    pipeline.scheduler.config,
    use_karras_sigmas=True,
    algorithm_type="dpmsolver++"
)
```

## Modular Pipelines

Introduced in v0.37.0, Modular Pipelines allow composing pipelines from reusable building blocks:

```mermaid
graph LR
    A[Transformer] --> B[ModularPipeline]
    C[VAE] --> B
    D[Scheduler] --> B
    E[Text Encoder] --> B
    F[Input] --> B
    B --> G[Output]
```

### Creating Modular Pipelines

Modular pipelines are defined with a `modular_model_index.json` that specifies component types and loading hints:

```python
# Source: src/diffusers/modular_pipelines/modular_pipeline_utils.py
# Components can be loaded with or without type hints
if self.type_hint is None:
    component = AutoModel.from_pretrained(pretrained_model_name_or_path, **load_kwargs, **kwargs)
else:
    load_method = (
        getattr(self.type_hint, "from_single_file")
        if is_single_file
        else getattr(self.type_hint, "from_pretrained")
    )
    component = load_method(pretrained_model_name_or_path, **load_kwargs, **kwargs)
```

## Community Scripts

The community contributes additional pipeline implementations and utilities through community scripts:

| Example | Description | Code Example |
|---------|-------------|--------------|
| IP-Adapter Negative Noise | Using negative noise with IP-Adapter for better control | [Link](#ip-adapter-negative-noise) |
| Asymmetric Tiling | Configure seamless image tiling for X and Y axes independently | [Link](#Asymmetric-Tiling) |
| Prompt Scheduling Callback | Dynamic prompt modification during generation | [Link](#prompt-scheduling) |

Source: [examples/community/README_community_scripts.md](https://github.com/huggingface/diffusers/blob/main/examples/community/README_community_scripts.md)

### Using Community Scripts

```python
# Load a community pipeline
from diffusers import DiffusionPipeline

pipeline = DiffusionPipeline.from_pretrained(
    "diffusers/community-pipeline",
    variant="v1",
    use_safetensors=True
)
```

> [!IMPORTANT]
> Community scripts are maintained by contributors. If a community script doesn't work as expected, please open an issue and ping the author.

## Training Scripts

Diffusers provides training scripts for various tasks:

| Script | Location | Use Case |
|--------|----------|----------|
| `train_uncond.py` | `examples/` | Unconditional image generation |
| `train_controlnet.py` | `examples/controlnet/` | ControlNet training |
| `train_dreambooth.py` | `examples/dreambooth/` | DreamBooth personalization |
| `train_lora.py` | `examples/lora/` | LoRA fine-tuning |

Source: [examples/README.md](https://github.com/huggingface/diffusers/blob/main/examples/README.md)

### ControlNet Training Example

```python
from diffusers import (
    AutoencoderKL,
    ControlNetModel,
    DDPMScheduler,
    StableDiffusionControlNetPipeline,
    UniPCMultistepScheduler,
)
from diffusers.optimization import get_scheduler

# Initialize models
controlnet = ControlNetModel.from_pretrained(
    "path/to/controlnet",
    torch_dtype=torch.float16
)

pipeline = StableDiffusionControlNetPipeline.from_pretrained(
    "runwayml/stable-diffusion-v1-5",
    controlnet=controlnet,
    torch_dtype=torch.float16
)
```

Source: [examples/controlnet/train_controlnet.py](https://github.com/huggingface/diffusers/blob/main/examples/controlnet/train_controlnet.py)

## Common Configuration Options

### Pipeline Loading Options

| Parameter | Type | Default | Description |
|-----------|------|---------|-------------|
| `pretrained_model_name_or_path` | str | Required | Model identifier or local path |
| `torch_dtype` | torch.dtype | None | Data type for model weights |
| `variant` | str | None | Model variant (e.g., "fp16", "onnx") |
| `use_safetensors` | bool | None | Use SafeTensors format if available |
| `local_files_only` | bool | False | Only use local files |
| `force_download` | bool | False | Force download even if cached |
| `cache_dir` | str | None | Custom cache directory |
| `token` | str | None | Hugging Face API token |
| `revision` | str | None | Git revision |
| `trust_remote_code` | bool | False | Execute remote code |

### Device Placement

```python
# Move entire pipeline to device
pipeline = pipeline.to("cuda")

# Or move individual components
pipeline.unet = pipeline.unet.to("cuda")
pipeline.vae = pipeline.vae.to("cpu")  # Offload VAE to save memory
```

## Common Issues and Solutions

### Model Loading Failures

**Issue**: Models fail to load with config mismatch errors.

**Solution**: Check that model components are compatible. Use `use_safetensors=True` and verify the model card for requirements.

### Memory Optimization

**Issue**: Out of memory errors during inference.

**Solutions**:
```python
# Enable CPU offloading
pipeline.enable_model_cpu_offload()

# Enable sequential CPU offloading
pipeline.enable_sequential_cpu_offload()

# Use attention slicing
pipeline.enable_attention_slicing()

# Enable VAE tiling for large images
pipeline.enable_vae_tiling()
```

### Custom Model Loading

**Issue**: Community request for universal model loading (see [Issue #13683](https://github.com/huggingface/diffusers/issues/13683)).

**Approach**: For custom models or GGUF files, verify if `from_single_file` method is available on the model's class. If not, consider using the base model class with appropriate configuration.

```python
# Universal loading attempt pattern
from diffusers import AutoModel

try:
    model = AutoModel.from_pretrained("path/to/model")
except Exception as e:
    # Fallback to single file loading if supported
    model = SomeModelClass.from_single_file("path/to/checkpoint")
```

### Scheduler Compatibility

**Issue**: Scheduler mapping confusion between A1111 and Diffusers (see [Issue #4167](https://github.com/huggingface/diffusers/issues/4167)).

**Solution**: Use the scheduler mapping table to find equivalent schedulers. Karras variants have `use_karras_sigmas=True`.

## See Also

- [Modular Pipelines](./modular_pipelines.md) - Composable pipeline architecture
- [Training Guide](./training/overview.md) - Fine-tuning diffusion models
- [Optimization](./optimization/fp16.md) - Memory and speed optimization
- [API Reference](./api/pipelines/overview.md) - Pipeline API documentation
- [Model Architecture](./model_doc/overview.md) - Underlying model architectures
- [Scheduler Reference](./api/schedulers/overview.md) - Available schedulers

---

<a id='system-architecture'></a>

## System Architecture

### Related Pages

Related topics: [Pipelines Overview](#pipelines-overview), [Loaders & Adapters](#loaders-adapters)

<details>
<summary>Related Source Files</summary>

The following source files were used to generate this page:

- [PHILOSOPHY.md](https://github.com/huggingface/diffusers/blob/main/PHILOSOPHY.md)
- [src/diffusers/pipelines/pipeline_loading_utils.py](https://github.com/huggingface/diffusers/blob/main/src/diffusers/pipelines/pipeline_loading_utils.py)
- [src/diffusers/pipelines/pipeline_utils.py](https://github.com/huggingface/diffusers/blob/main/src/diffusers/pipelines/pipeline_utils.py)
- [src/diffusers/loaders/single_file.py](https://github.com/huggingface/diffusers/blob/main/src/diffusers/loaders/single_file.py)
- [src/diffusers/models/auto_model.py](https://github.com/huggingface/diffusers/blob/main/src/diffusers/models/auto_model.py)
- [src/diffusers/modular_pipelines/modular_pipeline_utils.py](https://github.com/huggingface/diffusers/blob/main/src/diffusers/modular_pipelines/modular_pipeline_utils.py)
- [src/diffusers/utils/hub_utils.py](https://github.com/huggingface/diffusers/blob/main/src/diffusers/utils/hub_utils.py)
- [src/diffusers/quantizers/gguf/gguf_quantizer.py](https://github.com/huggingface/diffusers/blob/main/src/diffusers/quantizers/gguf/gguf_quantizer.py)
- [src/diffusers/schedulers/__init__.py](https://github.com/huggingface/diffusers/blob/main/src/diffusers/schedulers/__init__.py)
- [src/diffusers/models/__init__.py](https://github.com/huggingface/diffusers/blob/main/src/diffusers/models/__init__.py)
</details>

# System Architecture

## Overview

The Hugging Face Diffusers library provides a modular, flexible architecture for diffusion-based generative models. The system is designed around composable building blocks that enable both inference and training across image, video, audio, and text generation tasks. The architecture emphasizes separation of concerns between models (the neural network weights), schedulers (the sampling algorithms), and pipelines (the orchestration layer that combines components).

Source: [PHILOSOPHY.md:1-50](https://github.com/huggingface/diffusers/blob/main/PHILOSOPHY.md)

## High-Level Architecture

The Diffusers library follows a layered architectural approach with three primary abstractions:

```mermaid
graph TD
    A[User Code] --> B[Pipeline Layer]
    B --> C[Model Layer]
    B --> D[Scheduler Layer]
    C --> E[Transformer/UNet]
    C --> F[VAE/Encoder-Decoder]
    C --> G[Text Encoder]
    D --> H[Scheduler Implementations]
    
    style B fill:#e1f5fe
    style C fill:#fff3e0
    style D fill:#e8f5e9
```

### Core Abstractions

| Layer | Purpose | Key Classes |
|-------|---------|-------------|
| **Pipeline** | Orchestration and end-to-end workflows | `DiffusionPipeline`, `StableDiffusionPipeline` |
| **Model** | Neural network architectures | `ModelMixin`, `ConfigMixin`, `AutoModel` |
| **Scheduler** | Diffusion sampling algorithms | `SchedulerMixin`, various scheduler implementations |

Source: [src/diffusers/pipelines/pipeline_utils.py:1-100](https://github.com/huggingface/diffusers/blob/main/src/diffusers/pipelines/pipeline_utils.py)

## Model Architecture

### Design Philosophy

Models in Diffusers are designed to expose complexity while providing clear error messages, following principles inspired by PyTorch's `Module` class. The architecture prioritizes modularity and extensibility, using smaller building blocks rather than monolithic model files.

Key principles from the project philosophy:

- Models make use of smaller building blocks such as `attention.py`, `resnet.py`, and `embeddings.py`
- Models do not follow the single-file policy used in Transformers
- All models inherit from `ModelMixin` and `ConfigMixin`
- Models should by default have the highest precision and lowest performance setting
- New model checkpoints should adapt existing architectures when possible

Source: [PHILOSOPHY.md:1-30](https://github.com/huggingface/diffusers/blob/main/PHILOSOPHY.md)

### ModelMixin and ConfigMixin

All Diffusers models inherit from two base classes:

```python
# From src/diffusers/models/modeling_utils.py (conceptual)
class ModelMixin:
    """Base class for all Diffusers models."""
    
    @classmethod
    def from_pretrained(cls, pretrained_model_name_or_path, **kwargs):
        """Load a pretrained model."""
        pass
    
    def save_pretrained(self, save_directory):
        """Save a model to a directory."""
        pass

class ConfigMixin:
    """Base class for configuration classes."""
    
    @classmethod
    def from_config(cls, config, **kwargs):
        """Create a model from a configuration."""
        pass
    
    def save_config(self, save_directory):
        """Save configuration to a directory."""
        pass
```

These base classes provide consistent serialization and deserialization patterns across all model types.

### AutoModel System

The `AutoModel` system provides automatic model discovery and loading based on model configuration. It resolves model classes from configuration files and supports both Diffusers-native and Transformers models.

```python
# From src/diffusers/models/auto_model.py
class AutoModel:
    @classmethod
    def from_config(cls, config, **kwargs):
        # Determines the appropriate model class from config
        # Supports _class_name for Diffusers models
        # Supports model_type for Transformers models
        pass
    
    @classmethod
    def from_pretrained(cls, pretrained_model_name_or_path, **kwargs):
        # Loads pretrained weights
        pass
```

The AutoModel system checks configuration for either `_class_name` (for Diffusers models) or `model_type` (for Transformers models) to determine the appropriate class to instantiate.

Source: [src/diffusers/models/auto_model.py:1-80](https://github.com/huggingface/diffusers/blob/main/src/diffusers/models/auto_model.py)

## Pipeline Architecture

### DiffusionPipeline

The `DiffusionPipeline` serves as the main entry point for inference. It orchestrates the loading and connection of multiple components:

```mermaid
graph LR
    A[Config/Index] --> B[DiffusionPipeline]
    B --> C[UNet2DConditionModel]
    B --> D[AutoencoderKL]
    B --> E[Text Encoder]
    B --> F[Tokenizer]
    B --> G[Scheduler]
```

The pipeline handles:
1. Component discovery from configuration files
2. Model loading with appropriate device placement
3. Scheduler integration and timestep management
4. End-to-end generation workflows

Source: [src/diffusers/pipelines/pipeline_utils.py:100-200](https://github.com/huggingface/diffusers/blob/main/src/diffusers/pipelines/pipeline_utils.py)

### Pipeline Loading Mechanisms

The library supports multiple model loading strategies:

| Loading Method | Use Case | Key Parameter |
|---------------|----------|---------------|
| `from_pretrained()` | Standard HuggingFace Hub models | `pretrained_model_name_or_path` |
| `from_single_file()` | Single checkpoint files (CKPT, Safetensors) | `checkpoint_path` |
| `AutoModel` | Auto-detection of model types | Configuration-based |

Source: [src/diffusers/pipelines/pipeline_loading_utils.py:1-80](https://github.com/huggingface/diffusers/blob/main/src/diffusers/pipelines/pipeline_loading_utils.py)

### Single File Loading

The `from_single_file` method enables loading models from single checkpoint files. This is particularly important for community models and custom checkpoints that may not follow the standard directory structure.

```python
# From src/diffusers/loaders/single_file.py
class FromOriginalModelMixin:
    @classmethod
    def from_single_file(
        cls,
        pretrained_model_link_or_path_or_dict,
        original_config=None,
        config=None,
        **kwargs
    ):
        """Load a model from a single checkpoint file."""
        pass
```

The single file loader:
- Detects model type from checkpoint structure
- Optionally applies original configuration files
- Supports GGUF quantized models

Source: [src/diffusers/loaders/single_file.py:1-100](https://github.com/huggingface/diffusers/blob/main/src/diffusers/loaders/single_file.py)

### Model Type Detection

When loading models, Diffusers determines the appropriate loading strategy:

```python
# From src/diffusers/pipelines/pipeline_loading_utils.py
is_transformers_model = (
    is_transformers_available()
    and issubclass(class_obj, PreTrainedModel)
    and transformers_version >= version.parse("4.20.0")
)

is_diffusers_model = issubclass(class_obj, diffusers_module.ModelMixin)
is_diffusers_single_file_model = issubclass(class_obj, diffusers_module.FromOriginalModelMixin)
```

This detection determines whether to use Transformers-style loading, Diffusers-native loading, or single-file loading.

Source: [src/diffusers/pipelines/pipeline_loading_utils.py:20-50](https://github.com/huggingface/diffusers/blob/main/src/diffusers/pipelines/pipeline_loading_utils.py)

## Modular Diffusers

Introduced in Diffusers 0.37.0, Modular Diffusers provides a new way to build pipelines by composing reusable blocks. Instead of writing entire pipelines from scratch, developers can mix and match building blocks to create custom workflows.

Source: [Diffusers 0.37.0 Release Notes](https://github.com/huggingface/diffusers/releases/tag/v0.37.0)

### ModularPipeline Components

```mermaid
graph TD
    A[ModularPipeline] --> B[Transformer2DModel]
    A --> C[VAE]
    A --> D[TextEncoder]
    A --> E[Scheduler]
    B --> F[Attention]
    B --> G[ResNet]
    F --> H[Embeddings]
```

The modular system uses type hints to determine the correct loading method for each component:

```python
# From src/diffusers/modular_pipelines/modular_pipeline_utils.py
load_method = (
    getattr(self.type_hint, "from_single_file")
    if is_single_file
    else getattr(self.type_hint, "from_pretrained")
)
```

Source: [src/diffusers/modular_pipelines/modular_pipeline_utils.py:1-80](https://github.com/huggingface/diffusers/blob/main/src/diffusers/modular_pipelines/modular_pipeline_utils.py)

## Scheduler System

### SchedulerMixin Base Class

All schedulers inherit from `SchedulerMixin`, which provides a common interface for:

- Setting timesteps
- Scaling model inputs
- Computing denoised images
- Stepping through the diffusion process

The scheduler system implements various diffusion sampling algorithms including:

| Scheduler | Description | Use Case |
|-----------|-------------|----------|
| DDPMScheduler | Denoising Diffusion Probabilistic Models | Training and sampling |
| DDIMScheduler | Denoising Diffusion Implicit Models | Fast sampling |
| PNDMScheduler | Pseudo Numerical Methods | Balanced speed/quality |
| LMSDiscreteScheduler | Linear Multistep Scheduler | Alternative timestepping |
| EulerDiscreteScheduler | Euler method | Simple, fast |
| EulerAncestralDiscreteScheduler | Euler with ancestral sampling | Diverse outputs |
| KarrasDiffusionSchedulers | Schedulers with Karras noise schedule | Improved quality |

Source: [src/diffusers/schedulers/__init__.py](https://github.com/huggingface/diffusers/blob/main/src/diffusers/schedulers/__init__.py)

### Scheduler-Pipeline Coupling

Schedulers are loosely coupled with pipelines, allowing users to swap schedulers to experiment with different sampling strategies:

```python
from diffusers import StableDiffusionPipeline, DDIMScheduler

pipeline = StableDiffusionPipeline.from_pretrained("runwayml/stable-diffusion-v1-5")
pipeline.scheduler = DDIMScheduler.from_config(pipeline.scheduler.config)
```

## Quantization Support

### GGUF Quantization

Diffusers supports loading GGUF-quantized models through the `GGUFQuantizer` class. This enables efficient inference on reduced precision models.

```python
# From src/diffusers/quantizers/gguf/gguf_quantizer.py
class GGUFQuantizer(DiffusersQuantizer):
    use_keep_in_fp32_modules = True
    
    def __init__(self, quantization_config, **kwargs):
        self.compute_dtype = quantization_config.compute_dtype
        self.pre_quantized = quantization_config.pre_quantized
        self.modules_to_not_convert = quantization_config.modules_to_not_convert or []
```

The GGUF quantizer:
- Supports pre-quantized models from community repositories
- Maintains FP32 precision for sensitive modules
- Requires `accelerate>=0.26.0`

Source: [src/diffusers/quantizers/gguf/gguf_quantizer.py:1-60](https://github.com/huggingface/diffusers/blob/main/src/diffusers/quantizers/gguf/gguf_quantizer.py)

## Model Loading Flow

```mermaid
sequenceDiagram
    participant User
    participant Pipeline
    participant AutoModel
    participant HubUtils
    participant Model
    
    User->>Pipeline: from_pretrained(model_id)
    Pipeline->>HubUtils: hf_hub_download(config.json)
    HubUtils-->>Pipeline: config
    Pipeline->>AutoModel: from_config(config)
    AutoModel->>AutoModel: detect_model_type(config)
    AutoModel->>HubUtils: hf_hub_download(weights)
    HubUtils-->>AutoModel: weights
    AutoModel->>Model: __init__() + load_state_dict()
    Model-->>AutoModel: model
    AutoModel-->>Pipeline: component
```

The loading process follows these steps:

1. **Configuration Loading**: Download and parse `config.json` from the hub
2. **Model Type Detection**: Determine if model is Diffusers-native, Transformers, or single-file
3. **Weight Download**: Fetch model weights from the appropriate source
4. **Model Instantiation**: Create model with empty weights, then load state dict
5. **Device Placement**: Move model to appropriate device (CPU/CUDA)

Source: [src/diffusers/utils/hub_utils.py:1-100](https://github.com/huggingface/diffusers/blob/main/src/diffusers/utils/hub_utils.py)

## Common Component Patterns

### Model Components Table

| Component | File | Purpose |
|-----------|------|---------|
| Attention | `attention.py` | Self-attention and cross-attention mechanisms |
| ResNet | `resnet.py` | Residual connections for deep networks |
| Embeddings | `embeddings.py` | Timestep and text embeddings |
| UNet | `unet_2d_blocks.py` | U-Net architecture for image generation |
| VAE | `vae.py` | Variational Autoencoder for latent spaces |

Source: [PHILOSOPHY.md:5-15](https://github.com/huggingface/diffusers/blob/main/PHILOSOPHY.md)

### Lazy Import System

Diffusers uses lazy imports to minimize startup time and reduce memory footprint:

```python
# Pipelines defer loading of heavy dependencies until first use
# From src/diffusers/pipelines/pipeline_utils.py
def __getattr__(self, name):
    if name in self._optional_components:
        # Import only when accessed
        import optional_module
        return getattr(optional_module, name)
```

## Configuration Options

### Common Pipeline Parameters

| Parameter | Type | Default | Description |
|-----------|------|---------|-------------|
| `pretrained_model_name_or_path` | str | Required | Model identifier or local path |
| `torch_dtype` | torch.dtype | None | Data type for model weights |
| `variant` | str | None | Model variant (e.g., 'fp16', 'fp32') |
| `use_safetensors` | bool | None | Use safetensors format |
| `local_files_only` | bool | False | Only use local files |
| `revision` | str | None | Git revision for Hub models |

### Model Loading Configuration

| Parameter | Purpose | Source |
|-----------|---------|--------|
| `config.json` | Model architecture | HuggingFace Hub |
| `model_index.json` | Pipeline component mapping | Pipeline root |
| `config.yaml` | Additional metadata | Optional |
| `diffusion_pytorch_model.bin` | Model weights | Primary weight file |

## Common Failure Modes

Based on community issues and documentation, users frequently encounter these architectural challenges:

### 1. Model Type Mismatch

**Issue**: Loading custom models fails with config mismatch errors.

**Cause**: The configuration file doesn't match expected structure.

**Solution**: Use `from_single_file()` with explicit configuration or provide a custom config.

Source: [Community Issue #13683](https://github.com/huggingface/diffusers/issues/13683)

### 2. Scheduler Compatibility

**Issue**: Swapping schedulers produces unexpected results.

**Cause**: Not all schedulers are compatible with all pipelines.

**Solution**: Use schedulers designed for the same discretization approach.

Source: [Community Issue #4167](https://github.com/huggingface/diffusers/issues/4167)

### 3. ModularPipeline Type Hints

**Issue**: `AutoModel` type hints in `modular_model_index.json` cause loading failures.

**Cause**: Type hint resolution fails for generic AutoModel classes.

**Solution**: Use specific model classes or provide explicit type hints.

Source: [Diffusers 0.37.1 Release Notes](https://github.com/huggingface/diffusers/releases/tag/v0.37.1)

### 4. Transformer/GGUF Version Requirements

**Issue**: GGUF loading fails with version compatibility errors.

**Cause**: Missing or incompatible `accelerate` version.

**Solution**: Ensure `accelerate>=0.26.0` is installed.

Source: [src/diffusers/quantizers/gguf/gguf_quantizer.py:20-30](https://github.com/huggingface/diffusers/blob/main/src/diffusers/quantizers/gguf/gguf_quantizer.py)

## Extension Points

### Adding Custom Models

To integrate new model checkpoints:

1. Create or adapt an existing model architecture
2. Implement `ModelMixin` and `ConfigMixin` interfaces
3. Add configuration handling for the new checkpoint format
4. Register the model in `src/diffusers/models/__init__.py`

Source: [PHILOSOPHY.md:40-50](https://github.com/huggingface/diffusers/blob/main/PHILOSOPHY.md)

### Custom Pipelines

For fundamentally different architectures, create a new pipeline class:

1. Inherit from `DiffusionPipeline`
2. Define components as class attributes
3. Implement the `__call__` method for generation
4. Add configuration parsing

## Best Practices

### Performance Optimization

1. **Use `torch_dtype=torch.float16`** for faster inference on compatible hardware
2. **Enable `use_safetensors=True`** for faster model loading
3. **Use `variant='fp16'`** when available to download pre-converted weights
4. **Enable attention slicing** for reduced memory usage

### Model Selection

| Use Case | Recommended Approach |
|----------|----------------------|
| Standard models | `DiffusionPipeline.from_pretrained()` |
| Community models | `from_single_file()` |
| Custom architectures | `AutoModel.from_config()` |
| Quantized models | GGUF quantizer |

## See Also

- [Loading Diffusers Models](https://huggingface.co/docs/diffusers/using-diffusers/loading)
- [Pipelines Overview](https://huggingface.co/docs/diffusers/api/pipelines/overview)
- [Schedulers Reference](https://huggingface.co/docs/diffusers/api/schedulers/overview)
- [Modular Pipelines Guide](https://huggingface.co/docs/diffusers/en/api/modular_pipelines)
- [Training Diffusers Models](https://huggingface.co/docs/diffusers/training/overview)
- [Optimization Guide](https://huggingface.co/docs/diffusers/optimization/fp16)

---

<a id='pipelines-overview'></a>

## Pipelines Overview

### Related Pages

Related topics: [Modular Diffusers](#modular-pipelines), [System Architecture](#system-architecture)

<details>
<summary>Related Source Files</summary>

The following source files were used to generate this page:

- [src/diffusers/pipelines/__init__.py](https://github.com/huggingface/diffusers/blob/main/src/diffusers/pipelines/__init__.py)
- [src/diffusers/pipelines/auto_pipeline.py](https://github.com/huggingface/diffusers/blob/main/src/diffusers/pipelines/auto_pipeline.py)
- [src/diffusers/pipelines/pipeline_utils.py](https://github.com/huggingface/diffusers/blob/main/src/diffusers/pipelines/pipeline_utils.py)
- [src/diffusers/pipelines/pipeline_loading_utils.py](https://github.com/huggingface/diffusers/blob/main/src/diffusers/pipelines/pipeline_loading_utils.py)
- [src/diffusers/pipelines/README.md](https://github.com/huggingface/diffusers/blob/main/src/diffusers/pipelines/README.md)
- [src/diffusers/modular_pipelines/modular_pipeline_utils.py](https://github.com/huggingface/diffusers/blob/main/src/diffusers/modular_pipelines/modular_pipeline_utils.py)
- [src/diffusers/pipelines/stable_diffusion/pipeline_stable_diffusion.py](https://github.com/huggingface/diffusers/blob/main/src/diffusers/pipelines/stable_diffusion/pipeline_stable_diffusion.py)
- [examples/community/README.md](https://github.com/huggingface/diffusers/blob/main/examples/community/README.md)
</details>

# Pipelines Overview

## Introduction

Pipelines are the primary high-level API in Diffusers for running diffusion models for inference. They provide a unified interface that orchestrates multiple components—including models, schedulers, tokenizers, and processors—to generate outputs from pretrained checkpoints. Pipelines abstract away the complexity of the diffusion process, allowing users to perform inference with just a few lines of code.

The Diffusers library ships with pipelines for diverse generation tasks including text-to-image, image-to-image, inpainting, video generation, audio generation, and text generation. Each pipeline is designed to be modular, allowing components to be swapped or customized as needed.

Source: [src/diffusers/pipelines/README.md](https://github.com/huggingface/diffusers/blob/main/src/diffusers/pipelines/README.md)

## Architecture

### Core Components

A pipeline typically consists of several interconnected components that work together during the diffusion process:

```mermaid
graph TD
    A[Pipeline] --> B[UNet / Transformer]
    A --> C[Scheduler]
    A --> D[VAE / Encoder-Decoder]
    A --> E[Text Encoder / Tokenizer]
    A --> F[Safety Checker]
    
    B --> C
    C --> B
    
    G[Input] --> A
    A --> H[Output]
    
    G --> E
    E --> B
```

| Component | Purpose | Common Classes |
|-----------|---------|----------------|
| **UNet/Transformer** | Core denoising network that predicts noise in the latent space | `UNet2DConditionModel`, `FluxTransformer2DModel` |
| **Scheduler** | Controls the diffusion timestep schedule and noise addition/removal | `DDPMScheduler`, `DDIMScheduler`, `DPMSolverMultistepScheduler` |
| **VAE** | Encodes images to latent space and decodes latents back to images | `AutoencoderKL`, `AutoencoderTiny` |
| **Text Encoder** | Converts text prompts into embeddings understood by the model | `CLIPTextModel`, `T5EncoderModel` |
| **Safety Checker** | Filters potentially unsafe outputs | `StableDiffusionSafetyChecker` |

Source: [src/diffusers/pipelines/pipeline_utils.py](https://github.com/huggingface/diffusers/blob/main/src/diffusers/pipelines/pipeline_utils.py)

### Pipeline Class Hierarchy

Diffusers uses a mixin-based architecture for pipelines, allowing for flexible composition of functionality:

```mermaid
graph TD
    A[DiffusionPipeline<br/>Base Class] --> B[StableDiffusionMixin]
    A --> C[StableDiffusionLuminaMixin]
    A --> D[AutoPipelineMixin]
    
    B --> E[StableDiffusionPipeline]
    B --> F[StableDiffusionImg2ImgPipeline]
    B --> G[StableDiffusionInpaintPipeline]
    
    D --> H[AutoPipeline]
    D --> I[AutoEncoder天堂Pipeline]
```

All pipelines inherit from `DiffusionPipeline`, which provides core functionality such as `from_pretrained()` and `save_pretrained()` methods.

Source: [src/diffusers/pipelines/pipeline_utils.py:90-139](https://github.com/huggingface/diffusers/blob/main/src/diffusers/pipelines/pipeline_utils.py#L90-L139)

## Loading Pipelines

### Standard Loading with `from_pretrained`

The primary method for loading a pipeline is through the `from_pretrained()` class method. This method accepts either a Hugging Face Hub repository ID or a local directory path.

```python
from diffusers import StableDiffusionPipeline

# Load from Hugging Face Hub
pipeline = StableDiffusionPipeline.from_pretrained(
    "runwayml/stable-diffusion-v1-5",
    torch_dtype=torch.float16
)

# Load from local directory
pipeline = StableDiffusionPipeline.from_pretrained(
    "./local/stable-diffusion-v1-5"
)
```

The method requires a `model_index.json` file in the repository or directory, which defines all components that should be loaded. Each component is specified in the format `<name>: ["<library>", "<class_name>"]`.

Source: [src/diffusers/pipelines/README.md](https://github.com/huggingface/diffusers/blob/main/src/diffusers/pipelines/README.md)

### AutoPipeline

`AutoPipeline` is a universal pipeline loader that automatically detects and loads the appropriate pipeline class based on the model configuration. This addresses the community need for a "universal method to load any model" mentioned in issue #13683.

```python
from diffusers import AutoPipeline

# Automatically detects pipeline type
pipeline = AutoPipeline.from_pretrained(
    "runwayml/stable-diffusion-v1-5",
    torch_dtype=torch.float16
)
```

The `AutoPipeline` class maintains a registry of supported pipeline types and uses type hints to determine the correct pipeline class when loading from `modular_model_index.json` files introduced in v0.37.0.

Source: [src/diffusers/pipelines/auto_pipeline.py](https://github.com/huggingface/diffusers/blob/main/src/diffusers/pipelines/auto_pipeline.py)

### Model Loading Internals

When loading a model, Diffusers follows a specific sequence to determine the appropriate loading mechanism:

```mermaid
graph TD
    A[from_pretrained called] --> B{Is Transformers model?}
    B -->|Yes| C[Use PreTrainedModel.from_pretrained]
    B -->|No| D{Is Diffusers model?}
    D -->|Yes| E[Load config, create empty model<br/>with init_empty_weights, then load]
    D -->|No| F[Try AutoModel]
    
    C --> G[Return model]
    E --> G
    F --> G
```

For Diffusers models, the library first loads the configuration, creates an empty model on meta devices, then loads the weights. For Transformers models, it delegates to the Transformers library's loading mechanism.

Source: [src/diffusers/pipelines/pipeline_loading_utils.py](https://github.com/huggingface/diffusers/blob/main/src/diffusers/pipelines/pipeline_loading_utils.py)

### Loading Parameters

| Parameter | Type | Default | Description |
|-----------|------|---------|-------------|
| `pretrained_model_name_or_path` | `str` or `Path` | Required | Model identifier or local path |
| `torch_dtype` | `torch.dtype` | `None` | Data type for model weights |
| `variant` | `str` | `None` | Model variant (e.g., `fp16`, `fp32`) |
| `use_safetensors` | `bool` | `None` | Prefer safetensors format |
| `cache_dir` | `str` | `None` | Custom cache directory |
| `local_files_only` | `bool` | `False` | Only use local files |
| `force_download` | `bool` | `False` | Force re-download |

Source: [src/diffusers/pipelines/pipeline_utils.py](https://github.com/huggingface/diffusers/blob/main/src/diffusers/pipelines/pipeline_utils.py)

## Custom Pipelines

### Loading Custom Pipelines

Diffusers supports loading custom pipelines through the `custom_pipeline` parameter. This allows users to extend the library with community-contributed or self-developed pipeline implementations.

```python
from diffusers import DiffusionPipeline

pipeline = DiffusionPipeline.from_pretrained(
    "runwayml/stable-diffusion-v1-5",
    custom_pipeline="hf-internal-testing/diffusers-dummy-pipeline",
    trust_remote_code=True
)
```

Custom pipelines can be loaded from:
- **Hugging Face Hub**: A repository ID containing a `pipeline.py` file
- **GitHub**: A community pipeline script name (loaded from `examples/community/`)
- **Local directory**: A directory containing a `pipeline.py` file

Source: [src/diffusers/pipelines/pipeline_utils.py](https://github.com/huggingface/diffusers/blob/main/src/diffusers/pipelines/pipeline_utils.py)

### Community Pipelines

Community pipelines are hosted in the `examples/community/` directory and provide extended functionality not available in core pipelines. These include ControlNet integrations, IP-Adapter implementations, and specialized generation techniques.

Community pipelines are loaded by specifying the pipeline script name (without the `.py` extension) as the `custom_pipeline` argument:

```python
pipeline = DiffusionPipeline.from_pretrained(
    "runwayml/stable-diffusion-v1-5",
    custom_pipeline="clip_guided_stable_diffusion"
)
```

Source: [examples/community/README.md](https://github.com/huggingface/diffusers/blob/main/examples/community/README.md)

## Modular Pipelines

Introduced in Diffusers v0.37.0, Modular Pipelines provide a compositional approach to building diffusion pipelines. Instead of monolithic pipeline classes, Modular Pipelines assemble reusable building blocks defined in `modular_model_index.json` files.

### How Modular Pipelines Work

```mermaid
graph LR
    A[modular_model_index.json] --> B[ModularPipeline]
    B --> C[Transformer Block 1]
    B --> D[Transformer Block 2]
    B --> E[Scheduler Component]
    B --> F[VAE Component]
    
    C --> G[Attention Module]
    D --> G
    G --> H[Model Output]
```

The `ModularPipeline` class uses `type_hint` annotations to determine the correct model class for each component, allowing flexible composition of different architectures.

Source: [src/diffusers/modular_pipelines/modular_pipeline_utils.py](https://github.com/huggingface/diffusers/blob/main/src/diffusers/modular_pipelines/modular_pipeline_utils.py)

### Loading Modular Pipelines

```python
from diffusers import ModularPipeline

pipeline = ModularPipeline.from_pretrained(
    "path/to/modular/model",
    torch_dtype=torch.float16
)
```

When loading, the pipeline:
1. Reads `modular_model_index.json` to identify components
2. Resolves `type_hint` annotations to determine model classes
3. Loads each component using appropriate `from_pretrained` or `from_single_file` methods

Source: [src/diffusers/modular_pipelines/modular_pipeline_utils.py](https://github.com/huggingface/diffusers/blob/main/src/diffusers/modular_pipelines/modular_pipeline_utils.py)

## Pipeline Execution Flow

### Standard Inference Flow

```mermaid
sequenceDiagram
    participant User
    participant Pipeline
    participant Scheduler
    participant UNet
    participant VAE
    
    User->>Pipeline: __call__(prompt)
    Pipeline->>Pipeline: Encode prompt with tokenizer & text encoder
    Pipeline->>Scheduler: Set timesteps
    Loop Denoising loop
        Pipeline->>UNet: forward(latent, timestep, encoder_hidden_states)
        UNet-->>Pipeline: noise_pred
        Pipeline->>Scheduler: step(noise_pred, timestep, latent)
        Scheduler-->>Pipeline: denoised_latent
    end
    Pipeline->>VAE: decode(denoised_latent)
    VAE-->>Pipeline: decoded_image
    Pipeline->>Pipeline: Safety check
    Pipeline-->>User: Image
```

### Example: Text-to-Image Generation

```python
from diffusers import StableDiffusionPipeline
import torch

pipeline = StableDiffusionPipeline.from_pretrained(
    "runwayml/stable-diffusion-v1-5",
    torch_dtype=torch.float16
)
pipeline.to("cuda")

image = pipeline(
    prompt="a photo of an astronaut riding a horse on mars",
    num_inference_steps=50,
    guidance_scale=7.5
).images[0]
```

Source: [src/diffusers/pipelines/stable_diffusion/pipeline_stable_diffusion.py](https://github.com/huggingface/diffusers/blob/main/src/diffusers/pipelines/stable_diffusion/pipeline_stable_diffusion.py)

## Scheduler Integration

Schedulers define the noise schedule and control how the diffusion process progresses from noise to sample. Different schedulers offer trade-offs between speed and quality:

| Scheduler | Speed | Quality | Notes |
|-----------|-------|---------|-------|
| `DDIMScheduler` | Fast | High | Good for few-step generation |
| `DDPMScheduler` | Slow | Very High | Best quality, many steps |
| `DPMSolverMultistepScheduler` | Medium | High | Fast convergence |
| `EulerDiscreteScheduler` | Variable | High | Configurable |
| `UniPCMultistepScheduler` | Fast | High | Few steps needed |

### Switching Schedulers

```python
from diffusers import StableDiffusionPipeline, DPMSolverMultistepScheduler

pipeline = StableDiffusionPipeline.from_pretrained(
    "runwayml/stable-diffusion-v1-5"
)

# Replace the default scheduler
pipeline.scheduler = DPMSolverMultistepScheduler.from_config(
    pipeline.scheduler.config
)
```

For A1111/K-Diffusion to Diffusers scheduler mapping, refer to issue #4167 which documents the correspondence between common scheduler configurations.

Source: [src/diffusers/pipelines/pipeline_utils.py](https://github.com/huggingface/diffusers/blob/main/src/diffusers/pipelines/pipeline_utils.py)

## Advanced Usage

### Single-File Model Loading

Some custom models or quantized models (including GGUF files) are distributed as single checkpoint files. Diffusers provides `from_single_file` methods for loading these:

```python
from diffusers import UNet2DConditionModel

model = UNet2DConditionModel.from_single_file(
    "https://example.com/model.safetensors",
    torch_dtype=torch.float16
)
```

The GGUF quantizer, introduced in recent versions, handles quantized GGUF checkpoint files with special loading requirements.

Source: [src/diffusers/pipelines/pipeline_loading_utils.py](https://github.com/huggingface/diffusers/blob/main/src/diffusers/pipelines/pipeline_loading_utils.py)

### Memory Optimization

For inference on limited-memory hardware, several optimization strategies are available:

```python
from diffusers import StableDiffusionPipeline

pipeline = StableDiffusionPipeline.from_pretrained(
    "runwayml/stable-diffusion-v1-5",
    torch_dtype=torch.float16,
    use_safetensors=True
)

# Enable attention slicing for lower memory usage
pipeline.enable_attention_slicing()

# Enable sequential CPU offloading
pipeline.enable_sequential_cpu_offload()

# Use xformers memory-efficient attention
pipeline.enable_xformers_memory_efficient_attention()
```

Source: [src/diffusers/pipelines/pipeline_utils.py](https://github.com/huggingface/diffusers/blob/main/src/diffusers/pipelines/pipeline_utils.py)

## Common Failure Modes and Troubleshooting

### Config Mismatch Issues

When loading custom models or third-party checkpoints, config mismatches are common. This is particularly relevant for community requests around universal model loading (issue #13683).

**Symptoms:**
- `ValueError` during model initialization
- Missing keys when loading state dict
- Type mismatch errors

**Solutions:**
1. Use `type_hint` parameter in modular pipelines to specify expected model class
2. Provide custom configuration files alongside checkpoint files
3. Use `ignore_mismatched_sizes=True` where applicable

### Trust Remote Code

Custom pipelines require `trust_remote_code=True` to execute:

```python
pipeline = DiffusionPipeline.from_pretrained(
    "owner/custom-pipeline",
    custom_pipeline="pipeline_name",
    trust_remote_code=True
)
```

Without this flag, loading pipelines with custom code will raise a `ValueError`.

Source: [src/diffusers/pipelines/pipeline_utils.py](https://github.com/huggingface/diffusers/blob/main/src/diffusers/pipelines/pipeline_utils.py)

### Flux Klein Configuration

Recent releases (v0.37.0+) have addressed specific issues with Flux Klein model loading, including proper handling of distilled and non-distilled versions. Users should ensure they are using the correct configuration variant when loading these models.

Source: [Diffusers v0.37.1 Release Notes](https://github.com/huggingface/diffusers/releases/tag/v0.37.1)

## See Also

- [Loading Pre-trained Models](https://huggingface.co/docs/diffusers/using-diffusers/loading) - Detailed guide on model loading
- [DiffusionPipeline Class Reference](https://huggingface.co/docs/diffusers/main/en/api/pipelines/overview) - API documentation
- [Custom Pipelines](https://huggingface.co/docs/diffusers/using-diffusers/custom_pipeline_overview) - Creating and loading custom pipelines
- [Modular Diffusers](https://github.com/huggingface/diffusers/releases/tag/v0.37.0) - Modular pipeline documentation
- [Schedulers](https://huggingface.co/docs/diffusers/main/en/api/schedulers/overview) - Scheduler configuration guide
- [Optimization](https://huggingface.co/docs/diffusers/optimization/fp16) - Memory and speed optimization
- [Training](https://huggingface.co/docs/diffusers/training/overview) - Training pipelines and techniques

---

<a id='modular-pipelines'></a>

## Modular Diffusers

### Related Pages

Related topics: [Pipelines Overview](#pipelines-overview), [Training Guide](#training-guide)

<details>
<summary>Related Source Files</summary>

The following source files were used to generate this page:

- [src/diffusers/modular_pipelines/modular_pipeline.py](https://github.com/huggingface/diffusers/blob/main/src/diffusers/modular_pipelines/modular_pipeline.py)
- [src/diffusers/modular_pipelines/components_manager.py](https://github.com/huggingface/diffusers/blob/main/src/diffusers/modular_pipelines/components_manager.py)
- [src/diffusers/modular_pipelines/modular_pipeline_utils.py](https://github.com/huggingface/diffusers/blob/main/src/diffusers/modular_pipelines/modular_pipeline_utils.py)
- [src/diffusers/modular_pipelines/flux/modular_pipeline.py](https://github.com/huggingface/diffusers/blob/main/src/diffusers/modular_pipelines/flux/modular_pipeline.py)
- [src/diffusers/pipelines/pipeline_loading_utils.py](https://github.com/huggingface/diffusers/blob/main/src/diffusers/pipelines/pipeline_loading_utils.py)
- [src/diffusers/models/auto_model.py](https://github.com/huggingface/diffusers/blob/main/src/diffusers/models/auto_model.py)
- [src/diffusers/loaders/single_file.py](https://github.com/huggingface/diffusers/blob/main/src/diffusers/loaders/single_file.py)
- [src/diffusers/quantizers/gguf/gguf_quantizer.py](https://github.com/huggingface/diffusers/blob/main/src/diffusers/quantizers/gguf/gguf_quantizer.py)
- [examples/README.md](https://github.com/huggingface/diffusers/blob/main/examples/README.md)
- [docs/source/en/modular_diffusers/overview.md](https://github.com/huggingface/diffusers/blob/main/docs/source/en/modular_diffusers/overview.md)
</details>

# Modular Diffusers

## Overview

Modular Diffusers is a framework introduced in Diffusers v0.37.0 that enables building diffusion pipelines by composing reusable, modular building blocks. Instead of writing entire pipelines from scratch, developers can mix and match components to create custom workflows tailored to specific use cases.

The core philosophy behind Modular Diffusers is **composability**—allowing users to:
- Reuse existing pipeline components across different models
- Swap individual components (transformers, schedulers, guiders) without rewriting entire pipelines
- Create custom pipelines by combining standardized building blocks
- Share and distribute custom pipeline configurations through Hugging Face Hub

Source: [docs/source/en/modular_diffusers/overview.md](https://github.com/huggingface/diffusers/blob/main/docs/source/en/modular_diffusers/overview.md)

## Architecture

### Component Hierarchy

Modular Diffusers organizes pipeline components into a hierarchical structure. The main components include:

```mermaid
graph TD
    A[ModularPipeline] --> B[ComponentsManager]
    A --> C[PipelineConfig]
    B --> D[Transformer]
    B --> E[TextEncoder/TextEncoder 2]
    B --> F[VAE/AutoencoderKL]
    B --> G[Scheduler]
    B --> H[Guider]
    B --> I[Tokenizer]
    D --> J[Flux Transformer]
    D --> K[UNet2DConditionModel]
    H --> L[FlowMatcherGuider]
    H --> M[DPMSolverMultistepGuider]
```

### Core Components

| Component Type | Description | Base Class |
|----------------|-------------|------------|
| Transformer | The core diffusion model that performs the denoising process | `ModelMixin` |
| TextEncoder | Encodes text prompts into embeddings | `PreTrainedModel` |
| VAE/AutoencoderKL | Encodes images to latent space and decodes back | `ModelMixin` |
| Scheduler | Controls the diffusion sampling process | `SchedulerMixin` |
| Guider | Guides the generation process (CFG, flow matching) | `Guider` |
| Tokenizer | Converts text to token IDs | `PreTrainedTokenizer` |

Source: [src/diffusers/modular_pipelines/components_manager.py](https://github.com/huggingface/diffusers/blob/main/src/diffusers/modular_pipelines/components_manager.py)

### Type Hints System

Modular Diffusers uses type hints to resolve which class should be loaded for each component. This allows flexible component substitution while maintaining type safety.

The system supports the following type hint sources:

| Source Type | Resolution Method |
|-------------|-------------------|
| Direct class reference | Uses the specified class directly |
| `AutoModel` | Uses `AutoModel.from_pretrained()` |
| `AutoModelForClassDiffusion` | Uses class-specific auto model |
| Transformers models | Uses `transformers.AutoModel` |

Source: [src/diffusers/modular_pipelines/modular_pipeline_utils.py:1-100](https://github.com/huggingface/diffusers/blob/main/src/diffusers/modular_pipelines/modular_pipeline_utils.py)

### Guider System

The Guider system abstracts guidance computation from individual pipelines, allowing different guidance strategies to be applied uniformly:

```mermaid
graph LR
    A[NoGuider] --> B[Base Guider Interface]
    C[FlowMatcherGuider] --> B
    D[DPMSolverMultistepGuider] --> B
    B --> E[ModularPipeline]
```

| Guider Type | Purpose | Configuration Key |
|-------------|---------|-------------------|
| `NoGuider` | No guidance applied | Default |
| `FlowMatcherGuider` | Flow matching guidance for Flux models | `guider` config |
| `DPMSolverMultistepGuider` | DPM-Solver guidance | `guider` config |

Source: [src/diffusers/guiders/__init__.py](https://github.com/huggingface/diffusers/blob/main/src/diffusers/guiders/__init__.py)

## Loading Components

### From Pretrained Models

Modular pipelines automatically resolve and load components from the Hugging Face Hub:

```python
from diffusers.modular_pipelines import ModularPipeline

# Load a complete modular pipeline
pipeline = ModularPipeline.from_pretrained(
    "stabilityai/stable-diffusion-xl-base-1.0",
    torch_dtype=torch.float16
)
```

The loading process follows this sequence:

```mermaid
sequenceDiagram
    participant User
    participant ModularPipeline
    participant ComponentsManager
    participant AutoModel
    participant HuggingFace

    User->>ModularPipeline: from_pretrained(path)
    ModularPipeline->>HuggingFace: Download modular_model_index.json
    ModularPipeline->>ComponentsManager: Parse component configs
    ComponentsManager->>AutoModel: Resolve class from type_hint
    AutoModel->>HuggingFace: Download model weights
    ComponentsManager->>ComponentsManager: Instantiate components
    ModularPipeline->>User: Return assembled pipeline
```

Source: [src/diffusers/pipelines/pipeline_loading_utils.py:1-60](https://github.com/huggingface/diffusers/blob/main/src/diffusers/pipelines/pipeline_loading_utils.py)

### With Type Hints

When loading components that lack sufficient configuration, specify `type_hint` to guide the loader:

```python
from diffusers import AutoModel
from diffusers.modular_pipelines import ComponentsManager

manager = ComponentsManager()

# Specify type hint for component resolution
manager.add_component(
    name="transformer",
    pretrained_model_name_or_path="./my_custom_model",
    type_hint=AutoModel  # or specific class like FluxTransformer2DModel
)
```

Source: [src/diffusers/modular_pipelines/modular_pipeline_utils.py:50-80](https://github.com/huggingface/diffusers/blob/main/src/diffusers/modular_pipelines/modular_pipeline_utils.py)

### Single File Model Loading

Modular Diffusers supports loading models from single checkpoint files using `from_single_file`:

```python
from diffusers.modular_pipelines import ModularPipeline

pipeline = ModularPipeline.from_single_file(
    pretrained_model_link_or_path="./checkpoint.safetensors",
    original_config="./config.yaml"
)
```

The system detects single-file models and routes them appropriately:

```python
# From src/diffusers/loaders/single_file.py
is_diffusers_single_file_model = issubclass(class_obj, diffusers_module.FromOriginalModelMixin)

if is_diffusers_single_file_model:
    load_method = getattr(class_obj, "from_single_file")
    loaded_sub_model = load_method(
        pretrained_model_link_or_path_or_dict=checkpoint,
        original_config=original_config,
        config=cached_model_config_path,
        subfolder=name,
        torch_dtype=torch_dtype,
        **kwargs,
    )
```

Source: [src/diffusers/loaders/single_file.py:1-60](https://github.com/huggingface/diffusers/blob/main/src/diffusers/loaders/single_file.py)

## Flux Modular Pipeline

The Flux model family uses specialized modular pipeline implementations that handle both full and distilled model variants.

### FluxPipeline Structure

```mermaid
graph TD
    subgraph FluxPipeline
        A[Transformer] --> B[FluxTransformer2DModel]
        C[TextEncoder] --> D[CLIPTextModel/CLIPTextModelWithProjection]
        C --> E[T5TextEncoder]
        F[VAE] --> G[AutoencoderKL]
        H[Scheduler] --> I[FlowMatchEulerDiscreteScheduler]
        J[Guider] --> K[FlowMatcherGuider]
    end
```

### Configuration for Distilled Models

Flux models may use distilled versions that affect guidance configuration. The modular pipeline automatically detects and handles this:

```python
# Distilled model handling in modular_pipeline.py
if hasattr(config, "guidance_scale"):
    guider_config = {"guider": {"class_name": "FlowMatcherGuider"}}
else:
    guider_config = {"guider": {"class_name": "NoGuider"}}
```

Source: [src/diffusers/modular_pipelines/flux/modular_pipeline.py](https://github.com/huggingface/diffusers/blob/main/src/diffusers/modular_pipelines/flux/modular_pipeline.py)

## Configuration Schema

### Modular Model Index JSON

The `modular_model_index.json` file defines the pipeline configuration:

```json
{
  "_class_name": "ModularPipeline",
  "components": {
    "transformer": {
      "type_hint": "FluxTransformer2DModel",
      "pretrained_model_name_or_path": "black-forest-labs/FLUX.1-dev"
    },
    "text_encoder": {
      "type_hint": "CLIPTextModel",
      "pretrained_model_name_or_path": "black-forest-labs/FLUX.1-dev"
    },
    "text_encoder_2": {
      "type_hint": "T5EncoderModel",
      "pretrained_model_name_or_path": "black-forest-labs/FLUX.1-dev"
    }
  }
}
```

### Component Configuration Options

| Parameter | Description | Default |
|-----------|-------------|---------|
| `type_hint` | Class to use for loading | Auto-detected |
| `pretrained_model_name_or_path` | Model path or identifier | Required |
| `subfolder` | Subdirectory within model | None |
| `variant` | Model variant (e.g., "fp16") | None |
| `torch_dtype` | Data type for weights | None |
| `use_safetensors` | Use safe serialization | Auto |

Source: [src/diffusers/modular_pipelines/components_manager.py:1-80](https://github.com/huggingface/diffusers/blob/main/src/diffusers/modular_pipelines/components_manager.py)

## Common Patterns

### Creating a Custom Pipeline

```python
from diffusers import (
    ModularPipeline,
    FluxTransformer2DModel,
    FlowMatchEulerDiscreteScheduler,
    FlowMatcherGuider
)

# Define custom configuration
custom_config = {
    "transformer": {
        "type_hint": FluxTransformer2DModel,
        "pretrained_model_name_or_path": "custom/model"
    },
    "scheduler": {
        "type_hint": FlowMatchEulerDiscreteScheduler
    }
}

# Create pipeline with custom config
pipeline = ModularPipeline.from_config(custom_config)
```

### Mixing Components from Different Pipelines

```python
from diffusers import AutoModel

# Load base pipeline
pipeline = ModularPipeline.from_pretrained("base/pipeline")

# Replace transformer with a custom variant
pipeline.transformer = AutoModel.from_pretrained(
    "custom/transformer",
    type_hint=type(pipeline.transformer)
)
```

### Using with LoRA Adapters

```python
from diffusers import StableDiffusionXLPipeline
from diffusers.loaders import PeftAdapterMixin

# Load pipeline with LoRA support
pipeline = StableDiffusionXLPipeline.from_pretrained(
    "sdxl/pipeline",
    torch_dtype=torch.float16
)

# Load and apply LoRA adapter
pipeline.load_adapters("path/to/lora", adapter_name="my_adapter")
pipeline.set_adapters("my_adapter")
```

Source: [src/diffusers/models/auto_model.py:40-80](https://github.com/huggingface/diffusers/blob/main/src/diffusers/models/auto_model.py)

## GGUF Quantization Support

Modular Diffusers supports GGUF-quantized models for reduced memory footprint:

```python
from diffusers import AutoModel
from diffusers.quantizers.gguf import GGUFQuantizer

# Configure GGUF quantization
quantization_config = GGUFQuantizer(
    compute_dtype="float16",
    pre_quantized=True,
    modules_to_not_convert=["lm_head"]
)

# Load quantized model
model = AutoModel.from_pretrained(
    "quantized/model.gguf",
    quantization_config=quantization_config,
    torch_dtype=torch.float16
)
```

### GGUF Quantization Parameters

| Parameter | Type | Description |
|-----------|------|-------------|
| `compute_dtype` | `torch.dtype` | Computation data type |
| `pre_quantized` | `bool` | Model is pre-quantized |
| `modules_to_not_convert` | `list` | Modules to keep in FP32 |
| `use_keep_in_fp32_modules` | `bool` | Keep specified modules in FP32 |

Source: [src/diffusers/quantizers/gguf/gguf_quantizer.py:1-50](https://github.com/huggingface/diffusers/blob/main/src/diffusers/quantizers/gguf/gguf_quantizer.py)

## Common Failure Modes

### Type Hint Resolution Failures

When `type_hint` is missing and `AutoModel` cannot determine the correct class:

```
ValueError: Unable to load transformer without `type_hint`
```

**Solution**: Explicitly provide `type_hint` for the component.

```python
from diffusers import AutoModel

manager.add_component(
    name="transformer",
    pretrained_model_name_or_path="./custom_model",
    type_hint=AutoModel  # or specific class
)
```

Source: [src/diffusers/modular_pipelines/modular_pipeline_utils.py:60-70](https://github.com/huggingface/diffusers/blob/main/src/diffusers/modular_pipelines/modular_pipeline_utils.py)

### Config Mismatch with Transformers Models

When loading models that mix Diffusers and Transformers components:

```
ValueError: `config_class` cannot be None. Please double-check the model.
```

**Solution**: Ensure the model's config includes proper `model_type` or `_class_name` fields.

### Single File Loading with Missing Config

When loading from single files without an original config:

```
ValueError: The repository contains custom code which must be executed
```

**Solution**: Pass `trust_remote_code=True` or provide `original_config` path.

```python
pipeline = ModularPipeline.from_single_file(
    "./checkpoint.safetensors",
    original_config="./config.yaml",
    trust_remote_code=True
)
```

Source: [src/diffusers/loaders/single_file.py:30-60](https://github.com/huggingface/diffusers/blob/main/src/diffusers/loaders/single_file.py)

### Flux Klein LoRA Loading Issues

Community reports indicate issues with Flux Klein LoRA loading in some configurations. This was addressed in v0.37.1 with fixes for proper LoRA adapter handling with Flux models.

Reference: [GitHub Issue #13313](https://github.com/huggingface/diffusers/issues/13313)

## Examples and Usage

### Running Example Scripts

To use Modular Diffusers with example scripts:

```bash
git clone https://github.com/huggingface/diffusers
cd diffusers
pip install .

# Install example requirements
cd examples
pip install -r requirements.txt
```

Source: [examples/README.md](https://github.com/huggingface/diffusers/blob/main/examples/README.md)

### Community Scripts

The community maintains additional modular pipeline examples:

| Example | Description | Author |
|---------|-------------|--------|
| IP-Adapter Negative Noise | Advanced IP-Adapter control | Álvaro Somoza |
| Asymmetric Tiling | Seamless image tiling | alexisrolland |
| Prompt Scheduling | Dynamic prompt control | Community |

Reference: [examples/community/README_community_scripts.md](https://github.com/huggingface/diffusers/blob/main/examples/community/README_community_scripts.md)

## See Also

- [Pipeline Loading](https://github.com/huggingface/diffusers/blob/main/src/diffusers/pipelines/pipeline_loading_utils.py) - General pipeline loading mechanisms
- [AutoModel](https://github.com/huggingface/diffusers/blob/main/src/diffusers/models/auto_model.py) - Automatic model loading
- [Guider System](https://github.com/huggingface/diffusers/blob/main/src/diffusers/guiders/__init__.py) - Guidance abstraction layer
- [Single File Loading](https://github.com/huggingface/diffusers/blob/main/src/diffusers/loaders/single_file.py) - Loading from checkpoint files
- [GGUF Quantization](https://github.com/huggingface/diffusers/blob/main/src/diffusers/quantizers/gguf/gguf_quantizer.py) - Quantized model support

---

<a id='training-guide'></a>

## Training Guide

### Related Pages

Related topics: [Loaders & Adapters](#loaders-adapters), [Optimization Guide](#optimization-guide)

<details>
<summary>Related Source Files</summary>

The following source files were used to generate this page:

- [examples/dreambooth/train_dreambooth_lora.py](https://github.com/huggingface/diffusers/blob/main/examples/dreambooth/train_dreambooth_lora.py)
- [examples/text_to_image/train_text_to_image_lora.py](https://github.com/huggingface/diffusers/blob/main/examples/text_to_image/train_text_to_image_lora.py)
- [examples/controlnet/train_controlnet.py](https://github.com/huggingface/diffusers/blob/main/examples/controlnet/train_controlnet.py)
- [examples/advanced_diffusion_training/train_dreambooth_lora_flux_advanced.py](https://github.com/huggingface/diffusers/blob/main/examples/advanced_diffusion_training/train_dreambooth_lora_flux_advanced.py)
- [examples/consistency_distillation/train_lcm_distill_lora_sdxl.py](https://github.com/huggingface/diffusers/blob/main/examples/consistency_distillation/train_lcm_distill_lora_sdxl.py)
- [src/diffusers/training_utils.py](https://github.com/huggingface/diffusers/blob/main/src/diffusers/training_utils.py)
- [src/diffusers/pipelines/pipeline_loading_utils.py](https://github.com/huggingface/diffusers/blob/main/src/diffusers/pipelines/pipeline_loading_utils.py)
- [src/diffusers/models/auto_model.py](https://github.com/huggingface/diffusers/blob/main/src/diffusers/models/auto_model.py)
- [examples/README.md](https://github.com/huggingface/diffusers/blob/main/examples/README.md)
- [examples/research_projects/scheduled_huber_loss_training/text_to_image/train_text_to_image_lora_sdxl.py](https://github.com/huggingface/diffusers/blob/main/examples/research_projects/scheduled_huber_loss_training/text_to_image/train_text_to_image_lora_sdxl.py)
</details>

# Training Guide

## Overview

The Hugging Face Diffusers library provides a comprehensive suite of training scripts and utilities for fine-tuning diffusion models. Training in Diffusers enables users to adapt pretrained models for custom tasks, create personalized outputs, and optimize models for specific domains or styles.

Training scripts in Diffusers are designed to be **easy-to-tweak**, **beginner-friendly**, and **one-purpose-only**. While they are not intended to provide state-of-the-art training methods for the newest models, they serve as excellent starting points for understanding diffusion model training and for adapting to specific use cases. Source: [examples/README.md](https://github.com/huggingface/diffusers/blob/main/examples/README.md)

### Key Training Objectives

Diffusers training supports several fundamental objectives:

| Objective | Description | Common Use Cases |
|-----------|-------------|------------------|
| **Personalization** | Fine-tune models to generate content in a specific style or about specific subjects | DreamBooth, LoRA fine-tuning |
| **Control** | Add conditioning mechanisms to guide generation | ControlNet, adapter training |
| **Efficiency** | Distill knowledge or compress models for faster inference | LCM distillation, quantization |
| **Domain Adaptation** | Adapt models to specific data distributions | Custom dataset fine-tuning |

## Architecture

### Training System Components

```mermaid
graph TD
    A[Training Pipeline] --> B[Model Loading]
    A --> C[Data Loading]
    A --> D[Optimizer Setup]
    A --> E[Training Loop]
    
    B --> B1[pretrained_model_name_or_path]
    B --> B2[variant]
    B --> B3[revision]
    
    C --> C1[dataset_name]
    C --> C2[pretrained_vae]
    C --> C3[image processing]
    
    D --> D1[Learning Rate]
    D --> D2[AdamW]
    D --> D3[lr_scheduler]
    
    E --> E1[Gradient Computation]
    E --> E2[Optimization Step]
    E --> E3[Checkpointing]
```

### Training Script Types

Diffusers organizes training scripts by task and complexity level:

| Directory | Purpose | Example Scripts |
|-----------|---------|-----------------|
| `examples/dreambooth/` | DreamBooth personalization | LoRA, Full fine-tuning |
| `examples/text_to_image/` | Text-to-image training | LoRA, custom datasets |
| `examples/controlnet/` | ControlNet training | ControlNet, Flux ControlNet |
| `examples/advanced_diffusion_training/` | Advanced techniques | Flux LoRA, Dreambooth advanced |
| `examples/consistency_distillation/` | Model distillation | LCM LoRA distillation |
| `examples/research_projects/` | Community research | Scheduled Huber loss |

## Common Training Patterns

### Model Loading

All training scripts follow a consistent pattern for loading pretrained models:

```python
# Load pretrained UNet/Transformer
unet = UNet2DConditionModel.from_pretrained(
    pretrained_model_name_or_path,
    subfolder="unet",
    variant=variant,
    revision=revision,
)

# Load pretrained VAE for numerical stability
vae = AutoencoderKL.from_pretrained(
    pretrained_model_name_or_path,
    subfolder="vae",
    variant=variant,
    revision=revision,
)

# Load pretrained VAE separately if specified
if pretrained_vae_model_name_or_path:
    vae = AutoencoderKL.from_pretrained(pretrained_vae_model_name_or_path)
```

Source: [examples/controlnet/train_controlnet.py:100-140](https://github.com/huggingface/diffusers/blob/main/examples/controlnet/train_controlnet.py)

### Core Training Arguments

Training scripts share common command-line arguments:

| Argument | Type | Default | Description |
|----------|------|---------|-------------|
| `--pretrained_model_name_or_path` | str | required | Model identifier from HuggingFace Hub |
| `--pretrained_vae_model_name_or_path` | str | None | Path to pretrained VAE with better numerical stability |
| `--variant` | str | None | Variant of model files (e.g., `fp16`) |
| `--revision` | str | None | Git revision of pretrained model |
| `--dataset_name` | str | None | Dataset name from HuggingFace Hub |
| `--output_dir` | str | required | Directory for checkpoints and outputs |
| `--cache_dir` | str | None | Cache directory for downloaded models |
| `--seed` | int | None | Random seed for reproducibility |

Source: [examples/text_to_image/train_text_to_image_lora.py](https://github.com/huggingface/diffusers/blob/main/examples/text_to_image/train_text_to_image_lora.py)

### Dataset Configuration

Training scripts support multiple dataset formats and configurations:

```python
# From HuggingFace Hub
--dataset_name="dataset-name"

# From local directory
--train_data_dir="/path/to/local/data"

# Dataset configuration (when applicable)
--dataset_config_name="config-name"
```

The dataset must follow a specific structure, particularly for image datasets that need to work with HuggingFace Datasets' ImageFolder format. Source: [examples/research_projects/scheduled_huber_loss_training/text_to_image/train_text_to_image_lora_sdxl.py](https://github.com/huggingface/diffusers/blob/main/examples/research_projects/scheduled_huber_loss_training/text_to_image/train_text_to_image_lora_sdxl.py)

## Training Methods

### LoRA (Low-Rank Adaptation)

LoRA training adds trainable low-rank matrices to existing model layers, significantly reducing the number of trainable parameters while maintaining quality.

```python
# Enable LoRA training
lora_attn_procs = {}
for name, attn_processor in unet.attn_processors.items():
    # Initialize LoRA attention processors
    ...
unet.set_attn_processor(lora_attn_procs)
unet.train()
```

**Key benefits:**
- Reduced memory footprint
- Faster training times
- Easy to merge and unmerge
- Compatible with most model architectures

Source: [examples/text_to_image/train_text_to_image_lora.py](https://github.com/huggingface/diffusers/blob/main/examples/text_to_image/train_text_to_image_lora.py)

### DreamBooth

DreamBooth enables subject-driven personalization by fine-tuning a diffusion model on a few images of a specific subject with a unique identifier.

```python
# Special identifier for the subject
instance_prompt = "a photo of a sks dog"  # "sks" is the unique identifier

# Class-specific preservation prompt
class_prompt = "a photo of a dog"

# Training with prior preservation loss
# Helps maintain the model's knowledge about the class
```

Source: [examples/dreambooth/train_dreambooth_lora.py](https://github.com/huggingface/diffusers/blob/main/examples/dreambooth/train_dreambooth_lora.py)

### ControlNet Training

ControlNet trains additional conditioning branches that can control diffusion model outputs based on various input modalities (canny edges, poses, depth maps, etc.).

```python
# Initialize ControlNet
controlnet = ControlNetModel.from_unet(unet)

# Prepare ControlNet conditions
control_image = load_control_image(control_image_path)
control_image = controlnet_image_processor.preprocess(control_image)

# Training with ControlNet conditions
with torch.no_grad():
    # Forward pass with ControlNet conditioning
    ...
```

Source: [examples/controlnet/train_controlnet.py](https://github.com/huggingface/diffusers/blob/main/examples/controlnet/train_controlnet.py)

### Consistency Distillation (LCM)

Latent Consistency Models (LCM) distill the iterative denoising process into fewer steps for fast inference.

```python
# Teacher model for distillation
teacher_unet = UNet2DConditionModel.from_pretrained(
    pretrained_model_name_or_path,
    subfolder="unet",
)

# LCM-specific training parameters
--num_train_timesteps=1000
--GuidanceScale=0.0  # CFG disabled for LCM
--sigma_min=0.002
--sigma_max=14.61
```

Source: [examples/consistency_distillation/train_lcm_distill_lora_sdxl.py](https://github.com/huggingface/diffusers/blob/main/examples/consistency_distillation/train_lcm_distill_lora_sdxl.py)

## Advanced Training Configuration

### Flux Training

Flux models use a different architecture requiring specific training configurations:

```python
# Flux-specific model loading
transformer = FluxTransformer2DModel.from_pretrained(
    pretrained_model_name_or_path="black-forest-labs/FLUX.1-dev",
    subfolder="transformer",
)

# Flux training arguments
--flux=True
--max_sequence_length=512
--rank=4
--lambda_lora=1.0
```

Source: [examples/advanced_diffusion_training/train_dreambooth_lora_flux_advanced.py](https://github.com/huggingface/diffusers/blob/main/examples/advanced_diffusion_training/train_dreambooth_lora_flux_advanced.py)

### Training Utilities

The `training_utils.py` module provides core utilities for model training:

```python
from diffusers.training_utils import (
    FreeKLScheduler,
    compute_snr,
    scale_lora,
    unet_lora_state_dict,
)
```

Key utility functions include:
- **FreeKLScheduler**: Implements FreeBIT-style scheduling for knowledge distillation
- **compute_snr()**: Computes Signal-to-Noise Ratio for advanced scheduling
- **scale_lora()**: Scales LoRA weights for merging
- **unet_lora_state_dict()**: Extracts LoRA state dict for saving

Source: [src/diffusers/training_utils.py](https://github.com/huggingface/diffusers/blob/main/src/diffusers/training_utils.py)

## Training Workflow

```mermaid
graph LR
    A[Setup Environment] --> B[Prepare Dataset]
    B --> C[Load Pretrained Models]
    C --> D[Initialize LoRA/Adapters]
    D --> E[Training Loop]
    E --> F{Epoch Complete?}
    F -->|Yes| G[Save Checkpoint]
    F -->|No| E
    G --> H{More Epochs?}
    H -->|Yes| E
    H -->|No| I[Export Final Model]
    I --> J[Merge LoRA (optional)]
```

## Common Failure Modes and Troubleshooting

### Model Loading Issues

| Issue | Cause | Solution |
|-------|-------|----------|
| `Repository not found` | Invalid model identifier | Verify model name on HuggingFace Hub |
| `Revision not found` | Non-existent git revision | Use `revision="main"` or valid commit hash |
| `Variant not found` | Missing weight variant | Omit `--variant` or check available variants |
| Config mismatch | Model architecture changed | Update model reference or use specific revision |

Source: [src/diffusers/pipelines/pipeline_loading_utils.py](https://github.com/huggingface/diffusers/blob/main/src/diffusers/pipelines/pipeline_loading_utils.py)

### Memory Issues

| Issue | Solution |
|-------|----------|
| OOM during training | Enable gradient checkpointing, reduce batch size, use 8-bit Adam optimizer |
| Slow training | Use mixed precision (`--mixed_precision="fp16"`), enable xformers |
| VAE memory | Use separate pretrained VAE with better numerical stability |

### LoRA Loading Problems

Recent releases (v0.37.x) have addressed several LoRA loading issues:

- **Flux Klein LoRA loading**: Fixed in v0.37.1
- **ModularPipelines with AutoModel type hints**: Fixed in v0.37.1

If encountering LoRA loading issues with custom models, ensure:
1. The LoRA rank matches the target model architecture
2. The `type_hint` is correctly specified for single-file models
3. The model was saved with compatible LoRA weights

Source: [src/diffusers/modular_pipelines/modular_pipeline_utils.py](https://github.com/huggingface/diffusers/blob/main/src/diffusers/modular_pipelines/modular_pipeline_utils.py)

### Configuration Mismatch

When training with custom models or GGUF files:

1. Verify model architecture matches the expected UNet/Transformer class
2. Check that config files are present in the model directory
3. For custom architectures, ensure proper registration with `ModelMixin` and `ConfigMixin`

Source: [src/diffusers/models/auto_model.py](https://github.com/huggingface/diffusers/blob/main/src/diffusers/models/auto_model.py)

## Best Practices

### Environment Setup

```bash
# Clone and install from source
git clone https://github.com/huggingface/diffusers
cd diffusers
pip install .

# Install example-specific dependencies
cd examples/dreambooth
pip install -r requirements.txt
```

Source: [examples/README.md](https://github.com/huggingface/diffusers/blob/main/examples/README.md)

### Reproducibility

Always specify a seed for reproducible training:

```bash
python train_dreambooth_lora.py \
    --seed=42 \
    --output_dir="./output" \
    ...
```

### Checkpointing Strategy

- Save checkpoints at regular intervals using `--checkpointing_steps`
- Keep track of best-performing checkpoint using validation metrics
- Use `--resume_from_checkpoint` to resume interrupted training

## Installation and Dependencies

Training scripts require specific dependencies. To ensure compatibility:

1. **Install from source** for the latest training features
2. **Check requirements.txt** in the specific example directory
3. **Verify PyTorch version** is compatible with your GPU drivers
4. **For JAX training**, ensure Flax is installed

Example installation:

```bash
pip install torch --index-url https://download.pytorch.org/whl/cu118
pip install accelerate transformers datasets peft
pip install -e ".[torch]"
```

## See Also

- [Loading Guide](https://github.com/huggingface/diffusers/blob/main/docs/source/en/using-diffusers/loading.md) - Understanding model loading mechanisms
- [Optimization Guide](https://github.com/huggingface/diffusers/blob/main/docs/source/en/optimization/fp16.md) - Memory and speed optimization techniques
- [Pipelines Overview](https://github.com/huggingface/diffusers/blob/main/docs/source/en/api/pipelines/overview.md) - Using trained models for inference
- [Modular Diffusers](https://github.com/huggingface/diffusers/blob/main/src/diffusers/pipelines/modular_pipeline_utils.py) - Composable pipeline architecture
- [Model Architecture](https://github.com/huggingface/diffusers/blob/main/PHILOSOPHY.md) - Design philosophy for models in Diffusers

---

<a id='optimization-guide'></a>

## Optimization Guide

### Related Pages

Related topics: [Quantization Guide](#quantization-guide), [Loaders & Adapters](#loaders-adapters)

<details>
<summary>Related Source Files</summary>

The following source files were used to generate this page:

- [src/diffusers/hooks/__init__.py](https://github.com/huggingface/diffusers/blob/main/src/diffusers/hooks/__init__.py)
- [src/diffusers/hooks/faster_cache.py](https://github.com/huggingface/diffusers/blob/main/src/diffusers/hooks/faster_cache.py)
- [src/diffusers/hooks/text_kv_cache.py](https://github.com/huggingface/diffusers/blob/main/src/diffusers/hooks/text_kv_cache.py)
- [src/diffusers/models/attention_processor.py](https://github.com/huggingface/diffusers/blob/main/src/diffusers/models/attention_processor.py)
- [src/diffusers/quantizers/gguf/gguf_quantizer.py](https://github.com/huggingface/diffusers/blob/main/src/diffusers/quantizers/gguf/gguf_quantizer.py)
- [src/diffusers/pipelines/pipeline_loading_utils.py](https://github.com/huggingface/diffusers/blob/main/src/diffusers/pipelines/pipeline_loading_utils.py)
- [docs/source/en/optimization/memory.md](https://github.com/huggingface/diffusers/blob/main/docs/source/en/optimization/memory.md)
- [docs/source/en/optimization/attention_backends.md](https://github.com/huggingface/diffusers/blob/main/docs/source/en/optimization/attention_backends.md)
- [docs/source/en/optimization/cache.md](https://github.com/huggingface/diffusers/blob/main/docs/source/en/optimization/cache.md)
- [examples/controlnet/train_controlnet.py](https://github.com/huggingface/diffusers/blob/main/examples/controlnet/train_controlnet.py)
</details>

# Optimization Guide

This page covers performance optimization techniques for the Diffusers library, including memory management, attention backends, caching strategies, and quantization options. These techniques enable efficient inference and training of diffusion models on various hardware configurations.

## Overview

Diffusers provides multiple optimization layers to improve inference speed and reduce memory consumption. The optimization system operates at several levels:

1. **Attention Level**: Alternative attention implementations (xformers, flash attention, scaled dot product attention)
2. **Cache Level**: Key-value caching for iterative generation
3. **Memory Level**: CPU offloading, gradient checkpointing, and memory-efficient attention
4. **Quantization Level**: GGUF and other quantization formats for reduced precision inference

```mermaid
graph TD
    A[Diffusion Pipeline] --> B[Attention Processors]
    A --> C[Caching System]
    A --> D[Quantization]
    B --> B1[xformers]
    B --> B2[Flash Attention]
    B --> B3[SDPA]
    C --> C1[FasterCache]
    C --> C2[TextKVCache]
    D --> D1[GGUF Quantization]
```

Source: [src/diffusers/models/attention_processor.py:1-50](https://github.com/huggingface/diffusers/blob/main/src/diffusers/models/attention_processor.py)

---

## Memory Optimization

Memory optimization is critical for running large diffusion models on consumer hardware. Diffusers provides several strategies to reduce VRAM usage during inference and training.

### CPU Offloading

CPU offloading moves model components to system RAM when not in use, allowing larger models to run on limited GPU memory.

```python
from diffusers import StableDiffusionPipeline
import torch

pipeline = StableDiffusionPipeline.from_pretrained(
    "stable-diffusion-v1-5/stable-diffusion-v1-5",
    torch_dtype=torch.float16,
)
# Enable CPU offloading for memory-constrained environments
pipeline.enable_sequential_cpu_offload()
```

Source: [docs/source/en/optimization/memory.md](https://github.com/huggingface/diffusers/blob/main/docs/source/en/optimization/memory.md)

### Model Offloading

Model offloading moves entire model components (UNet, VAE, text encoder) between CPU and GPU as needed:

```python
# Enable model offloading - moves components to GPU only when needed
pipeline.enable_model_cpu_offload()
```

### Gradient Checkpointing

During training, gradient checkpointing reduces memory by recomputing activations during the backward pass instead of storing them:

```python
# In training scripts like examples/controlnet/train_controlnet.py
# Enable via Accelerate config or directly on model
unet.enable_gradient_checkpointing()
```

### Memory-Efficient Attention

The `enable_memory_efficient_attention()` method activates memory-efficient attention implementations:

```python
pipeline.enable_memory_efficient_attention()
```

Source: [docs/source/en/optimization/memory.md](https://github.com/huggingface/diffusers/blob/main/docs/source/en/optimization/memory.md)

### torch.compile Support

PyTorch 2.0+ `torch.compile` provides significant speedups through JIT compilation:

```python
pipeline.unet = torch.compile(pipeline.unet, mode="reduce-overhead", fullgraph=True)
```

Source: [Diffusers 0.34.0 Release Notes](https://github.com/huggingface/diffusers/releases/tag/v0.34.0)

---

## Attention Backends

Diffusers supports multiple attention implementations through pluggable attention processors. The appropriate backend depends on your hardware and PyTorch version.

### Available Backends

| Backend | Hardware | Requirements | Use Case |
|---------|----------|--------------|----------|
| **xformers** | NVIDIA GPUs | `xformers` package | General purpose, wide compatibility |
| **Flash Attention** | NVIDIA GPUs (Ampere+) | PyTorch 2.0+ | Fast, memory-efficient |
| **SDPA** | All | PyTorch 2.0+ | Default, automatic fallback |
| **SageAttention** | NVIDIA GPUs | `sageattention` package | High performance |

Source: [src/diffusers/models/attention_processor.py](https://github.com/huggingface/diffusers/blob/main/src/diffusers/models/attention_processor.py)

### Attention Processor Architecture

```mermaid
classDiagram
    class AttentionProcessor {
        <<abstract>>
    }
    class AttnProcessor {}
    class XFormersAttnProcessor {}
    class ScaledDotProductAttnProcessor {}
    class AttnProcessor2_0 {}
    
    AttentionProcessor <|-- AttnProcessor
    AttentionProcessor <|-- XFormersAttnProcessor
    AttentionProcessor <|-- ScaledDotProductAttnProcessor
    AttentionProcessor <|-- AttnProcessor2_0
```

### Using xformers

Install xformers and enable it:

```bash
pip install xformers
```

```python
from diffusers import StableDiffusionPipeline
from diffusers.models.attention_processor import XFormersAttnProcessor

pipeline = StableDiffusionPipeline.from_pretrained(
    "stable-diffusion-v1-5/stable-diffusion-v1-5",
    torch_dtype=torch.float16,
)

# Enable xformers attention
pipeline.enable_xformers_memory_efficient_attention()
```

Source: [docs/source/en/optimization/attention_backends.md](https://github.com/huggingface/diffusers/blob/main/docs/source/en/optimization/attention_backends.md)

### Using Flash Attention

Flash Attention requires PyTorch 2.0+ and a compatible GPU (Ampere architecture or newer):

```python
# Automatic detection and use when available
pipeline.enable_flash_attention()
```

### Scaled Dot Product Attention (SDPA)

SDPA is the default attention backend when using PyTorch 2.0+:

```python
from diffusers.models.attention_processor import AttnProcessor2_0

# Explicitly set SDPA attention
pipeline.unet.set_attn_processor(AttnProcessor2_0())
```

---

## Caching Strategies

Caching reduces redundant computation during iterative generation, particularly for text conditioning in image-to-image and inpainting pipelines.

### FasterCache

FasterCache provides efficient key-value caching for attention layers during multi-step generation:

```python
from diffusers.hooks import FasterCacheAttnProcessor, FasterCacheHook

# Initialize pipeline with faster cache
pipeline = StableDiffusionPipeline.from_pretrained(
    "stable-diffusion-v1-5/stable-diffusion-v1-5"
)

# Enable faster cache attention processor
pipeline.unet.set_attn_processor(FasterCacheAttnProcessor())

# Optionally attach cache hook for fine-grained control
hook = FasterCacheHook()
hook.register_hook(pipeline.unet)
```

Source: [src/diffusers/hooks/faster_cache.py:1-100](https://github.com/huggingface/diffusers/blob/main/src/diffusers/hooks/faster_cache.py)

### TextKVCache

TextKVCache caches key-value tensors for text conditioning across denoising steps:

```python
from diffusers.hooks import TextKVCacheAttnProcessor, TextKVCacheHook

# Enable text KV cache for reduced recomputation
pipeline.unet.set_attn_processor(TextKVCacheAttnProcessor())

# Attach hook for cache management
text_hook = TextKVCacheHook()
text_hook.register_hook(pipeline.unet)
```

Source: [src/diffusers/hooks/text_kv_cache.py:1-100](https://github.com/huggingface/diffusers/blob/main/src/diffusers/hooks/text_kv_cache.py)

### Cache Hook System

```mermaid
graph LR
    A[Pipeline Forward] --> B[Hook Manager]
    B --> C{FasterCacheHook}
    B --> D{TextKVCacheHook}
    C --> E[Cache Key-Value Tensors]
    D --> F[Cache Text Embeddings]
    E --> G[Reduced Recomputation]
    F --> G
```

Source: [src/diffusers/hooks/__init__.py](https://github.com/huggingface/diffusers/blob/main/src/diffusers/hooks/__init__.py)

---

## Quantization

Quantization reduces model size and memory requirements by using lower-precision data types.

### GGUF Quantization

GGUF (GPT-Generated Unified Format) supports loading quantized models from the Hugging Face Hub. This is particularly useful for community models that may not have standard safetensors variants.

```python
from diffusers import DiffusionPipeline
from diffusers.quantizers.gguf import GGUFQuantizer
import torch

# Load GGUF-quantized model
pipeline = DiffusionPipeline.from_pretrained(
    "path/to/gguf/model",
    quantization_config=GGUFQuantizer(compute_dtype=torch.float16),
    torch_dtype=torch.float16,
)
```

Source: [src/diffusers/quantizers/gguf/gguf_quantizer.py:1-50](https://github.com/huggingface/diffusers/blob/main/src/diffusers/quantizers/gguf/gguf_quantizer.py)

#### GGUF Quantizer Configuration

| Parameter | Type | Default | Description |
|-----------|------|---------|-------------|
| `compute_dtype` | `torch.dtype` | `torch.float16` | Computation precision |
| `pre_quantized` | `bool` | `True` | Whether model is pre-quantized |
| `modules_to_not_convert` | `List[str]` | `[]` | Layers to exclude from quantization |

Source: [src/diffusers/quantizers/gguf/gguf_quantizer.py:40-45](https://github.com/huggingface/diffusers/blob/main/src/diffusers/quantizers/gguf/gguf_quantizer.py)

#### Requirements

GGUF loading requires:
- `accelerate>=0.26.0`
- `gguf>=0.10.0`

```bash
pip install accelerate>=0.26.0 gguf>=0.10.0
```

### fp16 Variants

Many models offer fp16 (half-precision) variants optimized for inference:

```python
from diffusers import DiffusionPipeline

# Load fp16 variant explicitly
pipeline = DiffusionPipeline.from_pretrained(
    "model/name",
    variant="fp16",
    torch_dtype=torch.float16,
)
```

### Remote VAE

Diffusers 0.33.0 introduced Remote VAE support, allowing VAE models to be loaded from remote storage:

```python
pipeline = StableDiffusionPipeline.from_pretrained(
    "stable-diffusion-v1-5/stable-diffusion-v1-5",
    vae_source="https://huggingface.co/vae/path",
)
```

Source: [Diffusers 0.33.0 Release Notes](https://github.com/huggingface/diffusers/releases/tag/v0.33.0)

---

## Training Optimizations

### Mixed Precision Training

Use bfloat16 for training stability with reduced memory:

```python
# In training scripts
from diffusers import DDPMScheduler

train_dataloader = # ... your dataloader

# Use mixed precision via Accelerate
with accelerator.autocast():
    noise_pred = unet(latents, timesteps, encoder_hidden_states)
```

### Optimizer Choice

The library examples use `AdamW` with weight decay as the default:

```python
# From examples/controlnet/train_controlnet.py
optimizer = torch.optim.AdamW(
    controlnet.parameters(),
    lr=args.learning_rate,
    weight_decay=args.adam_weight_decay,
)
```

### Learning Rate Scheduling

```python
from diffusers.optimization import get_scheduler

lr_scheduler = get_scheduler(
    args.lr_scheduler,
    optimizer=optimizer,
    num_warmup_steps=args.lr_warmup_steps,
    num_training_steps=args.max_train_steps,
)
```

### Exponential Moving Average (EMA)

EMA stabilizes training and improves model quality:

```python
from diffusers.training_utils import EMAModel

ema_model = EMAModel(
    controlnet.parameters(),
    decay=0.9999,
    use_ema_warmup=True,
    image_size=512,
    embedding_dim=768,
)

# In training loop
ema_model.step(controlnet.parameters())
```

---

## Model Loading Optimization

### Lazy Loading

Diffusers uses lazy loading to reduce initial import time and memory usage. Components are loaded only when needed.

```python
from diffusers import DiffusionPipeline

# Components loaded on first use
pipeline = DiffusionPipeline.from_pretrained("model/path")
# Only the base config is loaded initially
```

Source: [src/diffusers/pipelines/pipeline_loading_utils.py:50-80](https://github.com/huggingface/diffusers/blob/main/src/diffusers/pipelines/pipeline_loading_utils.py)

### Variant Loading

Load specific model variants (fp16, bf16, etc.) using the `variant` parameter:

```python
pipeline = StableDiffusionPipeline.from_pretrained(
    "runwayml/stable-diffusion-v1-5",
    variant="fp16",  # Load fp16 weights
    torch_dtype=torch.float16,
)
```

### Subfolder Loading

Load components from specific subfolders:

```python
unet = UNet2DConditionModel.from_pretrained(
    "model/path",
    subfolder="unet",
    torch_dtype=torch.float16,
)
```

---

## Best Practices and Common Issues

### A1111 Scheduler Mapping

For users migrating from Automatic1111 WebUI, the following scheduler mappings apply:

| A1111 / K-Diffusion | Diffusers Scheduler | Notes |
|---------------------|---------------------|-------|
| DPM++ 2M | `MultistepDPM-Solver` | Default for SD |
| DPM++ 2M Karras | `MultistepDPM-Solver` + `karras` | Use `use_karras_sigmas=True` |
| Euler | `EulerDiscreteScheduler` | Simple, fast |
| Euler A | `EulerAncestralDiscreteScheduler` | Ancestral sampling |

Source: [Issue #4167](https://github.com/huggingface/diffusers/issues/4167)

### Memory Troubleshooting

**Out of Memory Errors**

1. Enable CPU offloading:
```python
pipeline.enable_sequential_cpu_offload()
```

2. Reduce inference steps
3. Use smaller batch sizes
4. Enable memory-efficient attention

**Slow Generation**

1. Check attention backend selection
2. Enable `torch.compile` for PyTorch 2.0+
3. Use FasterCache for multi-step generation

### GGUF Loading Failures

Community issue #13683 reports that GGUF files sometimes fail to load. Ensure:

1. `accelerate>=0.26.0` is installed
2. `gguf>=0.10.0` is installed
3. Use `trust_remote_code=True` for community models

```python
pipeline = DiffusionPipeline.from_pretrained(
    "path/to/gguf/model",
    trust_remote_code=True,
)
```

---

## Performance Comparison

### Attention Backend Comparison

| Backend | Memory Usage | Speed | Quality |
|---------|--------------|-------|---------|
| Default AttnProcessor | High | Baseline | Reference |
| AttnProcessor2_0 (SDPA) | Medium | ~1.5x faster | Identical |
| XFormers | Low | ~2x faster | Identical |
| Flash Attention | Low | ~2-3x faster | Identical |

### Precision Comparison

| Precision | Memory | Speed | Quality Loss |
|-----------|--------|-------|--------------|
| float32 | 100% | Baseline | None |
| float16 | 50% | ~1.5x faster | Minimal |
| bfloat16 | 50% | ~1.5x faster | Minimal |
| int8 (GGUF Q8) | ~25% | ~2x faster | Small |
| int4 (GGUF Q4) | ~12.5% | ~3x faster | Noticeable |

---

## See Also

- [Loading Diffusion Models](https://huggingface.co/docs/diffusers/using-diffusers/loading) - Comprehensive guide to loading pipelines and models
- [Attention Processors](https://github.com/huggingface/diffusers/blob/main/src/diffusers/models/attention_processor.py) - Attention processor implementations
- [Pipeline Architecture](https://github.com/huggingface/diffusers/tree/main/src/diffusers/pipelines) - Pipeline component documentation
- [Training Overview](https://huggingface.co/docs/diffusers/training/overview) - Training guides and examples
- [Memory Optimization Documentation](https://github.com/huggingface/diffusers/blob/main/docs/source/en/optimization/memory.md) - Official memory optimization guide

---

<a id='loaders-adapters'></a>

## Loaders & Adapters

### Related Pages

Related topics: [Quantization Guide](#quantization-guide), [System Architecture](#system-architecture)

<details>
<summary>Related Source Files</summary>

The following source files were used to generate this page:

- [src/diffusers/loaders/__init__.py](https://github.com/huggingface/diffusers/blob/main/src/diffusers/loaders/__init__.py)
- [src/diffusers/loaders/lora_base.py](https://github.com/huggingface/diffusers/blob/main/src/diffusers/loaders/lora_base.py)
- [src/diffusers/loaders/lora_pipeline.py](https://github.com/huggingface/diffusers/blob/main/src/diffusers/loaders/lora_pipeline.py)
- [src/diffusers/loaders/peft.py](https://github.com/huggingface/diffusers/blob/main/src/diffusers/loaders/peft.py)
- [src/diffusers/loaders/single_file.py](https://github.com/huggingface/diffusers/blob/main/src/diffusers/loaders/single_file.py)
- [src/diffusers/loaders/single_file_model.py](https://github.com/huggingface/diffusers/blob/main/src/diffusers/loaders/single_file_model.py)
- [src/diffusers/loaders/textual_inversion.py](https://github.com/huggingface/diffusers/blob/main/src/diffusers/loaders/textual_inversion.py)
- [src/diffusers/loaders/ip_adapter.py](https://github.com/huggingface/diffusers/blob/main/src/diffusers/loaders/ip_adapter.py)
- [src/diffusers/pipelines/pipeline_loading_utils.py](https://github.com/huggingface/diffusers/blob/main/src/diffusers/pipelines/pipeline_loading_utils.py)
- [src/diffusers/modular_pipelines/modular_pipeline_utils.py](https://github.com/huggingface/diffusers/blob/main/src/diffusers/modular_pipelines/modular_pipeline_utils.py)
- [docs/source/en/using-diffusers/loading.md](https://github.com/huggingface/diffusers/blob/main/docs/source/en/using-diffusers/loading.md)
</details>

# Loaders & Adapters

This page documents the loading mechanisms and adapter systems in the Diffusers library. These components are responsible for importing pretrained models, checkpoints, and adapter weights into pipelines and model architectures.

## Overview

The Diffusers library provides a unified loading architecture that supports multiple model formats, checkpoint types, and adapter mechanisms. The `loaders` module (`src/diffusers/loaders/`) centralizes all loading functionality, enabling pipelines to dynamically import and configure model components at runtime.

```mermaid
graph TD
    A[Pipeline Loading Request] --> B{Model Type Detection}
    B -->|Standard HuggingFace| C[from_pretrained]
    B -->|Single File Checkpoint| D[from_single_file]
    B -->|LoRA Adapter| E[load_lora_weights]
    B -->|Textual Inversion| F[load_textual_inversion]
    B -->|IP Adapter| G[load_ip_adapter]
    B -->|PEFT Format| H[load_peft_weights]
    
    C --> I[ModelMixin / PreTrainedModel]
    D --> J[FromOriginalModelMixin]
    E --> K[StableDiffusionLoraLoaderMixin]
    F --> L[TextualInversionLoaderMixin]
    G --> M[IPAdapterMixin]
    H --> N[PeftMixin]
    
    I --> O[Loaded Model / Pipeline]
    J --> O
    K --> O
    L --> O
    M --> O
    N --> O
```

## Loading Architecture

### Core Loading Components

The loading system is built on several key abstractions:

| Component | File | Purpose |
|-----------|------|---------|
| `FromOriginalModelMixin` | [single_file_model.py](https://github.com/huggingface/diffusers/blob/main/src/diffusers/loaders/single_file_model.py) | Base mixin for loading checkpoints from original model formats |
| `StableDiffusionLoraLoaderMixin` | [lora_base.py](https://github.com/huggingface/diffusers/blob/main/src/diffusers/loaders/lora_base.py) | LoRA weight loading and fusion for Stable Diffusion models |
| `LoraLoaderMixin` | [lora_pipeline.py](https://github.com/huggingface/diffusers/blob/main/src/diffusers/loaders/lora_pipeline.py) | Generic LoRA loading support for pipeline components |
| `PeftMixin` | [peft.py](https://github.com/huggingface/diffusers/blob/main/src/diffusers/loaders/peft.py) | PEFT-format adapter loading (LoRA, IA³, LoHa, etc.) |
| `TextualInversionLoaderMixin` | [textual_inversion.py](https://github.com/huggingface/diffusers/blob/main/src/diffusers/loaders/textual_inversion.py) | Textual inversion embedding loading |
| `IPAdapterMixin` | [ip_adapter.py](https://github.com/huggingface/diffusers/blob/main/src/diffusers/loaders/ip_adapter.py) | Image Prompt adapter loading |
| `SingleFileLoader` | [single_file.py](https://github.com/huggingface/diffusers/blob/main/src/diffusers/loaders/single_file.py) | Utilities for single-file checkpoint loading |

Source: [src/diffusers/loaders/__init__.py](https://github.com/huggingface/diffusers/blob/main/src/diffusers/loaders/__init__.py)

### Model Type Detection

During loading, the system detects model types to determine the appropriate loading strategy:

```python
is_transformers_model = (
    is_transformers_available()
    and issubclass(class_obj, PreTrainedModel)
    and transformers_version >= version.parse("4.20.0")
)
is_diffusers_model = issubclass(class_obj, diffusers_module.ModelMixin)
is_diffusers_single_file_model = issubclass(class_obj, diffusers_module.FromOriginalModelMixin)
```

Source: [src/diffusers/loaders/single_file.py:1-100](https://github.com/huggingface/diffusers/blob/main/src/diffusers/loaders/single_file.py)

## Single File Loading

Single file loading enables the import of pretrained checkpoints in formats other than the native Diffusers format. This is essential for loading models from other ecosystems or custom checkpoints.

### FromOriginalModelMixin

Models implementing `FromOriginalModelMixin` support loading from original checkpoint formats:

```python
if is_diffusers_single_file_model:
    load_method = getattr(class_obj, "from_single_file")
    
    loaded_sub_model = load_method(
        pretrained_model_link_or_path_or_dict=checkpoint,
        original_config=original_config,
        config=cached_model_config_path,
        subfolder=name,
        torch_dtype=torch_dtype,
        local_files_only=local_files_only,
        disable_mmap=disable_mmap,
        **kwargs,
    )
```

Source: [src/diffusers/loaders/single_file.py](https://github.com/huggingface/diffusers/blob/main/src/diffusers/loaders/single_file.py)

### Supported Single File Formats

The single file loader supports multiple checkpoint formats:

| Format | Description | Notes |
|--------|-------------|-------|
| `.safetensors` | Safe tensors format | Memory-efficient, secure |
| `.bin` / `.pt` | PyTorch pickle format | Legacy compatibility |
| `.ckpt` | Generic checkpoint | Common for Stable Diffusion |

### Single File Loading Parameters

| Parameter | Type | Description |
|-----------|------|-------------|
| `pretrained_model_link_or_path_or_dict` | `str \| dict` | Path or URL to checkpoint, or state dict |
| `original_config` | `str \| dict \| None` | Original model configuration |
| `config` | `str \| None` | Diffusers config path |
| `subfolder` | `str` | Subfolder path within checkpoint |
| `torch_dtype` | `torch.dtype` | Target data type |
| `local_files_only` | `bool` | Only load from local cache |
| `disable_mmap` | `bool` | Disable memory-mapped loading |

## LoRA (Low-Rank Adaptation)

LoRA enables efficient fine-tuning by adding small trainable matrices to existing model weights without modifying the base model.

### LoRA Loading Architecture

```mermaid
graph LR
    A[LoRA Checkpoint] --> B{LoraLoaderMixin}
    B --> C[State Dict Extraction]
    C --> D[Target Module Mapping]
    D --> E[Weight Fusion]
    E --> F[Adapted Model]
```

### Loading LoRA Weights

The `StableDiffusionLoraLoaderMixin` provides the `load_lora_weights` method:

```python
def load_lora_weights(cls, pretrained_model_name_or_path, adapter_name=None, **kwargs):
    """
    Load LoRA weights into pipeline components.
    
    Args:
        pretrained_model_name_or_path: Path or HuggingFace model ID
        adapter_name: Optional name for the adapter (for multiple LoRAs)
    """
```

Source: [src/diffusers/loaders/lora_base.py](https://github.com/huggingface/diffusers/blob/main/src/diffusers/loaders/lora_base.py)

### LoRA Pipeline Integration

The `LoraLoaderMixin` extends pipeline support for LoRA adapters:

```python
class LoraLoaderMixin:
    """Mixin class for LoRA loading in diffusion pipelines."""
    
    def load_lora_weights(self, pretrained_model_name_or_path, **kwargs):
        """Load and fuse LoRA weights into pipeline components."""
        
    def unload_lora_weights(self):
        """Remove LoRA weights and restore original weights."""
        
    def set_adapters(self, adapter_names, weights=None):
        """Set active adapters with optional weighting."""
```

Source: [src/diffusers/loaders/lora_pipeline.py](https://github.com/huggingface/diffusers/blob/main/src/diffusers/loaders/lora_pipeline.py)

### Multiple LoRA Support

Diffusers supports loading multiple LoRA adapters simultaneously:

| Method | Description |
|--------|-------------|
| `load_lora_weights()` | Load with optional adapter name |
| `set_adapters()` | Activate specific adapters |
| `fuse_lora()` | Fuse adapters with custom weights |
| `unfuse_lora()` | Unfuse previously fused adapters |

### Flux Klein LoRA Loading

> **Note**: Diffusers v0.37.1 included fixes specifically for Flux Klein LoRA loading, addressing issues with type hints and model compatibility.
> 
> Source: [Release v0.37.1 - Fix Flux Klein LoRA loading #13313](https://github.com/huggingface/diffusers/releases/tag/v0.37.1)

## PEFT Integration

The `PeftMixin` enables loading adapters in the PEFT (Parameter-Efficient Fine-Tuning) format:

```python
class PeftMixin:
    """Mixin for loading PEFT-format adapters."""
    
    def load_peft_weights(
        self,
        pretrained_model_name_or_path,
        adapter_name: str = "default",
        layer_selection: Optional[List[int]] = None,
        scale_weight: Optional[float] = None,
    ):
        """Load PEFT-format adapter weights."""
```

Source: [src/diffusers/loaders/peft.py](https://github.com/huggingface/diffusers/blob/main/src/diffusers/loaders/peft.py)

### Supported PEFT Adapter Types

| Adapter Type | Description |
|--------------|-------------|
| `LORA` | Low-Rank Adaptation |
| `IA3` | Infused Adapter by Inhibiting and Amplifying Inner Layers |
| `LoHa` | Low-Rank Hadamard Product |
| `AdaLoRA` | Adaptive LoRA |
| `DoRA` | Weight-Decomposed Linear Adaptation |

## Textual Inversion

Textual Inversion enables customizing the model's vocabulary through learned embeddings without modifying the base model.

### Loading Textual Inversion Embeddings

```python
class TextualInversionLoaderMixin:
    """Mixin for textual inversion embedding loading."""
    
    def load_textual_inversion(
        self,
        pretrained_model_name_or_path,
        token: Optional[str] = None,
        file_extension: str = "safetensors",
        **kwargs
    ):
        """
        Load textual inversion embeddings.
        
        Args:
            pretrained_model_name_or_path: Path or model ID
            token: Optional token name for the embedding
            file_extension: File format for embeddings
        """
```

Source: [src/diffusers/loaders/textual_inversion.py](https://github.com/huggingface/diffusers/blob/main/src/diffusers/loaders/textual_inversion.py)

### Textual Inversion File Formats

| Format | Extension | Notes |
|--------|-----------|-------|
| SafeTensors | `.safetensors` | Recommended, secure |
| PyTorch | `.bin`, `.pt` | Legacy format |
| Diffusers | `.json` + vectors | Native format |

## IP Adapter

IP Adapter enables image-based conditioning for generation, allowing reference images to guide the generation process.

### IP Adapter Loading

```python
class IPAdapterMixin:
    """Mixin for IP-Adapter loading."""
    
    def load_ip_adapter(
        self,
        model_id_or_path: Union[str, List[str]],
        subfolder: Union[str, List[str], None] = None,
        weight_name: Union[str, List[str], None] = None,
        image_encoder_folder: Union[str, List[str], None] = "image_encoder",
        **kwargs
    ):
        """Load IP-Adapter weights and image encoders."""
```

Source: [src/diffusers/loaders/ip_adapter.py](https://github.com/huggingface/diffusers/blob/main/src/diffusers/loaders/ip_adapter.py)

### IP Adapter Components

| Component | Description |
|-----------|-------------|
| Image Encoder | Processes reference images |
| Image Projection | Maps encoded features to cross-attention space |
| Adapter Weights | Fine-tuned weights for image conditioning |

## Pipeline Loading Utilities

### Loading Process Flow

```mermaid
graph TD
    A[Pipeline.from_pretrained] --> B[Load model_index.json]
    B --> C{Component Type Detection}
    C -->|Diffusers Model| D[ModelMixin.from_config]
    C -->|Transformers Model| E[PreTrainedModel.from_pretrained]
    C -->|Scheduler| F[SchedulerMixin.from_config]
    C -->|Tokenizer| G[AutoTokenizer.from_pretrained]
    
    D --> H[Load config.yaml]
    E --> I[Load config.json]
    H --> J[Create model on meta device]
    I --> J
    
    J --> K[Load weights with accelerate]
    K --> L[Offload if needed]
    L --> M[Pipeline Ready]
```

### Loading with Quantization

The pipeline loading system integrates with quantization configurations:

```python
if (
    quantization_config is not None
    and isinstance(quantization_config, PipelineQuantizationConfig)
    and issubclass(class_obj, torch.nn.Module)
):
    model_quant_config = quantization_config._resolve_quant_config(
        is_diffusers=is_diffusers_model, module_name=name
    )
    if model_quant_config is not None:
        loading_kwargs["quantization_config"] = model_quant_config
```

Source: [src/diffusers/pipelines/pipeline_loading_utils.py](https://github.com/huggingface/diffusers/blob/main/src/diffusers/pipelines/pipeline_loading_utils.py)

## Modular Pipeline Loading

Modular Pipelines (introduced in v0.37.0) provide a composable approach to pipeline construction using reusable blocks.

### Component Specification

Modular Pipelines use `ComponentSpec` to define loading parameters:

```python
@dataclass
class ComponentSpec:
    name: str
    type_hint: tuple[str, str]  # (library, class_name)
    pretrained_model_name_or_path: Optional[str]
    subfolder: Optional[str]
    variant: Optional[str]
    revision: Optional[str]
```

Source: [src/diffusers/modular_pipelines/modular_pipeline.py](https://github.com/huggingface/diffusers/blob/main/src/diffusers/modular_pipelines/modular_pipeline.py)

### Loading with AutoModel Type Hints

> **Note**: Diffusers v0.37.1 fixed loading issues with `ModularPipelines` that use `AutoModel` type hints in their `modular_model_index.json`.
>
> Source: [Release v0.37.1 - Fix for loading ModularPipelines with AutoModel type hints #13271](https://github.com/huggingface/diffusers/releases/tag/v0.37.1)

The loading process attempts `AutoModel.from_pretrained` when `type_hint` is `None`:

```python
if self.type_hint is None:
    try:
        component = AutoModel.from_pretrained(
            pretrained_model_name_or_path, **load_kwargs, **kwargs
        )
    except Exception as e:
        raise ValueError(f"Unable to load {self.name} without `type_hint`: {e}")
    self.type_hint = component.__class__
```

Source: [src/diffusers/modular_pipelines/modular_pipeline_utils.py](https://github.com/huggingface/diffusers/blob/main/src/diffusers/modular_pipelines/modular_pipeline_utils.py)

## Common Usage Patterns

### Loading a Standard Pipeline

```python
from diffusers import StableDiffusionPipeline

pipeline = StableDiffusionPipeline.from_pretrained(
    "runwayml/stable-diffusion-v1-5",
    torch_dtype=torch.float16,
    use_safetensors=True,
)
```

### Loading with LoRA

```python
from diffusers import StableDiffusionXLPipeline

pipeline = StableDiffusionXLPipeline.from_pretrained(
    "stabilityai/stable-diffusion-xl-base-1.0"
)

pipeline.load_lora_weights("path/to/lora_weights")

# Generate with LoRA
image = pipeline(prompt).images[0]
```

### Loading Multiple Adapters

```python
from diffusers import StableDiffusionPipeline

pipeline = StableDiffusionPipeline.from_pretrained(
    "runwayml/stable-diffusion-v1-5"
)

# Load multiple LoRA adapters
pipeline.load_lora_weights("adapter_1", adapter_name="style_1")
pipeline.load_lora_weights("adapter_2", adapter_name="style_2")

# Use with different weights
pipeline.set_adapters(["style_1"], weights=[1.0])
```

### Loading Textual Inversion

```python
pipeline = StableDiffusionPipeline.from_pretrained(
    "runwayml/stable-diffusion-v1-5"
)

pipeline.load_textual_inversion(
    "path/to/textual_inversion",
    token="my-concept"
)

image = pipeline("a photo of my-concept").images[0]
```

## Configuration Options

### Loading Parameters

| Parameter | Type | Default | Description |
|-----------|------|---------|-------------|
| `cache_dir` | `str` | `~/.cache/huggingface/` | Cache directory for downloaded models |
| `torch_dtype` | `torch.dtype` | `None` | Override default dtype |
| `use_safetensors` | `bool` | `True` | Prefer `.safetensors` format |
| `variant` | `str` | `None` | Model variant (e.g., "fp16") |
| `revision` | `str` | `None` | Git revision to load |
| `use_flash_attention_2` | `bool` | `False` | Enable Flash Attention 2 |
| `device_map` | `str \| dict` | `None` | Device mapping strategy |
| `max_memory` | `dict` | `None` | Memory limits per device |
| `offload_folder` | `str` | `None` | Folder for offloaded weights |
| `local_files_only` | `bool` | `False` | Only use local files |

### LoRA-Specific Parameters

| Parameter | Type | Description |
|-----------|------|-------------|
| `adapter_name` | `str` | Name for the loaded adapter |
| `scale_weight` | `float` | Scaling factor for LoRA weights |
| `layer_selection` | `List[int]` | Apply only to specific layers |

## Common Issues and Troubleshooting

### Single File Loading Failures

**Issue**: Custom models or GGUF files fail to load

> Community discussion: [Issue #13683 - Universal method or class to load any model locally](https://github.com/huggingface/diffusers/issues/13683)
>
> Many custom models fail to load due to limited `.from_single_file` availability across model classes.

**Solutions**:
1. Verify the model class implements `FromOriginalModelMixin`
2. Provide an original config file when available
3. Consider converting to standard Diffusers format

### Type Hint Requirements

When using Modular Pipelines:
- Ensure `modular_model_index.json` includes proper `type_hint` fields
- For unknown types, provide `type_hint` explicitly or ensure AutoModel can resolve the class

### Version Compatibility

| Feature | Minimum Diffusers Version |
|---------|---------------------------|
| Modular Pipelines | 0.37.0 |
| Flux Klein LoRA fixes | 0.37.1 |
| PEFT integration | 0.33.0+ |
| IP Adapter | 0.31.0+ |

## Architecture Principles

According to the Diffusers philosophy ([PHILOSOPHY.md](https://github.com/huggingface/diffusers/blob/main/PHILOSOPHY.md)):

1. **Extensibility**: Loaders should be designed to be easily extendable to future changes
2. **Composability**: Adapter systems should support mixing multiple techniques
3. **Backward Compatibility**: Loading mechanisms maintain compatibility across versions
4. **Clear Error Messages**: Loading failures provide actionable error information

## See Also

- [Loading Documentation](https://huggingface.co/docs/diffusers/using-diffusers/loading) - Official guide on loading models
- [LoRA Training](https://huggingface.co/docs/diffusers/training/lora) - Training with LoRA adapters
- [Textual Inversion](https://huggingface.co/docs/diffusers/training/text_inversion) - Custom concept training
- [Modular Pipelines](https://huggingface.co/docs/diffusers/using-diffusers/modular_pipelines) - Composable pipeline blocks
- [Optimization Guide](https://huggingface.co/docs/diffusers/optimization/fp16) - Memory and performance optimization

---

<a id='quantization-guide'></a>

## Quantization Guide

### Related Pages

Related topics: [Optimization Guide](#optimization-guide), [Loaders & Adapters](#loaders-adapters)

<details>
<summary>Related Source Files</summary>

The following source files were used to generate this page:

- [src/diffusers/quantizers/__init__.py](https://github.com/huggingface/diffusers/blob/main/src/diffusers/quantizers/__init__.py)
- [src/diffusers/quantizers/gguf/gguf_quantizer.py](https://github.com/huggingface/diffusers/blob/main/src/diffusers/quantizers/gguf/gguf_quantizer.py)
- [src/diffusers/quantizers/bitsandbytes/bnb_quantizer.py](https://github.com/huggingface/diffusers/blob/main/src/diffusers/quantizers/bitsandbytes/bnb_quantizer.py)
- [src/diffusers/quantizers/quanto/quanto_quantizer.py](https://github.com/huggingface/diffusers/blob/main/src/diffusers/quantizers/quanto/quanto_quantizer.py)
- [src/diffusers/quantizers/torchao/torchao_quantizer.py](https://github.com/huggingface/diffusers/blob/main/src/diffusers/quantizers/torchao/torchao_quantizer.py)
- [src/diffusers/quantizers/pipe_quant_config.py](https://github.com/huggingface/diffusers/blob/main/src/diffusers/quantizers/pipe_quant_config.py)
- [src/diffusers/pipelines/pipeline_loading_utils.py](https://github.com/huggingface/diffusers/blob/main/src/diffusers/pipelines/pipeline_loading_utils.py)
- [src/diffusers/loaders/single_file.py](https://github.com/huggingface/diffusers/blob/main/src/diffusers/loaders/single_file.py)
</details>

# Quantization Guide

This page provides comprehensive documentation on quantization support in the Diffusers library. Quantization reduces model memory footprint and computational requirements by representing model weights in lower precision formats, enabling deployment of large diffusion models on resource-constrained hardware.

## Overview

The Diffusers library implements a modular quantization framework that supports multiple quantization backends. This architecture allows users to load quantized models from the Hugging Face Hub or quantize models on-the-fly during loading. The quantization system is designed to be backend-agnostic while providing backend-specific optimizations.

Quantization in Diffusers serves two primary purposes:

1. **Memory Reduction**: Reduce VRAM requirements for loading and running diffusion models
2. **Runtime Optimization**: Accelerate inference through optimized low-precision computations

The library currently supports four major quantization backends: GGUF, BitsAndBytes, Quanto, and TorchAO. Each backend offers different trade-offs between compression ratio, inference speed, and quality preservation.

## Architecture

### Quantization System Components

The quantization framework follows a modular architecture with a base class hierarchy and backend-specific implementations:

```mermaid
graph TD
    A[DiffusionPipeline] --> B[PipelineQuantizationConfig]
    B --> C[DiffusersQuantizer Base Class]
    C --> D[GGUFQuantizer]
    C --> E[BitsAndBytesQuantizer]
    C --> F[QuantoQuantizer]
    C --> G[TorchAOQuantizer]
    
    H[Model Loading] --> I[ModelMixin]
    I --> C
    J[Single File Loading] --> K[FromOriginalModelMixin]
    K --> C
```

### Quantization Flow

```mermaid
sequenceDiagram
    participant User
    participant Pipeline
    participant QuantConfig
    participant Quantizer
    participant Model
    
    User->>Pipeline: from_pretrained(quantization_config)
    Pipeline->>QuantConfig: Validate quantization config
    QuantConfig->>Quantizer: Create backend-specific quantizer
    Pipeline->>Model: Load with quantizer
    Model->>Quantizer: Apply quantization to weights
    Quantizer-->>Model: Quantized model ready
    Model-->>Pipeline: Pipeline ready for inference
```

## Supported Quantization Backends

### GGUF Quantization

GGUF (GPT-Generated Unified Format) is designed for loading pre-quantized models, particularly those from the llama.cpp ecosystem. The GGUF quantizer handles models that have been quantized externally and stored in the GGUF format.

**Key Characteristics:**

- Supports various quantization types (Q4_K, Q5_K, Q8_0, etc.)
- Memory-mapped file loading for efficient memory usage
- Compatible with models converted from original formats

**Source:** [src/diffusers/quantizers/gguf/gguf_quantizer.py](https://github.com/huggingface/diffusers/blob/main/src/diffusers/quantizers/gguf/gguf_quantizer.py)

The GGUF quantizer class initializes with the following parameters:

| Parameter | Type | Description |
|-----------|------|-------------|
| `quantization_config` | `GGUFQuantizationConfig` | Configuration for GGUF quantization |
| `modules_to_not_convert` | `List[str]` | Module names to exclude from quantization |
| `compute_dtype` | `torch.dtype` | Computation data type |
| `pre_quantized` | `bool` | Whether the model is pre-quantized |

**Important Dependencies:**

GGUF loading requires `accelerate>=0.26.0` and the `gguf` package. These are validated during environment checks in `validate_environment()`.

```python
def validate_environment(self, *args, **kwargs):
    if not is_accelerate_available() or is_accelerate_version("<", "0.26.0"):
        raise ImportError(
            "Loading GGUF Parameters requires `accelerate` installed in your environment: "
            "`pip install 'accelerate>=0.26.0'`"
        )
```

**Source:** [src/diffusers/quantizers/gguf/gguf_quantizer.py:30-37](https://github.com/huggingface/diffusers/blob/main/src/diffusers/quantizers/gguf/gguf_quantizer.py)

### BitsAndBytes Quantization

BitsAndBytes (bnb) provides on-the-fly quantization during model loading. It supports 4-bit and 8-bit quantization modes with optional NF4 (Normal Float 4) data type.

**Key Characteristics:**

- On-the-fly quantization during loading
- 4-bit (NF4) and 8-bit (Int8) modes
- Supports `keep_in_fp32_modules` for sensitive layers
- Compatible with QLoRA fine-tuning workflows

**Source:** [src/diffusers/quantizers/bitsandbytes/bnb_quantizer.py](https://github.com/huggingface/diffusers/blob/main/src/diffusers/quantizers/bitsandbytes/bnb_quantizer.py)

### Quanto Quantization

Quanto provides a PyTorch-native quantization backend with support for various quantization schemes including int8 and int4.

**Key Characteristics:**

- Pure PyTorch implementation
- Supports int2, int4, int8 quantization
- Good compatibility with existing PyTorch workflows
- No additional C++ dependencies required

**Source:** [src/diffusers/quantizers/quanto/quanto_quantizer.py](https://github.com/huggingface/diffusers/blob/main/src/diffusers/quantizers/quanto/quanto_quantizer.py)

### TorchAO Quantization

TorchAO is the PyTorch native quantization backend that provides hardware-optimized quantization kernels.

**Key Characteristics:**

- PyTorch native backend
- Optimized kernel support
- Integration with torch.compile for additional speedups
- Supports both dynamic and static quantization

**Source:** [src/diffusers/quantizers/torchao/torchao_quantizer.py](https://github.com/huggingface/diffusers/blob/main/src/diffusers/quantizers/torchao/torchao_quantizer.py)

## Configuration

### PipelineQuantizationConfig

The `PipelineQuantizationConfig` class provides a unified interface for configuring quantization across different backends. It handles backend-specific configuration resolution and validation.

**Source:** [src/diffusers/quantizers/pipe_quant_config.py](https://github.com/huggingface/diffusers/blob/main/src/diffusers/quantizers/pipe_quant_config.py)

### Quantization Configuration Parameters

| Parameter | Type | Backend | Description |
|-----------|------|---------|-------------|
| `quantization_method` | `str` | all | Quantization backend: `gguf`, `bitsandbytes`, `quanto`, `torchao` |
| `load_in_4bit` | `bool` | bnb | Load model weights in 4-bit precision |
| `load_in_8bit` | `bool` | bnb | Load model weights in 8-bit precision |
| `bnb_4bit_compute_dtype` | `torch.dtype` | bnb | Computation dtype for BitsAndBytes |
| `bnb_4bit_quant_type` | `str` | bnb | Quantization type (fp4, nf4) |
| `bnb_4bit_use_double_quant` | `bool` | bnb | Enable double quantization |
| `gguf_format` | `str` | gguf | GGUF file format version |
| `compute_dtype` | `torch.dtype` | gguf | Target compute data type |
| `modules_to_not_convert` | `List[str]` | gguf | Modules to exclude from quantization |
| `torch_dtype` | `torch.dtype` | all | Default torch data type |

### Loading Quantized Models

#### Loading GGUF Models

```python
from diffusers import DiffusionPipeline
import torch

pipeline = DiffusionPipeline.from_pretrained(
    "model/path",
    quantization_config={
        "quantization_method": "gguf",
        "gguf_format": "q4_k",  # or q5_k, q8_0, etc.
    },
    torch_dtype=torch.float16,
    device_map="auto"
)
```

#### Loading with BitsAndBytes

```python
from diffusers import DiffusionPipeline
from transformers import BitsAndBytesConfig

quantization_config = BitsAndBytesConfig(
    load_in_4bit=True,
    bnb_4bit_compute_dtype=torch.float16,
    bnb_4bit_quant_type="nf4",
    bnb_4bit_use_double_quant=True
)

pipeline = DiffusionPipeline.from_pretrained(
    "model/path",
    quantization_config=quantization_config
)
```

**Source:** [src/diffusers/pipelines/pipeline_loading_utils.py](https://github.com/huggingface/diffusers/blob/main/src/diffusers/pipelines/pipeline_loading_utils.py)

## Pipeline Integration

### Model Loading with Quantization

When a pipeline loads with quantization configuration, the `PipelineLoadingUtils` class handles the quantization process. The loading flow follows these steps:

```mermaid
graph LR
    A[from_pretrained] --> B{Is Quantized?}
    B -->|Yes| C[Get Quantizer]
    B -->|No| D[Load Normal]
    C --> E{Quantizer Type?}
    E -->|GGUF| F[Use from_single_file]
    E -->|Other| G[Use from_config]
    F --> H[Apply Quantization]
    G --> H
    H --> I[Return Quantized Model]
    D --> I
```

**Source:** [src/diffusers/loaders/single_file.py](https://github.com/huggingface/diffusers/blob/main/src/diffusers/loaders/single_file.py)

The loading process determines the appropriate loading method based on the model type:

```python
is_diffusers_single_file_model = issubclass(class_obj, diffusers_module.FromOriginalModelMixin)
is_diffusers_model = issubclass(class_obj, diffusers_module.ModelMixin)

if is_diffusers_single_file_model:
    load_method = getattr(class_obj, "from_single_file")
    # ...
    loaded_sub_model = load_method(
        pretrained_model_link_or_path_or_dict=checkpoint,
        original_config=original_config,
        config=cached_model_config_path,
        subfolder=name,
        torch_dtype=torch_dtype,
        local_files_only=local_files_only,
        disable_mmap=disable_mmap,
        **kwargs,
    )
```

**Source:** [src/diffusers/loaders/single_file.py:40-55](https://github.com/huggingface/diffusers/blob/main/src/diffusers/loaders/single_file.py)

### Single File Loading

For GGUF and other single-file model formats, the `from_single_file` method handles the complete loading process. This is particularly important for quantized models that bundle all weights in a single file.

**Source:** [src/diffusers/loaders/single_file.py](https://github.com/huggingface/diffusers/blob/main/src/diffusers/loaders/single_file.py)

### Quantization Resolution in Pipelines

The pipeline quantization configuration is resolved at load time:

```python
if (
    quantization_config is not None
    and isinstance(quantization_config, PipelineQuantizationConfig)
    and issubclass(class_obj, torch.nn.Module)
):
    model_quant_config = quantization_config._resolve_quant_config(
        is_diffusers=is_diffusers_model, module_name=name
    )
    if model_quant_config is not None:
        loading_kwargs["quantization_config"] = model_quant_config
```

**Source:** [src/diffusers/pipelines/pipeline_loading_utils.py:120-129](https://github.com/huggingface/diffusers/blob/main/src/diffusers/pipelines/pipeline_loading_utils.py)

## Common Usage Patterns

### Memory-Constrained Inference

For running large models on GPUs with limited VRAM:

```python
from diffusers import DiffusionPipeline
import torch

pipeline = DiffusionPipeline.from_pretrained(
    "stabilityai/stable-diffusion-xl-base-1.0",
    quantization_config={
        "quantization_method": "bitsandbytes",
        "load_in_4bit": True,
    },
    torch_dtype=torch.float16,
    device_map="auto"
)

# Generate image
result = pipeline(prompt="a beautiful landscape")
```

### Loading Pre-Quantized GGUF Models

```python
from diffusers import DiffusionPipeline
import torch

# Load a GGUF quantized model
pipeline = DiffusionPipeline.from_pretrained(
    "quantized/model/path",
    quantization_config={
        "quantization_method": "gguf",
        "gguf_format": "q4_k_m",
    },
    torch_dtype=torch.float16,
    device_map="auto"
)
```

### Mixed Quantization

Apply different quantization levels to different components:

```python
from diffusers import DiffusionPipeline
from transformers import BitsAndBytesConfig

# Quantize UNet with 4-bit, keep VAE in full precision
pipeline = DiffusionPipeline.from_pretrained(
    "model/path",
    unet_quantization_config=BitsAndBytesConfig(load_in_4bit=True),
    vae_quantization_config=None,  # Full precision VAE
)
```

## Troubleshooting

### Common Issues and Solutions

| Issue | Cause | Solution |
|-------|-------|----------|
| ImportError for `accelerate` | Missing dependency for GGUF | `pip install 'accelerate>=0.26.0'` |
| Memory errors during loading | Model too large for GPU | Use 4-bit quantization or CPU offloading |
| Slow inference with quantized model | Quantization not optimized | Enable `torch.compile` or use faster backends |
| Config mismatch errors | Incompatible quantization config | Verify backend-specific requirements |
| MMAP errors | Memory-mapped file issues | Set `disable_mmap=True` in loading config |

### Environment Requirements

Different quantization backends have specific dependencies:

| Backend | Minimum Dependencies |
|---------|----------------------|
| GGUF | `accelerate>=0.26.0`, `gguf` |
| BitsAndBytes | `bitsandbytes>=0.41.0` |
| Quanto | `quanto` |
| TorchAO | PyTorch 2.0+ |

### Version Compatibility

The quantization system was enhanced in recent releases:

- **v0.37.0+**: Improved modular pipelines and quantization integration
- **v0.35.2+**: Better transformers compatibility for quantized models
- **v0.33.0+**: Enhanced memory optimizations and caching for quantized models

**Source:** [README.md](https://github.com/huggingface/diffusers/blob/main/README.md)

## Design Philosophy

The quantization system in Diffusers follows the library's core design principles:

1. **Modularity**: Each quantizer is a self-contained class inheriting from `DiffusersQuantizer`
2. **Composability**: Quantization configs can be applied at pipeline or individual component level
3. **Backward Compatibility**: Default settings preserve maximum precision
4. **Extensibility**: New backends can be added by implementing the base quantizer interface

**Source:** [PHILOSOPHY.md](https://github.com/huggingface/diffusers/blob/main/PHILOSOPHY.md)

Models are designed to expose complexity similar to PyTorch's `Module` class, providing clear error messages when quantization configuration issues occur. The system maintains high precision defaults while allowing optimization when explicitly requested.

## See Also

- [Loading Diffusion Models](https://huggingface.co/docs/diffusers/using-diffusers/loading) - General model loading documentation
- [Optimization Guide](https://huggingface.co/docs/diffusers/optimization/fp16) - Memory and speed optimization techniques
- [Modular Pipelines](https://huggingface.co/docs/diffusers/using-diffusers/modular_pipelines) - Composable pipeline architecture
- [GGUF Quantization](https://huggingface.co/docs/diffusers/quantization/gguf) - Detailed GGUF format documentation
- [Quantization Overview](https://huggingface.co/docs/diffusers/quantization/overview) - Complete quantization documentation
- [Training with Quantization](https://huggingface.co/docs/diffusers/training/overview) - Training quantized models

---

<!-- evidence_pipeline_checked: true -->
<!-- evidence_injected: true -->

---

## Pitfall Log

Project: huggingface/diffusers

Summary: Found 24 structured pitfall item(s), including 4 high/blocking item(s). Top priority: Installation risk - Installation risk requires verification.

## 1. Installation risk - Installation risk requires verification

- Severity: high
- Evidence strength: source_linked
- Finding: Project evidence flags a installation risk. Review the linked source before relying on this workflow.
- User impact: May increase setup, validation, or first-run risk for the user.
- Suggested check: Reproduce the official install and quickstart path in an isolated environment.
- Evidence: community_evidence:github | cevd_a9d989818ab840c6985e6c0c41830e87 | https://github.com/huggingface/diffusers/issues/13401

## 2. Installation risk - Installation risk requires verification

- Severity: high
- Evidence strength: source_linked
- Finding: Project evidence flags a installation risk. Review the linked source before relying on this workflow.
- User impact: May increase setup, validation, or first-run risk for the user.
- Suggested check: Reproduce the official install and quickstart path in an isolated environment.
- Evidence: community_evidence:github | cevd_190402547a6a441bb4f046b278c04a7f | https://github.com/huggingface/diffusers/issues/13683

## 3. Security or permission risk - Security or permission risk requires verification

- Severity: high
- Evidence strength: source_linked
- Finding: Project evidence flags a security or permission risk. Review the linked source before relying on this workflow.
- User impact: May increase setup, validation, or first-run risk for the user.
- Suggested check: Reproduce the official install and quickstart path in an isolated environment.
- Evidence: community_evidence:github | cevd_fedc9c5b4dc2486aa7ed13053f2050af | https://github.com/huggingface/diffusers/issues/13772

## 4. Security or permission risk - Security or permission risk requires verification

- Severity: high
- Evidence strength: source_linked
- Finding: Project evidence flags a security or permission risk. Review the linked source before relying on this workflow.
- User impact: May increase setup, validation, or first-run risk for the user.
- Suggested check: Reproduce the official install and quickstart path in an isolated environment.
- Evidence: community_evidence:github | cevd_d70cffdb7188481fb8e1e7e5a84539bb | https://github.com/huggingface/diffusers/issues/13844

## 5. Installation risk - Installation risk requires verification

- Severity: medium
- Evidence strength: source_linked
- Finding: Project evidence flags a installation risk. Review the linked source before relying on this workflow.
- User impact: May increase setup, validation, or first-run risk for the user.
- Suggested check: Reproduce the official install and quickstart path in an isolated environment.
- Evidence: community_evidence:github | cevd_e2c183459b644dfe88a28ce288693dc1 | https://github.com/huggingface/diffusers/issues/13762

## 6. Configuration risk - Configuration risk requires verification

- Severity: medium
- Evidence strength: source_linked
- Finding: Developers should check this configuration risk before relying on the project: Diffusers 0.35.0: Qwen Image pipelines, Flux Kontext, Wan 2.2, and more
- User impact: Upgrade or migration may change expected behavior: Diffusers 0.35.0: Qwen Image pipelines, Flux Kontext, Wan 2.2, and more
- Suggested check: Before packaging this project, run the relevant install/config/quickstart check for: Diffusers 0.35.0: Qwen Image pipelines, Flux Kontext, Wan 2.2, and more. Context: Observed when using python
- Guardrail: State this as source-backed community evidence, not as Doramagic reproduction.
- Evidence: failure_mode_cluster:github_release | fmev_e8d17ffbe5fa1785fea2871516925453 | https://github.com/huggingface/diffusers/releases/tag/v0.35.0

## 7. Configuration risk - Configuration risk requires verification

- Severity: medium
- Evidence strength: source_linked
- Finding: Developers should check this configuration risk before relying on the project: llada2 model/pipeline review
- User impact: Developers may misconfigure credentials, environment, or host setup: llada2 model/pipeline review
- Suggested check: Before packaging this project, run the relevant install/config/quickstart check for: llada2 model/pipeline review. Context: Observed when using python
- Guardrail: State this as source-backed community evidence, not as Doramagic reproduction.
- Evidence: failure_mode_cluster:github_issue | fmev_b0fdcc0ebf367379b87fcad2dd642011 | https://github.com/huggingface/diffusers/issues/13598

## 8. Configuration risk - Configuration risk requires verification

- Severity: medium
- Evidence strength: source_linked
- Finding: Developers should check this configuration risk before relying on the project: universal method or class to load any model locally
- User impact: Developers may misconfigure credentials, environment, or host setup: universal method or class to load any model locally
- Suggested check: Before packaging this project, run the relevant install/config/quickstart check for: universal method or class to load any model locally. Context: Observed when using python
- Guardrail: State this as source-backed community evidence, not as Doramagic reproduction.
- Evidence: failure_mode_cluster:github_issue | fmev_8132f9310793351811bea343d379b680 | https://github.com/huggingface/diffusers/issues/13683

## 9. Capability evidence risk - Capability evidence risk requires verification

- Severity: medium
- Evidence strength: source_linked
- Finding: README/documentation is current enough for a first validation pass.
- User impact: May increase setup, validation, or first-run risk for the user.
- Suggested check: Reproduce the official install and quickstart path in an isolated environment.
- Evidence: capability.assumptions | github_repo:498011141 | https://github.com/huggingface/diffusers

## 10. Maintenance risk - Maintenance risk requires verification

- Severity: medium
- Evidence strength: source_linked
- Finding: Developers should check this migration risk before relying on the project: Diffusers 0.36.0: Pipelines galore, new caching method, training scripts, and more 🎄
- User impact: Upgrade or migration may change expected behavior: Diffusers 0.36.0: Pipelines galore, new caching method, training scripts, and more 🎄
- Suggested check: Before packaging this project, run the relevant install/config/quickstart check for: Diffusers 0.36.0: Pipelines galore, new caching method, training scripts, and more 🎄. Context: Observed when using python, cuda
- Guardrail: State this as source-backed community evidence, not as Doramagic reproduction.
- Evidence: failure_mode_cluster:github_release | fmev_fa85fd2586df0265d3c51e0547f8f9a5 | https://github.com/huggingface/diffusers/releases/tag/v0.36.0

## 11. Maintenance risk - Maintenance risk requires verification

- Severity: medium
- Evidence strength: source_linked
- Finding: Project evidence flags a maintenance risk. Review the linked source before relying on this workflow.
- User impact: May increase setup, validation, or first-run risk for the user.
- Suggested check: Reproduce the official install and quickstart path in an isolated environment.
- Evidence: evidence.maintainer_signals | github_repo:498011141 | https://github.com/huggingface/diffusers

## 12. Security or permission risk - Security or permission risk requires verification

- Severity: medium
- Evidence strength: source_linked
- Finding: no_demo
- User impact: May increase setup, validation, or first-run risk for the user.
- Suggested check: Reproduce the official install and quickstart path in an isolated environment.
- Evidence: downstream_validation.risk_items | github_repo:498011141 | https://github.com/huggingface/diffusers

## 13. Security or permission risk - Security or permission risk requires verification

- Severity: medium
- Evidence strength: source_linked
- Finding: no_demo
- User impact: May increase setup, validation, or first-run risk for the user.
- Suggested check: Reproduce the official install and quickstart path in an isolated environment.
- Evidence: risks.scoring_risks | github_repo:498011141 | https://github.com/huggingface/diffusers

## 14. Security or permission risk - Security or permission risk requires verification

- Severity: medium
- Evidence strength: source_linked
- Finding: Project evidence flags a security or permission risk. Review the linked source before relying on this workflow.
- User impact: May increase setup, validation, or first-run risk for the user.
- Suggested check: Reproduce the official install and quickstart path in an isolated environment.
- Evidence: community_evidence:github | cevd_7d913ba503ef40d4b21e8c1333da45e7 | https://github.com/huggingface/diffusers/issues/13598

## 15. Capability evidence risk - Capability evidence risk requires verification

- Severity: low
- Evidence strength: source_linked
- Finding: Developers should check this capability risk before relying on the project: FluxKlein Training Scripts - CFG issue
- User impact: Developers may hit a documented source-backed failure mode: FluxKlein Training Scripts - CFG issue
- Suggested check: Before packaging this project, run the relevant install/config/quickstart check for: FluxKlein Training Scripts - CFG issue. Context: Observed when using python
- Guardrail: State this as source-backed community evidence, not as Doramagic reproduction.
- Evidence: failure_mode_cluster:github_issue | fmev_b470a15abb735667f2a2922021e011c5 | https://github.com/huggingface/diffusers/issues/13762

## 16. Runtime risk - Runtime risk requires verification

- Severity: low
- Evidence strength: source_linked
- Finding: Developers should check this performance risk before relying on the project: Diffusers 0.33.0: New Image and Video Models, Memory Optimizations, Caching Methods, Remote VAEs, New Training Scripts, and more
- User impact: Upgrade or migration may change expected behavior: Diffusers 0.33.0: New Image and Video Models, Memory Optimizations, Caching Methods, Remote VAEs, New Training Scripts, and more
- Suggested check: Before packaging this project, run the relevant install/config/quickstart check for: Diffusers 0.33.0: New Image and Video Models, Memory Optimizations, Caching Methods, Remote VAEs, New Training Scripts, and more. Context: Observed when using python
- Guardrail: State this as source-backed community evidence, not as Doramagic reproduction.
- Evidence: failure_mode_cluster:github_release | fmev_d763adcd6d4ede5242447044a7574865 | https://github.com/huggingface/diffusers/releases/tag/v0.33.0

## 17. Runtime risk - Runtime risk requires verification

- Severity: low
- Evidence strength: source_linked
- Finding: Developers should check this performance risk before relying on the project: 🐞 fixes for `transformers` models, imports,
- User impact: Upgrade or migration may change expected behavior: 🐞 fixes for `transformers` models, imports,
- Suggested check: Before packaging this project, run the relevant install/config/quickstart check for: 🐞 fixes for `transformers` models, imports,. Context: Source discussion did not expose a precise runtime context.
- Guardrail: State this as source-backed community evidence, not as Doramagic reproduction.
- Evidence: failure_mode_cluster:github_release | fmev_7f4820a78f24615780e1268ed8d8635c | https://github.com/huggingface/diffusers/releases/tag/v0.35.2

## 18. Maintenance risk - Maintenance risk requires verification

- Severity: low
- Evidence strength: source_linked
- Finding: issue_or_pr_quality=unknown。
- User impact: May increase setup, validation, or first-run risk for the user.
- Suggested check: Reproduce the official install and quickstart path in an isolated environment.
- Evidence: evidence.maintainer_signals | github_repo:498011141 | https://github.com/huggingface/diffusers

## 19. Maintenance risk - Maintenance risk requires verification

- Severity: low
- Evidence strength: source_linked
- Finding: release_recency=unknown。
- User impact: May increase setup, validation, or first-run risk for the user.
- Suggested check: Reproduce the official install and quickstart path in an isolated environment.
- Evidence: evidence.maintainer_signals | github_repo:498011141 | https://github.com/huggingface/diffusers

## 20. Maintenance risk - Maintenance risk requires verification

- Severity: low
- Evidence strength: source_linked
- Finding: Developers should check this maintenance risk before relying on the project: Diffusers 0.34.0: New Image and Video Models, Better torch.compile Support, and more
- User impact: Upgrade or migration may change expected behavior: Diffusers 0.34.0: New Image and Video Models, Better torch.compile Support, and more
- Suggested check: Before packaging this project, run the relevant install/config/quickstart check for: Diffusers 0.34.0: New Image and Video Models, Better torch.compile Support, and more. Context: Observed when using python, cuda
- Guardrail: State this as source-backed community evidence, not as Doramagic reproduction.
- Evidence: failure_mode_cluster:github_release | fmev_60f82a49d02d4d6d625125e6e84d4870 | https://github.com/huggingface/diffusers/releases/tag/v0.34.0

## 21. Maintenance risk - Maintenance risk requires verification

- Severity: low
- Evidence strength: source_linked
- Finding: Developers should check this maintenance risk before relying on the project: Diffusers 0.37.0: Modular Diffusers, New image and video pipelines, multiple core library improvements, and more 🔥
- User impact: Upgrade or migration may change expected behavior: Diffusers 0.37.0: Modular Diffusers, New image and video pipelines, multiple core library improvements, and more 🔥
- Suggested check: Before packaging this project, run the relevant install/config/quickstart check for: Diffusers 0.37.0: Modular Diffusers, New image and video pipelines, multiple core library improvements, and more 🔥. Context: Observed when using python
- Guardrail: State this as source-backed community evidence, not as Doramagic reproduction.
- Evidence: failure_mode_cluster:github_release | fmev_695dcddd384eac023272e66b39460d68 | https://github.com/huggingface/diffusers/releases/tag/v0.37.0

## 22. Maintenance risk - Maintenance risk requires verification

- Severity: low
- Evidence strength: source_linked
- Finding: Developers should check this maintenance risk before relying on the project: Diffusers 0.38.0: New image and audio pipelines, Core library improvements, and more
- User impact: Upgrade or migration may change expected behavior: Diffusers 0.38.0: New image and audio pipelines, Core library improvements, and more
- Suggested check: Before packaging this project, run the relevant install/config/quickstart check for: Diffusers 0.38.0: New image and audio pipelines, Core library improvements, and more. Context: Observed when using python
- Guardrail: State this as source-backed community evidence, not as Doramagic reproduction.
- Evidence: failure_mode_cluster:github_release | fmev_2b3387e7ee4b4da3756de41a18fa3917 | https://github.com/huggingface/diffusers/releases/tag/v0.38.0

## 23. Maintenance risk - Maintenance risk requires verification

- Severity: low
- Evidence strength: source_linked
- Finding: Developers should check this maintenance risk before relying on the project: Fixes for AutoModel type hints in Modular Pipelines and Flux Klein LoRA loading
- User impact: Upgrade or migration may change expected behavior: Fixes for AutoModel type hints in Modular Pipelines and Flux Klein LoRA loading
- Suggested check: Before packaging this project, run the relevant install/config/quickstart check for: Fixes for AutoModel type hints in Modular Pipelines and Flux Klein LoRA loading. Context: Observed when using python
- Guardrail: State this as source-backed community evidence, not as Doramagic reproduction.
- Evidence: failure_mode_cluster:github_release | fmev_74385080afed2c71332330aef25a47cf | https://github.com/huggingface/diffusers/releases/tag/v0.37.1

## 24. Maintenance risk - Maintenance risk requires verification

- Severity: low
- Evidence strength: source_linked
- Finding: Developers should check this maintenance risk before relying on the project: v0.33.1: fix ftfy import
- User impact: Upgrade or migration may change expected behavior: v0.33.1: fix ftfy import
- Suggested check: Before packaging this project, run the relevant install/config/quickstart check for: v0.33.1: fix ftfy import. Context: Observed when using python
- Guardrail: State this as source-backed community evidence, not as Doramagic reproduction.
- Evidence: failure_mode_cluster:github_release | fmev_f5f4ba2237878ce924ac3697471a109e | https://github.com/huggingface/diffusers/releases/tag/v0.33.1

<!-- canonical_name: huggingface/diffusers; human_manual_source: deepwiki_human_wiki -->