Doramagic Project Pack Β· Human Manual

diffusers

Diffusers serves as a modular toolbox for pretrained diffusion models. According to the project philosophy, the library embraces the following design principles (Source: PHILOSOPHY.md):

Getting Started with Diffusers

Related topics: System Architecture, Pipelines Overview

Section Related Pages

Continue reading this section for the full explanation and source context.

Section Basic Installation

Continue reading this section for the full explanation and source context.

Section Installing from Source

Continue reading this section for the full explanation and source context.

Section Example-Specific Dependencies

Continue reading this section for the full explanation and source context.

Related topics: System Architecture, Pipelines Overview

Getting Started with Diffusers

Diffusers is a state-of-the-art library for diffusion models, providing researchers and practitioners with modular, flexible, and efficient tools for image, audio, and video generation. This page serves as a comprehensive guide for getting started with Diffusers, covering installation, core concepts, model loading, and common usage patterns.

Overview

Diffusers serves as a modular toolbox for pretrained diffusion models. According to the project philosophy, the library embraces the following design principles (Source: PHILOSOPHY.md):

  • Reusability: Pipelines should be self-contained and reusable
  • Composability: Smaller building blocks like attention.py, resnet.py, and embeddings.py should be composable
  • Flexibility: Models should expose complexity and give clear error messages
  • Performance: Models can be optimized without major code changes while maintaining backward compatibility

The library supports a wide range of tasks including text-to-image, image-to-image, inpainting, video generation, and more. Recent releases (v0.33.0 through v0.38.0) have introduced numerous new pipelines including Wan 2.1/2.2, Flux variants, LLaDA2, and specialized ControlNet implementations.

Installation

Basic Installation

To install the latest stable version of Diffusers:

pip install diffusers

For GPU acceleration (recommended):

pip install diffusers[torch]

Installing from Source

For the latest features and example scripts, install from source:

git clone https://github.com/huggingface/diffusers
cd diffusers
pip install .

Source: examples/README.md

Example-Specific Dependencies

Training scripts and community examples may require additional dependencies:

cd examples  # Navigate to the specific example folder
pip install -r requirements.txt
[!IMPORTANT]
Example scripts frequently depend on the latest library version. Always install from source to ensure compatibility.

Core Concepts

Understanding Diffusers requires familiarity with three fundamental building blocks: Pipelines, Models, and Schedulers.

Architectural Overview

graph TD
    A[User Input] --> B[DiffusionPipeline]
    B --> C[Models]
    B --> D[Schedulers]
    B --> E[Tokenizers/Processors]
    C --> F[UNet2D / Transformer2D]
    C --> G[VAE]
    D --> H[Noise Schedule]
    F --> I[Latent Space]
    G --> J[Generated Output]
    style B fill:#e1f5fe
    style C fill:#fff3e0
    style D fill:#e8f5e8

Pipelines

Pipelines are the high-level API that orchestrates the entire diffusion process. They combine models, schedulers, and optional components like tokenizers or control networks into a cohesive inference workflow.

Key pipeline characteristics (Source: src/diffusers/pipelines/pipeline_utils.py):

Pipeline TypeDescriptionTypical Use Case
DiffusionPipelineBase pipeline classCustom implementations
StableDiffusionPipelineSD 1.x text-to-imageGeneral image generation
StableDiffusionXLPipelineSDXL optimizedHigh-quality image generation
StableDiffusionControlNetPipelineWith ControlNetControlled generation
AutoPipelineTask-agnosticFlexible pipeline selection

Models

Diffusers models are PyTorch modules that inherit from ModelMixin and ConfigMixin. They are designed to be:

  • Composable from smaller building blocks
  • Configurable with clear parameter handling
  • Optimizable for memory and compute efficiency

Source: PHILOSOPHY.md

Common model architectures include:

ModelDescriptionLocation
UNet2DConditionModelConditioning UNet for text-to-imagesrc/diffusers/models/unets/
AutoencoderKLVAE for latent operationssrc/diffusers/models/autoencoders/
Transformer2DModelTransformer-based diffusionsrc/diffusers/models/transformers/
ControlNetModelControlNet conditioningsrc/diffusers/models/controlnet/

Schedulers

Schedulers implement various diffusion sampling strategies. The library supports numerous scheduling algorithms:

SchedulerA1111 EquivalentCharacteristics
DDPMSchedulerDDPMHigh-quality, many steps
DDIMSchedulerDDIMFast convergence
DPMSolverMultistepSchedulerDPM++ 2MFast, good quality
EulerDiscreteSchedulerEulerSimple, fast
EulerAncestralDiscreteSchedulerEuler aAncestral sampling
UniPCMultistepSchedulerUniPCVery fast convergence

Source: github.com/huggingface/diffusers/issues/4167

Loading Models and Pipelines

The library provides multiple ways to load models and pipelines, addressing common community needs around universal model loading.

The DiffusionPipeline is the recommended entry point for loading pretrained models:

from diffusers import DiffusionPipeline

# Load from Hugging Face Hub
pipeline = DiffusionPipeline.from_pretrained(
    "runwayml/stable-diffusion-v1-5",
    torch_dtype=torch.float16,
    use_safetensors=True
)

# Move to GPU
pipeline = pipeline.to("cuda")

Source: src/diffusers/pipelines/pipeline_loading_utils.py

Using AutoModel Classes

For loading individual model components, use the AutoModel classes:

from diffusers import AutoModel, AutoTokenizer

# Load a model from config automatically
model = AutoModel.from_pretrained(
    "path/to/model",
    torch_dtype=torch.float16,
    variant="fp16"
)

The AutoModel class determines the appropriate model class from the configuration:

# Source: src/diffusers/models/auto_model.py
if "_class_name" in config:
    class_name = config["_class_name"]
    library = "diffusers"
elif "model_type" in config:
    class_name = "AutoModel"
    library = "transformers"

Source: src/diffusers/models/auto_model.py

Loading Single-File Checkpoints

For custom models stored in single checkpoint files (including GGUF formats in supported models):

from diffusers import SomeModelClass

# Load from a single checkpoint file
model = SomeModelClass.from_single_file(
    "path/to/checkpoint.safetensors",
    config="path/to/config.json"  # Optional: provide config
)
[!NOTE]
The from_single_file method is available on models that inherit from FromOriginalModelMixin. Source: src/diffusers/loaders/single_file.py

The loading logic determines the appropriate method:

# Source: src/diffusers/modular_pipelines/modular_pipeline_utils.py
load_method = (
    getattr(self.type_hint, "from_single_file")
    if is_single_file
    else getattr(self.type_hint, "from_pretrained")
)

Loading with Trust Remote Code

Some models require executing custom code from the repository:

pipeline = DiffusionPipeline.from_pretrained(
    "some/model-with-custom-code",
    trust_remote_code=True
)

When trust_remote_code=True is not set and custom code is detected, the library raises:

ValueError: The repository for {pretrained_model_name_or_path} contains custom code 
which must be executed to correctly load the model.

Source: src/diffusers/utils/dynamic_modules_utils.py

Basic Usage Patterns

Text-to-Image Generation

import torch
from diffusers import DiffusionPipeline

pipeline = DiffusionPipeline.from_pretrained(
    "runwayml/stable-diffusion-v1-5",
    torch_dtype=torch.float16
)
pipeline = pipeline.to("cuda")

prompt = "a photo of an astronaut riding a horse on mars"
image = pipeline(prompt).images[0]
image.save("output.png")

Image-to-Image Generation

from diffusers import StableDiffusionImg2ImgPipeline
from diffusers.utils import load_image

pipe = StableDiffusionImg2ImgPipeline.from_pretrained(
    "runwayml/stable-diffusion-v1-5",
    torch_dtype=torch.float16
)
pipe = pipe.to("cuda")

init_image = load_image("path/to/input.jpg").resize((768, 768))
image = pipe(prompt="modern art style", image=init_image).images[0]

Inpainting with ControlNet

from diffusers import StableDiffusionControlNetPipeline, ControlNetModel
from diffusers.utils import load_image
import numpy as np
import cv2

# Load controlnet and pipeline
controlnet = ControlNetModel.from_pretrained(
    "lllyasviel/sd-controlnet-canny",
    torch_dtype=torch.float16
)
pipeline = StableDiffusionControlNetPipeline.from_pretrained(
    "runwayml/stable-diffusion-v1-5",
    controlnet=controlnet,
    torch_dtype=torch.float16
)

# Prepare control image
prompt = "your prompt"
control_image = load_image("path/to/control.jpg")

image = pipeline(prompt, image=control_image).images[0]

Using Schedulers

Schedulers can be swapped for the same pipeline:

from diffusers import StableDiffusionPipeline, DPMSolverMultistepScheduler

pipeline = StableDiffusionPipeline.from_pretrained(
    "runwayml/stable-diffusion-v1-5",
    torch_dtype=torch.float16
)

# Replace default scheduler with DPM++ 2M Karras
pipeline.scheduler = DPMSolverMultistepScheduler.from_config(
    pipeline.scheduler.config,
    use_karras_sigmas=True,
    algorithm_type="dpmsolver++"
)

Modular Pipelines

Introduced in v0.37.0, Modular Pipelines allow composing pipelines from reusable building blocks:

graph LR
    A[Transformer] --> B[ModularPipeline]
    C[VAE] --> B
    D[Scheduler] --> B
    E[Text Encoder] --> B
    F[Input] --> B
    B --> G[Output]

Creating Modular Pipelines

Modular pipelines are defined with a modular_model_index.json that specifies component types and loading hints:

# Source: src/diffusers/modular_pipelines/modular_pipeline_utils.py
# Components can be loaded with or without type hints
if self.type_hint is None:
    component = AutoModel.from_pretrained(pretrained_model_name_or_path, **load_kwargs, **kwargs)
else:
    load_method = (
        getattr(self.type_hint, "from_single_file")
        if is_single_file
        else getattr(self.type_hint, "from_pretrained")
    )
    component = load_method(pretrained_model_name_or_path, **load_kwargs, **kwargs)

Community Scripts

The community contributes additional pipeline implementations and utilities through community scripts:

ExampleDescriptionCode Example
IP-Adapter Negative NoiseUsing negative noise with IP-Adapter for better controlLink
Asymmetric TilingConfigure seamless image tiling for X and Y axes independentlyLink
Prompt Scheduling CallbackDynamic prompt modification during generationLink

Source: examples/community/README_community_scripts.md

Using Community Scripts

# Load a community pipeline
from diffusers import DiffusionPipeline

pipeline = DiffusionPipeline.from_pretrained(
    "diffusers/community-pipeline",
    variant="v1",
    use_safetensors=True
)
[!IMPORTANT]
Community scripts are maintained by contributors. If a community script doesn't work as expected, please open an issue and ping the author.

Training Scripts

Diffusers provides training scripts for various tasks:

ScriptLocationUse Case
train_uncond.pyexamples/Unconditional image generation
train_controlnet.pyexamples/controlnet/ControlNet training
train_dreambooth.pyexamples/dreambooth/DreamBooth personalization
train_lora.pyexamples/lora/LoRA fine-tuning

Source: examples/README.md

ControlNet Training Example

from diffusers import (
    AutoencoderKL,
    ControlNetModel,
    DDPMScheduler,
    StableDiffusionControlNetPipeline,
    UniPCMultistepScheduler,
)
from diffusers.optimization import get_scheduler

# Initialize models
controlnet = ControlNetModel.from_pretrained(
    "path/to/controlnet",
    torch_dtype=torch.float16
)

pipeline = StableDiffusionControlNetPipeline.from_pretrained(
    "runwayml/stable-diffusion-v1-5",
    controlnet=controlnet,
    torch_dtype=torch.float16
)

Source: examples/controlnet/train_controlnet.py

Common Configuration Options

Pipeline Loading Options

ParameterTypeDefaultDescription
pretrained_model_name_or_pathstrRequiredModel identifier or local path
torch_dtypetorch.dtypeNoneData type for model weights
variantstrNoneModel variant (e.g., "fp16", "onnx")
use_safetensorsboolNoneUse SafeTensors format if available
local_files_onlyboolFalseOnly use local files
force_downloadboolFalseForce download even if cached
cache_dirstrNoneCustom cache directory
tokenstrNoneHugging Face API token
revisionstrNoneGit revision
trust_remote_codeboolFalseExecute remote code

Device Placement

# Move entire pipeline to device
pipeline = pipeline.to("cuda")

# Or move individual components
pipeline.unet = pipeline.unet.to("cuda")
pipeline.vae = pipeline.vae.to("cpu")  # Offload VAE to save memory

Common Issues and Solutions

Model Loading Failures

Issue: Models fail to load with config mismatch errors.

Solution: Check that model components are compatible. Use use_safetensors=True and verify the model card for requirements.

Memory Optimization

Issue: Out of memory errors during inference.

Solutions:

# Enable CPU offloading
pipeline.enable_model_cpu_offload()

# Enable sequential CPU offloading
pipeline.enable_sequential_cpu_offload()

# Use attention slicing
pipeline.enable_attention_slicing()

# Enable VAE tiling for large images
pipeline.enable_vae_tiling()

Custom Model Loading

Issue: Community request for universal model loading (see Issue #13683).

Approach: For custom models or GGUF files, verify if from_single_file method is available on the model's class. If not, consider using the base model class with appropriate configuration.

# Universal loading attempt pattern
from diffusers import AutoModel

try:
    model = AutoModel.from_pretrained("path/to/model")
except Exception as e:
    # Fallback to single file loading if supported
    model = SomeModelClass.from_single_file("path/to/checkpoint")

Scheduler Compatibility

Issue: Scheduler mapping confusion between A1111 and Diffusers (see Issue #4167).

Solution: Use the scheduler mapping table to find equivalent schedulers. Karras variants have use_karras_sigmas=True.

See Also

Source: https://github.com/huggingface/diffusers / Human Manual

System Architecture

Related topics: Pipelines Overview, Loaders & Adapters

Section Related Pages

Continue reading this section for the full explanation and source context.

Section Core Abstractions

Continue reading this section for the full explanation and source context.

Section Design Philosophy

Continue reading this section for the full explanation and source context.

Section ModelMixin and ConfigMixin

Continue reading this section for the full explanation and source context.

Related topics: Pipelines Overview, Loaders & Adapters

System Architecture

Overview

The Hugging Face Diffusers library provides a modular, flexible architecture for diffusion-based generative models. The system is designed around composable building blocks that enable both inference and training across image, video, audio, and text generation tasks. The architecture emphasizes separation of concerns between models (the neural network weights), schedulers (the sampling algorithms), and pipelines (the orchestration layer that combines components).

Source: PHILOSOPHY.md:1-50

High-Level Architecture

The Diffusers library follows a layered architectural approach with three primary abstractions:

graph TD
    A[User Code] --> B[Pipeline Layer]
    B --> C[Model Layer]
    B --> D[Scheduler Layer]
    C --> E[Transformer/UNet]
    C --> F[VAE/Encoder-Decoder]
    C --> G[Text Encoder]
    D --> H[Scheduler Implementations]
    
    style B fill:#e1f5fe
    style C fill:#fff3e0
    style D fill:#e8f5e9

Core Abstractions

LayerPurposeKey Classes
PipelineOrchestration and end-to-end workflowsDiffusionPipeline, StableDiffusionPipeline
ModelNeural network architecturesModelMixin, ConfigMixin, AutoModel
SchedulerDiffusion sampling algorithmsSchedulerMixin, various scheduler implementations

Source: src/diffusers/pipelines/pipeline_utils.py:1-100

Model Architecture

Design Philosophy

Models in Diffusers are designed to expose complexity while providing clear error messages, following principles inspired by PyTorch's Module class. The architecture prioritizes modularity and extensibility, using smaller building blocks rather than monolithic model files.

Key principles from the project philosophy:

  • Models make use of smaller building blocks such as attention.py, resnet.py, and embeddings.py
  • Models do not follow the single-file policy used in Transformers
  • All models inherit from ModelMixin and ConfigMixin
  • Models should by default have the highest precision and lowest performance setting
  • New model checkpoints should adapt existing architectures when possible

Source: PHILOSOPHY.md:1-30

ModelMixin and ConfigMixin

All Diffusers models inherit from two base classes:

# From src/diffusers/models/modeling_utils.py (conceptual)
class ModelMixin:
    """Base class for all Diffusers models."""
    
    @classmethod
    def from_pretrained(cls, pretrained_model_name_or_path, **kwargs):
        """Load a pretrained model."""
        pass
    
    def save_pretrained(self, save_directory):
        """Save a model to a directory."""
        pass

class ConfigMixin:
    """Base class for configuration classes."""
    
    @classmethod
    def from_config(cls, config, **kwargs):
        """Create a model from a configuration."""
        pass
    
    def save_config(self, save_directory):
        """Save configuration to a directory."""
        pass

These base classes provide consistent serialization and deserialization patterns across all model types.

AutoModel System

The AutoModel system provides automatic model discovery and loading based on model configuration. It resolves model classes from configuration files and supports both Diffusers-native and Transformers models.

# From src/diffusers/models/auto_model.py
class AutoModel:
    @classmethod
    def from_config(cls, config, **kwargs):
        # Determines the appropriate model class from config
        # Supports _class_name for Diffusers models
        # Supports model_type for Transformers models
        pass
    
    @classmethod
    def from_pretrained(cls, pretrained_model_name_or_path, **kwargs):
        # Loads pretrained weights
        pass

The AutoModel system checks configuration for either _class_name (for Diffusers models) or model_type (for Transformers models) to determine the appropriate class to instantiate.

Source: src/diffusers/models/auto_model.py:1-80

Pipeline Architecture

DiffusionPipeline

The DiffusionPipeline serves as the main entry point for inference. It orchestrates the loading and connection of multiple components:

graph LR
    A[Config/Index] --> B[DiffusionPipeline]
    B --> C[UNet2DConditionModel]
    B --> D[AutoencoderKL]
    B --> E[Text Encoder]
    B --> F[Tokenizer]
    B --> G[Scheduler]

The pipeline handles:

  1. Component discovery from configuration files
  2. Model loading with appropriate device placement
  3. Scheduler integration and timestep management
  4. End-to-end generation workflows

Source: src/diffusers/pipelines/pipeline_utils.py:100-200

Pipeline Loading Mechanisms

The library supports multiple model loading strategies:

Loading MethodUse CaseKey Parameter
from_pretrained()Standard HuggingFace Hub modelspretrained_model_name_or_path
from_single_file()Single checkpoint files (CKPT, Safetensors)checkpoint_path
AutoModelAuto-detection of model typesConfiguration-based

Source: src/diffusers/pipelines/pipeline_loading_utils.py:1-80

Single File Loading

The from_single_file method enables loading models from single checkpoint files. This is particularly important for community models and custom checkpoints that may not follow the standard directory structure.

# From src/diffusers/loaders/single_file.py
class FromOriginalModelMixin:
    @classmethod
    def from_single_file(
        cls,
        pretrained_model_link_or_path_or_dict,
        original_config=None,
        config=None,
        **kwargs
    ):
        """Load a model from a single checkpoint file."""
        pass

The single file loader:

  • Detects model type from checkpoint structure
  • Optionally applies original configuration files
  • Supports GGUF quantized models

Source: src/diffusers/loaders/single_file.py:1-100

Model Type Detection

When loading models, Diffusers determines the appropriate loading strategy:

# From src/diffusers/pipelines/pipeline_loading_utils.py
is_transformers_model = (
    is_transformers_available()
    and issubclass(class_obj, PreTrainedModel)
    and transformers_version >= version.parse("4.20.0")
)

is_diffusers_model = issubclass(class_obj, diffusers_module.ModelMixin)
is_diffusers_single_file_model = issubclass(class_obj, diffusers_module.FromOriginalModelMixin)

This detection determines whether to use Transformers-style loading, Diffusers-native loading, or single-file loading.

Source: src/diffusers/pipelines/pipeline_loading_utils.py:20-50

Modular Diffusers

Introduced in Diffusers 0.37.0, Modular Diffusers provides a new way to build pipelines by composing reusable blocks. Instead of writing entire pipelines from scratch, developers can mix and match building blocks to create custom workflows.

Source: Diffusers 0.37.0 Release Notes

ModularPipeline Components

graph TD
    A[ModularPipeline] --> B[Transformer2DModel]
    A --> C[VAE]
    A --> D[TextEncoder]
    A --> E[Scheduler]
    B --> F[Attention]
    B --> G[ResNet]
    F --> H[Embeddings]

The modular system uses type hints to determine the correct loading method for each component:

# From src/diffusers/modular_pipelines/modular_pipeline_utils.py
load_method = (
    getattr(self.type_hint, "from_single_file")
    if is_single_file
    else getattr(self.type_hint, "from_pretrained")
)

Source: src/diffusers/modular_pipelines/modular_pipeline_utils.py:1-80

Scheduler System

SchedulerMixin Base Class

All schedulers inherit from SchedulerMixin, which provides a common interface for:

  • Setting timesteps
  • Scaling model inputs
  • Computing denoised images
  • Stepping through the diffusion process

The scheduler system implements various diffusion sampling algorithms including:

SchedulerDescriptionUse Case
DDPMSchedulerDenoising Diffusion Probabilistic ModelsTraining and sampling
DDIMSchedulerDenoising Diffusion Implicit ModelsFast sampling
PNDMSchedulerPseudo Numerical MethodsBalanced speed/quality
LMSDiscreteSchedulerLinear Multistep SchedulerAlternative timestepping
EulerDiscreteSchedulerEuler methodSimple, fast
EulerAncestralDiscreteSchedulerEuler with ancestral samplingDiverse outputs
KarrasDiffusionSchedulersSchedulers with Karras noise scheduleImproved quality

Source: src/diffusers/schedulers/__init__.py

Scheduler-Pipeline Coupling

Schedulers are loosely coupled with pipelines, allowing users to swap schedulers to experiment with different sampling strategies:

from diffusers import StableDiffusionPipeline, DDIMScheduler

pipeline = StableDiffusionPipeline.from_pretrained("runwayml/stable-diffusion-v1-5")
pipeline.scheduler = DDIMScheduler.from_config(pipeline.scheduler.config)

Quantization Support

GGUF Quantization

Diffusers supports loading GGUF-quantized models through the GGUFQuantizer class. This enables efficient inference on reduced precision models.

# From src/diffusers/quantizers/gguf/gguf_quantizer.py
class GGUFQuantizer(DiffusersQuantizer):
    use_keep_in_fp32_modules = True
    
    def __init__(self, quantization_config, **kwargs):
        self.compute_dtype = quantization_config.compute_dtype
        self.pre_quantized = quantization_config.pre_quantized
        self.modules_to_not_convert = quantization_config.modules_to_not_convert or []

The GGUF quantizer:

  • Supports pre-quantized models from community repositories
  • Maintains FP32 precision for sensitive modules
  • Requires accelerate>=0.26.0

Source: src/diffusers/quantizers/gguf/gguf_quantizer.py:1-60

Model Loading Flow

sequenceDiagram
    participant User
    participant Pipeline
    participant AutoModel
    participant HubUtils
    participant Model
    
    User->>Pipeline: from_pretrained(model_id)
    Pipeline->>HubUtils: hf_hub_download(config.json)
    HubUtils-->>Pipeline: config
    Pipeline->>AutoModel: from_config(config)
    AutoModel->>AutoModel: detect_model_type(config)
    AutoModel->>HubUtils: hf_hub_download(weights)
    HubUtils-->>AutoModel: weights
    AutoModel->>Model: __init__() + load_state_dict()
    Model-->>AutoModel: model
    AutoModel-->>Pipeline: component

The loading process follows these steps:

  1. Configuration Loading: Download and parse config.json from the hub
  2. Model Type Detection: Determine if model is Diffusers-native, Transformers, or single-file
  3. Weight Download: Fetch model weights from the appropriate source
  4. Model Instantiation: Create model with empty weights, then load state dict
  5. Device Placement: Move model to appropriate device (CPU/CUDA)

Source: src/diffusers/utils/hub_utils.py:1-100

Common Component Patterns

Model Components Table

ComponentFilePurpose
Attentionattention.pySelf-attention and cross-attention mechanisms
ResNetresnet.pyResidual connections for deep networks
Embeddingsembeddings.pyTimestep and text embeddings
UNetunet_2d_blocks.pyU-Net architecture for image generation
VAEvae.pyVariational Autoencoder for latent spaces

Source: PHILOSOPHY.md:5-15

Lazy Import System

Diffusers uses lazy imports to minimize startup time and reduce memory footprint:

# Pipelines defer loading of heavy dependencies until first use
# From src/diffusers/pipelines/pipeline_utils.py
def __getattr__(self, name):
    if name in self._optional_components:
        # Import only when accessed
        import optional_module
        return getattr(optional_module, name)

Configuration Options

Common Pipeline Parameters

ParameterTypeDefaultDescription
pretrained_model_name_or_pathstrRequiredModel identifier or local path
torch_dtypetorch.dtypeNoneData type for model weights
variantstrNoneModel variant (e.g., 'fp16', 'fp32')
use_safetensorsboolNoneUse safetensors format
local_files_onlyboolFalseOnly use local files
revisionstrNoneGit revision for Hub models

Model Loading Configuration

ParameterPurposeSource
config.jsonModel architectureHuggingFace Hub
model_index.jsonPipeline component mappingPipeline root
config.yamlAdditional metadataOptional
diffusion_pytorch_model.binModel weightsPrimary weight file

Common Failure Modes

Based on community issues and documentation, users frequently encounter these architectural challenges:

1. Model Type Mismatch

Issue: Loading custom models fails with config mismatch errors.

Cause: The configuration file doesn't match expected structure.

Solution: Use from_single_file() with explicit configuration or provide a custom config.

Source: Community Issue #13683

2. Scheduler Compatibility

Issue: Swapping schedulers produces unexpected results.

Cause: Not all schedulers are compatible with all pipelines.

Solution: Use schedulers designed for the same discretization approach.

Source: Community Issue #4167

3. ModularPipeline Type Hints

Issue: AutoModel type hints in modular_model_index.json cause loading failures.

Cause: Type hint resolution fails for generic AutoModel classes.

Solution: Use specific model classes or provide explicit type hints.

Source: Diffusers 0.37.1 Release Notes

4. Transformer/GGUF Version Requirements

Issue: GGUF loading fails with version compatibility errors.

Cause: Missing or incompatible accelerate version.

Solution: Ensure accelerate>=0.26.0 is installed.

Source: src/diffusers/quantizers/gguf/gguf_quantizer.py:20-30

Extension Points

Adding Custom Models

To integrate new model checkpoints:

  1. Create or adapt an existing model architecture
  2. Implement ModelMixin and ConfigMixin interfaces
  3. Add configuration handling for the new checkpoint format
  4. Register the model in src/diffusers/models/__init__.py

Source: PHILOSOPHY.md:40-50

Custom Pipelines

For fundamentally different architectures, create a new pipeline class:

  1. Inherit from DiffusionPipeline
  2. Define components as class attributes
  3. Implement the __call__ method for generation
  4. Add configuration parsing

Best Practices

Performance Optimization

  1. Use torch_dtype=torch.float16 for faster inference on compatible hardware
  2. Enable use_safetensors=True for faster model loading
  3. Use variant='fp16' when available to download pre-converted weights
  4. Enable attention slicing for reduced memory usage

Model Selection

Use CaseRecommended Approach
Standard modelsDiffusionPipeline.from_pretrained()
Community modelsfrom_single_file()
Custom architecturesAutoModel.from_config()
Quantized modelsGGUF quantizer

See Also

Source: https://github.com/huggingface/diffusers / Human Manual

Pipelines Overview

Related topics: Modular Diffusers, System Architecture

Section Related Pages

Continue reading this section for the full explanation and source context.

Section Core Components

Continue reading this section for the full explanation and source context.

Section Pipeline Class Hierarchy

Continue reading this section for the full explanation and source context.

Section Standard Loading with frompretrained

Continue reading this section for the full explanation and source context.

Related topics: Modular Diffusers, System Architecture

Pipelines Overview

Introduction

Pipelines are the primary high-level API in Diffusers for running diffusion models for inference. They provide a unified interface that orchestrates multiple componentsβ€”including models, schedulers, tokenizers, and processorsβ€”to generate outputs from pretrained checkpoints. Pipelines abstract away the complexity of the diffusion process, allowing users to perform inference with just a few lines of code.

The Diffusers library ships with pipelines for diverse generation tasks including text-to-image, image-to-image, inpainting, video generation, audio generation, and text generation. Each pipeline is designed to be modular, allowing components to be swapped or customized as needed.

Source: src/diffusers/pipelines/README.md

Architecture

Core Components

A pipeline typically consists of several interconnected components that work together during the diffusion process:

graph TD
    A[Pipeline] --> B[UNet / Transformer]
    A --> C[Scheduler]
    A --> D[VAE / Encoder-Decoder]
    A --> E[Text Encoder / Tokenizer]
    A --> F[Safety Checker]
    
    B --> C
    C --> B
    
    G[Input] --> A
    A --> H[Output]
    
    G --> E
    E --> B
ComponentPurposeCommon Classes
UNet/TransformerCore denoising network that predicts noise in the latent spaceUNet2DConditionModel, FluxTransformer2DModel
SchedulerControls the diffusion timestep schedule and noise addition/removalDDPMScheduler, DDIMScheduler, DPMSolverMultistepScheduler
VAEEncodes images to latent space and decodes latents back to imagesAutoencoderKL, AutoencoderTiny
Text EncoderConverts text prompts into embeddings understood by the modelCLIPTextModel, T5EncoderModel
Safety CheckerFilters potentially unsafe outputsStableDiffusionSafetyChecker

Source: src/diffusers/pipelines/pipeline_utils.py

Pipeline Class Hierarchy

Diffusers uses a mixin-based architecture for pipelines, allowing for flexible composition of functionality:

graph TD
    A[DiffusionPipeline<br/>Base Class] --> B[StableDiffusionMixin]
    A --> C[StableDiffusionLuminaMixin]
    A --> D[AutoPipelineMixin]
    
    B --> E[StableDiffusionPipeline]
    B --> F[StableDiffusionImg2ImgPipeline]
    B --> G[StableDiffusionInpaintPipeline]
    
    D --> H[AutoPipeline]
    D --> I[AutoEncoder倩堂Pipeline]

All pipelines inherit from DiffusionPipeline, which provides core functionality such as from_pretrained() and save_pretrained() methods.

Source: src/diffusers/pipelines/pipeline_utils.py:90-139

Loading Pipelines

Standard Loading with `from_pretrained`

The primary method for loading a pipeline is through the from_pretrained() class method. This method accepts either a Hugging Face Hub repository ID or a local directory path.

from diffusers import StableDiffusionPipeline

# Load from Hugging Face Hub
pipeline = StableDiffusionPipeline.from_pretrained(
    "runwayml/stable-diffusion-v1-5",
    torch_dtype=torch.float16
)

# Load from local directory
pipeline = StableDiffusionPipeline.from_pretrained(
    "./local/stable-diffusion-v1-5"
)

The method requires a model_index.json file in the repository or directory, which defines all components that should be loaded. Each component is specified in the format <name>: ["<library>", "<class_name>"].

Source: src/diffusers/pipelines/README.md

AutoPipeline

AutoPipeline is a universal pipeline loader that automatically detects and loads the appropriate pipeline class based on the model configuration. This addresses the community need for a "universal method to load any model" mentioned in issue #13683.

from diffusers import AutoPipeline

# Automatically detects pipeline type
pipeline = AutoPipeline.from_pretrained(
    "runwayml/stable-diffusion-v1-5",
    torch_dtype=torch.float16
)

The AutoPipeline class maintains a registry of supported pipeline types and uses type hints to determine the correct pipeline class when loading from modular_model_index.json files introduced in v0.37.0.

Source: src/diffusers/pipelines/auto_pipeline.py

Model Loading Internals

When loading a model, Diffusers follows a specific sequence to determine the appropriate loading mechanism:

graph TD
    A[from_pretrained called] --> B{Is Transformers model?}
    B -->|Yes| C[Use PreTrainedModel.from_pretrained]
    B -->|No| D{Is Diffusers model?}
    D -->|Yes| E[Load config, create empty model<br/>with init_empty_weights, then load]
    D -->|No| F[Try AutoModel]
    
    C --> G[Return model]
    E --> G
    F --> G

For Diffusers models, the library first loads the configuration, creates an empty model on meta devices, then loads the weights. For Transformers models, it delegates to the Transformers library's loading mechanism.

Source: src/diffusers/pipelines/pipeline_loading_utils.py

Loading Parameters

ParameterTypeDefaultDescription
pretrained_model_name_or_pathstr or PathRequiredModel identifier or local path
torch_dtypetorch.dtypeNoneData type for model weights
variantstrNoneModel variant (e.g., fp16, fp32)
use_safetensorsboolNonePrefer safetensors format
cache_dirstrNoneCustom cache directory
local_files_onlyboolFalseOnly use local files
force_downloadboolFalseForce re-download

Source: src/diffusers/pipelines/pipeline_utils.py

Custom Pipelines

Loading Custom Pipelines

Diffusers supports loading custom pipelines through the custom_pipeline parameter. This allows users to extend the library with community-contributed or self-developed pipeline implementations.

from diffusers import DiffusionPipeline

pipeline = DiffusionPipeline.from_pretrained(
    "runwayml/stable-diffusion-v1-5",
    custom_pipeline="hf-internal-testing/diffusers-dummy-pipeline",
    trust_remote_code=True
)

Custom pipelines can be loaded from:

  • Hugging Face Hub: A repository ID containing a pipeline.py file
  • GitHub: A community pipeline script name (loaded from examples/community/)
  • Local directory: A directory containing a pipeline.py file

Source: src/diffusers/pipelines/pipeline_utils.py

Community Pipelines

Community pipelines are hosted in the examples/community/ directory and provide extended functionality not available in core pipelines. These include ControlNet integrations, IP-Adapter implementations, and specialized generation techniques.

Community pipelines are loaded by specifying the pipeline script name (without the .py extension) as the custom_pipeline argument:

pipeline = DiffusionPipeline.from_pretrained(
    "runwayml/stable-diffusion-v1-5",
    custom_pipeline="clip_guided_stable_diffusion"
)

Source: examples/community/README.md

Modular Pipelines

Introduced in Diffusers v0.37.0, Modular Pipelines provide a compositional approach to building diffusion pipelines. Instead of monolithic pipeline classes, Modular Pipelines assemble reusable building blocks defined in modular_model_index.json files.

How Modular Pipelines Work

graph LR
    A[modular_model_index.json] --> B[ModularPipeline]
    B --> C[Transformer Block 1]
    B --> D[Transformer Block 2]
    B --> E[Scheduler Component]
    B --> F[VAE Component]
    
    C --> G[Attention Module]
    D --> G
    G --> H[Model Output]

The ModularPipeline class uses type_hint annotations to determine the correct model class for each component, allowing flexible composition of different architectures.

Source: src/diffusers/modular_pipelines/modular_pipeline_utils.py

Loading Modular Pipelines

from diffusers import ModularPipeline

pipeline = ModularPipeline.from_pretrained(
    "path/to/modular/model",
    torch_dtype=torch.float16
)

When loading, the pipeline:

  1. Reads modular_model_index.json to identify components
  2. Resolves type_hint annotations to determine model classes
  3. Loads each component using appropriate from_pretrained or from_single_file methods

Source: src/diffusers/modular_pipelines/modular_pipeline_utils.py

Pipeline Execution Flow

Standard Inference Flow

sequenceDiagram
    participant User
    participant Pipeline
    participant Scheduler
    participant UNet
    participant VAE
    
    User->>Pipeline: __call__(prompt)
    Pipeline->>Pipeline: Encode prompt with tokenizer & text encoder
    Pipeline->>Scheduler: Set timesteps
    Loop Denoising loop
        Pipeline->>UNet: forward(latent, timestep, encoder_hidden_states)
        UNet-->>Pipeline: noise_pred
        Pipeline->>Scheduler: step(noise_pred, timestep, latent)
        Scheduler-->>Pipeline: denoised_latent
    end
    Pipeline->>VAE: decode(denoised_latent)
    VAE-->>Pipeline: decoded_image
    Pipeline->>Pipeline: Safety check
    Pipeline-->>User: Image

Example: Text-to-Image Generation

from diffusers import StableDiffusionPipeline
import torch

pipeline = StableDiffusionPipeline.from_pretrained(
    "runwayml/stable-diffusion-v1-5",
    torch_dtype=torch.float16
)
pipeline.to("cuda")

image = pipeline(
    prompt="a photo of an astronaut riding a horse on mars",
    num_inference_steps=50,
    guidance_scale=7.5
).images[0]

Source: src/diffusers/pipelines/stable_diffusion/pipeline_stable_diffusion.py

Scheduler Integration

Schedulers define the noise schedule and control how the diffusion process progresses from noise to sample. Different schedulers offer trade-offs between speed and quality:

SchedulerSpeedQualityNotes
DDIMSchedulerFastHighGood for few-step generation
DDPMSchedulerSlowVery HighBest quality, many steps
DPMSolverMultistepSchedulerMediumHighFast convergence
EulerDiscreteSchedulerVariableHighConfigurable
UniPCMultistepSchedulerFastHighFew steps needed

Switching Schedulers

from diffusers import StableDiffusionPipeline, DPMSolverMultistepScheduler

pipeline = StableDiffusionPipeline.from_pretrained(
    "runwayml/stable-diffusion-v1-5"
)

# Replace the default scheduler
pipeline.scheduler = DPMSolverMultistepScheduler.from_config(
    pipeline.scheduler.config
)

For A1111/K-Diffusion to Diffusers scheduler mapping, refer to issue #4167 which documents the correspondence between common scheduler configurations.

Source: src/diffusers/pipelines/pipeline_utils.py

Advanced Usage

Single-File Model Loading

Some custom models or quantized models (including GGUF files) are distributed as single checkpoint files. Diffusers provides from_single_file methods for loading these:

from diffusers import UNet2DConditionModel

model = UNet2DConditionModel.from_single_file(
    "https://example.com/model.safetensors",
    torch_dtype=torch.float16
)

The GGUF quantizer, introduced in recent versions, handles quantized GGUF checkpoint files with special loading requirements.

Source: src/diffusers/pipelines/pipeline_loading_utils.py

Memory Optimization

For inference on limited-memory hardware, several optimization strategies are available:

from diffusers import StableDiffusionPipeline

pipeline = StableDiffusionPipeline.from_pretrained(
    "runwayml/stable-diffusion-v1-5",
    torch_dtype=torch.float16,
    use_safetensors=True
)

# Enable attention slicing for lower memory usage
pipeline.enable_attention_slicing()

# Enable sequential CPU offloading
pipeline.enable_sequential_cpu_offload()

# Use xformers memory-efficient attention
pipeline.enable_xformers_memory_efficient_attention()

Source: src/diffusers/pipelines/pipeline_utils.py

Common Failure Modes and Troubleshooting

Config Mismatch Issues

When loading custom models or third-party checkpoints, config mismatches are common. This is particularly relevant for community requests around universal model loading (issue #13683).

Symptoms:

  • ValueError during model initialization
  • Missing keys when loading state dict
  • Type mismatch errors

Solutions:

  1. Use type_hint parameter in modular pipelines to specify expected model class
  2. Provide custom configuration files alongside checkpoint files
  3. Use ignore_mismatched_sizes=True where applicable

Trust Remote Code

Custom pipelines require trust_remote_code=True to execute:

pipeline = DiffusionPipeline.from_pretrained(
    "owner/custom-pipeline",
    custom_pipeline="pipeline_name",
    trust_remote_code=True
)

Without this flag, loading pipelines with custom code will raise a ValueError.

Source: src/diffusers/pipelines/pipeline_utils.py

Flux Klein Configuration

Recent releases (v0.37.0+) have addressed specific issues with Flux Klein model loading, including proper handling of distilled and non-distilled versions. Users should ensure they are using the correct configuration variant when loading these models.

Source: Diffusers v0.37.1 Release Notes

See Also

Source: https://github.com/huggingface/diffusers / Human Manual

Modular Diffusers

Related topics: Pipelines Overview, Training Guide

Section Related Pages

Continue reading this section for the full explanation and source context.

Section Component Hierarchy

Continue reading this section for the full explanation and source context.

Section Core Components

Continue reading this section for the full explanation and source context.

Section Type Hints System

Continue reading this section for the full explanation and source context.

Related topics: Pipelines Overview, Training Guide

Modular Diffusers

Overview

Modular Diffusers is a framework introduced in Diffusers v0.37.0 that enables building diffusion pipelines by composing reusable, modular building blocks. Instead of writing entire pipelines from scratch, developers can mix and match components to create custom workflows tailored to specific use cases.

The core philosophy behind Modular Diffusers is composabilityβ€”allowing users to:

  • Reuse existing pipeline components across different models
  • Swap individual components (transformers, schedulers, guiders) without rewriting entire pipelines
  • Create custom pipelines by combining standardized building blocks
  • Share and distribute custom pipeline configurations through Hugging Face Hub

Source: docs/source/en/modular_diffusers/overview.md

Architecture

Component Hierarchy

Modular Diffusers organizes pipeline components into a hierarchical structure. The main components include:

graph TD
    A[ModularPipeline] --> B[ComponentsManager]
    A --> C[PipelineConfig]
    B --> D[Transformer]
    B --> E[TextEncoder/TextEncoder 2]
    B --> F[VAE/AutoencoderKL]
    B --> G[Scheduler]
    B --> H[Guider]
    B --> I[Tokenizer]
    D --> J[Flux Transformer]
    D --> K[UNet2DConditionModel]
    H --> L[FlowMatcherGuider]
    H --> M[DPMSolverMultistepGuider]

Core Components

Component TypeDescriptionBase Class
TransformerThe core diffusion model that performs the denoising processModelMixin
TextEncoderEncodes text prompts into embeddingsPreTrainedModel
VAE/AutoencoderKLEncodes images to latent space and decodes backModelMixin
SchedulerControls the diffusion sampling processSchedulerMixin
GuiderGuides the generation process (CFG, flow matching)Guider
TokenizerConverts text to token IDsPreTrainedTokenizer

Source: src/diffusers/modular_pipelines/components_manager.py

Type Hints System

Modular Diffusers uses type hints to resolve which class should be loaded for each component. This allows flexible component substitution while maintaining type safety.

The system supports the following type hint sources:

Source TypeResolution Method
Direct class referenceUses the specified class directly
AutoModelUses AutoModel.from_pretrained()
AutoModelForClassDiffusionUses class-specific auto model
Transformers modelsUses transformers.AutoModel

Source: src/diffusers/modular_pipelines/modular_pipeline_utils.py:1-100

Guider System

The Guider system abstracts guidance computation from individual pipelines, allowing different guidance strategies to be applied uniformly:

graph LR
    A[NoGuider] --> B[Base Guider Interface]
    C[FlowMatcherGuider] --> B
    D[DPMSolverMultistepGuider] --> B
    B --> E[ModularPipeline]
Guider TypePurposeConfiguration Key
NoGuiderNo guidance appliedDefault
FlowMatcherGuiderFlow matching guidance for Flux modelsguider config
DPMSolverMultistepGuiderDPM-Solver guidanceguider config

Source: src/diffusers/guiders/__init__.py

Loading Components

From Pretrained Models

Modular pipelines automatically resolve and load components from the Hugging Face Hub:

from diffusers.modular_pipelines import ModularPipeline

# Load a complete modular pipeline
pipeline = ModularPipeline.from_pretrained(
    "stabilityai/stable-diffusion-xl-base-1.0",
    torch_dtype=torch.float16
)

The loading process follows this sequence:

sequenceDiagram
    participant User
    participant ModularPipeline
    participant ComponentsManager
    participant AutoModel
    participant HuggingFace

    User->>ModularPipeline: from_pretrained(path)
    ModularPipeline->>HuggingFace: Download modular_model_index.json
    ModularPipeline->>ComponentsManager: Parse component configs
    ComponentsManager->>AutoModel: Resolve class from type_hint
    AutoModel->>HuggingFace: Download model weights
    ComponentsManager->>ComponentsManager: Instantiate components
    ModularPipeline->>User: Return assembled pipeline

Source: src/diffusers/pipelines/pipeline_loading_utils.py:1-60

With Type Hints

When loading components that lack sufficient configuration, specify type_hint to guide the loader:

from diffusers import AutoModel
from diffusers.modular_pipelines import ComponentsManager

manager = ComponentsManager()

# Specify type hint for component resolution
manager.add_component(
    name="transformer",
    pretrained_model_name_or_path="./my_custom_model",
    type_hint=AutoModel  # or specific class like FluxTransformer2DModel
)

Source: src/diffusers/modular_pipelines/modular_pipeline_utils.py:50-80

Single File Model Loading

Modular Diffusers supports loading models from single checkpoint files using from_single_file:

from diffusers.modular_pipelines import ModularPipeline

pipeline = ModularPipeline.from_single_file(
    pretrained_model_link_or_path="./checkpoint.safetensors",
    original_config="./config.yaml"
)

The system detects single-file models and routes them appropriately:

# From src/diffusers/loaders/single_file.py
is_diffusers_single_file_model = issubclass(class_obj, diffusers_module.FromOriginalModelMixin)

if is_diffusers_single_file_model:
    load_method = getattr(class_obj, "from_single_file")
    loaded_sub_model = load_method(
        pretrained_model_link_or_path_or_dict=checkpoint,
        original_config=original_config,
        config=cached_model_config_path,
        subfolder=name,
        torch_dtype=torch_dtype,
        **kwargs,
    )

Source: src/diffusers/loaders/single_file.py:1-60

Flux Modular Pipeline

The Flux model family uses specialized modular pipeline implementations that handle both full and distilled model variants.

FluxPipeline Structure

graph TD
    subgraph FluxPipeline
        A[Transformer] --> B[FluxTransformer2DModel]
        C[TextEncoder] --> D[CLIPTextModel/CLIPTextModelWithProjection]
        C --> E[T5TextEncoder]
        F[VAE] --> G[AutoencoderKL]
        H[Scheduler] --> I[FlowMatchEulerDiscreteScheduler]
        J[Guider] --> K[FlowMatcherGuider]
    end

Configuration for Distilled Models

Flux models may use distilled versions that affect guidance configuration. The modular pipeline automatically detects and handles this:

# Distilled model handling in modular_pipeline.py
if hasattr(config, "guidance_scale"):
    guider_config = {"guider": {"class_name": "FlowMatcherGuider"}}
else:
    guider_config = {"guider": {"class_name": "NoGuider"}}

Source: src/diffusers/modular_pipelines/flux/modular_pipeline.py

Configuration Schema

Modular Model Index JSON

The modular_model_index.json file defines the pipeline configuration:

{
  "_class_name": "ModularPipeline",
  "components": {
    "transformer": {
      "type_hint": "FluxTransformer2DModel",
      "pretrained_model_name_or_path": "black-forest-labs/FLUX.1-dev"
    },
    "text_encoder": {
      "type_hint": "CLIPTextModel",
      "pretrained_model_name_or_path": "black-forest-labs/FLUX.1-dev"
    },
    "text_encoder_2": {
      "type_hint": "T5EncoderModel",
      "pretrained_model_name_or_path": "black-forest-labs/FLUX.1-dev"
    }
  }
}

Component Configuration Options

ParameterDescriptionDefault
type_hintClass to use for loadingAuto-detected
pretrained_model_name_or_pathModel path or identifierRequired
subfolderSubdirectory within modelNone
variantModel variant (e.g., "fp16")None
torch_dtypeData type for weightsNone
use_safetensorsUse safe serializationAuto

Source: src/diffusers/modular_pipelines/components_manager.py:1-80

Common Patterns

Creating a Custom Pipeline

from diffusers import (
    ModularPipeline,
    FluxTransformer2DModel,
    FlowMatchEulerDiscreteScheduler,
    FlowMatcherGuider
)

# Define custom configuration
custom_config = {
    "transformer": {
        "type_hint": FluxTransformer2DModel,
        "pretrained_model_name_or_path": "custom/model"
    },
    "scheduler": {
        "type_hint": FlowMatchEulerDiscreteScheduler
    }
}

# Create pipeline with custom config
pipeline = ModularPipeline.from_config(custom_config)

Mixing Components from Different Pipelines

from diffusers import AutoModel

# Load base pipeline
pipeline = ModularPipeline.from_pretrained("base/pipeline")

# Replace transformer with a custom variant
pipeline.transformer = AutoModel.from_pretrained(
    "custom/transformer",
    type_hint=type(pipeline.transformer)
)

Using with LoRA Adapters

from diffusers import StableDiffusionXLPipeline
from diffusers.loaders import PeftAdapterMixin

# Load pipeline with LoRA support
pipeline = StableDiffusionXLPipeline.from_pretrained(
    "sdxl/pipeline",
    torch_dtype=torch.float16
)

# Load and apply LoRA adapter
pipeline.load_adapters("path/to/lora", adapter_name="my_adapter")
pipeline.set_adapters("my_adapter")

Source: src/diffusers/models/auto_model.py:40-80

GGUF Quantization Support

Modular Diffusers supports GGUF-quantized models for reduced memory footprint:

from diffusers import AutoModel
from diffusers.quantizers.gguf import GGUFQuantizer

# Configure GGUF quantization
quantization_config = GGUFQuantizer(
    compute_dtype="float16",
    pre_quantized=True,
    modules_to_not_convert=["lm_head"]
)

# Load quantized model
model = AutoModel.from_pretrained(
    "quantized/model.gguf",
    quantization_config=quantization_config,
    torch_dtype=torch.float16
)

GGUF Quantization Parameters

ParameterTypeDescription
compute_dtypetorch.dtypeComputation data type
pre_quantizedboolModel is pre-quantized
modules_to_not_convertlistModules to keep in FP32
use_keep_in_fp32_modulesboolKeep specified modules in FP32

Source: src/diffusers/quantizers/gguf/gguf_quantizer.py:1-50

Common Failure Modes

Type Hint Resolution Failures

When type_hint is missing and AutoModel cannot determine the correct class:

ValueError: Unable to load transformer without `type_hint`

Solution: Explicitly provide type_hint for the component.

from diffusers import AutoModel

manager.add_component(
    name="transformer",
    pretrained_model_name_or_path="./custom_model",
    type_hint=AutoModel  # or specific class
)

Source: src/diffusers/modular_pipelines/modular_pipeline_utils.py:60-70

Config Mismatch with Transformers Models

When loading models that mix Diffusers and Transformers components:

ValueError: `config_class` cannot be None. Please double-check the model.

Solution: Ensure the model's config includes proper model_type or _class_name fields.

Single File Loading with Missing Config

When loading from single files without an original config:

ValueError: The repository contains custom code which must be executed

Solution: Pass trust_remote_code=True or provide original_config path.

pipeline = ModularPipeline.from_single_file(
    "./checkpoint.safetensors",
    original_config="./config.yaml",
    trust_remote_code=True
)

Source: src/diffusers/loaders/single_file.py:30-60

Flux Klein LoRA Loading Issues

Community reports indicate issues with Flux Klein LoRA loading in some configurations. This was addressed in v0.37.1 with fixes for proper LoRA adapter handling with Flux models.

Reference: GitHub Issue #13313

Examples and Usage

Running Example Scripts

To use Modular Diffusers with example scripts:

git clone https://github.com/huggingface/diffusers
cd diffusers
pip install .

# Install example requirements
cd examples
pip install -r requirements.txt

Source: examples/README.md

Community Scripts

The community maintains additional modular pipeline examples:

ExampleDescriptionAuthor
IP-Adapter Negative NoiseAdvanced IP-Adapter controlÁlvaro Somoza
Asymmetric TilingSeamless image tilingalexisrolland
Prompt SchedulingDynamic prompt controlCommunity

Reference: examples/community/README_community_scripts.md

See Also

Source: https://github.com/huggingface/diffusers / Human Manual

Training Guide

Related topics: Loaders & Adapters, Optimization Guide

Section Related Pages

Continue reading this section for the full explanation and source context.

Section Key Training Objectives

Continue reading this section for the full explanation and source context.

Section Training System Components

Continue reading this section for the full explanation and source context.

Section Training Script Types

Continue reading this section for the full explanation and source context.

Related topics: Loaders & Adapters, Optimization Guide

Training Guide

Overview

The Hugging Face Diffusers library provides a comprehensive suite of training scripts and utilities for fine-tuning diffusion models. Training in Diffusers enables users to adapt pretrained models for custom tasks, create personalized outputs, and optimize models for specific domains or styles.

Training scripts in Diffusers are designed to be easy-to-tweak, beginner-friendly, and one-purpose-only. While they are not intended to provide state-of-the-art training methods for the newest models, they serve as excellent starting points for understanding diffusion model training and for adapting to specific use cases. Source: examples/README.md

Key Training Objectives

Diffusers training supports several fundamental objectives:

ObjectiveDescriptionCommon Use Cases
PersonalizationFine-tune models to generate content in a specific style or about specific subjectsDreamBooth, LoRA fine-tuning
ControlAdd conditioning mechanisms to guide generationControlNet, adapter training
EfficiencyDistill knowledge or compress models for faster inferenceLCM distillation, quantization
Domain AdaptationAdapt models to specific data distributionsCustom dataset fine-tuning

Architecture

Training System Components

graph TD
    A[Training Pipeline] --> B[Model Loading]
    A --> C[Data Loading]
    A --> D[Optimizer Setup]
    A --> E[Training Loop]
    
    B --> B1[pretrained_model_name_or_path]
    B --> B2[variant]
    B --> B3[revision]
    
    C --> C1[dataset_name]
    C --> C2[pretrained_vae]
    C --> C3[image processing]
    
    D --> D1[Learning Rate]
    D --> D2[AdamW]
    D --> D3[lr_scheduler]
    
    E --> E1[Gradient Computation]
    E --> E2[Optimization Step]
    E --> E3[Checkpointing]

Training Script Types

Diffusers organizes training scripts by task and complexity level:

DirectoryPurposeExample Scripts
examples/dreambooth/DreamBooth personalizationLoRA, Full fine-tuning
examples/text_to_image/Text-to-image trainingLoRA, custom datasets
examples/controlnet/ControlNet trainingControlNet, Flux ControlNet
examples/advanced_diffusion_training/Advanced techniquesFlux LoRA, Dreambooth advanced
examples/consistency_distillation/Model distillationLCM LoRA distillation
examples/research_projects/Community researchScheduled Huber loss

Common Training Patterns

Model Loading

All training scripts follow a consistent pattern for loading pretrained models:

# Load pretrained UNet/Transformer
unet = UNet2DConditionModel.from_pretrained(
    pretrained_model_name_or_path,
    subfolder="unet",
    variant=variant,
    revision=revision,
)

# Load pretrained VAE for numerical stability
vae = AutoencoderKL.from_pretrained(
    pretrained_model_name_or_path,
    subfolder="vae",
    variant=variant,
    revision=revision,
)

# Load pretrained VAE separately if specified
if pretrained_vae_model_name_or_path:
    vae = AutoencoderKL.from_pretrained(pretrained_vae_model_name_or_path)

Source: examples/controlnet/train_controlnet.py:100-140

Core Training Arguments

Training scripts share common command-line arguments:

ArgumentTypeDefaultDescription
--pretrained_model_name_or_pathstrrequiredModel identifier from HuggingFace Hub
--pretrained_vae_model_name_or_pathstrNonePath to pretrained VAE with better numerical stability
--variantstrNoneVariant of model files (e.g., fp16)
--revisionstrNoneGit revision of pretrained model
--dataset_namestrNoneDataset name from HuggingFace Hub
--output_dirstrrequiredDirectory for checkpoints and outputs
--cache_dirstrNoneCache directory for downloaded models
--seedintNoneRandom seed for reproducibility

Source: examples/text_to_image/train_text_to_image_lora.py

Dataset Configuration

Training scripts support multiple dataset formats and configurations:

# From HuggingFace Hub
--dataset_name="dataset-name"

# From local directory
--train_data_dir="/path/to/local/data"

# Dataset configuration (when applicable)
--dataset_config_name="config-name"

The dataset must follow a specific structure, particularly for image datasets that need to work with HuggingFace Datasets' ImageFolder format. Source: examples/research_projects/scheduled_huber_loss_training/text_to_image/train_text_to_image_lora_sdxl.py

Training Methods

LoRA (Low-Rank Adaptation)

LoRA training adds trainable low-rank matrices to existing model layers, significantly reducing the number of trainable parameters while maintaining quality.

# Enable LoRA training
lora_attn_procs = {}
for name, attn_processor in unet.attn_processors.items():
    # Initialize LoRA attention processors
    ...
unet.set_attn_processor(lora_attn_procs)
unet.train()

Key benefits:

  • Reduced memory footprint
  • Faster training times
  • Easy to merge and unmerge
  • Compatible with most model architectures

Source: examples/text_to_image/train_text_to_image_lora.py

DreamBooth

DreamBooth enables subject-driven personalization by fine-tuning a diffusion model on a few images of a specific subject with a unique identifier.

# Special identifier for the subject
instance_prompt = "a photo of a sks dog"  # "sks" is the unique identifier

# Class-specific preservation prompt
class_prompt = "a photo of a dog"

# Training with prior preservation loss
# Helps maintain the model's knowledge about the class

Source: examples/dreambooth/train_dreambooth_lora.py

ControlNet Training

ControlNet trains additional conditioning branches that can control diffusion model outputs based on various input modalities (canny edges, poses, depth maps, etc.).

# Initialize ControlNet
controlnet = ControlNetModel.from_unet(unet)

# Prepare ControlNet conditions
control_image = load_control_image(control_image_path)
control_image = controlnet_image_processor.preprocess(control_image)

# Training with ControlNet conditions
with torch.no_grad():
    # Forward pass with ControlNet conditioning
    ...

Source: examples/controlnet/train_controlnet.py

Consistency Distillation (LCM)

Latent Consistency Models (LCM) distill the iterative denoising process into fewer steps for fast inference.

# Teacher model for distillation
teacher_unet = UNet2DConditionModel.from_pretrained(
    pretrained_model_name_or_path,
    subfolder="unet",
)

# LCM-specific training parameters
--num_train_timesteps=1000
--GuidanceScale=0.0  # CFG disabled for LCM
--sigma_min=0.002
--sigma_max=14.61

Source: examples/consistency_distillation/train_lcm_distill_lora_sdxl.py

Advanced Training Configuration

Flux Training

Flux models use a different architecture requiring specific training configurations:

# Flux-specific model loading
transformer = FluxTransformer2DModel.from_pretrained(
    pretrained_model_name_or_path="black-forest-labs/FLUX.1-dev",
    subfolder="transformer",
)

# Flux training arguments
--flux=True
--max_sequence_length=512
--rank=4
--lambda_lora=1.0

Source: examples/advanced_diffusion_training/train_dreambooth_lora_flux_advanced.py

Training Utilities

The training_utils.py module provides core utilities for model training:

from diffusers.training_utils import (
    FreeKLScheduler,
    compute_snr,
    scale_lora,
    unet_lora_state_dict,
)

Key utility functions include:

  • FreeKLScheduler: Implements FreeBIT-style scheduling for knowledge distillation
  • compute_snr(): Computes Signal-to-Noise Ratio for advanced scheduling
  • scale_lora(): Scales LoRA weights for merging
  • unet_lora_state_dict(): Extracts LoRA state dict for saving

Source: src/diffusers/training_utils.py

Training Workflow

graph LR
    A[Setup Environment] --> B[Prepare Dataset]
    B --> C[Load Pretrained Models]
    C --> D[Initialize LoRA/Adapters]
    D --> E[Training Loop]
    E --> F{Epoch Complete?}
    F -->|Yes| G[Save Checkpoint]
    F -->|No| E
    G --> H{More Epochs?}
    H -->|Yes| E
    H -->|No| I[Export Final Model]
    I --> J[Merge LoRA (optional)]

Common Failure Modes and Troubleshooting

Model Loading Issues

IssueCauseSolution
Repository not foundInvalid model identifierVerify model name on HuggingFace Hub
Revision not foundNon-existent git revisionUse revision="main" or valid commit hash
Variant not foundMissing weight variantOmit --variant or check available variants
Config mismatchModel architecture changedUpdate model reference or use specific revision

Source: src/diffusers/pipelines/pipeline_loading_utils.py

Memory Issues

IssueSolution
OOM during trainingEnable gradient checkpointing, reduce batch size, use 8-bit Adam optimizer
Slow trainingUse mixed precision (--mixed_precision="fp16"), enable xformers
VAE memoryUse separate pretrained VAE with better numerical stability

LoRA Loading Problems

Recent releases (v0.37.x) have addressed several LoRA loading issues:

  • Flux Klein LoRA loading: Fixed in v0.37.1
  • ModularPipelines with AutoModel type hints: Fixed in v0.37.1

If encountering LoRA loading issues with custom models, ensure:

  1. The LoRA rank matches the target model architecture
  2. The type_hint is correctly specified for single-file models
  3. The model was saved with compatible LoRA weights

Source: src/diffusers/modular_pipelines/modular_pipeline_utils.py

Configuration Mismatch

When training with custom models or GGUF files:

  1. Verify model architecture matches the expected UNet/Transformer class
  2. Check that config files are present in the model directory
  3. For custom architectures, ensure proper registration with ModelMixin and ConfigMixin

Source: src/diffusers/models/auto_model.py

Best Practices

Environment Setup

# Clone and install from source
git clone https://github.com/huggingface/diffusers
cd diffusers
pip install .

# Install example-specific dependencies
cd examples/dreambooth
pip install -r requirements.txt

Source: examples/README.md

Reproducibility

Always specify a seed for reproducible training:

python train_dreambooth_lora.py \
    --seed=42 \
    --output_dir="./output" \
    ...

Checkpointing Strategy

  • Save checkpoints at regular intervals using --checkpointing_steps
  • Keep track of best-performing checkpoint using validation metrics
  • Use --resume_from_checkpoint to resume interrupted training

Installation and Dependencies

Training scripts require specific dependencies. To ensure compatibility:

  1. Install from source for the latest training features
  2. Check requirements.txt in the specific example directory
  3. Verify PyTorch version is compatible with your GPU drivers
  4. For JAX training, ensure Flax is installed

Example installation:

pip install torch --index-url https://download.pytorch.org/whl/cu118
pip install accelerate transformers datasets peft
pip install -e ".[torch]"

See Also

Source: https://github.com/huggingface/diffusers / Human Manual

Optimization Guide

Related topics: Quantization Guide, Loaders & Adapters

Section Related Pages

Continue reading this section for the full explanation and source context.

Related topics: Quantization Guide, Loaders & Adapters

Optimization Guide

This page covers performance optimization techniques for the Diffusers library, including memory management, attention backends, caching strategies, and quantization options. These techniques enable efficient inference and training of diffusion models on various hardware configurations.

Overview

Diffusers provides multiple optimization layers to improve inference speed and reduce memory consumption. The optimization system operates at several levels:

  1. Attention Level: Alternative attention implementations (xformers, flash attention, scaled dot product attention)
  2. Cache Level: Key-value caching for iterative generation
  3. Memory Level: CPU offloading, gradient checkpointing, and memory-efficient attention
  4. Quantization Level: GGUF and other quantization formats for reduced precision inference
graph TD
    A[Diffusion Pipeline] --> B[Attention Processors]
    A --> C[Caching System]
    A --> D[Quantization]
    B --> B1[xformers]
    B --> B2[Flash Attention]
    B --> B3[SDPA]
    C --> C1[FasterCache]
    C --> C2[TextKVCache]
    D --> D1[GGUF Quantization]

Source: src/diffusers/models/attention_processor.py:1-50

Source: https://github.com/huggingface/diffusers / Human Manual

Loaders & Adapters

Related topics: Quantization Guide, System Architecture

Section Related Pages

Continue reading this section for the full explanation and source context.

Section Core Loading Components

Continue reading this section for the full explanation and source context.

Section Model Type Detection

Continue reading this section for the full explanation and source context.

Section FromOriginalModelMixin

Continue reading this section for the full explanation and source context.

Related topics: Quantization Guide, System Architecture

Loaders & Adapters

This page documents the loading mechanisms and adapter systems in the Diffusers library. These components are responsible for importing pretrained models, checkpoints, and adapter weights into pipelines and model architectures.

Overview

The Diffusers library provides a unified loading architecture that supports multiple model formats, checkpoint types, and adapter mechanisms. The loaders module (src/diffusers/loaders/) centralizes all loading functionality, enabling pipelines to dynamically import and configure model components at runtime.

graph TD
    A[Pipeline Loading Request] --> B{Model Type Detection}
    B -->|Standard HuggingFace| C[from_pretrained]
    B -->|Single File Checkpoint| D[from_single_file]
    B -->|LoRA Adapter| E[load_lora_weights]
    B -->|Textual Inversion| F[load_textual_inversion]
    B -->|IP Adapter| G[load_ip_adapter]
    B -->|PEFT Format| H[load_peft_weights]
    
    C --> I[ModelMixin / PreTrainedModel]
    D --> J[FromOriginalModelMixin]
    E --> K[StableDiffusionLoraLoaderMixin]
    F --> L[TextualInversionLoaderMixin]
    G --> M[IPAdapterMixin]
    H --> N[PeftMixin]
    
    I --> O[Loaded Model / Pipeline]
    J --> O
    K --> O
    L --> O
    M --> O
    N --> O

Loading Architecture

Core Loading Components

The loading system is built on several key abstractions:

ComponentFilePurpose
FromOriginalModelMixinsingle_file_model.pyBase mixin for loading checkpoints from original model formats
StableDiffusionLoraLoaderMixinlora_base.pyLoRA weight loading and fusion for Stable Diffusion models
LoraLoaderMixinlora_pipeline.pyGeneric LoRA loading support for pipeline components
PeftMixinpeft.pyPEFT-format adapter loading (LoRA, IAΒ³, LoHa, etc.)
TextualInversionLoaderMixintextual_inversion.pyTextual inversion embedding loading
IPAdapterMixinip_adapter.pyImage Prompt adapter loading
SingleFileLoadersingle_file.pyUtilities for single-file checkpoint loading

Source: src/diffusers/loaders/__init__.py

Model Type Detection

During loading, the system detects model types to determine the appropriate loading strategy:

is_transformers_model = (
    is_transformers_available()
    and issubclass(class_obj, PreTrainedModel)
    and transformers_version >= version.parse("4.20.0")
)
is_diffusers_model = issubclass(class_obj, diffusers_module.ModelMixin)
is_diffusers_single_file_model = issubclass(class_obj, diffusers_module.FromOriginalModelMixin)

Source: src/diffusers/loaders/single_file.py:1-100

Single File Loading

Single file loading enables the import of pretrained checkpoints in formats other than the native Diffusers format. This is essential for loading models from other ecosystems or custom checkpoints.

FromOriginalModelMixin

Models implementing FromOriginalModelMixin support loading from original checkpoint formats:

if is_diffusers_single_file_model:
    load_method = getattr(class_obj, "from_single_file")
    
    loaded_sub_model = load_method(
        pretrained_model_link_or_path_or_dict=checkpoint,
        original_config=original_config,
        config=cached_model_config_path,
        subfolder=name,
        torch_dtype=torch_dtype,
        local_files_only=local_files_only,
        disable_mmap=disable_mmap,
        **kwargs,
    )

Source: src/diffusers/loaders/single_file.py

Supported Single File Formats

The single file loader supports multiple checkpoint formats:

FormatDescriptionNotes
.safetensorsSafe tensors formatMemory-efficient, secure
.bin / .ptPyTorch pickle formatLegacy compatibility
.ckptGeneric checkpointCommon for Stable Diffusion

Single File Loading Parameters

ParameterTypeDescription
pretrained_model_link_or_path_or_dict`str \dict`Path or URL to checkpoint, or state dict
original_config`str \dict \None`Original model configuration
config`str \None`Diffusers config path
subfolderstrSubfolder path within checkpoint
torch_dtypetorch.dtypeTarget data type
local_files_onlyboolOnly load from local cache
disable_mmapboolDisable memory-mapped loading

LoRA (Low-Rank Adaptation)

LoRA enables efficient fine-tuning by adding small trainable matrices to existing model weights without modifying the base model.

LoRA Loading Architecture

graph LR
    A[LoRA Checkpoint] --> B{LoraLoaderMixin}
    B --> C[State Dict Extraction]
    C --> D[Target Module Mapping]
    D --> E[Weight Fusion]
    E --> F[Adapted Model]

Loading LoRA Weights

The StableDiffusionLoraLoaderMixin provides the load_lora_weights method:

def load_lora_weights(cls, pretrained_model_name_or_path, adapter_name=None, **kwargs):
    """
    Load LoRA weights into pipeline components.
    
    Args:
        pretrained_model_name_or_path: Path or HuggingFace model ID
        adapter_name: Optional name for the adapter (for multiple LoRAs)
    """

Source: src/diffusers/loaders/lora_base.py

LoRA Pipeline Integration

The LoraLoaderMixin extends pipeline support for LoRA adapters:

class LoraLoaderMixin:
    """Mixin class for LoRA loading in diffusion pipelines."""
    
    def load_lora_weights(self, pretrained_model_name_or_path, **kwargs):
        """Load and fuse LoRA weights into pipeline components."""
        
    def unload_lora_weights(self):
        """Remove LoRA weights and restore original weights."""
        
    def set_adapters(self, adapter_names, weights=None):
        """Set active adapters with optional weighting."""

Source: src/diffusers/loaders/lora_pipeline.py

Multiple LoRA Support

Diffusers supports loading multiple LoRA adapters simultaneously:

MethodDescription
load_lora_weights()Load with optional adapter name
set_adapters()Activate specific adapters
fuse_lora()Fuse adapters with custom weights
unfuse_lora()Unfuse previously fused adapters

Flux Klein LoRA Loading

Note: Diffusers v0.37.1 included fixes specifically for Flux Klein LoRA loading, addressing issues with type hints and model compatibility.
Source: Release v0.37.1 - Fix Flux Klein LoRA loading #13313

PEFT Integration

The PeftMixin enables loading adapters in the PEFT (Parameter-Efficient Fine-Tuning) format:

class PeftMixin:
    """Mixin for loading PEFT-format adapters."""
    
    def load_peft_weights(
        self,
        pretrained_model_name_or_path,
        adapter_name: str = "default",
        layer_selection: Optional[List[int]] = None,
        scale_weight: Optional[float] = None,
    ):
        """Load PEFT-format adapter weights."""

Source: src/diffusers/loaders/peft.py

Supported PEFT Adapter Types

Adapter TypeDescription
LORALow-Rank Adaptation
IA3Infused Adapter by Inhibiting and Amplifying Inner Layers
LoHaLow-Rank Hadamard Product
AdaLoRAAdaptive LoRA
DoRAWeight-Decomposed Linear Adaptation

Textual Inversion

Textual Inversion enables customizing the model's vocabulary through learned embeddings without modifying the base model.

Loading Textual Inversion Embeddings

class TextualInversionLoaderMixin:
    """Mixin for textual inversion embedding loading."""
    
    def load_textual_inversion(
        self,
        pretrained_model_name_or_path,
        token: Optional[str] = None,
        file_extension: str = "safetensors",
        **kwargs
    ):
        """
        Load textual inversion embeddings.
        
        Args:
            pretrained_model_name_or_path: Path or model ID
            token: Optional token name for the embedding
            file_extension: File format for embeddings
        """

Source: src/diffusers/loaders/textual_inversion.py

Textual Inversion File Formats

FormatExtensionNotes
SafeTensors.safetensorsRecommended, secure
PyTorch.bin, .ptLegacy format
Diffusers.json + vectorsNative format

IP Adapter

IP Adapter enables image-based conditioning for generation, allowing reference images to guide the generation process.

IP Adapter Loading

class IPAdapterMixin:
    """Mixin for IP-Adapter loading."""
    
    def load_ip_adapter(
        self,
        model_id_or_path: Union[str, List[str]],
        subfolder: Union[str, List[str], None] = None,
        weight_name: Union[str, List[str], None] = None,
        image_encoder_folder: Union[str, List[str], None] = "image_encoder",
        **kwargs
    ):
        """Load IP-Adapter weights and image encoders."""

Source: src/diffusers/loaders/ip_adapter.py

IP Adapter Components

ComponentDescription
Image EncoderProcesses reference images
Image ProjectionMaps encoded features to cross-attention space
Adapter WeightsFine-tuned weights for image conditioning

Pipeline Loading Utilities

Loading Process Flow

graph TD
    A[Pipeline.from_pretrained] --> B[Load model_index.json]
    B --> C{Component Type Detection}
    C -->|Diffusers Model| D[ModelMixin.from_config]
    C -->|Transformers Model| E[PreTrainedModel.from_pretrained]
    C -->|Scheduler| F[SchedulerMixin.from_config]
    C -->|Tokenizer| G[AutoTokenizer.from_pretrained]
    
    D --> H[Load config.yaml]
    E --> I[Load config.json]
    H --> J[Create model on meta device]
    I --> J
    
    J --> K[Load weights with accelerate]
    K --> L[Offload if needed]
    L --> M[Pipeline Ready]

Loading with Quantization

The pipeline loading system integrates with quantization configurations:

if (
    quantization_config is not None
    and isinstance(quantization_config, PipelineQuantizationConfig)
    and issubclass(class_obj, torch.nn.Module)
):
    model_quant_config = quantization_config._resolve_quant_config(
        is_diffusers=is_diffusers_model, module_name=name
    )
    if model_quant_config is not None:
        loading_kwargs["quantization_config"] = model_quant_config

Source: src/diffusers/pipelines/pipeline_loading_utils.py

Modular Pipeline Loading

Modular Pipelines (introduced in v0.37.0) provide a composable approach to pipeline construction using reusable blocks.

Component Specification

Modular Pipelines use ComponentSpec to define loading parameters:

@dataclass
class ComponentSpec:
    name: str
    type_hint: tuple[str, str]  # (library, class_name)
    pretrained_model_name_or_path: Optional[str]
    subfolder: Optional[str]
    variant: Optional[str]
    revision: Optional[str]

Source: src/diffusers/modular_pipelines/modular_pipeline.py

Loading with AutoModel Type Hints

Note: Diffusers v0.37.1 fixed loading issues with ModularPipelines that use AutoModel type hints in their modular_model_index.json.
Source: Release v0.37.1 - Fix for loading ModularPipelines with AutoModel type hints #13271

The loading process attempts AutoModel.from_pretrained when type_hint is None:

if self.type_hint is None:
    try:
        component = AutoModel.from_pretrained(
            pretrained_model_name_or_path, **load_kwargs, **kwargs
        )
    except Exception as e:
        raise ValueError(f"Unable to load {self.name} without `type_hint`: {e}")
    self.type_hint = component.__class__

Source: src/diffusers/modular_pipelines/modular_pipeline_utils.py

Common Usage Patterns

Loading a Standard Pipeline

from diffusers import StableDiffusionPipeline

pipeline = StableDiffusionPipeline.from_pretrained(
    "runwayml/stable-diffusion-v1-5",
    torch_dtype=torch.float16,
    use_safetensors=True,
)

Loading with LoRA

from diffusers import StableDiffusionXLPipeline

pipeline = StableDiffusionXLPipeline.from_pretrained(
    "stabilityai/stable-diffusion-xl-base-1.0"
)

pipeline.load_lora_weights("path/to/lora_weights")

# Generate with LoRA
image = pipeline(prompt).images[0]

Loading Multiple Adapters

from diffusers import StableDiffusionPipeline

pipeline = StableDiffusionPipeline.from_pretrained(
    "runwayml/stable-diffusion-v1-5"
)

# Load multiple LoRA adapters
pipeline.load_lora_weights("adapter_1", adapter_name="style_1")
pipeline.load_lora_weights("adapter_2", adapter_name="style_2")

# Use with different weights
pipeline.set_adapters(["style_1"], weights=[1.0])

Loading Textual Inversion

pipeline = StableDiffusionPipeline.from_pretrained(
    "runwayml/stable-diffusion-v1-5"
)

pipeline.load_textual_inversion(
    "path/to/textual_inversion",
    token="my-concept"
)

image = pipeline("a photo of my-concept").images[0]

Configuration Options

Loading Parameters

ParameterTypeDefaultDescription
cache_dirstr~/.cache/huggingface/Cache directory for downloaded models
torch_dtypetorch.dtypeNoneOverride default dtype
use_safetensorsboolTruePrefer .safetensors format
variantstrNoneModel variant (e.g., "fp16")
revisionstrNoneGit revision to load
use_flash_attention_2boolFalseEnable Flash Attention 2
device_map`str \dict`NoneDevice mapping strategy
max_memorydictNoneMemory limits per device
offload_folderstrNoneFolder for offloaded weights
local_files_onlyboolFalseOnly use local files

LoRA-Specific Parameters

ParameterTypeDescription
adapter_namestrName for the loaded adapter
scale_weightfloatScaling factor for LoRA weights
layer_selectionList[int]Apply only to specific layers

Common Issues and Troubleshooting

Single File Loading Failures

Issue: Custom models or GGUF files fail to load

Community discussion: Issue #13683 - Universal method or class to load any model locally
Many custom models fail to load due to limited .from_single_file availability across model classes.

Solutions:

  1. Verify the model class implements FromOriginalModelMixin
  2. Provide an original config file when available
  3. Consider converting to standard Diffusers format

Type Hint Requirements

When using Modular Pipelines:

  • Ensure modular_model_index.json includes proper type_hint fields
  • For unknown types, provide type_hint explicitly or ensure AutoModel can resolve the class

Version Compatibility

FeatureMinimum Diffusers Version
Modular Pipelines0.37.0
Flux Klein LoRA fixes0.37.1
PEFT integration0.33.0+
IP Adapter0.31.0+

Architecture Principles

According to the Diffusers philosophy (PHILOSOPHY.md):

  1. Extensibility: Loaders should be designed to be easily extendable to future changes
  2. Composability: Adapter systems should support mixing multiple techniques
  3. Backward Compatibility: Loading mechanisms maintain compatibility across versions
  4. Clear Error Messages: Loading failures provide actionable error information

See Also

Source: https://github.com/huggingface/diffusers / Human Manual

Quantization Guide

Related topics: Optimization Guide, Loaders & Adapters

Section Related Pages

Continue reading this section for the full explanation and source context.

Section Quantization System Components

Continue reading this section for the full explanation and source context.

Section Quantization Flow

Continue reading this section for the full explanation and source context.

Section GGUF Quantization

Continue reading this section for the full explanation and source context.

Related topics: Optimization Guide, Loaders & Adapters

Quantization Guide

This page provides comprehensive documentation on quantization support in the Diffusers library. Quantization reduces model memory footprint and computational requirements by representing model weights in lower precision formats, enabling deployment of large diffusion models on resource-constrained hardware.

Overview

The Diffusers library implements a modular quantization framework that supports multiple quantization backends. This architecture allows users to load quantized models from the Hugging Face Hub or quantize models on-the-fly during loading. The quantization system is designed to be backend-agnostic while providing backend-specific optimizations.

Quantization in Diffusers serves two primary purposes:

  1. Memory Reduction: Reduce VRAM requirements for loading and running diffusion models
  2. Runtime Optimization: Accelerate inference through optimized low-precision computations

The library currently supports four major quantization backends: GGUF, BitsAndBytes, Quanto, and TorchAO. Each backend offers different trade-offs between compression ratio, inference speed, and quality preservation.

Architecture

Quantization System Components

The quantization framework follows a modular architecture with a base class hierarchy and backend-specific implementations:

graph TD
    A[DiffusionPipeline] --> B[PipelineQuantizationConfig]
    B --> C[DiffusersQuantizer Base Class]
    C --> D[GGUFQuantizer]
    C --> E[BitsAndBytesQuantizer]
    C --> F[QuantoQuantizer]
    C --> G[TorchAOQuantizer]
    
    H[Model Loading] --> I[ModelMixin]
    I --> C
    J[Single File Loading] --> K[FromOriginalModelMixin]
    K --> C

Quantization Flow

sequenceDiagram
    participant User
    participant Pipeline
    participant QuantConfig
    participant Quantizer
    participant Model
    
    User->>Pipeline: from_pretrained(quantization_config)
    Pipeline->>QuantConfig: Validate quantization config
    QuantConfig->>Quantizer: Create backend-specific quantizer
    Pipeline->>Model: Load with quantizer
    Model->>Quantizer: Apply quantization to weights
    Quantizer-->>Model: Quantized model ready
    Model-->>Pipeline: Pipeline ready for inference

Supported Quantization Backends

GGUF Quantization

GGUF (GPT-Generated Unified Format) is designed for loading pre-quantized models, particularly those from the llama.cpp ecosystem. The GGUF quantizer handles models that have been quantized externally and stored in the GGUF format.

Key Characteristics:

  • Supports various quantization types (Q4_K, Q5_K, Q8_0, etc.)
  • Memory-mapped file loading for efficient memory usage
  • Compatible with models converted from original formats

Source: src/diffusers/quantizers/gguf/gguf_quantizer.py

The GGUF quantizer class initializes with the following parameters:

ParameterTypeDescription
quantization_configGGUFQuantizationConfigConfiguration for GGUF quantization
modules_to_not_convertList[str]Module names to exclude from quantization
compute_dtypetorch.dtypeComputation data type
pre_quantizedboolWhether the model is pre-quantized

Important Dependencies:

GGUF loading requires accelerate>=0.26.0 and the gguf package. These are validated during environment checks in validate_environment().

def validate_environment(self, *args, **kwargs):
    if not is_accelerate_available() or is_accelerate_version("<", "0.26.0"):
        raise ImportError(
            "Loading GGUF Parameters requires `accelerate` installed in your environment: "
            "`pip install 'accelerate>=0.26.0'`"
        )

Source: src/diffusers/quantizers/gguf/gguf_quantizer.py:30-37

BitsAndBytes Quantization

BitsAndBytes (bnb) provides on-the-fly quantization during model loading. It supports 4-bit and 8-bit quantization modes with optional NF4 (Normal Float 4) data type.

Key Characteristics:

  • On-the-fly quantization during loading
  • 4-bit (NF4) and 8-bit (Int8) modes
  • Supports keep_in_fp32_modules for sensitive layers
  • Compatible with QLoRA fine-tuning workflows

Source: src/diffusers/quantizers/bitsandbytes/bnb_quantizer.py

Quanto Quantization

Quanto provides a PyTorch-native quantization backend with support for various quantization schemes including int8 and int4.

Key Characteristics:

  • Pure PyTorch implementation
  • Supports int2, int4, int8 quantization
  • Good compatibility with existing PyTorch workflows
  • No additional C++ dependencies required

Source: src/diffusers/quantizers/quanto/quanto_quantizer.py

TorchAO Quantization

TorchAO is the PyTorch native quantization backend that provides hardware-optimized quantization kernels.

Key Characteristics:

  • PyTorch native backend
  • Optimized kernel support
  • Integration with torch.compile for additional speedups
  • Supports both dynamic and static quantization

Source: src/diffusers/quantizers/torchao/torchao_quantizer.py

Configuration

PipelineQuantizationConfig

The PipelineQuantizationConfig class provides a unified interface for configuring quantization across different backends. It handles backend-specific configuration resolution and validation.

Source: src/diffusers/quantizers/pipe_quant_config.py

Quantization Configuration Parameters

ParameterTypeBackendDescription
quantization_methodstrallQuantization backend: gguf, bitsandbytes, quanto, torchao
load_in_4bitboolbnbLoad model weights in 4-bit precision
load_in_8bitboolbnbLoad model weights in 8-bit precision
bnb_4bit_compute_dtypetorch.dtypebnbComputation dtype for BitsAndBytes
bnb_4bit_quant_typestrbnbQuantization type (fp4, nf4)
bnb_4bit_use_double_quantboolbnbEnable double quantization
gguf_formatstrggufGGUF file format version
compute_dtypetorch.dtypeggufTarget compute data type
modules_to_not_convertList[str]ggufModules to exclude from quantization
torch_dtypetorch.dtypeallDefault torch data type

Loading Quantized Models

#### Loading GGUF Models

from diffusers import DiffusionPipeline
import torch

pipeline = DiffusionPipeline.from_pretrained(
    "model/path",
    quantization_config={
        "quantization_method": "gguf",
        "gguf_format": "q4_k",  # or q5_k, q8_0, etc.
    },
    torch_dtype=torch.float16,
    device_map="auto"
)

#### Loading with BitsAndBytes

from diffusers import DiffusionPipeline
from transformers import BitsAndBytesConfig

quantization_config = BitsAndBytesConfig(
    load_in_4bit=True,
    bnb_4bit_compute_dtype=torch.float16,
    bnb_4bit_quant_type="nf4",
    bnb_4bit_use_double_quant=True
)

pipeline = DiffusionPipeline.from_pretrained(
    "model/path",
    quantization_config=quantization_config
)

Source: src/diffusers/pipelines/pipeline_loading_utils.py

Pipeline Integration

Model Loading with Quantization

When a pipeline loads with quantization configuration, the PipelineLoadingUtils class handles the quantization process. The loading flow follows these steps:

graph LR
    A[from_pretrained] --> B{Is Quantized?}
    B -->|Yes| C[Get Quantizer]
    B -->|No| D[Load Normal]
    C --> E{Quantizer Type?}
    E -->|GGUF| F[Use from_single_file]
    E -->|Other| G[Use from_config]
    F --> H[Apply Quantization]
    G --> H
    H --> I[Return Quantized Model]
    D --> I

Source: src/diffusers/loaders/single_file.py

The loading process determines the appropriate loading method based on the model type:

is_diffusers_single_file_model = issubclass(class_obj, diffusers_module.FromOriginalModelMixin)
is_diffusers_model = issubclass(class_obj, diffusers_module.ModelMixin)

if is_diffusers_single_file_model:
    load_method = getattr(class_obj, "from_single_file")
    # ...
    loaded_sub_model = load_method(
        pretrained_model_link_or_path_or_dict=checkpoint,
        original_config=original_config,
        config=cached_model_config_path,
        subfolder=name,
        torch_dtype=torch_dtype,
        local_files_only=local_files_only,
        disable_mmap=disable_mmap,
        **kwargs,
    )

Source: src/diffusers/loaders/single_file.py:40-55

Single File Loading

For GGUF and other single-file model formats, the from_single_file method handles the complete loading process. This is particularly important for quantized models that bundle all weights in a single file.

Source: src/diffusers/loaders/single_file.py

Quantization Resolution in Pipelines

The pipeline quantization configuration is resolved at load time:

if (
    quantization_config is not None
    and isinstance(quantization_config, PipelineQuantizationConfig)
    and issubclass(class_obj, torch.nn.Module)
):
    model_quant_config = quantization_config._resolve_quant_config(
        is_diffusers=is_diffusers_model, module_name=name
    )
    if model_quant_config is not None:
        loading_kwargs["quantization_config"] = model_quant_config

Source: src/diffusers/pipelines/pipeline_loading_utils.py:120-129

Common Usage Patterns

Memory-Constrained Inference

For running large models on GPUs with limited VRAM:

from diffusers import DiffusionPipeline
import torch

pipeline = DiffusionPipeline.from_pretrained(
    "stabilityai/stable-diffusion-xl-base-1.0",
    quantization_config={
        "quantization_method": "bitsandbytes",
        "load_in_4bit": True,
    },
    torch_dtype=torch.float16,
    device_map="auto"
)

# Generate image
result = pipeline(prompt="a beautiful landscape")

Loading Pre-Quantized GGUF Models

from diffusers import DiffusionPipeline
import torch

# Load a GGUF quantized model
pipeline = DiffusionPipeline.from_pretrained(
    "quantized/model/path",
    quantization_config={
        "quantization_method": "gguf",
        "gguf_format": "q4_k_m",
    },
    torch_dtype=torch.float16,
    device_map="auto"
)

Mixed Quantization

Apply different quantization levels to different components:

from diffusers import DiffusionPipeline
from transformers import BitsAndBytesConfig

# Quantize UNet with 4-bit, keep VAE in full precision
pipeline = DiffusionPipeline.from_pretrained(
    "model/path",
    unet_quantization_config=BitsAndBytesConfig(load_in_4bit=True),
    vae_quantization_config=None,  # Full precision VAE
)

Troubleshooting

Common Issues and Solutions

IssueCauseSolution
ImportError for accelerateMissing dependency for GGUFpip install 'accelerate>=0.26.0'
Memory errors during loadingModel too large for GPUUse 4-bit quantization or CPU offloading
Slow inference with quantized modelQuantization not optimizedEnable torch.compile or use faster backends
Config mismatch errorsIncompatible quantization configVerify backend-specific requirements
MMAP errorsMemory-mapped file issuesSet disable_mmap=True in loading config

Environment Requirements

Different quantization backends have specific dependencies:

BackendMinimum Dependencies
GGUFaccelerate>=0.26.0, gguf
BitsAndBytesbitsandbytes>=0.41.0
Quantoquanto
TorchAOPyTorch 2.0+

Version Compatibility

The quantization system was enhanced in recent releases:

  • v0.37.0+: Improved modular pipelines and quantization integration
  • v0.35.2+: Better transformers compatibility for quantized models
  • v0.33.0+: Enhanced memory optimizations and caching for quantized models

Source: README.md

Design Philosophy

The quantization system in Diffusers follows the library's core design principles:

  1. Modularity: Each quantizer is a self-contained class inheriting from DiffusersQuantizer
  2. Composability: Quantization configs can be applied at pipeline or individual component level
  3. Backward Compatibility: Default settings preserve maximum precision
  4. Extensibility: New backends can be added by implementing the base quantizer interface

Source: PHILOSOPHY.md

Models are designed to expose complexity similar to PyTorch's Module class, providing clear error messages when quantization configuration issues occur. The system maintains high precision defaults while allowing optimization when explicitly requested.

See Also

Source: https://github.com/huggingface/diffusers / Human Manual

Doramagic Pitfall Log

Source-linked risks stay visible on the manual page so the preview does not read like a recommendation.

high Installation risk requires verification

May increase setup, validation, or first-run risk for the user.

high Installation risk requires verification

May increase setup, validation, or first-run risk for the user.

high Security or permission risk requires verification

May increase setup, validation, or first-run risk for the user.

high Security or permission risk requires verification

May increase setup, validation, or first-run risk for the user.

Doramagic Pitfall Log

Found 24 structured pitfall item(s), including 4 high/blocking item(s). Top priority: Installation risk - Installation risk requires verification.

1. Installation risk: Installation risk requires verification

  • Severity: high
  • Finding: Project evidence flags a installation risk. Review the linked source before relying on this workflow.
  • User impact: May increase setup, validation, or first-run risk for the user.
  • Recommended check: Reproduce the official install and quickstart path in an isolated environment.
  • Evidence: community_evidence:github | cevd_a9d989818ab840c6985e6c0c41830e87 | https://github.com/huggingface/diffusers/issues/13401

2. Installation risk: Installation risk requires verification

  • Severity: high
  • Finding: Project evidence flags a installation risk. Review the linked source before relying on this workflow.
  • User impact: May increase setup, validation, or first-run risk for the user.
  • Recommended check: Reproduce the official install and quickstart path in an isolated environment.
  • Evidence: community_evidence:github | cevd_190402547a6a441bb4f046b278c04a7f | https://github.com/huggingface/diffusers/issues/13683

3. Security or permission risk: Security or permission risk requires verification

  • Severity: high
  • Finding: Project evidence flags a security or permission risk. Review the linked source before relying on this workflow.
  • User impact: May increase setup, validation, or first-run risk for the user.
  • Recommended check: Reproduce the official install and quickstart path in an isolated environment.
  • Evidence: community_evidence:github | cevd_fedc9c5b4dc2486aa7ed13053f2050af | https://github.com/huggingface/diffusers/issues/13772

4. Security or permission risk: Security or permission risk requires verification

  • Severity: high
  • Finding: Project evidence flags a security or permission risk. Review the linked source before relying on this workflow.
  • User impact: May increase setup, validation, or first-run risk for the user.
  • Recommended check: Reproduce the official install and quickstart path in an isolated environment.
  • Evidence: community_evidence:github | cevd_d70cffdb7188481fb8e1e7e5a84539bb | https://github.com/huggingface/diffusers/issues/13844

5. Installation risk: Installation risk requires verification

  • Severity: medium
  • Finding: Project evidence flags a installation risk. Review the linked source before relying on this workflow.
  • User impact: May increase setup, validation, or first-run risk for the user.
  • Recommended check: Reproduce the official install and quickstart path in an isolated environment.
  • Evidence: community_evidence:github | cevd_e2c183459b644dfe88a28ce288693dc1 | https://github.com/huggingface/diffusers/issues/13762

6. Configuration risk: Configuration risk requires verification

  • Severity: medium
  • Finding: Developers should check this configuration risk before relying on the project: Diffusers 0.35.0: Qwen Image pipelines, Flux Kontext, Wan 2.2, and more
  • User impact: Upgrade or migration may change expected behavior: Diffusers 0.35.0: Qwen Image pipelines, Flux Kontext, Wan 2.2, and more
  • Recommended check: Before packaging this project, run the relevant install/config/quickstart check for: Diffusers 0.35.0: Qwen Image pipelines, Flux Kontext, Wan 2.2, and more. Context: Observed when using python
  • Evidence: failure_mode_cluster:github_release | fmev_e8d17ffbe5fa1785fea2871516925453 | https://github.com/huggingface/diffusers/releases/tag/v0.35.0

7. Configuration risk: Configuration risk requires verification

  • Severity: medium
  • Finding: Developers should check this configuration risk before relying on the project: llada2 model/pipeline review
  • User impact: Developers may misconfigure credentials, environment, or host setup: llada2 model/pipeline review
  • Recommended check: Before packaging this project, run the relevant install/config/quickstart check for: llada2 model/pipeline review. Context: Observed when using python
  • Evidence: failure_mode_cluster:github_issue | fmev_b0fdcc0ebf367379b87fcad2dd642011 | https://github.com/huggingface/diffusers/issues/13598

8. Configuration risk: Configuration risk requires verification

  • Severity: medium
  • Finding: Developers should check this configuration risk before relying on the project: universal method or class to load any model locally
  • User impact: Developers may misconfigure credentials, environment, or host setup: universal method or class to load any model locally
  • Recommended check: Before packaging this project, run the relevant install/config/quickstart check for: universal method or class to load any model locally. Context: Observed when using python
  • Evidence: failure_mode_cluster:github_issue | fmev_8132f9310793351811bea343d379b680 | https://github.com/huggingface/diffusers/issues/13683

9. Capability evidence risk: Capability evidence risk requires verification

  • Severity: medium
  • Finding: README/documentation is current enough for a first validation pass.
  • User impact: May increase setup, validation, or first-run risk for the user.
  • Recommended check: Reproduce the official install and quickstart path in an isolated environment.
  • Evidence: capability.assumptions | github_repo:498011141 | https://github.com/huggingface/diffusers

10. Maintenance risk: Maintenance risk requires verification

  • Severity: medium
  • Finding: Developers should check this migration risk before relying on the project: Diffusers 0.36.0: Pipelines galore, new caching method, training scripts, and more πŸŽ„
  • User impact: Upgrade or migration may change expected behavior: Diffusers 0.36.0: Pipelines galore, new caching method, training scripts, and more πŸŽ„
  • Recommended check: Before packaging this project, run the relevant install/config/quickstart check for: Diffusers 0.36.0: Pipelines galore, new caching method, training scripts, and more πŸŽ„. Context: Observed when using python, cuda
  • Evidence: failure_mode_cluster:github_release | fmev_fa85fd2586df0265d3c51e0547f8f9a5 | https://github.com/huggingface/diffusers/releases/tag/v0.36.0

11. Maintenance risk: Maintenance risk requires verification

  • Severity: medium
  • Finding: Project evidence flags a maintenance risk. Review the linked source before relying on this workflow.
  • User impact: May increase setup, validation, or first-run risk for the user.
  • Recommended check: Reproduce the official install and quickstart path in an isolated environment.
  • Evidence: evidence.maintainer_signals | github_repo:498011141 | https://github.com/huggingface/diffusers

12. Security or permission risk: Security or permission risk requires verification

  • Severity: medium
  • Finding: no_demo
  • User impact: May increase setup, validation, or first-run risk for the user.
  • Recommended check: Reproduce the official install and quickstart path in an isolated environment.
  • Evidence: downstream_validation.risk_items | github_repo:498011141 | https://github.com/huggingface/diffusers

Source: Doramagic discovery, validation, and Project Pack records

Community Discussion Evidence

These external discussion links are review inputs, not standalone proof that the project is production-ready.

Sources 12

Count of project-level external discussion links exposed on this manual page.

Use Review before install

Open the linked issues or discussions before treating the pack as ready for your environment.

Community Discussion Evidence

Doramagic exposes project-level community discussion separately from official documentation. Review these links before using diffusers with real data or production workflows.

Source: Project Pack community evidence and pitfall evidence