Doramagic Project Pack Β· Human Manual
diffusers
Diffusers serves as a modular toolbox for pretrained diffusion models. According to the project philosophy, the library embraces the following design principles (Source: PHILOSOPHY.md):
Getting Started with Diffusers
Related topics: System Architecture, Pipelines Overview
Continue reading this section for the full explanation and source context.
Continue reading this section for the full explanation and source context.
Continue reading this section for the full explanation and source context.
Continue reading this section for the full explanation and source context.
Related Pages
Related topics: System Architecture, Pipelines Overview
Getting Started with Diffusers
Diffusers is a state-of-the-art library for diffusion models, providing researchers and practitioners with modular, flexible, and efficient tools for image, audio, and video generation. This page serves as a comprehensive guide for getting started with Diffusers, covering installation, core concepts, model loading, and common usage patterns.
Overview
Diffusers serves as a modular toolbox for pretrained diffusion models. According to the project philosophy, the library embraces the following design principles (Source: PHILOSOPHY.md):
- Reusability: Pipelines should be self-contained and reusable
- Composability: Smaller building blocks like
attention.py,resnet.py, andembeddings.pyshould be composable - Flexibility: Models should expose complexity and give clear error messages
- Performance: Models can be optimized without major code changes while maintaining backward compatibility
The library supports a wide range of tasks including text-to-image, image-to-image, inpainting, video generation, and more. Recent releases (v0.33.0 through v0.38.0) have introduced numerous new pipelines including Wan 2.1/2.2, Flux variants, LLaDA2, and specialized ControlNet implementations.
Installation
Basic Installation
To install the latest stable version of Diffusers:
pip install diffusers
For GPU acceleration (recommended):
pip install diffusers[torch]
Installing from Source
For the latest features and example scripts, install from source:
git clone https://github.com/huggingface/diffusers
cd diffusers
pip install .
Source: examples/README.md
Example-Specific Dependencies
Training scripts and community examples may require additional dependencies:
cd examples # Navigate to the specific example folder
pip install -r requirements.txt
[!IMPORTANT]
Example scripts frequently depend on the latest library version. Always install from source to ensure compatibility.
Core Concepts
Understanding Diffusers requires familiarity with three fundamental building blocks: Pipelines, Models, and Schedulers.
Architectural Overview
graph TD
A[User Input] --> B[DiffusionPipeline]
B --> C[Models]
B --> D[Schedulers]
B --> E[Tokenizers/Processors]
C --> F[UNet2D / Transformer2D]
C --> G[VAE]
D --> H[Noise Schedule]
F --> I[Latent Space]
G --> J[Generated Output]
style B fill:#e1f5fe
style C fill:#fff3e0
style D fill:#e8f5e8Pipelines
Pipelines are the high-level API that orchestrates the entire diffusion process. They combine models, schedulers, and optional components like tokenizers or control networks into a cohesive inference workflow.
Key pipeline characteristics (Source: src/diffusers/pipelines/pipeline_utils.py):
| Pipeline Type | Description | Typical Use Case |
|---|---|---|
DiffusionPipeline | Base pipeline class | Custom implementations |
StableDiffusionPipeline | SD 1.x text-to-image | General image generation |
StableDiffusionXLPipeline | SDXL optimized | High-quality image generation |
StableDiffusionControlNetPipeline | With ControlNet | Controlled generation |
AutoPipeline | Task-agnostic | Flexible pipeline selection |
Models
Diffusers models are PyTorch modules that inherit from ModelMixin and ConfigMixin. They are designed to be:
- Composable from smaller building blocks
- Configurable with clear parameter handling
- Optimizable for memory and compute efficiency
Source: PHILOSOPHY.md
Common model architectures include:
| Model | Description | Location |
|---|---|---|
UNet2DConditionModel | Conditioning UNet for text-to-image | src/diffusers/models/unets/ |
AutoencoderKL | VAE for latent operations | src/diffusers/models/autoencoders/ |
Transformer2DModel | Transformer-based diffusion | src/diffusers/models/transformers/ |
ControlNetModel | ControlNet conditioning | src/diffusers/models/controlnet/ |
Schedulers
Schedulers implement various diffusion sampling strategies. The library supports numerous scheduling algorithms:
| Scheduler | A1111 Equivalent | Characteristics |
|---|---|---|
DDPMScheduler | DDPM | High-quality, many steps |
DDIMScheduler | DDIM | Fast convergence |
DPMSolverMultistepScheduler | DPM++ 2M | Fast, good quality |
EulerDiscreteScheduler | Euler | Simple, fast |
EulerAncestralDiscreteScheduler | Euler a | Ancestral sampling |
UniPCMultistepScheduler | UniPC | Very fast convergence |
Source: github.com/huggingface/diffusers/issues/4167
Loading Models and Pipelines
The library provides multiple ways to load models and pipelines, addressing common community needs around universal model loading.
Using DiffusionPipeline (Recommended)
The DiffusionPipeline is the recommended entry point for loading pretrained models:
from diffusers import DiffusionPipeline
# Load from Hugging Face Hub
pipeline = DiffusionPipeline.from_pretrained(
"runwayml/stable-diffusion-v1-5",
torch_dtype=torch.float16,
use_safetensors=True
)
# Move to GPU
pipeline = pipeline.to("cuda")
Source: src/diffusers/pipelines/pipeline_loading_utils.py
Using AutoModel Classes
For loading individual model components, use the AutoModel classes:
from diffusers import AutoModel, AutoTokenizer
# Load a model from config automatically
model = AutoModel.from_pretrained(
"path/to/model",
torch_dtype=torch.float16,
variant="fp16"
)
The AutoModel class determines the appropriate model class from the configuration:
# Source: src/diffusers/models/auto_model.py
if "_class_name" in config:
class_name = config["_class_name"]
library = "diffusers"
elif "model_type" in config:
class_name = "AutoModel"
library = "transformers"
Source: src/diffusers/models/auto_model.py
Loading Single-File Checkpoints
For custom models stored in single checkpoint files (including GGUF formats in supported models):
from diffusers import SomeModelClass
# Load from a single checkpoint file
model = SomeModelClass.from_single_file(
"path/to/checkpoint.safetensors",
config="path/to/config.json" # Optional: provide config
)
[!NOTE]
Thefrom_single_filemethod is available on models that inherit fromFromOriginalModelMixin. Source: src/diffusers/loaders/single_file.py
The loading logic determines the appropriate method:
# Source: src/diffusers/modular_pipelines/modular_pipeline_utils.py
load_method = (
getattr(self.type_hint, "from_single_file")
if is_single_file
else getattr(self.type_hint, "from_pretrained")
)
Loading with Trust Remote Code
Some models require executing custom code from the repository:
pipeline = DiffusionPipeline.from_pretrained(
"some/model-with-custom-code",
trust_remote_code=True
)
When trust_remote_code=True is not set and custom code is detected, the library raises:
ValueError: The repository for {pretrained_model_name_or_path} contains custom code
which must be executed to correctly load the model.
Source: src/diffusers/utils/dynamic_modules_utils.py
Basic Usage Patterns
Text-to-Image Generation
import torch
from diffusers import DiffusionPipeline
pipeline = DiffusionPipeline.from_pretrained(
"runwayml/stable-diffusion-v1-5",
torch_dtype=torch.float16
)
pipeline = pipeline.to("cuda")
prompt = "a photo of an astronaut riding a horse on mars"
image = pipeline(prompt).images[0]
image.save("output.png")
Image-to-Image Generation
from diffusers import StableDiffusionImg2ImgPipeline
from diffusers.utils import load_image
pipe = StableDiffusionImg2ImgPipeline.from_pretrained(
"runwayml/stable-diffusion-v1-5",
torch_dtype=torch.float16
)
pipe = pipe.to("cuda")
init_image = load_image("path/to/input.jpg").resize((768, 768))
image = pipe(prompt="modern art style", image=init_image).images[0]
Inpainting with ControlNet
from diffusers import StableDiffusionControlNetPipeline, ControlNetModel
from diffusers.utils import load_image
import numpy as np
import cv2
# Load controlnet and pipeline
controlnet = ControlNetModel.from_pretrained(
"lllyasviel/sd-controlnet-canny",
torch_dtype=torch.float16
)
pipeline = StableDiffusionControlNetPipeline.from_pretrained(
"runwayml/stable-diffusion-v1-5",
controlnet=controlnet,
torch_dtype=torch.float16
)
# Prepare control image
prompt = "your prompt"
control_image = load_image("path/to/control.jpg")
image = pipeline(prompt, image=control_image).images[0]
Using Schedulers
Schedulers can be swapped for the same pipeline:
from diffusers import StableDiffusionPipeline, DPMSolverMultistepScheduler
pipeline = StableDiffusionPipeline.from_pretrained(
"runwayml/stable-diffusion-v1-5",
torch_dtype=torch.float16
)
# Replace default scheduler with DPM++ 2M Karras
pipeline.scheduler = DPMSolverMultistepScheduler.from_config(
pipeline.scheduler.config,
use_karras_sigmas=True,
algorithm_type="dpmsolver++"
)
Modular Pipelines
Introduced in v0.37.0, Modular Pipelines allow composing pipelines from reusable building blocks:
graph LR
A[Transformer] --> B[ModularPipeline]
C[VAE] --> B
D[Scheduler] --> B
E[Text Encoder] --> B
F[Input] --> B
B --> G[Output]Creating Modular Pipelines
Modular pipelines are defined with a modular_model_index.json that specifies component types and loading hints:
# Source: src/diffusers/modular_pipelines/modular_pipeline_utils.py
# Components can be loaded with or without type hints
if self.type_hint is None:
component = AutoModel.from_pretrained(pretrained_model_name_or_path, **load_kwargs, **kwargs)
else:
load_method = (
getattr(self.type_hint, "from_single_file")
if is_single_file
else getattr(self.type_hint, "from_pretrained")
)
component = load_method(pretrained_model_name_or_path, **load_kwargs, **kwargs)
Community Scripts
The community contributes additional pipeline implementations and utilities through community scripts:
| Example | Description | Code Example |
|---|---|---|
| IP-Adapter Negative Noise | Using negative noise with IP-Adapter for better control | Link |
| Asymmetric Tiling | Configure seamless image tiling for X and Y axes independently | Link |
| Prompt Scheduling Callback | Dynamic prompt modification during generation | Link |
Source: examples/community/README_community_scripts.md
Using Community Scripts
# Load a community pipeline
from diffusers import DiffusionPipeline
pipeline = DiffusionPipeline.from_pretrained(
"diffusers/community-pipeline",
variant="v1",
use_safetensors=True
)
[!IMPORTANT]
Community scripts are maintained by contributors. If a community script doesn't work as expected, please open an issue and ping the author.
Training Scripts
Diffusers provides training scripts for various tasks:
| Script | Location | Use Case |
|---|---|---|
train_uncond.py | examples/ | Unconditional image generation |
train_controlnet.py | examples/controlnet/ | ControlNet training |
train_dreambooth.py | examples/dreambooth/ | DreamBooth personalization |
train_lora.py | examples/lora/ | LoRA fine-tuning |
Source: examples/README.md
ControlNet Training Example
from diffusers import (
AutoencoderKL,
ControlNetModel,
DDPMScheduler,
StableDiffusionControlNetPipeline,
UniPCMultistepScheduler,
)
from diffusers.optimization import get_scheduler
# Initialize models
controlnet = ControlNetModel.from_pretrained(
"path/to/controlnet",
torch_dtype=torch.float16
)
pipeline = StableDiffusionControlNetPipeline.from_pretrained(
"runwayml/stable-diffusion-v1-5",
controlnet=controlnet,
torch_dtype=torch.float16
)
Source: examples/controlnet/train_controlnet.py
Common Configuration Options
Pipeline Loading Options
| Parameter | Type | Default | Description |
|---|---|---|---|
pretrained_model_name_or_path | str | Required | Model identifier or local path |
torch_dtype | torch.dtype | None | Data type for model weights |
variant | str | None | Model variant (e.g., "fp16", "onnx") |
use_safetensors | bool | None | Use SafeTensors format if available |
local_files_only | bool | False | Only use local files |
force_download | bool | False | Force download even if cached |
cache_dir | str | None | Custom cache directory |
token | str | None | Hugging Face API token |
revision | str | None | Git revision |
trust_remote_code | bool | False | Execute remote code |
Device Placement
# Move entire pipeline to device
pipeline = pipeline.to("cuda")
# Or move individual components
pipeline.unet = pipeline.unet.to("cuda")
pipeline.vae = pipeline.vae.to("cpu") # Offload VAE to save memory
Common Issues and Solutions
Model Loading Failures
Issue: Models fail to load with config mismatch errors.
Solution: Check that model components are compatible. Use use_safetensors=True and verify the model card for requirements.
Memory Optimization
Issue: Out of memory errors during inference.
Solutions:
# Enable CPU offloading
pipeline.enable_model_cpu_offload()
# Enable sequential CPU offloading
pipeline.enable_sequential_cpu_offload()
# Use attention slicing
pipeline.enable_attention_slicing()
# Enable VAE tiling for large images
pipeline.enable_vae_tiling()
Custom Model Loading
Issue: Community request for universal model loading (see Issue #13683).
Approach: For custom models or GGUF files, verify if from_single_file method is available on the model's class. If not, consider using the base model class with appropriate configuration.
# Universal loading attempt pattern
from diffusers import AutoModel
try:
model = AutoModel.from_pretrained("path/to/model")
except Exception as e:
# Fallback to single file loading if supported
model = SomeModelClass.from_single_file("path/to/checkpoint")
Scheduler Compatibility
Issue: Scheduler mapping confusion between A1111 and Diffusers (see Issue #4167).
Solution: Use the scheduler mapping table to find equivalent schedulers. Karras variants have use_karras_sigmas=True.
See Also
- Modular Pipelines - Composable pipeline architecture
- Training Guide - Fine-tuning diffusion models
- Optimization - Memory and speed optimization
- API Reference - Pipeline API documentation
- Model Architecture - Underlying model architectures
- Scheduler Reference - Available schedulers
Source: https://github.com/huggingface/diffusers / Human Manual
System Architecture
Related topics: Pipelines Overview, Loaders & Adapters
Continue reading this section for the full explanation and source context.
Continue reading this section for the full explanation and source context.
Continue reading this section for the full explanation and source context.
Continue reading this section for the full explanation and source context.
Related Pages
Related topics: Pipelines Overview, Loaders & Adapters
System Architecture
Overview
The Hugging Face Diffusers library provides a modular, flexible architecture for diffusion-based generative models. The system is designed around composable building blocks that enable both inference and training across image, video, audio, and text generation tasks. The architecture emphasizes separation of concerns between models (the neural network weights), schedulers (the sampling algorithms), and pipelines (the orchestration layer that combines components).
Source: PHILOSOPHY.md:1-50
High-Level Architecture
The Diffusers library follows a layered architectural approach with three primary abstractions:
graph TD
A[User Code] --> B[Pipeline Layer]
B --> C[Model Layer]
B --> D[Scheduler Layer]
C --> E[Transformer/UNet]
C --> F[VAE/Encoder-Decoder]
C --> G[Text Encoder]
D --> H[Scheduler Implementations]
style B fill:#e1f5fe
style C fill:#fff3e0
style D fill:#e8f5e9Core Abstractions
| Layer | Purpose | Key Classes |
|---|---|---|
| Pipeline | Orchestration and end-to-end workflows | DiffusionPipeline, StableDiffusionPipeline |
| Model | Neural network architectures | ModelMixin, ConfigMixin, AutoModel |
| Scheduler | Diffusion sampling algorithms | SchedulerMixin, various scheduler implementations |
Source: src/diffusers/pipelines/pipeline_utils.py:1-100
Model Architecture
Design Philosophy
Models in Diffusers are designed to expose complexity while providing clear error messages, following principles inspired by PyTorch's Module class. The architecture prioritizes modularity and extensibility, using smaller building blocks rather than monolithic model files.
Key principles from the project philosophy:
- Models make use of smaller building blocks such as
attention.py,resnet.py, andembeddings.py - Models do not follow the single-file policy used in Transformers
- All models inherit from
ModelMixinandConfigMixin - Models should by default have the highest precision and lowest performance setting
- New model checkpoints should adapt existing architectures when possible
Source: PHILOSOPHY.md:1-30
ModelMixin and ConfigMixin
All Diffusers models inherit from two base classes:
# From src/diffusers/models/modeling_utils.py (conceptual)
class ModelMixin:
"""Base class for all Diffusers models."""
@classmethod
def from_pretrained(cls, pretrained_model_name_or_path, **kwargs):
"""Load a pretrained model."""
pass
def save_pretrained(self, save_directory):
"""Save a model to a directory."""
pass
class ConfigMixin:
"""Base class for configuration classes."""
@classmethod
def from_config(cls, config, **kwargs):
"""Create a model from a configuration."""
pass
def save_config(self, save_directory):
"""Save configuration to a directory."""
pass
These base classes provide consistent serialization and deserialization patterns across all model types.
AutoModel System
The AutoModel system provides automatic model discovery and loading based on model configuration. It resolves model classes from configuration files and supports both Diffusers-native and Transformers models.
# From src/diffusers/models/auto_model.py
class AutoModel:
@classmethod
def from_config(cls, config, **kwargs):
# Determines the appropriate model class from config
# Supports _class_name for Diffusers models
# Supports model_type for Transformers models
pass
@classmethod
def from_pretrained(cls, pretrained_model_name_or_path, **kwargs):
# Loads pretrained weights
pass
The AutoModel system checks configuration for either _class_name (for Diffusers models) or model_type (for Transformers models) to determine the appropriate class to instantiate.
Source: src/diffusers/models/auto_model.py:1-80
Pipeline Architecture
DiffusionPipeline
The DiffusionPipeline serves as the main entry point for inference. It orchestrates the loading and connection of multiple components:
graph LR
A[Config/Index] --> B[DiffusionPipeline]
B --> C[UNet2DConditionModel]
B --> D[AutoencoderKL]
B --> E[Text Encoder]
B --> F[Tokenizer]
B --> G[Scheduler]The pipeline handles:
- Component discovery from configuration files
- Model loading with appropriate device placement
- Scheduler integration and timestep management
- End-to-end generation workflows
Source: src/diffusers/pipelines/pipeline_utils.py:100-200
Pipeline Loading Mechanisms
The library supports multiple model loading strategies:
| Loading Method | Use Case | Key Parameter |
|---|---|---|
from_pretrained() | Standard HuggingFace Hub models | pretrained_model_name_or_path |
from_single_file() | Single checkpoint files (CKPT, Safetensors) | checkpoint_path |
AutoModel | Auto-detection of model types | Configuration-based |
Source: src/diffusers/pipelines/pipeline_loading_utils.py:1-80
Single File Loading
The from_single_file method enables loading models from single checkpoint files. This is particularly important for community models and custom checkpoints that may not follow the standard directory structure.
# From src/diffusers/loaders/single_file.py
class FromOriginalModelMixin:
@classmethod
def from_single_file(
cls,
pretrained_model_link_or_path_or_dict,
original_config=None,
config=None,
**kwargs
):
"""Load a model from a single checkpoint file."""
pass
The single file loader:
- Detects model type from checkpoint structure
- Optionally applies original configuration files
- Supports GGUF quantized models
Source: src/diffusers/loaders/single_file.py:1-100
Model Type Detection
When loading models, Diffusers determines the appropriate loading strategy:
# From src/diffusers/pipelines/pipeline_loading_utils.py
is_transformers_model = (
is_transformers_available()
and issubclass(class_obj, PreTrainedModel)
and transformers_version >= version.parse("4.20.0")
)
is_diffusers_model = issubclass(class_obj, diffusers_module.ModelMixin)
is_diffusers_single_file_model = issubclass(class_obj, diffusers_module.FromOriginalModelMixin)
This detection determines whether to use Transformers-style loading, Diffusers-native loading, or single-file loading.
Source: src/diffusers/pipelines/pipeline_loading_utils.py:20-50
Modular Diffusers
Introduced in Diffusers 0.37.0, Modular Diffusers provides a new way to build pipelines by composing reusable blocks. Instead of writing entire pipelines from scratch, developers can mix and match building blocks to create custom workflows.
Source: Diffusers 0.37.0 Release Notes
ModularPipeline Components
graph TD
A[ModularPipeline] --> B[Transformer2DModel]
A --> C[VAE]
A --> D[TextEncoder]
A --> E[Scheduler]
B --> F[Attention]
B --> G[ResNet]
F --> H[Embeddings]The modular system uses type hints to determine the correct loading method for each component:
# From src/diffusers/modular_pipelines/modular_pipeline_utils.py
load_method = (
getattr(self.type_hint, "from_single_file")
if is_single_file
else getattr(self.type_hint, "from_pretrained")
)
Source: src/diffusers/modular_pipelines/modular_pipeline_utils.py:1-80
Scheduler System
SchedulerMixin Base Class
All schedulers inherit from SchedulerMixin, which provides a common interface for:
- Setting timesteps
- Scaling model inputs
- Computing denoised images
- Stepping through the diffusion process
The scheduler system implements various diffusion sampling algorithms including:
| Scheduler | Description | Use Case |
|---|---|---|
| DDPMScheduler | Denoising Diffusion Probabilistic Models | Training and sampling |
| DDIMScheduler | Denoising Diffusion Implicit Models | Fast sampling |
| PNDMScheduler | Pseudo Numerical Methods | Balanced speed/quality |
| LMSDiscreteScheduler | Linear Multistep Scheduler | Alternative timestepping |
| EulerDiscreteScheduler | Euler method | Simple, fast |
| EulerAncestralDiscreteScheduler | Euler with ancestral sampling | Diverse outputs |
| KarrasDiffusionSchedulers | Schedulers with Karras noise schedule | Improved quality |
Source: src/diffusers/schedulers/__init__.py
Scheduler-Pipeline Coupling
Schedulers are loosely coupled with pipelines, allowing users to swap schedulers to experiment with different sampling strategies:
from diffusers import StableDiffusionPipeline, DDIMScheduler
pipeline = StableDiffusionPipeline.from_pretrained("runwayml/stable-diffusion-v1-5")
pipeline.scheduler = DDIMScheduler.from_config(pipeline.scheduler.config)
Quantization Support
GGUF Quantization
Diffusers supports loading GGUF-quantized models through the GGUFQuantizer class. This enables efficient inference on reduced precision models.
# From src/diffusers/quantizers/gguf/gguf_quantizer.py
class GGUFQuantizer(DiffusersQuantizer):
use_keep_in_fp32_modules = True
def __init__(self, quantization_config, **kwargs):
self.compute_dtype = quantization_config.compute_dtype
self.pre_quantized = quantization_config.pre_quantized
self.modules_to_not_convert = quantization_config.modules_to_not_convert or []
The GGUF quantizer:
- Supports pre-quantized models from community repositories
- Maintains FP32 precision for sensitive modules
- Requires
accelerate>=0.26.0
Source: src/diffusers/quantizers/gguf/gguf_quantizer.py:1-60
Model Loading Flow
sequenceDiagram
participant User
participant Pipeline
participant AutoModel
participant HubUtils
participant Model
User->>Pipeline: from_pretrained(model_id)
Pipeline->>HubUtils: hf_hub_download(config.json)
HubUtils-->>Pipeline: config
Pipeline->>AutoModel: from_config(config)
AutoModel->>AutoModel: detect_model_type(config)
AutoModel->>HubUtils: hf_hub_download(weights)
HubUtils-->>AutoModel: weights
AutoModel->>Model: __init__() + load_state_dict()
Model-->>AutoModel: model
AutoModel-->>Pipeline: componentThe loading process follows these steps:
- Configuration Loading: Download and parse
config.jsonfrom the hub - Model Type Detection: Determine if model is Diffusers-native, Transformers, or single-file
- Weight Download: Fetch model weights from the appropriate source
- Model Instantiation: Create model with empty weights, then load state dict
- Device Placement: Move model to appropriate device (CPU/CUDA)
Source: src/diffusers/utils/hub_utils.py:1-100
Common Component Patterns
Model Components Table
| Component | File | Purpose |
|---|---|---|
| Attention | attention.py | Self-attention and cross-attention mechanisms |
| ResNet | resnet.py | Residual connections for deep networks |
| Embeddings | embeddings.py | Timestep and text embeddings |
| UNet | unet_2d_blocks.py | U-Net architecture for image generation |
| VAE | vae.py | Variational Autoencoder for latent spaces |
Source: PHILOSOPHY.md:5-15
Lazy Import System
Diffusers uses lazy imports to minimize startup time and reduce memory footprint:
# Pipelines defer loading of heavy dependencies until first use
# From src/diffusers/pipelines/pipeline_utils.py
def __getattr__(self, name):
if name in self._optional_components:
# Import only when accessed
import optional_module
return getattr(optional_module, name)
Configuration Options
Common Pipeline Parameters
| Parameter | Type | Default | Description |
|---|---|---|---|
pretrained_model_name_or_path | str | Required | Model identifier or local path |
torch_dtype | torch.dtype | None | Data type for model weights |
variant | str | None | Model variant (e.g., 'fp16', 'fp32') |
use_safetensors | bool | None | Use safetensors format |
local_files_only | bool | False | Only use local files |
revision | str | None | Git revision for Hub models |
Model Loading Configuration
| Parameter | Purpose | Source |
|---|---|---|
config.json | Model architecture | HuggingFace Hub |
model_index.json | Pipeline component mapping | Pipeline root |
config.yaml | Additional metadata | Optional |
diffusion_pytorch_model.bin | Model weights | Primary weight file |
Common Failure Modes
Based on community issues and documentation, users frequently encounter these architectural challenges:
1. Model Type Mismatch
Issue: Loading custom models fails with config mismatch errors.
Cause: The configuration file doesn't match expected structure.
Solution: Use from_single_file() with explicit configuration or provide a custom config.
Source: Community Issue #13683
2. Scheduler Compatibility
Issue: Swapping schedulers produces unexpected results.
Cause: Not all schedulers are compatible with all pipelines.
Solution: Use schedulers designed for the same discretization approach.
Source: Community Issue #4167
3. ModularPipeline Type Hints
Issue: AutoModel type hints in modular_model_index.json cause loading failures.
Cause: Type hint resolution fails for generic AutoModel classes.
Solution: Use specific model classes or provide explicit type hints.
Source: Diffusers 0.37.1 Release Notes
4. Transformer/GGUF Version Requirements
Issue: GGUF loading fails with version compatibility errors.
Cause: Missing or incompatible accelerate version.
Solution: Ensure accelerate>=0.26.0 is installed.
Source: src/diffusers/quantizers/gguf/gguf_quantizer.py:20-30
Extension Points
Adding Custom Models
To integrate new model checkpoints:
- Create or adapt an existing model architecture
- Implement
ModelMixinandConfigMixininterfaces - Add configuration handling for the new checkpoint format
- Register the model in
src/diffusers/models/__init__.py
Source: PHILOSOPHY.md:40-50
Custom Pipelines
For fundamentally different architectures, create a new pipeline class:
- Inherit from
DiffusionPipeline - Define components as class attributes
- Implement the
__call__method for generation - Add configuration parsing
Best Practices
Performance Optimization
- Use
torch_dtype=torch.float16for faster inference on compatible hardware - Enable
use_safetensors=Truefor faster model loading - Use
variant='fp16'when available to download pre-converted weights - Enable attention slicing for reduced memory usage
Model Selection
| Use Case | Recommended Approach |
|---|---|
| Standard models | DiffusionPipeline.from_pretrained() |
| Community models | from_single_file() |
| Custom architectures | AutoModel.from_config() |
| Quantized models | GGUF quantizer |
See Also
Source: https://github.com/huggingface/diffusers / Human Manual
Pipelines Overview
Related topics: Modular Diffusers, System Architecture
Continue reading this section for the full explanation and source context.
Continue reading this section for the full explanation and source context.
Continue reading this section for the full explanation and source context.
Continue reading this section for the full explanation and source context.
Related Pages
Related topics: Modular Diffusers, System Architecture
Pipelines Overview
Introduction
Pipelines are the primary high-level API in Diffusers for running diffusion models for inference. They provide a unified interface that orchestrates multiple componentsβincluding models, schedulers, tokenizers, and processorsβto generate outputs from pretrained checkpoints. Pipelines abstract away the complexity of the diffusion process, allowing users to perform inference with just a few lines of code.
The Diffusers library ships with pipelines for diverse generation tasks including text-to-image, image-to-image, inpainting, video generation, audio generation, and text generation. Each pipeline is designed to be modular, allowing components to be swapped or customized as needed.
Source: src/diffusers/pipelines/README.md
Architecture
Core Components
A pipeline typically consists of several interconnected components that work together during the diffusion process:
graph TD
A[Pipeline] --> B[UNet / Transformer]
A --> C[Scheduler]
A --> D[VAE / Encoder-Decoder]
A --> E[Text Encoder / Tokenizer]
A --> F[Safety Checker]
B --> C
C --> B
G[Input] --> A
A --> H[Output]
G --> E
E --> B| Component | Purpose | Common Classes |
|---|---|---|
| UNet/Transformer | Core denoising network that predicts noise in the latent space | UNet2DConditionModel, FluxTransformer2DModel |
| Scheduler | Controls the diffusion timestep schedule and noise addition/removal | DDPMScheduler, DDIMScheduler, DPMSolverMultistepScheduler |
| VAE | Encodes images to latent space and decodes latents back to images | AutoencoderKL, AutoencoderTiny |
| Text Encoder | Converts text prompts into embeddings understood by the model | CLIPTextModel, T5EncoderModel |
| Safety Checker | Filters potentially unsafe outputs | StableDiffusionSafetyChecker |
Source: src/diffusers/pipelines/pipeline_utils.py
Pipeline Class Hierarchy
Diffusers uses a mixin-based architecture for pipelines, allowing for flexible composition of functionality:
graph TD
A[DiffusionPipeline<br/>Base Class] --> B[StableDiffusionMixin]
A --> C[StableDiffusionLuminaMixin]
A --> D[AutoPipelineMixin]
B --> E[StableDiffusionPipeline]
B --> F[StableDiffusionImg2ImgPipeline]
B --> G[StableDiffusionInpaintPipeline]
D --> H[AutoPipeline]
D --> I[AutoEncoder倩ε Pipeline]All pipelines inherit from DiffusionPipeline, which provides core functionality such as from_pretrained() and save_pretrained() methods.
Source: src/diffusers/pipelines/pipeline_utils.py:90-139
Loading Pipelines
Standard Loading with `from_pretrained`
The primary method for loading a pipeline is through the from_pretrained() class method. This method accepts either a Hugging Face Hub repository ID or a local directory path.
from diffusers import StableDiffusionPipeline
# Load from Hugging Face Hub
pipeline = StableDiffusionPipeline.from_pretrained(
"runwayml/stable-diffusion-v1-5",
torch_dtype=torch.float16
)
# Load from local directory
pipeline = StableDiffusionPipeline.from_pretrained(
"./local/stable-diffusion-v1-5"
)
The method requires a model_index.json file in the repository or directory, which defines all components that should be loaded. Each component is specified in the format <name>: ["<library>", "<class_name>"].
Source: src/diffusers/pipelines/README.md
AutoPipeline
AutoPipeline is a universal pipeline loader that automatically detects and loads the appropriate pipeline class based on the model configuration. This addresses the community need for a "universal method to load any model" mentioned in issue #13683.
from diffusers import AutoPipeline
# Automatically detects pipeline type
pipeline = AutoPipeline.from_pretrained(
"runwayml/stable-diffusion-v1-5",
torch_dtype=torch.float16
)
The AutoPipeline class maintains a registry of supported pipeline types and uses type hints to determine the correct pipeline class when loading from modular_model_index.json files introduced in v0.37.0.
Source: src/diffusers/pipelines/auto_pipeline.py
Model Loading Internals
When loading a model, Diffusers follows a specific sequence to determine the appropriate loading mechanism:
graph TD
A[from_pretrained called] --> B{Is Transformers model?}
B -->|Yes| C[Use PreTrainedModel.from_pretrained]
B -->|No| D{Is Diffusers model?}
D -->|Yes| E[Load config, create empty model<br/>with init_empty_weights, then load]
D -->|No| F[Try AutoModel]
C --> G[Return model]
E --> G
F --> GFor Diffusers models, the library first loads the configuration, creates an empty model on meta devices, then loads the weights. For Transformers models, it delegates to the Transformers library's loading mechanism.
Source: src/diffusers/pipelines/pipeline_loading_utils.py
Loading Parameters
| Parameter | Type | Default | Description |
|---|---|---|---|
pretrained_model_name_or_path | str or Path | Required | Model identifier or local path |
torch_dtype | torch.dtype | None | Data type for model weights |
variant | str | None | Model variant (e.g., fp16, fp32) |
use_safetensors | bool | None | Prefer safetensors format |
cache_dir | str | None | Custom cache directory |
local_files_only | bool | False | Only use local files |
force_download | bool | False | Force re-download |
Source: src/diffusers/pipelines/pipeline_utils.py
Custom Pipelines
Loading Custom Pipelines
Diffusers supports loading custom pipelines through the custom_pipeline parameter. This allows users to extend the library with community-contributed or self-developed pipeline implementations.
from diffusers import DiffusionPipeline
pipeline = DiffusionPipeline.from_pretrained(
"runwayml/stable-diffusion-v1-5",
custom_pipeline="hf-internal-testing/diffusers-dummy-pipeline",
trust_remote_code=True
)
Custom pipelines can be loaded from:
- Hugging Face Hub: A repository ID containing a
pipeline.pyfile - GitHub: A community pipeline script name (loaded from
examples/community/) - Local directory: A directory containing a
pipeline.pyfile
Source: src/diffusers/pipelines/pipeline_utils.py
Community Pipelines
Community pipelines are hosted in the examples/community/ directory and provide extended functionality not available in core pipelines. These include ControlNet integrations, IP-Adapter implementations, and specialized generation techniques.
Community pipelines are loaded by specifying the pipeline script name (without the .py extension) as the custom_pipeline argument:
pipeline = DiffusionPipeline.from_pretrained(
"runwayml/stable-diffusion-v1-5",
custom_pipeline="clip_guided_stable_diffusion"
)
Source: examples/community/README.md
Modular Pipelines
Introduced in Diffusers v0.37.0, Modular Pipelines provide a compositional approach to building diffusion pipelines. Instead of monolithic pipeline classes, Modular Pipelines assemble reusable building blocks defined in modular_model_index.json files.
How Modular Pipelines Work
graph LR
A[modular_model_index.json] --> B[ModularPipeline]
B --> C[Transformer Block 1]
B --> D[Transformer Block 2]
B --> E[Scheduler Component]
B --> F[VAE Component]
C --> G[Attention Module]
D --> G
G --> H[Model Output]The ModularPipeline class uses type_hint annotations to determine the correct model class for each component, allowing flexible composition of different architectures.
Source: src/diffusers/modular_pipelines/modular_pipeline_utils.py
Loading Modular Pipelines
from diffusers import ModularPipeline
pipeline = ModularPipeline.from_pretrained(
"path/to/modular/model",
torch_dtype=torch.float16
)
When loading, the pipeline:
- Reads
modular_model_index.jsonto identify components - Resolves
type_hintannotations to determine model classes - Loads each component using appropriate
from_pretrainedorfrom_single_filemethods
Source: src/diffusers/modular_pipelines/modular_pipeline_utils.py
Pipeline Execution Flow
Standard Inference Flow
sequenceDiagram
participant User
participant Pipeline
participant Scheduler
participant UNet
participant VAE
User->>Pipeline: __call__(prompt)
Pipeline->>Pipeline: Encode prompt with tokenizer & text encoder
Pipeline->>Scheduler: Set timesteps
Loop Denoising loop
Pipeline->>UNet: forward(latent, timestep, encoder_hidden_states)
UNet-->>Pipeline: noise_pred
Pipeline->>Scheduler: step(noise_pred, timestep, latent)
Scheduler-->>Pipeline: denoised_latent
end
Pipeline->>VAE: decode(denoised_latent)
VAE-->>Pipeline: decoded_image
Pipeline->>Pipeline: Safety check
Pipeline-->>User: ImageExample: Text-to-Image Generation
from diffusers import StableDiffusionPipeline
import torch
pipeline = StableDiffusionPipeline.from_pretrained(
"runwayml/stable-diffusion-v1-5",
torch_dtype=torch.float16
)
pipeline.to("cuda")
image = pipeline(
prompt="a photo of an astronaut riding a horse on mars",
num_inference_steps=50,
guidance_scale=7.5
).images[0]
Source: src/diffusers/pipelines/stable_diffusion/pipeline_stable_diffusion.py
Scheduler Integration
Schedulers define the noise schedule and control how the diffusion process progresses from noise to sample. Different schedulers offer trade-offs between speed and quality:
| Scheduler | Speed | Quality | Notes |
|---|---|---|---|
DDIMScheduler | Fast | High | Good for few-step generation |
DDPMScheduler | Slow | Very High | Best quality, many steps |
DPMSolverMultistepScheduler | Medium | High | Fast convergence |
EulerDiscreteScheduler | Variable | High | Configurable |
UniPCMultistepScheduler | Fast | High | Few steps needed |
Switching Schedulers
from diffusers import StableDiffusionPipeline, DPMSolverMultistepScheduler
pipeline = StableDiffusionPipeline.from_pretrained(
"runwayml/stable-diffusion-v1-5"
)
# Replace the default scheduler
pipeline.scheduler = DPMSolverMultistepScheduler.from_config(
pipeline.scheduler.config
)
For A1111/K-Diffusion to Diffusers scheduler mapping, refer to issue #4167 which documents the correspondence between common scheduler configurations.
Source: src/diffusers/pipelines/pipeline_utils.py
Advanced Usage
Single-File Model Loading
Some custom models or quantized models (including GGUF files) are distributed as single checkpoint files. Diffusers provides from_single_file methods for loading these:
from diffusers import UNet2DConditionModel
model = UNet2DConditionModel.from_single_file(
"https://example.com/model.safetensors",
torch_dtype=torch.float16
)
The GGUF quantizer, introduced in recent versions, handles quantized GGUF checkpoint files with special loading requirements.
Source: src/diffusers/pipelines/pipeline_loading_utils.py
Memory Optimization
For inference on limited-memory hardware, several optimization strategies are available:
from diffusers import StableDiffusionPipeline
pipeline = StableDiffusionPipeline.from_pretrained(
"runwayml/stable-diffusion-v1-5",
torch_dtype=torch.float16,
use_safetensors=True
)
# Enable attention slicing for lower memory usage
pipeline.enable_attention_slicing()
# Enable sequential CPU offloading
pipeline.enable_sequential_cpu_offload()
# Use xformers memory-efficient attention
pipeline.enable_xformers_memory_efficient_attention()
Source: src/diffusers/pipelines/pipeline_utils.py
Common Failure Modes and Troubleshooting
Config Mismatch Issues
When loading custom models or third-party checkpoints, config mismatches are common. This is particularly relevant for community requests around universal model loading (issue #13683).
Symptoms:
ValueErrorduring model initialization- Missing keys when loading state dict
- Type mismatch errors
Solutions:
- Use
type_hintparameter in modular pipelines to specify expected model class - Provide custom configuration files alongside checkpoint files
- Use
ignore_mismatched_sizes=Truewhere applicable
Trust Remote Code
Custom pipelines require trust_remote_code=True to execute:
pipeline = DiffusionPipeline.from_pretrained(
"owner/custom-pipeline",
custom_pipeline="pipeline_name",
trust_remote_code=True
)
Without this flag, loading pipelines with custom code will raise a ValueError.
Source: src/diffusers/pipelines/pipeline_utils.py
Flux Klein Configuration
Recent releases (v0.37.0+) have addressed specific issues with Flux Klein model loading, including proper handling of distilled and non-distilled versions. Users should ensure they are using the correct configuration variant when loading these models.
Source: Diffusers v0.37.1 Release Notes
See Also
- Loading Pre-trained Models - Detailed guide on model loading
- DiffusionPipeline Class Reference - API documentation
- Custom Pipelines - Creating and loading custom pipelines
- Modular Diffusers - Modular pipeline documentation
- Schedulers - Scheduler configuration guide
- Optimization - Memory and speed optimization
- Training - Training pipelines and techniques
Source: https://github.com/huggingface/diffusers / Human Manual
Modular Diffusers
Related topics: Pipelines Overview, Training Guide
Continue reading this section for the full explanation and source context.
Continue reading this section for the full explanation and source context.
Continue reading this section for the full explanation and source context.
Continue reading this section for the full explanation and source context.
Related Pages
Related topics: Pipelines Overview, Training Guide
Modular Diffusers
Overview
Modular Diffusers is a framework introduced in Diffusers v0.37.0 that enables building diffusion pipelines by composing reusable, modular building blocks. Instead of writing entire pipelines from scratch, developers can mix and match components to create custom workflows tailored to specific use cases.
The core philosophy behind Modular Diffusers is composabilityβallowing users to:
- Reuse existing pipeline components across different models
- Swap individual components (transformers, schedulers, guiders) without rewriting entire pipelines
- Create custom pipelines by combining standardized building blocks
- Share and distribute custom pipeline configurations through Hugging Face Hub
Source: docs/source/en/modular_diffusers/overview.md
Architecture
Component Hierarchy
Modular Diffusers organizes pipeline components into a hierarchical structure. The main components include:
graph TD
A[ModularPipeline] --> B[ComponentsManager]
A --> C[PipelineConfig]
B --> D[Transformer]
B --> E[TextEncoder/TextEncoder 2]
B --> F[VAE/AutoencoderKL]
B --> G[Scheduler]
B --> H[Guider]
B --> I[Tokenizer]
D --> J[Flux Transformer]
D --> K[UNet2DConditionModel]
H --> L[FlowMatcherGuider]
H --> M[DPMSolverMultistepGuider]Core Components
| Component Type | Description | Base Class |
|---|---|---|
| Transformer | The core diffusion model that performs the denoising process | ModelMixin |
| TextEncoder | Encodes text prompts into embeddings | PreTrainedModel |
| VAE/AutoencoderKL | Encodes images to latent space and decodes back | ModelMixin |
| Scheduler | Controls the diffusion sampling process | SchedulerMixin |
| Guider | Guides the generation process (CFG, flow matching) | Guider |
| Tokenizer | Converts text to token IDs | PreTrainedTokenizer |
Source: src/diffusers/modular_pipelines/components_manager.py
Type Hints System
Modular Diffusers uses type hints to resolve which class should be loaded for each component. This allows flexible component substitution while maintaining type safety.
The system supports the following type hint sources:
| Source Type | Resolution Method |
|---|---|
| Direct class reference | Uses the specified class directly |
AutoModel | Uses AutoModel.from_pretrained() |
AutoModelForClassDiffusion | Uses class-specific auto model |
| Transformers models | Uses transformers.AutoModel |
Source: src/diffusers/modular_pipelines/modular_pipeline_utils.py:1-100
Guider System
The Guider system abstracts guidance computation from individual pipelines, allowing different guidance strategies to be applied uniformly:
graph LR
A[NoGuider] --> B[Base Guider Interface]
C[FlowMatcherGuider] --> B
D[DPMSolverMultistepGuider] --> B
B --> E[ModularPipeline]| Guider Type | Purpose | Configuration Key |
|---|---|---|
NoGuider | No guidance applied | Default |
FlowMatcherGuider | Flow matching guidance for Flux models | guider config |
DPMSolverMultistepGuider | DPM-Solver guidance | guider config |
Source: src/diffusers/guiders/__init__.py
Loading Components
From Pretrained Models
Modular pipelines automatically resolve and load components from the Hugging Face Hub:
from diffusers.modular_pipelines import ModularPipeline
# Load a complete modular pipeline
pipeline = ModularPipeline.from_pretrained(
"stabilityai/stable-diffusion-xl-base-1.0",
torch_dtype=torch.float16
)
The loading process follows this sequence:
sequenceDiagram
participant User
participant ModularPipeline
participant ComponentsManager
participant AutoModel
participant HuggingFace
User->>ModularPipeline: from_pretrained(path)
ModularPipeline->>HuggingFace: Download modular_model_index.json
ModularPipeline->>ComponentsManager: Parse component configs
ComponentsManager->>AutoModel: Resolve class from type_hint
AutoModel->>HuggingFace: Download model weights
ComponentsManager->>ComponentsManager: Instantiate components
ModularPipeline->>User: Return assembled pipelineSource: src/diffusers/pipelines/pipeline_loading_utils.py:1-60
With Type Hints
When loading components that lack sufficient configuration, specify type_hint to guide the loader:
from diffusers import AutoModel
from diffusers.modular_pipelines import ComponentsManager
manager = ComponentsManager()
# Specify type hint for component resolution
manager.add_component(
name="transformer",
pretrained_model_name_or_path="./my_custom_model",
type_hint=AutoModel # or specific class like FluxTransformer2DModel
)
Source: src/diffusers/modular_pipelines/modular_pipeline_utils.py:50-80
Single File Model Loading
Modular Diffusers supports loading models from single checkpoint files using from_single_file:
from diffusers.modular_pipelines import ModularPipeline
pipeline = ModularPipeline.from_single_file(
pretrained_model_link_or_path="./checkpoint.safetensors",
original_config="./config.yaml"
)
The system detects single-file models and routes them appropriately:
# From src/diffusers/loaders/single_file.py
is_diffusers_single_file_model = issubclass(class_obj, diffusers_module.FromOriginalModelMixin)
if is_diffusers_single_file_model:
load_method = getattr(class_obj, "from_single_file")
loaded_sub_model = load_method(
pretrained_model_link_or_path_or_dict=checkpoint,
original_config=original_config,
config=cached_model_config_path,
subfolder=name,
torch_dtype=torch_dtype,
**kwargs,
)
Source: src/diffusers/loaders/single_file.py:1-60
Flux Modular Pipeline
The Flux model family uses specialized modular pipeline implementations that handle both full and distilled model variants.
FluxPipeline Structure
graph TD
subgraph FluxPipeline
A[Transformer] --> B[FluxTransformer2DModel]
C[TextEncoder] --> D[CLIPTextModel/CLIPTextModelWithProjection]
C --> E[T5TextEncoder]
F[VAE] --> G[AutoencoderKL]
H[Scheduler] --> I[FlowMatchEulerDiscreteScheduler]
J[Guider] --> K[FlowMatcherGuider]
endConfiguration for Distilled Models
Flux models may use distilled versions that affect guidance configuration. The modular pipeline automatically detects and handles this:
# Distilled model handling in modular_pipeline.py
if hasattr(config, "guidance_scale"):
guider_config = {"guider": {"class_name": "FlowMatcherGuider"}}
else:
guider_config = {"guider": {"class_name": "NoGuider"}}
Source: src/diffusers/modular_pipelines/flux/modular_pipeline.py
Configuration Schema
Modular Model Index JSON
The modular_model_index.json file defines the pipeline configuration:
{
"_class_name": "ModularPipeline",
"components": {
"transformer": {
"type_hint": "FluxTransformer2DModel",
"pretrained_model_name_or_path": "black-forest-labs/FLUX.1-dev"
},
"text_encoder": {
"type_hint": "CLIPTextModel",
"pretrained_model_name_or_path": "black-forest-labs/FLUX.1-dev"
},
"text_encoder_2": {
"type_hint": "T5EncoderModel",
"pretrained_model_name_or_path": "black-forest-labs/FLUX.1-dev"
}
}
}
Component Configuration Options
| Parameter | Description | Default |
|---|---|---|
type_hint | Class to use for loading | Auto-detected |
pretrained_model_name_or_path | Model path or identifier | Required |
subfolder | Subdirectory within model | None |
variant | Model variant (e.g., "fp16") | None |
torch_dtype | Data type for weights | None |
use_safetensors | Use safe serialization | Auto |
Source: src/diffusers/modular_pipelines/components_manager.py:1-80
Common Patterns
Creating a Custom Pipeline
from diffusers import (
ModularPipeline,
FluxTransformer2DModel,
FlowMatchEulerDiscreteScheduler,
FlowMatcherGuider
)
# Define custom configuration
custom_config = {
"transformer": {
"type_hint": FluxTransformer2DModel,
"pretrained_model_name_or_path": "custom/model"
},
"scheduler": {
"type_hint": FlowMatchEulerDiscreteScheduler
}
}
# Create pipeline with custom config
pipeline = ModularPipeline.from_config(custom_config)
Mixing Components from Different Pipelines
from diffusers import AutoModel
# Load base pipeline
pipeline = ModularPipeline.from_pretrained("base/pipeline")
# Replace transformer with a custom variant
pipeline.transformer = AutoModel.from_pretrained(
"custom/transformer",
type_hint=type(pipeline.transformer)
)
Using with LoRA Adapters
from diffusers import StableDiffusionXLPipeline
from diffusers.loaders import PeftAdapterMixin
# Load pipeline with LoRA support
pipeline = StableDiffusionXLPipeline.from_pretrained(
"sdxl/pipeline",
torch_dtype=torch.float16
)
# Load and apply LoRA adapter
pipeline.load_adapters("path/to/lora", adapter_name="my_adapter")
pipeline.set_adapters("my_adapter")
Source: src/diffusers/models/auto_model.py:40-80
GGUF Quantization Support
Modular Diffusers supports GGUF-quantized models for reduced memory footprint:
from diffusers import AutoModel
from diffusers.quantizers.gguf import GGUFQuantizer
# Configure GGUF quantization
quantization_config = GGUFQuantizer(
compute_dtype="float16",
pre_quantized=True,
modules_to_not_convert=["lm_head"]
)
# Load quantized model
model = AutoModel.from_pretrained(
"quantized/model.gguf",
quantization_config=quantization_config,
torch_dtype=torch.float16
)
GGUF Quantization Parameters
| Parameter | Type | Description |
|---|---|---|
compute_dtype | torch.dtype | Computation data type |
pre_quantized | bool | Model is pre-quantized |
modules_to_not_convert | list | Modules to keep in FP32 |
use_keep_in_fp32_modules | bool | Keep specified modules in FP32 |
Source: src/diffusers/quantizers/gguf/gguf_quantizer.py:1-50
Common Failure Modes
Type Hint Resolution Failures
When type_hint is missing and AutoModel cannot determine the correct class:
ValueError: Unable to load transformer without `type_hint`
Solution: Explicitly provide type_hint for the component.
from diffusers import AutoModel
manager.add_component(
name="transformer",
pretrained_model_name_or_path="./custom_model",
type_hint=AutoModel # or specific class
)
Source: src/diffusers/modular_pipelines/modular_pipeline_utils.py:60-70
Config Mismatch with Transformers Models
When loading models that mix Diffusers and Transformers components:
ValueError: `config_class` cannot be None. Please double-check the model.
Solution: Ensure the model's config includes proper model_type or _class_name fields.
Single File Loading with Missing Config
When loading from single files without an original config:
ValueError: The repository contains custom code which must be executed
Solution: Pass trust_remote_code=True or provide original_config path.
pipeline = ModularPipeline.from_single_file(
"./checkpoint.safetensors",
original_config="./config.yaml",
trust_remote_code=True
)
Source: src/diffusers/loaders/single_file.py:30-60
Flux Klein LoRA Loading Issues
Community reports indicate issues with Flux Klein LoRA loading in some configurations. This was addressed in v0.37.1 with fixes for proper LoRA adapter handling with Flux models.
Reference: GitHub Issue #13313
Examples and Usage
Running Example Scripts
To use Modular Diffusers with example scripts:
git clone https://github.com/huggingface/diffusers
cd diffusers
pip install .
# Install example requirements
cd examples
pip install -r requirements.txt
Source: examples/README.md
Community Scripts
The community maintains additional modular pipeline examples:
| Example | Description | Author |
|---|---|---|
| IP-Adapter Negative Noise | Advanced IP-Adapter control | Γlvaro Somoza |
| Asymmetric Tiling | Seamless image tiling | alexisrolland |
| Prompt Scheduling | Dynamic prompt control | Community |
Reference: examples/community/README_community_scripts.md
See Also
- Pipeline Loading - General pipeline loading mechanisms
- AutoModel - Automatic model loading
- Guider System - Guidance abstraction layer
- Single File Loading - Loading from checkpoint files
- GGUF Quantization - Quantized model support
Source: https://github.com/huggingface/diffusers / Human Manual
Training Guide
Related topics: Loaders & Adapters, Optimization Guide
Continue reading this section for the full explanation and source context.
Continue reading this section for the full explanation and source context.
Continue reading this section for the full explanation and source context.
Continue reading this section for the full explanation and source context.
Related Pages
Related topics: Loaders & Adapters, Optimization Guide
Training Guide
Overview
The Hugging Face Diffusers library provides a comprehensive suite of training scripts and utilities for fine-tuning diffusion models. Training in Diffusers enables users to adapt pretrained models for custom tasks, create personalized outputs, and optimize models for specific domains or styles.
Training scripts in Diffusers are designed to be easy-to-tweak, beginner-friendly, and one-purpose-only. While they are not intended to provide state-of-the-art training methods for the newest models, they serve as excellent starting points for understanding diffusion model training and for adapting to specific use cases. Source: examples/README.md
Key Training Objectives
Diffusers training supports several fundamental objectives:
| Objective | Description | Common Use Cases |
|---|---|---|
| Personalization | Fine-tune models to generate content in a specific style or about specific subjects | DreamBooth, LoRA fine-tuning |
| Control | Add conditioning mechanisms to guide generation | ControlNet, adapter training |
| Efficiency | Distill knowledge or compress models for faster inference | LCM distillation, quantization |
| Domain Adaptation | Adapt models to specific data distributions | Custom dataset fine-tuning |
Architecture
Training System Components
graph TD
A[Training Pipeline] --> B[Model Loading]
A --> C[Data Loading]
A --> D[Optimizer Setup]
A --> E[Training Loop]
B --> B1[pretrained_model_name_or_path]
B --> B2[variant]
B --> B3[revision]
C --> C1[dataset_name]
C --> C2[pretrained_vae]
C --> C3[image processing]
D --> D1[Learning Rate]
D --> D2[AdamW]
D --> D3[lr_scheduler]
E --> E1[Gradient Computation]
E --> E2[Optimization Step]
E --> E3[Checkpointing]Training Script Types
Diffusers organizes training scripts by task and complexity level:
| Directory | Purpose | Example Scripts |
|---|---|---|
examples/dreambooth/ | DreamBooth personalization | LoRA, Full fine-tuning |
examples/text_to_image/ | Text-to-image training | LoRA, custom datasets |
examples/controlnet/ | ControlNet training | ControlNet, Flux ControlNet |
examples/advanced_diffusion_training/ | Advanced techniques | Flux LoRA, Dreambooth advanced |
examples/consistency_distillation/ | Model distillation | LCM LoRA distillation |
examples/research_projects/ | Community research | Scheduled Huber loss |
Common Training Patterns
Model Loading
All training scripts follow a consistent pattern for loading pretrained models:
# Load pretrained UNet/Transformer
unet = UNet2DConditionModel.from_pretrained(
pretrained_model_name_or_path,
subfolder="unet",
variant=variant,
revision=revision,
)
# Load pretrained VAE for numerical stability
vae = AutoencoderKL.from_pretrained(
pretrained_model_name_or_path,
subfolder="vae",
variant=variant,
revision=revision,
)
# Load pretrained VAE separately if specified
if pretrained_vae_model_name_or_path:
vae = AutoencoderKL.from_pretrained(pretrained_vae_model_name_or_path)
Source: examples/controlnet/train_controlnet.py:100-140
Core Training Arguments
Training scripts share common command-line arguments:
| Argument | Type | Default | Description |
|---|---|---|---|
--pretrained_model_name_or_path | str | required | Model identifier from HuggingFace Hub |
--pretrained_vae_model_name_or_path | str | None | Path to pretrained VAE with better numerical stability |
--variant | str | None | Variant of model files (e.g., fp16) |
--revision | str | None | Git revision of pretrained model |
--dataset_name | str | None | Dataset name from HuggingFace Hub |
--output_dir | str | required | Directory for checkpoints and outputs |
--cache_dir | str | None | Cache directory for downloaded models |
--seed | int | None | Random seed for reproducibility |
Source: examples/text_to_image/train_text_to_image_lora.py
Dataset Configuration
Training scripts support multiple dataset formats and configurations:
# From HuggingFace Hub
--dataset_name="dataset-name"
# From local directory
--train_data_dir="/path/to/local/data"
# Dataset configuration (when applicable)
--dataset_config_name="config-name"
The dataset must follow a specific structure, particularly for image datasets that need to work with HuggingFace Datasets' ImageFolder format. Source: examples/research_projects/scheduled_huber_loss_training/text_to_image/train_text_to_image_lora_sdxl.py
Training Methods
LoRA (Low-Rank Adaptation)
LoRA training adds trainable low-rank matrices to existing model layers, significantly reducing the number of trainable parameters while maintaining quality.
# Enable LoRA training
lora_attn_procs = {}
for name, attn_processor in unet.attn_processors.items():
# Initialize LoRA attention processors
...
unet.set_attn_processor(lora_attn_procs)
unet.train()
Key benefits:
- Reduced memory footprint
- Faster training times
- Easy to merge and unmerge
- Compatible with most model architectures
Source: examples/text_to_image/train_text_to_image_lora.py
DreamBooth
DreamBooth enables subject-driven personalization by fine-tuning a diffusion model on a few images of a specific subject with a unique identifier.
# Special identifier for the subject
instance_prompt = "a photo of a sks dog" # "sks" is the unique identifier
# Class-specific preservation prompt
class_prompt = "a photo of a dog"
# Training with prior preservation loss
# Helps maintain the model's knowledge about the class
Source: examples/dreambooth/train_dreambooth_lora.py
ControlNet Training
ControlNet trains additional conditioning branches that can control diffusion model outputs based on various input modalities (canny edges, poses, depth maps, etc.).
# Initialize ControlNet
controlnet = ControlNetModel.from_unet(unet)
# Prepare ControlNet conditions
control_image = load_control_image(control_image_path)
control_image = controlnet_image_processor.preprocess(control_image)
# Training with ControlNet conditions
with torch.no_grad():
# Forward pass with ControlNet conditioning
...
Source: examples/controlnet/train_controlnet.py
Consistency Distillation (LCM)
Latent Consistency Models (LCM) distill the iterative denoising process into fewer steps for fast inference.
# Teacher model for distillation
teacher_unet = UNet2DConditionModel.from_pretrained(
pretrained_model_name_or_path,
subfolder="unet",
)
# LCM-specific training parameters
--num_train_timesteps=1000
--GuidanceScale=0.0 # CFG disabled for LCM
--sigma_min=0.002
--sigma_max=14.61
Source: examples/consistency_distillation/train_lcm_distill_lora_sdxl.py
Advanced Training Configuration
Flux Training
Flux models use a different architecture requiring specific training configurations:
# Flux-specific model loading
transformer = FluxTransformer2DModel.from_pretrained(
pretrained_model_name_or_path="black-forest-labs/FLUX.1-dev",
subfolder="transformer",
)
# Flux training arguments
--flux=True
--max_sequence_length=512
--rank=4
--lambda_lora=1.0
Source: examples/advanced_diffusion_training/train_dreambooth_lora_flux_advanced.py
Training Utilities
The training_utils.py module provides core utilities for model training:
from diffusers.training_utils import (
FreeKLScheduler,
compute_snr,
scale_lora,
unet_lora_state_dict,
)
Key utility functions include:
- FreeKLScheduler: Implements FreeBIT-style scheduling for knowledge distillation
- compute_snr(): Computes Signal-to-Noise Ratio for advanced scheduling
- scale_lora(): Scales LoRA weights for merging
- unet_lora_state_dict(): Extracts LoRA state dict for saving
Source: src/diffusers/training_utils.py
Training Workflow
graph LR
A[Setup Environment] --> B[Prepare Dataset]
B --> C[Load Pretrained Models]
C --> D[Initialize LoRA/Adapters]
D --> E[Training Loop]
E --> F{Epoch Complete?}
F -->|Yes| G[Save Checkpoint]
F -->|No| E
G --> H{More Epochs?}
H -->|Yes| E
H -->|No| I[Export Final Model]
I --> J[Merge LoRA (optional)]Common Failure Modes and Troubleshooting
Model Loading Issues
| Issue | Cause | Solution |
|---|---|---|
Repository not found | Invalid model identifier | Verify model name on HuggingFace Hub |
Revision not found | Non-existent git revision | Use revision="main" or valid commit hash |
Variant not found | Missing weight variant | Omit --variant or check available variants |
| Config mismatch | Model architecture changed | Update model reference or use specific revision |
Source: src/diffusers/pipelines/pipeline_loading_utils.py
Memory Issues
| Issue | Solution |
|---|---|
| OOM during training | Enable gradient checkpointing, reduce batch size, use 8-bit Adam optimizer |
| Slow training | Use mixed precision (--mixed_precision="fp16"), enable xformers |
| VAE memory | Use separate pretrained VAE with better numerical stability |
LoRA Loading Problems
Recent releases (v0.37.x) have addressed several LoRA loading issues:
- Flux Klein LoRA loading: Fixed in v0.37.1
- ModularPipelines with AutoModel type hints: Fixed in v0.37.1
If encountering LoRA loading issues with custom models, ensure:
- The LoRA rank matches the target model architecture
- The
type_hintis correctly specified for single-file models - The model was saved with compatible LoRA weights
Source: src/diffusers/modular_pipelines/modular_pipeline_utils.py
Configuration Mismatch
When training with custom models or GGUF files:
- Verify model architecture matches the expected UNet/Transformer class
- Check that config files are present in the model directory
- For custom architectures, ensure proper registration with
ModelMixinandConfigMixin
Source: src/diffusers/models/auto_model.py
Best Practices
Environment Setup
# Clone and install from source
git clone https://github.com/huggingface/diffusers
cd diffusers
pip install .
# Install example-specific dependencies
cd examples/dreambooth
pip install -r requirements.txt
Source: examples/README.md
Reproducibility
Always specify a seed for reproducible training:
python train_dreambooth_lora.py \
--seed=42 \
--output_dir="./output" \
...
Checkpointing Strategy
- Save checkpoints at regular intervals using
--checkpointing_steps - Keep track of best-performing checkpoint using validation metrics
- Use
--resume_from_checkpointto resume interrupted training
Installation and Dependencies
Training scripts require specific dependencies. To ensure compatibility:
- Install from source for the latest training features
- Check requirements.txt in the specific example directory
- Verify PyTorch version is compatible with your GPU drivers
- For JAX training, ensure Flax is installed
Example installation:
pip install torch --index-url https://download.pytorch.org/whl/cu118
pip install accelerate transformers datasets peft
pip install -e ".[torch]"
See Also
- Loading Guide - Understanding model loading mechanisms
- Optimization Guide - Memory and speed optimization techniques
- Pipelines Overview - Using trained models for inference
- Modular Diffusers - Composable pipeline architecture
- Model Architecture - Design philosophy for models in Diffusers
Source: https://github.com/huggingface/diffusers / Human Manual
Optimization Guide
Related topics: Quantization Guide, Loaders & Adapters
Continue reading this section for the full explanation and source context.
Related Pages
Related topics: Quantization Guide, Loaders & Adapters
Optimization Guide
This page covers performance optimization techniques for the Diffusers library, including memory management, attention backends, caching strategies, and quantization options. These techniques enable efficient inference and training of diffusion models on various hardware configurations.
Overview
Diffusers provides multiple optimization layers to improve inference speed and reduce memory consumption. The optimization system operates at several levels:
- Attention Level: Alternative attention implementations (xformers, flash attention, scaled dot product attention)
- Cache Level: Key-value caching for iterative generation
- Memory Level: CPU offloading, gradient checkpointing, and memory-efficient attention
- Quantization Level: GGUF and other quantization formats for reduced precision inference
graph TD
A[Diffusion Pipeline] --> B[Attention Processors]
A --> C[Caching System]
A --> D[Quantization]
B --> B1[xformers]
B --> B2[Flash Attention]
B --> B3[SDPA]
C --> C1[FasterCache]
C --> C2[TextKVCache]
D --> D1[GGUF Quantization]Source: https://github.com/huggingface/diffusers / Human Manual
Loaders & Adapters
Related topics: Quantization Guide, System Architecture
Continue reading this section for the full explanation and source context.
Continue reading this section for the full explanation and source context.
Continue reading this section for the full explanation and source context.
Continue reading this section for the full explanation and source context.
Related Pages
Related topics: Quantization Guide, System Architecture
Loaders & Adapters
This page documents the loading mechanisms and adapter systems in the Diffusers library. These components are responsible for importing pretrained models, checkpoints, and adapter weights into pipelines and model architectures.
Overview
The Diffusers library provides a unified loading architecture that supports multiple model formats, checkpoint types, and adapter mechanisms. The loaders module (src/diffusers/loaders/) centralizes all loading functionality, enabling pipelines to dynamically import and configure model components at runtime.
graph TD
A[Pipeline Loading Request] --> B{Model Type Detection}
B -->|Standard HuggingFace| C[from_pretrained]
B -->|Single File Checkpoint| D[from_single_file]
B -->|LoRA Adapter| E[load_lora_weights]
B -->|Textual Inversion| F[load_textual_inversion]
B -->|IP Adapter| G[load_ip_adapter]
B -->|PEFT Format| H[load_peft_weights]
C --> I[ModelMixin / PreTrainedModel]
D --> J[FromOriginalModelMixin]
E --> K[StableDiffusionLoraLoaderMixin]
F --> L[TextualInversionLoaderMixin]
G --> M[IPAdapterMixin]
H --> N[PeftMixin]
I --> O[Loaded Model / Pipeline]
J --> O
K --> O
L --> O
M --> O
N --> OLoading Architecture
Core Loading Components
The loading system is built on several key abstractions:
| Component | File | Purpose |
|---|---|---|
FromOriginalModelMixin | single_file_model.py | Base mixin for loading checkpoints from original model formats |
StableDiffusionLoraLoaderMixin | lora_base.py | LoRA weight loading and fusion for Stable Diffusion models |
LoraLoaderMixin | lora_pipeline.py | Generic LoRA loading support for pipeline components |
PeftMixin | peft.py | PEFT-format adapter loading (LoRA, IAΒ³, LoHa, etc.) |
TextualInversionLoaderMixin | textual_inversion.py | Textual inversion embedding loading |
IPAdapterMixin | ip_adapter.py | Image Prompt adapter loading |
SingleFileLoader | single_file.py | Utilities for single-file checkpoint loading |
Source: src/diffusers/loaders/__init__.py
Model Type Detection
During loading, the system detects model types to determine the appropriate loading strategy:
is_transformers_model = (
is_transformers_available()
and issubclass(class_obj, PreTrainedModel)
and transformers_version >= version.parse("4.20.0")
)
is_diffusers_model = issubclass(class_obj, diffusers_module.ModelMixin)
is_diffusers_single_file_model = issubclass(class_obj, diffusers_module.FromOriginalModelMixin)
Source: src/diffusers/loaders/single_file.py:1-100
Single File Loading
Single file loading enables the import of pretrained checkpoints in formats other than the native Diffusers format. This is essential for loading models from other ecosystems or custom checkpoints.
FromOriginalModelMixin
Models implementing FromOriginalModelMixin support loading from original checkpoint formats:
if is_diffusers_single_file_model:
load_method = getattr(class_obj, "from_single_file")
loaded_sub_model = load_method(
pretrained_model_link_or_path_or_dict=checkpoint,
original_config=original_config,
config=cached_model_config_path,
subfolder=name,
torch_dtype=torch_dtype,
local_files_only=local_files_only,
disable_mmap=disable_mmap,
**kwargs,
)
Source: src/diffusers/loaders/single_file.py
Supported Single File Formats
The single file loader supports multiple checkpoint formats:
| Format | Description | Notes |
|---|---|---|
.safetensors | Safe tensors format | Memory-efficient, secure |
.bin / .pt | PyTorch pickle format | Legacy compatibility |
.ckpt | Generic checkpoint | Common for Stable Diffusion |
Single File Loading Parameters
| Parameter | Type | Description | ||
|---|---|---|---|---|
pretrained_model_link_or_path_or_dict | `str \ | dict` | Path or URL to checkpoint, or state dict | |
original_config | `str \ | dict \ | None` | Original model configuration |
config | `str \ | None` | Diffusers config path | |
subfolder | str | Subfolder path within checkpoint | ||
torch_dtype | torch.dtype | Target data type | ||
local_files_only | bool | Only load from local cache | ||
disable_mmap | bool | Disable memory-mapped loading |
LoRA (Low-Rank Adaptation)
LoRA enables efficient fine-tuning by adding small trainable matrices to existing model weights without modifying the base model.
LoRA Loading Architecture
graph LR
A[LoRA Checkpoint] --> B{LoraLoaderMixin}
B --> C[State Dict Extraction]
C --> D[Target Module Mapping]
D --> E[Weight Fusion]
E --> F[Adapted Model]Loading LoRA Weights
The StableDiffusionLoraLoaderMixin provides the load_lora_weights method:
def load_lora_weights(cls, pretrained_model_name_or_path, adapter_name=None, **kwargs):
"""
Load LoRA weights into pipeline components.
Args:
pretrained_model_name_or_path: Path or HuggingFace model ID
adapter_name: Optional name for the adapter (for multiple LoRAs)
"""
Source: src/diffusers/loaders/lora_base.py
LoRA Pipeline Integration
The LoraLoaderMixin extends pipeline support for LoRA adapters:
class LoraLoaderMixin:
"""Mixin class for LoRA loading in diffusion pipelines."""
def load_lora_weights(self, pretrained_model_name_or_path, **kwargs):
"""Load and fuse LoRA weights into pipeline components."""
def unload_lora_weights(self):
"""Remove LoRA weights and restore original weights."""
def set_adapters(self, adapter_names, weights=None):
"""Set active adapters with optional weighting."""
Source: src/diffusers/loaders/lora_pipeline.py
Multiple LoRA Support
Diffusers supports loading multiple LoRA adapters simultaneously:
| Method | Description |
|---|---|
load_lora_weights() | Load with optional adapter name |
set_adapters() | Activate specific adapters |
fuse_lora() | Fuse adapters with custom weights |
unfuse_lora() | Unfuse previously fused adapters |
Flux Klein LoRA Loading
Note: Diffusers v0.37.1 included fixes specifically for Flux Klein LoRA loading, addressing issues with type hints and model compatibility.
Source: Release v0.37.1 - Fix Flux Klein LoRA loading #13313
PEFT Integration
The PeftMixin enables loading adapters in the PEFT (Parameter-Efficient Fine-Tuning) format:
class PeftMixin:
"""Mixin for loading PEFT-format adapters."""
def load_peft_weights(
self,
pretrained_model_name_or_path,
adapter_name: str = "default",
layer_selection: Optional[List[int]] = None,
scale_weight: Optional[float] = None,
):
"""Load PEFT-format adapter weights."""
Source: src/diffusers/loaders/peft.py
Supported PEFT Adapter Types
| Adapter Type | Description |
|---|---|
LORA | Low-Rank Adaptation |
IA3 | Infused Adapter by Inhibiting and Amplifying Inner Layers |
LoHa | Low-Rank Hadamard Product |
AdaLoRA | Adaptive LoRA |
DoRA | Weight-Decomposed Linear Adaptation |
Textual Inversion
Textual Inversion enables customizing the model's vocabulary through learned embeddings without modifying the base model.
Loading Textual Inversion Embeddings
class TextualInversionLoaderMixin:
"""Mixin for textual inversion embedding loading."""
def load_textual_inversion(
self,
pretrained_model_name_or_path,
token: Optional[str] = None,
file_extension: str = "safetensors",
**kwargs
):
"""
Load textual inversion embeddings.
Args:
pretrained_model_name_or_path: Path or model ID
token: Optional token name for the embedding
file_extension: File format for embeddings
"""
Source: src/diffusers/loaders/textual_inversion.py
Textual Inversion File Formats
| Format | Extension | Notes |
|---|---|---|
| SafeTensors | .safetensors | Recommended, secure |
| PyTorch | .bin, .pt | Legacy format |
| Diffusers | .json + vectors | Native format |
IP Adapter
IP Adapter enables image-based conditioning for generation, allowing reference images to guide the generation process.
IP Adapter Loading
class IPAdapterMixin:
"""Mixin for IP-Adapter loading."""
def load_ip_adapter(
self,
model_id_or_path: Union[str, List[str]],
subfolder: Union[str, List[str], None] = None,
weight_name: Union[str, List[str], None] = None,
image_encoder_folder: Union[str, List[str], None] = "image_encoder",
**kwargs
):
"""Load IP-Adapter weights and image encoders."""
Source: src/diffusers/loaders/ip_adapter.py
IP Adapter Components
| Component | Description |
|---|---|
| Image Encoder | Processes reference images |
| Image Projection | Maps encoded features to cross-attention space |
| Adapter Weights | Fine-tuned weights for image conditioning |
Pipeline Loading Utilities
Loading Process Flow
graph TD
A[Pipeline.from_pretrained] --> B[Load model_index.json]
B --> C{Component Type Detection}
C -->|Diffusers Model| D[ModelMixin.from_config]
C -->|Transformers Model| E[PreTrainedModel.from_pretrained]
C -->|Scheduler| F[SchedulerMixin.from_config]
C -->|Tokenizer| G[AutoTokenizer.from_pretrained]
D --> H[Load config.yaml]
E --> I[Load config.json]
H --> J[Create model on meta device]
I --> J
J --> K[Load weights with accelerate]
K --> L[Offload if needed]
L --> M[Pipeline Ready]Loading with Quantization
The pipeline loading system integrates with quantization configurations:
if (
quantization_config is not None
and isinstance(quantization_config, PipelineQuantizationConfig)
and issubclass(class_obj, torch.nn.Module)
):
model_quant_config = quantization_config._resolve_quant_config(
is_diffusers=is_diffusers_model, module_name=name
)
if model_quant_config is not None:
loading_kwargs["quantization_config"] = model_quant_config
Source: src/diffusers/pipelines/pipeline_loading_utils.py
Modular Pipeline Loading
Modular Pipelines (introduced in v0.37.0) provide a composable approach to pipeline construction using reusable blocks.
Component Specification
Modular Pipelines use ComponentSpec to define loading parameters:
@dataclass
class ComponentSpec:
name: str
type_hint: tuple[str, str] # (library, class_name)
pretrained_model_name_or_path: Optional[str]
subfolder: Optional[str]
variant: Optional[str]
revision: Optional[str]
Source: src/diffusers/modular_pipelines/modular_pipeline.py
Loading with AutoModel Type Hints
Note: Diffusers v0.37.1 fixed loading issues withModularPipelinesthat useAutoModeltype hints in theirmodular_model_index.json.
Source: Release v0.37.1 - Fix for loading ModularPipelines with AutoModel type hints #13271
The loading process attempts AutoModel.from_pretrained when type_hint is None:
if self.type_hint is None:
try:
component = AutoModel.from_pretrained(
pretrained_model_name_or_path, **load_kwargs, **kwargs
)
except Exception as e:
raise ValueError(f"Unable to load {self.name} without `type_hint`: {e}")
self.type_hint = component.__class__
Source: src/diffusers/modular_pipelines/modular_pipeline_utils.py
Common Usage Patterns
Loading a Standard Pipeline
from diffusers import StableDiffusionPipeline
pipeline = StableDiffusionPipeline.from_pretrained(
"runwayml/stable-diffusion-v1-5",
torch_dtype=torch.float16,
use_safetensors=True,
)
Loading with LoRA
from diffusers import StableDiffusionXLPipeline
pipeline = StableDiffusionXLPipeline.from_pretrained(
"stabilityai/stable-diffusion-xl-base-1.0"
)
pipeline.load_lora_weights("path/to/lora_weights")
# Generate with LoRA
image = pipeline(prompt).images[0]
Loading Multiple Adapters
from diffusers import StableDiffusionPipeline
pipeline = StableDiffusionPipeline.from_pretrained(
"runwayml/stable-diffusion-v1-5"
)
# Load multiple LoRA adapters
pipeline.load_lora_weights("adapter_1", adapter_name="style_1")
pipeline.load_lora_weights("adapter_2", adapter_name="style_2")
# Use with different weights
pipeline.set_adapters(["style_1"], weights=[1.0])
Loading Textual Inversion
pipeline = StableDiffusionPipeline.from_pretrained(
"runwayml/stable-diffusion-v1-5"
)
pipeline.load_textual_inversion(
"path/to/textual_inversion",
token="my-concept"
)
image = pipeline("a photo of my-concept").images[0]
Configuration Options
Loading Parameters
| Parameter | Type | Default | Description | |
|---|---|---|---|---|
cache_dir | str | ~/.cache/huggingface/ | Cache directory for downloaded models | |
torch_dtype | torch.dtype | None | Override default dtype | |
use_safetensors | bool | True | Prefer .safetensors format | |
variant | str | None | Model variant (e.g., "fp16") | |
revision | str | None | Git revision to load | |
use_flash_attention_2 | bool | False | Enable Flash Attention 2 | |
device_map | `str \ | dict` | None | Device mapping strategy |
max_memory | dict | None | Memory limits per device | |
offload_folder | str | None | Folder for offloaded weights | |
local_files_only | bool | False | Only use local files |
LoRA-Specific Parameters
| Parameter | Type | Description |
|---|---|---|
adapter_name | str | Name for the loaded adapter |
scale_weight | float | Scaling factor for LoRA weights |
layer_selection | List[int] | Apply only to specific layers |
Common Issues and Troubleshooting
Single File Loading Failures
Issue: Custom models or GGUF files fail to load
Community discussion: Issue #13683 - Universal method or class to load any model locally
Many custom models fail to load due to limited .from_single_file availability across model classes.
Solutions:
- Verify the model class implements
FromOriginalModelMixin - Provide an original config file when available
- Consider converting to standard Diffusers format
Type Hint Requirements
When using Modular Pipelines:
- Ensure
modular_model_index.jsonincludes propertype_hintfields - For unknown types, provide
type_hintexplicitly or ensure AutoModel can resolve the class
Version Compatibility
| Feature | Minimum Diffusers Version |
|---|---|
| Modular Pipelines | 0.37.0 |
| Flux Klein LoRA fixes | 0.37.1 |
| PEFT integration | 0.33.0+ |
| IP Adapter | 0.31.0+ |
Architecture Principles
According to the Diffusers philosophy (PHILOSOPHY.md):
- Extensibility: Loaders should be designed to be easily extendable to future changes
- Composability: Adapter systems should support mixing multiple techniques
- Backward Compatibility: Loading mechanisms maintain compatibility across versions
- Clear Error Messages: Loading failures provide actionable error information
See Also
- Loading Documentation - Official guide on loading models
- LoRA Training - Training with LoRA adapters
- Textual Inversion - Custom concept training
- Modular Pipelines - Composable pipeline blocks
- Optimization Guide - Memory and performance optimization
Source: https://github.com/huggingface/diffusers / Human Manual
Quantization Guide
Related topics: Optimization Guide, Loaders & Adapters
Continue reading this section for the full explanation and source context.
Continue reading this section for the full explanation and source context.
Continue reading this section for the full explanation and source context.
Continue reading this section for the full explanation and source context.
Related Pages
Related topics: Optimization Guide, Loaders & Adapters
Quantization Guide
This page provides comprehensive documentation on quantization support in the Diffusers library. Quantization reduces model memory footprint and computational requirements by representing model weights in lower precision formats, enabling deployment of large diffusion models on resource-constrained hardware.
Overview
The Diffusers library implements a modular quantization framework that supports multiple quantization backends. This architecture allows users to load quantized models from the Hugging Face Hub or quantize models on-the-fly during loading. The quantization system is designed to be backend-agnostic while providing backend-specific optimizations.
Quantization in Diffusers serves two primary purposes:
- Memory Reduction: Reduce VRAM requirements for loading and running diffusion models
- Runtime Optimization: Accelerate inference through optimized low-precision computations
The library currently supports four major quantization backends: GGUF, BitsAndBytes, Quanto, and TorchAO. Each backend offers different trade-offs between compression ratio, inference speed, and quality preservation.
Architecture
Quantization System Components
The quantization framework follows a modular architecture with a base class hierarchy and backend-specific implementations:
graph TD
A[DiffusionPipeline] --> B[PipelineQuantizationConfig]
B --> C[DiffusersQuantizer Base Class]
C --> D[GGUFQuantizer]
C --> E[BitsAndBytesQuantizer]
C --> F[QuantoQuantizer]
C --> G[TorchAOQuantizer]
H[Model Loading] --> I[ModelMixin]
I --> C
J[Single File Loading] --> K[FromOriginalModelMixin]
K --> CQuantization Flow
sequenceDiagram
participant User
participant Pipeline
participant QuantConfig
participant Quantizer
participant Model
User->>Pipeline: from_pretrained(quantization_config)
Pipeline->>QuantConfig: Validate quantization config
QuantConfig->>Quantizer: Create backend-specific quantizer
Pipeline->>Model: Load with quantizer
Model->>Quantizer: Apply quantization to weights
Quantizer-->>Model: Quantized model ready
Model-->>Pipeline: Pipeline ready for inferenceSupported Quantization Backends
GGUF Quantization
GGUF (GPT-Generated Unified Format) is designed for loading pre-quantized models, particularly those from the llama.cpp ecosystem. The GGUF quantizer handles models that have been quantized externally and stored in the GGUF format.
Key Characteristics:
- Supports various quantization types (Q4_K, Q5_K, Q8_0, etc.)
- Memory-mapped file loading for efficient memory usage
- Compatible with models converted from original formats
Source: src/diffusers/quantizers/gguf/gguf_quantizer.py
The GGUF quantizer class initializes with the following parameters:
| Parameter | Type | Description |
|---|---|---|
quantization_config | GGUFQuantizationConfig | Configuration for GGUF quantization |
modules_to_not_convert | List[str] | Module names to exclude from quantization |
compute_dtype | torch.dtype | Computation data type |
pre_quantized | bool | Whether the model is pre-quantized |
Important Dependencies:
GGUF loading requires accelerate>=0.26.0 and the gguf package. These are validated during environment checks in validate_environment().
def validate_environment(self, *args, **kwargs):
if not is_accelerate_available() or is_accelerate_version("<", "0.26.0"):
raise ImportError(
"Loading GGUF Parameters requires `accelerate` installed in your environment: "
"`pip install 'accelerate>=0.26.0'`"
)
Source: src/diffusers/quantizers/gguf/gguf_quantizer.py:30-37
BitsAndBytes Quantization
BitsAndBytes (bnb) provides on-the-fly quantization during model loading. It supports 4-bit and 8-bit quantization modes with optional NF4 (Normal Float 4) data type.
Key Characteristics:
- On-the-fly quantization during loading
- 4-bit (NF4) and 8-bit (Int8) modes
- Supports
keep_in_fp32_modulesfor sensitive layers - Compatible with QLoRA fine-tuning workflows
Source: src/diffusers/quantizers/bitsandbytes/bnb_quantizer.py
Quanto Quantization
Quanto provides a PyTorch-native quantization backend with support for various quantization schemes including int8 and int4.
Key Characteristics:
- Pure PyTorch implementation
- Supports int2, int4, int8 quantization
- Good compatibility with existing PyTorch workflows
- No additional C++ dependencies required
Source: src/diffusers/quantizers/quanto/quanto_quantizer.py
TorchAO Quantization
TorchAO is the PyTorch native quantization backend that provides hardware-optimized quantization kernels.
Key Characteristics:
- PyTorch native backend
- Optimized kernel support
- Integration with torch.compile for additional speedups
- Supports both dynamic and static quantization
Source: src/diffusers/quantizers/torchao/torchao_quantizer.py
Configuration
PipelineQuantizationConfig
The PipelineQuantizationConfig class provides a unified interface for configuring quantization across different backends. It handles backend-specific configuration resolution and validation.
Source: src/diffusers/quantizers/pipe_quant_config.py
Quantization Configuration Parameters
| Parameter | Type | Backend | Description |
|---|---|---|---|
quantization_method | str | all | Quantization backend: gguf, bitsandbytes, quanto, torchao |
load_in_4bit | bool | bnb | Load model weights in 4-bit precision |
load_in_8bit | bool | bnb | Load model weights in 8-bit precision |
bnb_4bit_compute_dtype | torch.dtype | bnb | Computation dtype for BitsAndBytes |
bnb_4bit_quant_type | str | bnb | Quantization type (fp4, nf4) |
bnb_4bit_use_double_quant | bool | bnb | Enable double quantization |
gguf_format | str | gguf | GGUF file format version |
compute_dtype | torch.dtype | gguf | Target compute data type |
modules_to_not_convert | List[str] | gguf | Modules to exclude from quantization |
torch_dtype | torch.dtype | all | Default torch data type |
Loading Quantized Models
#### Loading GGUF Models
from diffusers import DiffusionPipeline
import torch
pipeline = DiffusionPipeline.from_pretrained(
"model/path",
quantization_config={
"quantization_method": "gguf",
"gguf_format": "q4_k", # or q5_k, q8_0, etc.
},
torch_dtype=torch.float16,
device_map="auto"
)
#### Loading with BitsAndBytes
from diffusers import DiffusionPipeline
from transformers import BitsAndBytesConfig
quantization_config = BitsAndBytesConfig(
load_in_4bit=True,
bnb_4bit_compute_dtype=torch.float16,
bnb_4bit_quant_type="nf4",
bnb_4bit_use_double_quant=True
)
pipeline = DiffusionPipeline.from_pretrained(
"model/path",
quantization_config=quantization_config
)
Source: src/diffusers/pipelines/pipeline_loading_utils.py
Pipeline Integration
Model Loading with Quantization
When a pipeline loads with quantization configuration, the PipelineLoadingUtils class handles the quantization process. The loading flow follows these steps:
graph LR
A[from_pretrained] --> B{Is Quantized?}
B -->|Yes| C[Get Quantizer]
B -->|No| D[Load Normal]
C --> E{Quantizer Type?}
E -->|GGUF| F[Use from_single_file]
E -->|Other| G[Use from_config]
F --> H[Apply Quantization]
G --> H
H --> I[Return Quantized Model]
D --> ISource: src/diffusers/loaders/single_file.py
The loading process determines the appropriate loading method based on the model type:
is_diffusers_single_file_model = issubclass(class_obj, diffusers_module.FromOriginalModelMixin)
is_diffusers_model = issubclass(class_obj, diffusers_module.ModelMixin)
if is_diffusers_single_file_model:
load_method = getattr(class_obj, "from_single_file")
# ...
loaded_sub_model = load_method(
pretrained_model_link_or_path_or_dict=checkpoint,
original_config=original_config,
config=cached_model_config_path,
subfolder=name,
torch_dtype=torch_dtype,
local_files_only=local_files_only,
disable_mmap=disable_mmap,
**kwargs,
)
Source: src/diffusers/loaders/single_file.py:40-55
Single File Loading
For GGUF and other single-file model formats, the from_single_file method handles the complete loading process. This is particularly important for quantized models that bundle all weights in a single file.
Source: src/diffusers/loaders/single_file.py
Quantization Resolution in Pipelines
The pipeline quantization configuration is resolved at load time:
if (
quantization_config is not None
and isinstance(quantization_config, PipelineQuantizationConfig)
and issubclass(class_obj, torch.nn.Module)
):
model_quant_config = quantization_config._resolve_quant_config(
is_diffusers=is_diffusers_model, module_name=name
)
if model_quant_config is not None:
loading_kwargs["quantization_config"] = model_quant_config
Source: src/diffusers/pipelines/pipeline_loading_utils.py:120-129
Common Usage Patterns
Memory-Constrained Inference
For running large models on GPUs with limited VRAM:
from diffusers import DiffusionPipeline
import torch
pipeline = DiffusionPipeline.from_pretrained(
"stabilityai/stable-diffusion-xl-base-1.0",
quantization_config={
"quantization_method": "bitsandbytes",
"load_in_4bit": True,
},
torch_dtype=torch.float16,
device_map="auto"
)
# Generate image
result = pipeline(prompt="a beautiful landscape")
Loading Pre-Quantized GGUF Models
from diffusers import DiffusionPipeline
import torch
# Load a GGUF quantized model
pipeline = DiffusionPipeline.from_pretrained(
"quantized/model/path",
quantization_config={
"quantization_method": "gguf",
"gguf_format": "q4_k_m",
},
torch_dtype=torch.float16,
device_map="auto"
)
Mixed Quantization
Apply different quantization levels to different components:
from diffusers import DiffusionPipeline
from transformers import BitsAndBytesConfig
# Quantize UNet with 4-bit, keep VAE in full precision
pipeline = DiffusionPipeline.from_pretrained(
"model/path",
unet_quantization_config=BitsAndBytesConfig(load_in_4bit=True),
vae_quantization_config=None, # Full precision VAE
)
Troubleshooting
Common Issues and Solutions
| Issue | Cause | Solution |
|---|---|---|
ImportError for accelerate | Missing dependency for GGUF | pip install 'accelerate>=0.26.0' |
| Memory errors during loading | Model too large for GPU | Use 4-bit quantization or CPU offloading |
| Slow inference with quantized model | Quantization not optimized | Enable torch.compile or use faster backends |
| Config mismatch errors | Incompatible quantization config | Verify backend-specific requirements |
| MMAP errors | Memory-mapped file issues | Set disable_mmap=True in loading config |
Environment Requirements
Different quantization backends have specific dependencies:
| Backend | Minimum Dependencies |
|---|---|
| GGUF | accelerate>=0.26.0, gguf |
| BitsAndBytes | bitsandbytes>=0.41.0 |
| Quanto | quanto |
| TorchAO | PyTorch 2.0+ |
Version Compatibility
The quantization system was enhanced in recent releases:
- v0.37.0+: Improved modular pipelines and quantization integration
- v0.35.2+: Better transformers compatibility for quantized models
- v0.33.0+: Enhanced memory optimizations and caching for quantized models
Source: README.md
Design Philosophy
The quantization system in Diffusers follows the library's core design principles:
- Modularity: Each quantizer is a self-contained class inheriting from
DiffusersQuantizer - Composability: Quantization configs can be applied at pipeline or individual component level
- Backward Compatibility: Default settings preserve maximum precision
- Extensibility: New backends can be added by implementing the base quantizer interface
Source: PHILOSOPHY.md
Models are designed to expose complexity similar to PyTorch's Module class, providing clear error messages when quantization configuration issues occur. The system maintains high precision defaults while allowing optimization when explicitly requested.
See Also
- Loading Diffusion Models - General model loading documentation
- Optimization Guide - Memory and speed optimization techniques
- Modular Pipelines - Composable pipeline architecture
- GGUF Quantization - Detailed GGUF format documentation
- Quantization Overview - Complete quantization documentation
- Training with Quantization - Training quantized models
Source: https://github.com/huggingface/diffusers / Human Manual
Doramagic Pitfall Log
Source-linked risks stay visible on the manual page so the preview does not read like a recommendation.
May increase setup, validation, or first-run risk for the user.
May increase setup, validation, or first-run risk for the user.
May increase setup, validation, or first-run risk for the user.
May increase setup, validation, or first-run risk for the user.
Doramagic Pitfall Log
Found 24 structured pitfall item(s), including 4 high/blocking item(s). Top priority: Installation risk - Installation risk requires verification.
1. Installation risk: Installation risk requires verification
- Severity: high
- Finding: Project evidence flags a installation risk. Review the linked source before relying on this workflow.
- User impact: May increase setup, validation, or first-run risk for the user.
- Recommended check: Reproduce the official install and quickstart path in an isolated environment.
- Evidence: community_evidence:github | cevd_a9d989818ab840c6985e6c0c41830e87 | https://github.com/huggingface/diffusers/issues/13401
2. Installation risk: Installation risk requires verification
- Severity: high
- Finding: Project evidence flags a installation risk. Review the linked source before relying on this workflow.
- User impact: May increase setup, validation, or first-run risk for the user.
- Recommended check: Reproduce the official install and quickstart path in an isolated environment.
- Evidence: community_evidence:github | cevd_190402547a6a441bb4f046b278c04a7f | https://github.com/huggingface/diffusers/issues/13683
3. Security or permission risk: Security or permission risk requires verification
- Severity: high
- Finding: Project evidence flags a security or permission risk. Review the linked source before relying on this workflow.
- User impact: May increase setup, validation, or first-run risk for the user.
- Recommended check: Reproduce the official install and quickstart path in an isolated environment.
- Evidence: community_evidence:github | cevd_fedc9c5b4dc2486aa7ed13053f2050af | https://github.com/huggingface/diffusers/issues/13772
4. Security or permission risk: Security or permission risk requires verification
- Severity: high
- Finding: Project evidence flags a security or permission risk. Review the linked source before relying on this workflow.
- User impact: May increase setup, validation, or first-run risk for the user.
- Recommended check: Reproduce the official install and quickstart path in an isolated environment.
- Evidence: community_evidence:github | cevd_d70cffdb7188481fb8e1e7e5a84539bb | https://github.com/huggingface/diffusers/issues/13844
5. Installation risk: Installation risk requires verification
- Severity: medium
- Finding: Project evidence flags a installation risk. Review the linked source before relying on this workflow.
- User impact: May increase setup, validation, or first-run risk for the user.
- Recommended check: Reproduce the official install and quickstart path in an isolated environment.
- Evidence: community_evidence:github | cevd_e2c183459b644dfe88a28ce288693dc1 | https://github.com/huggingface/diffusers/issues/13762
6. Configuration risk: Configuration risk requires verification
- Severity: medium
- Finding: Developers should check this configuration risk before relying on the project: Diffusers 0.35.0: Qwen Image pipelines, Flux Kontext, Wan 2.2, and more
- User impact: Upgrade or migration may change expected behavior: Diffusers 0.35.0: Qwen Image pipelines, Flux Kontext, Wan 2.2, and more
- Recommended check: Before packaging this project, run the relevant install/config/quickstart check for: Diffusers 0.35.0: Qwen Image pipelines, Flux Kontext, Wan 2.2, and more. Context: Observed when using python
- Evidence: failure_mode_cluster:github_release | fmev_e8d17ffbe5fa1785fea2871516925453 | https://github.com/huggingface/diffusers/releases/tag/v0.35.0
7. Configuration risk: Configuration risk requires verification
- Severity: medium
- Finding: Developers should check this configuration risk before relying on the project: llada2 model/pipeline review
- User impact: Developers may misconfigure credentials, environment, or host setup: llada2 model/pipeline review
- Recommended check: Before packaging this project, run the relevant install/config/quickstart check for: llada2 model/pipeline review. Context: Observed when using python
- Evidence: failure_mode_cluster:github_issue | fmev_b0fdcc0ebf367379b87fcad2dd642011 | https://github.com/huggingface/diffusers/issues/13598
8. Configuration risk: Configuration risk requires verification
- Severity: medium
- Finding: Developers should check this configuration risk before relying on the project: universal method or class to load any model locally
- User impact: Developers may misconfigure credentials, environment, or host setup: universal method or class to load any model locally
- Recommended check: Before packaging this project, run the relevant install/config/quickstart check for: universal method or class to load any model locally. Context: Observed when using python
- Evidence: failure_mode_cluster:github_issue | fmev_8132f9310793351811bea343d379b680 | https://github.com/huggingface/diffusers/issues/13683
9. Capability evidence risk: Capability evidence risk requires verification
- Severity: medium
- Finding: README/documentation is current enough for a first validation pass.
- User impact: May increase setup, validation, or first-run risk for the user.
- Recommended check: Reproduce the official install and quickstart path in an isolated environment.
- Evidence: capability.assumptions | github_repo:498011141 | https://github.com/huggingface/diffusers
10. Maintenance risk: Maintenance risk requires verification
- Severity: medium
- Finding: Developers should check this migration risk before relying on the project: Diffusers 0.36.0: Pipelines galore, new caching method, training scripts, and more π
- User impact: Upgrade or migration may change expected behavior: Diffusers 0.36.0: Pipelines galore, new caching method, training scripts, and more π
- Recommended check: Before packaging this project, run the relevant install/config/quickstart check for: Diffusers 0.36.0: Pipelines galore, new caching method, training scripts, and more π. Context: Observed when using python, cuda
- Evidence: failure_mode_cluster:github_release | fmev_fa85fd2586df0265d3c51e0547f8f9a5 | https://github.com/huggingface/diffusers/releases/tag/v0.36.0
11. Maintenance risk: Maintenance risk requires verification
- Severity: medium
- Finding: Project evidence flags a maintenance risk. Review the linked source before relying on this workflow.
- User impact: May increase setup, validation, or first-run risk for the user.
- Recommended check: Reproduce the official install and quickstart path in an isolated environment.
- Evidence: evidence.maintainer_signals | github_repo:498011141 | https://github.com/huggingface/diffusers
12. Security or permission risk: Security or permission risk requires verification
- Severity: medium
- Finding: no_demo
- User impact: May increase setup, validation, or first-run risk for the user.
- Recommended check: Reproduce the official install and quickstart path in an isolated environment.
- Evidence: downstream_validation.risk_items | github_repo:498011141 | https://github.com/huggingface/diffusers
Source: Doramagic discovery, validation, and Project Pack records
Community Discussion Evidence
These external discussion links are review inputs, not standalone proof that the project is production-ready.
Count of project-level external discussion links exposed on this manual page.
Open the linked issues or discussions before treating the pack as ready for your environment.
Community Discussion Evidence
Doramagic exposes project-level community discussion separately from official documentation. Review these links before using diffusers with real data or production workflows.
- Bad image output for Flux.2-dev, rocm, quantization and separate prompt - github / github_issue
- [[Community Support] Integrating visual generative foundation models in d](https://github.com/huggingface/diffusers/issues/13844) - github / github_issue
- Help us profile important pipelines and improve if needed - github / github_issue
- [[Feature] Add support for Anima](https://github.com/huggingface/diffusers/issues/13067) - github / github_issue
- universal method or class to load any model locally - github / github_issue
- FluxKlein Training Scripts - CFG issue - github / github_issue
- llada2 model/pipeline review - github / github_issue
- Diffusers 0.38.0: New image and audio pipelines, Core library improvemen - github / github_release
- Fixes for AutoModel type hints in Modular Pipelines and Flux Klein LoRA - github / github_release
- Diffusers 0.37.0: Modular Diffusers, New image and video pipelines, mult - github / github_release
- Diffusers 0.36.0: Pipelines galore, new caching method, training scripts - github / github_release
- π fixes for
transformersmodels, imports, - github / github_release
Source: Project Pack community evidence and pitfall evidence