Doramagic Project Pack · Human Manual
peft
PEFT (Parameter-Efficient Fine-Tuning) is a Python library developed by Hugging Face that provides efficient methods for fine-tuning pre-trained models while keeping most model parameters ...
Introduction to PEFT
Related topics: Installation Guide, System Architecture, LoRA and LoRA Variants
Continue reading this section for the full explanation and source context.
Continue reading this section for the full explanation and source context.
Continue reading this section for the full explanation and source context.
Continue reading this section for the full explanation and source context.
Related Pages
Related topics: Installation Guide, System Architecture, LoRA and LoRA Variants
Introduction to PEFT
Overview
PEFT (Parameter-Efficient Fine-Tuning) is a Python library developed by Hugging Face that provides efficient methods for fine-tuning pre-trained models while keeping most model parameters frozen. This approach significantly reduces computational costs and memory requirements compared to full fine-tuning, making it accessible to work with large language models on limited hardware resources.
The library supports multiple fine-tuning techniques including LoRA, Prefix Tuning, Prompt Tuning, AdaLoRA, QLoRA, and many other parameter-efficient methods. PEFT is designed to integrate seamlessly with the Hugging Face Transformers ecosystem, allowing users to apply adapter-based fine-tuning with minimal code changes.
Sources: src/peft/tuners/lora/model.py:1-50
Core Architecture
Design Philosophy
PEFT follows an adapter-based architecture where lightweight trainable modules are added to pre-trained models. These adapters contain a small fraction of the total model parameters, typically ranging from 0.1% to 5% of the original model size, depending on the configuration.
The core principles of PEFT's architecture include:
- Modularity: Each fine-tuning method is implemented as a separate "tuner" with its own configuration class
- Composability: Multiple adapters can be loaded and used simultaneously
- Compatibility: Full integration with Hugging Face Transformers and Diffusers
- Memory Efficiency: Support for quantization and CPU offloading strategies
Sources: src/peft/tuners/tuners_utils.py:1-30
Component Hierarchy
graph TD
A[PeftModel] --> B[BaseTuner]
B --> C[Model Specific Tuners]
C --> D[LoraModel]
C --> E[PrefixTuningModel]
C --> F[PromptTuningModel]
C --> G[AdaLoRAModel]
C --> H[QLoRAModel]
C --> I[XLoraModel]
C --> J[HiraModel]
C --> K[GraloraModel]
C --> L[AdamssModel]Supported Fine-Tuning Methods
PEFT provides implementations for various parameter-efficient fine-tuning techniques. Each method has its own configuration class and model wrapper.
| Method | Configuration Class | Description |
|---|---|---|
| LoRA | LoraConfig | Low-Rank Adaptation using rank-decomposition matrices |
| Prefix Tuning | PrefixTuningConfig | Optimizes continuous prompts prepended to layer inputs |
| Prompt Tuning | PromptTuningConfig | Trains soft prompts embedded in the input layer |
| P-Tuning | P-tuningConfig | Uses trainable prompt embeddings with optional LSTM/MLP |
| AdaLoRA | AdaLoraConfig | Adaptive LoRA with dynamic rank allocation |
| QLoRA | QLoRAConfig | LoRA with quantized base models |
| IA³ | IA³Config | Infused Adapter by Inhibiting and Amplifying Activations |
| Multi Adapter | MultiAdapterConfig | Combines multiple adapters |
| LoHa | LoHaConfig | Low-Rank Hadamard Product adaptation |
| LoKr | LoKrConfig | Low-Kranker factorization adaptation |
| AdaLoKr | AdaLoKrConfig | Adaptive LoKr with dynamic rank allocation |
| OFT | OFTConfig | Orthogonal Fine-Tuning |
| BOFT | BOFTConfig | Block-diagonal OFT |
| Vera | VeraConfig | Vector-based Random Matrix Adaptation |
| XLora | XLoraConfig | Cross-Layer LoRA with hierarchical structure |
| Hira | HiraConfig | Hierarchical Rank Adaptation |
| Gralora | GraloraConfig | Gradient-Routed LoRA |
| Adamss | AdamssConfig | Adaptive subspace efficient fine-tuning |
| SHiRA | ShiraConfig | SharedHierarchical Rank Adaptation |
| LND | LNDConfig | Layer-wise Normalization Distribution |
| Loralite | LoraliteConfig | Lightweight LoRA variant |
Sources: src/peft/tuners/lora/model.py:1-80
Task Types
PEFT supports various NLP task types through specialized model classes. Each task type is designed for specific downstream applications.
graph LR
A[Base Model] --> B[PeftModel]
B --> C{Task Type}
C --> D[CAUSAL_LM]
C --> E[SEQ_2_SEQ_LM]
C --> F[FEATURE_EXTRACTION]
C --> G[QUESTION_ANS]
C --> H[SEQ_CLS]
C --> I[TOKEN_CLS]
C --> J[IMAGE_CLS]Task-Specific Models
| Task Type | Model Class | Use Case |
|---|---|---|
CAUSAL_LM | PeftModelForCausalLM | Autoregressive text generation |
SEQ_2_SEQ_LM | PeftModelForSeq2SeqLM | Encoder-decoder tasks (translation, summarization) |
FEATURE_EXTRACTION | PeftModelForFeatureExtraction | Embedding extraction |
QUESTION_ANS | PeftModelForQuestionAnswering | Question answering tasks |
SEQ_CLS | PeftModelForSequenceClassification | Text classification |
TOKEN_CLS | PeftModelForTokenClassification | Named entity recognition, POS tagging |
Sources: src/peft/peft_model.py:1-100
Core API
PeftModel Class
The PeftModel is the base class for all PEFT models. It wraps a pre-trained model and manages adapter injection, loading, and merging.
#### Key Methods
| Method | Description |
|---|---|
from_pretrained(model, model_id, adapter_name, ...) | Load PEFT model from pretrained weights |
get_peft_config(adapter_name) | Get configuration for a specific adapter |
print_trainable_parameters() | Display trainable vs total parameter counts |
merge_and_unload(progressbar, safe_merge, adapter_names) | Merge adapters into base model |
unload() | Return base model without PEFT modules |
set_adapter(adapter_name) | Activate a specific adapter |
add_weighted_adapter(adapter_names, weights, combination_type) | Combine multiple adapters |
Sources: src/peft/peft_model.py:100-200
Loading Pre-trained Adapters
The from_pretrained class method loads PEFT adapters from the Hugging Face Hub or local storage:
from peft import PeftModel, PeftConfig
# Load configuration
config = PeftConfig.from_pretrained("user/peft-model")
# Load base model
base_model = AutoModelForCausalLM.from_pretrained("base-model")
# Create PEFT model with loaded adapter
peft_model = PeftModel.from_pretrained(
base_model,
"user/peft-model",
adapter_name="default",
is_trainable=False,
autocast_adapter_dtype=True
)
Sources: src/peft/peft_model.py:200-280
Merging and Unloading
PEFT models support merging adapters back into the base model for inference:
# Merge and unload to get a standalone model
merged_model = peft_model.merge_and_unload()
# Safe merge with weight averaging
merged_model = peft_model.merge_and_unload(safe_merge=True)
# Merge specific adapters only
merged_model = peft_model.merge_and_unload(adapter_names=["adapter1", "adapter2"])
# Unload without merging
base_model = peft_model.unload()
Sources: src/peft/tuners/tuners_utils.py:50-100
Adapter Management
Multi-Adapter Support
PEFT supports loading and managing multiple adapters simultaneously. This is useful for ensemble methods or when combining adapters trained on different tasks.
# Load multiple adapters
config = {
"adapter_1": "./path/to/adapter-1",
"adapter_2": "./path/to/adapter-2",
}
xlora_config = XLoraConfig(adapter_dict=config)
model = get_peft_model(base_model, xlora_config)
Sources: src/peft/tuners/xlora/model.py:1-50
Hotswap Adapter
The hotswap functionality allows replacing loaded adapters without reloading the entire model:
from peft.utils.hotswap import hotswap_adapter
# Replace the default adapter with a new one
hotswap_adapter(
model,
"path-to-new-adapter",
adapter_name="default",
torch_device="cuda:0"
)
This operation validates the new adapter configuration and swaps the weights while maintaining the model structure.
Sources: src/peft/utils/hotswap.py:1-80
Configuration Options
Common Parameters
Most PEFT configuration classes share common parameters that control the fine-tuning behavior:
| Parameter | Type | Default | Description |
|---|---|---|---|
r | int | 8 | LoRA rank dimension |
lora_alpha | int | None | LoRA scaling factor |
lora_dropout | float | 0.0 | Dropout probability for LoRA layers |
target_modules | List[str] | None | Names of modules to apply adaptation |
bias | str | "none" | Bias handling: "none", "all", "lora_only" |
modules_to_save | List[str] | None | Additional trainable modules |
fan_in_fan_out | bool | False | Transpose weights for certain architectures |
Method-Specific Parameters
#### LoRA Configuration
from peft import LoraConfig
config = LoraConfig(
r=16,
lora_alpha=32,
target_modules=["q_proj", "v_proj", "k_proj", "out_proj"],
lora_dropout=0.05,
bias="none",
task_type="CAUSAL_LM"
)
#### Prefix Tuning Configuration
from peft import PrefixTuningConfig
config = PrefixTuningConfig(
num_virtual_tokens=20,
token_dim=768,
num_transformer_submodules=1,
num_attention_heads=12,
num_layers=12,
encoder_hidden_size=768,
prefix_projection=False
)
Sources: src/peft/tuners/lora/model.py:50-150
Advanced Features
Dynamic Rank Allocation
Some PEFT methods support adaptive rank allocation, where the importance of different layers is evaluated during training:
# Adaptive LoRA with dynamic ranking
config = AdaLoraConfig(
r=16,
lora_alpha=32,
target_r=8,
tinit=200,
tfinal=1000,
deltaT=10,
lora_dropout=0.1
)
Sources: src/peft/tuners/adamss/model.py:1-60
Hierarchical Adaptation
Methods like Hira and Gralora implement hierarchical rank adaptation for better parameter efficiency:
from peft import HiraConfig
config = HiraConfig(
r=32,
target_modules=["q_proj", "k_proj", "v_proj", "out_proj"],
hira_dropout=0.01,
task_type="SEQ_2_SEQ_LM"
)
Sources: src/peft/tuners/hira/model.py:1-60
Quantization Support
PEFT integrates with BitsAndBytes for 8-bit and 4-bit quantization:
from peft import prepare_model_for_kbit_training, get_peft_model, LoraConfig
import transformers
quantization_config = transformers.BitsAndBytesConfig(load_in_8bit=True)
model = AutoModelForCausalLM.from_pretrained(
"model-name",
quantization_config=quantization_config
)
model = prepare_model_for_kbit_training(model)
config = LoraConfig(r=8, lora_alpha=16, target_modules=["q_proj", "v_proj"])
peft_model = get_peft_model(model, config)
Helper Functions
Signature Updates
The helpers module provides utility functions for updating model signatures:
from peft import update_forward_signature, update_generate_signature, update_signature
# Update forward signature only
update_forward_signature(peft_model)
# Update generate signature only
update_generate_signature(peft_model)
# Update both
update_signature(peft_model, method="all")
Model Validation
from peft.helpers import check_if_peft_model
# Check if a model ID corresponds to a PEFT model
is_peft = check_if_peft_model("user/peft-model")
# Works with both Hub and local paths
is_peft_local = check_if_peft_model("./local/peft-model")
Adapter Scale Rescaling
from peft.helpers import rescale_adapter_scale
with rescale_adapter_scale(model, multiplier=0.5):
output = model(inputs)
Memory Optimization
Low CPU Memory Usage
Loading adapters can be optimized for memory-constrained environments:
# Create adapter weights on meta device for faster loading
peft_model = PeftModel.from_pretrained(
base_model,
adapter_path,
low_cpu_mem_usage=True
)
Training with Quantized Models
PEFT supports full training workflows with quantized base models:
from peft import get_peft_model, LoraConfig, prepare_model_for_kbit_training
model = AutoModelForCausalLM.from_pretrained(
"mistralai/Mistral-7B-Instruct-v0.1",
quantization_config=BitsAndBytesConfig(load_in_4bit=True)
)
model = prepare_model_for_kbit_training(model)
config = LoraConfig(r=8, lora_alpha=16, target_modules=["q_proj", "v_proj"])
peft_model = get_peft_model(model, config)
Integration Patterns
With Diffusers
PEFT works with Stable Diffusion and other diffusion models:
from diffusers import StableDiffusionPipeline
from peft import MissModel, MissConfig
config_unet = MissConfig(
r=8,
target_modules=["proj_in", "proj_out", "to_k", "to_q", "to_v"],
init_weights=True
)
pipeline = StableDiffusionPipeline.from_pretrained("runwayml/stable-diffusion-v1-5")
pipeline.unet = MissModel(pipeline.unet, config_unet, "default")
Sources: src/peft/tuners/miss/model.py:1-60
Cross-Modal Applications
Some PEFT methods like XLora are designed for multi-modal models with complex architecture support:
from peft import XLoraConfig, get_peft_model
config = XLoraConfig(
adapter_dict={
"adapter_1": "./path/to/adapter-1",
"adapter_2": "./path/to/adapter-2"
}
)
model = AutoModelForCausalLM.from_pretrained("model-name", trust_remote_code=True)
xlora_model = get_peft_model(model, config)
Workflow Diagram
graph TD
A[Pre-trained Model] --> B[Choose Fine-tuning Method]
B --> C[Create PEFT Config]
C --> D[Initialize Adapter]
D --> E[Train Adapter]
E --> F{Save or Load?}
F -->|Save| G[save_pretrained]
F -->|Load| H[from_pretrained]
G --> I[Hub or Local]
H --> J[Merge or Inference]
J --> K[merge_and_unload]
J --> L[Direct Inference]
K --> M[Final Model]
L --> MBest Practices
- Start with Default Ranks: Begin with
r=8for LoRA and increase based on performance - Target Specific Modules: Prefer targeting attention projection layers (
q_proj,v_proj) over all linear layers - Use Quantization for Large Models: Apply 4-bit quantization (QLoRA) for models larger than 7B parameters
- Save Checkpoints Regularly: Use PEFT's built-in checkpoint saving to prevent training loss
- Evaluate Before Merging: Always evaluate adapter quality before merging into the base model
Conclusion
PEFT provides a comprehensive framework for parameter-efficient fine-tuning that enables training large models on limited hardware. Its modular architecture supports various adaptation methods while maintaining compatibility with the broader Hugging Face ecosystem. Whether working with language models, vision models, or multi-modal architectures, PEFT offers consistent APIs and significant memory savings compared to full fine-tuning approaches.
Sources: src/peft/tuners/lora/model.py:1-100 Sources: src/peft/tuners/tuners_utils.py:1-50
Sources: [src/peft/tuners/lora/model.py:1-50]()
Installation Guide
Related topics: Introduction to PEFT, Quantization Integration
Continue reading this section for the full explanation and source context.
Continue reading this section for the full explanation and source context.
Continue reading this section for the full explanation and source context.
Continue reading this section for the full explanation and source context.
Related Pages
Related topics: Introduction to PEFT, Quantization Integration
Installation Guide
This guide covers all methods for installing the PEFT (Parameter-Efficient Fine-Tuning) library, including dependencies management, optional feature installations, and verification procedures.
Overview
The PEFT library provides state-of-the-art parameter-efficient fine-tuning methods including LoRA, AdaLoRA, Prefix Tuning, Prompt Tuning, and many other advanced techniques. Proper installation ensures access to all functionality including GPU acceleration, quantization support, and integration with Hugging Face Transformers and Diffusers.
Key Installation Features:
- Core library installation via pip, conda, or from source
- Optional dependencies for specific tuners and features
- GPU/CUDA support for accelerated training
- BitsAndBytes integration for quantization
- Diffusers integration for image generation models
System Requirements
Hardware Requirements
| Component | Minimum | Recommended |
|---|---|---|
| RAM | 8 GB | 16 GB+ |
| GPU VRAM | 4 GB | 8-24 GB (depending on model size) |
| Storage | 5 GB | 10 GB+ |
| CUDA | 11.6 | 11.8+ or CUDA 12.x |
Software Requirements
| Requirement | Version |
|---|---|
| Python | ≥ 3.8 |
| PyTorch | ≥ 1.11.0 |
| Transformers | ≥ 4.20.0 |
| Diffusers | ≥ 0.13.0 |
| Accelerate | ≥ 0.20.0 |
Installation Methods
Standard Installation via pip
The simplest method to install PEFT is using pip:
pip install peft
This installs the core library with all base dependencies.
Installing Specific Versions
To install a specific version of PEFT:
pip install peft==0.13.0
To install the latest development version from GitHub:
pip install git+https://github.com/huggingface/peft.git
Installation from Source
For developers contributing to PEFT or needing the latest features:
git clone https://github.com/huggingface/peft.git
cd peft
pip install -e .
The editable installation (-e .) allows modifications to the source code while keeping the package importable.
Dependencies Structure
Core Dependencies
The core dependencies are defined in pyproject.toml and requirements.txt:
# Core runtime dependencies
torch>=1.11.0
transformers>=4.20.0
accelerate>=0.20.0
torch>=1.11.0
Sources: pyproject.toml
Optional Dependencies by Feature
PEFT provides optional dependencies for specific use cases:
| Feature | Installation Command | Purpose |
|---|---|---|
| Quantization | pip install peft[quantization] | BitsAndBytes 4-bit/8-bit quantization |
| GPU Training | pip install peft[gpu] | CUDA-optimized operations |
| Diffusers | pip install peft[diffusers] | Stable Diffusion model support |
| Dev Tools | pip install peft[dev] | Testing and linting |
| All Extras | pip install peft[all] | Complete installation |
Advanced Installation with Quantization
For models requiring quantized weights (e.g., using 4-bit or 8-bit precision):
pip install peft bitsandbytes scipy accelerate
This combination enables:
- 4-bit quantization via BitsAndBytes
- 8-bit quantization for extreme memory reduction
- Mixed-precision training optimization
- Efficient loading of large models on limited hardware
Sources: src/peft/tuners/lora/model.py
Environment Setup
Using Virtual Environments
Using venv:
python -m venv peft-env
source peft-env/bin/activate # Linux/macOS
peft-env\Scripts\activate # Windows
pip install peft
Using conda:
conda create -n peft-env python=3.10
conda activate peft-env
pip install peft
CUDA Configuration
For GPU acceleration, ensure CUDA is properly configured:
import torch
print(torch.cuda.is_available()) # Should return True
print(torch.cuda.device_count()) # Number of available GPUs
The PEFT library automatically detects and utilizes available CUDA devices during training.
Verification and Testing
Basic Installation Verification
Verify your installation by importing PEFT and checking the version:
import peft
print(peft.__version__) # Should print the installed version
Quick Functionality Test
Test basic LoRA functionality:
from transformers import AutoModelForCausalLM, AutoTokenizer
from peft import get_peft_model, LoraConfig
# Load a small model for testing
model_name = "gpt2"
model = AutoModelForCausalLM.from_pretrained(model_name)
# Configure LoRA
lora_config = LoraConfig(
task_type="CAUSAL_LM",
r=8,
lora_alpha=16,
target_modules=["c_attn", "c_proj"],
lora_dropout=0.05
)
# Apply PEFT
peft_model = get_peft_model(model, lora_config)
peft_model.print_trainable_parameters()
Signature Update Utilities
After installation, you may want to update method signatures for better IDE support:
from peft import update_forward_signature, update_generate_signature
# Update forward signature
update_forward_signature(peft_model)
# Update generate signature (for generative models)
update_generate_signature(peft_model)
Sources: src/peft/helpers.py:1-100
Tuner-Specific Installation Notes
LoRA and QLoRA
Standard LoRA requires no additional dependencies beyond core installation. QLoRA requires:
pip install peft bitsandbytes>=0.40.0 trl>=0.4.0
Sources: src/peft/tuners/lora/model.py
Prefix Tuning and Prompt Tuning
These methods require only core dependencies:
pip install peft
Diffusion Model Support (LoRA for Images)
For Stable Diffusion and similar models:
pip install peft diffusers
Example configuration for Stable Diffusion:
from diffusers import StableDiffusionPipeline
from peft import MissModel, MissConfig
config_unet = MissConfig(
r=8,
target_modules=["proj_in", "proj_out", "to_k", "to_q", "to_v", "to_out.0"],
init_weights=True
)
pipeline = StableDiffusionPipeline.from_pretrained("runwayml/stable-diffusion-v1-5")
pipeline.unet = MissModel(pipeline.unet, config_unet, "default")
Sources: src/peft/tuners/miss/model.py
X-LoRA Installation
X-LoRA requires specific dependencies for multi-adapter support:
pip install peft transformers accelerate bitsandbytes
Sources: src/peft/tuners/xlora/model.py
Troubleshooting
Common Installation Issues
| Issue | Solution |
|---|---|
ImportError: No module named peft | Reinstall: pip uninstall peft && pip install peft |
| CUDA out of memory | Use quantization or smaller batch sizes |
| BitsAndBytes import failure | Install: pip install bitsandbytes |
| Old PyTorch version | Update: pip install torch>=1.11.0 |
Version Compatibility
Check compatibility matrix:
| PEFT Version | Min Python | Min PyTorch | Min Transformers |
|---|---|---|---|
| 0.13.x | 3.8+ | 1.11.0 | 4.20.0 |
| 0.12.x | 3.8+ | 1.11.0 | 4.20.0 |
| 0.11.x | 3.7+ | 1.11.0 | 4.20.0 |
Verifying Adapter Loading
Test adapter functionality after installation:
from peft import check_if_peft_model
is_peft = check_if_peft_model("path/to/model")
print(f"Is PEFT model: {is_peft}")
Sources: src/peft/helpers.py:51-65
Adapter Hotswap Installation
For runtime adapter switching functionality:
pip install peft
The hotswap capability is built into PEFT's core functionality:
from peft.utils.hotswap import hotswap_adapter
# Load and swap adapters at runtime
hotswap_adapter(model, "path-to-new-adapter", adapter_name="default")
Sources: src/peft/utils/hotswap.py
Next Steps
After successful installation:
- Quick Start: Follow the Quickstart Guide for first-time users
- Tuner Selection: Review available tuners to choose the right method
- Configuration: Learn about PeftConfig options
- Examples: Explore example notebooks for your use case
Summary
The PEFT library offers flexible installation options to accommodate various use cases from basic fine-tuning to advanced quantized training. Core installation via pip provides immediate access to all major functionality, while optional dependencies enable specialized features like 4-bit quantization and diffusion model support.
Sources: [pyproject.toml](https://github.com/huggingface/peft/blob/main/pyproject.toml)
System Architecture
Related topics: Core Components, Introduction to PEFT, Configuration System
Continue reading this section for the full explanation and source context.
Continue reading this section for the full explanation and source context.
Continue reading this section for the full explanation and source context.
Continue reading this section for the full explanation and source context.
Related Pages
Related topics: Core Components, Introduction to PEFT, Configuration System
System Architecture
Overview
The PEFT (Parameter-Efficient Fine-Tuning) library implements a modular architecture designed to enable efficient model adaptation without modifying the entire parameter set of pre-trained models. The system architecture is built around three core pillars: the PeftModel base class hierarchy, tuner abstractions, and configuration management.
PEFT supports multiple fine-tuning techniques including LoRA, IA³, Adapters, Prefix Tuning, Prompt Learning, and various specialized methods like SHiRA, GraLoRA, X-LoRA, and others. Each technique is implemented as a separate "tuner" that follows a common interface defined in the base tuner utilities.
High-Level Architecture Diagram
graph TD
User[User Code] --> PeftAPI[PeftModel API]
PeftAPI --> PeftModel[PeftModel Base Class]
PeftModel --> BaseTuner[BaseTuner]
BaseTuner --> TunerRegistry[Tuner Registry]
subgraph Tuners
LoRA[LoRA Tuner]
IA3[IA³ Tuner]
PrefixTuning[Prefix Tuning]
PromptLearning[Prompt Learning]
SHiRA[SHiRA Tuner]
GraLoRA[GraLoRA Tuner]
XLoRA[X-LoRA Tuner]
Hira[Hira Tuner]
DeLoRA[DeLoRA Tuner]
Miss[MiSS Tuner]
Adamss[Adamss Tuner]
end
BaseTuner --> LoRA
BaseTuner --> IA3
BaseTuner --> PrefixTuning
BaseTuner --> PromptLearning
BaseTuner --> SHiRA
BaseTuner --> GraLoRA
BaseTuner --> XLoRA
BaseTuner --> Hira
BaseTuner --> DeLoRA
BaseTuner --> Miss
BaseTuner --> Adamss
PeftModel --> Config[PeftConfig]
Config --> ConfigMapping[PEFT_TYPE_TO_CONFIG_MAPPING]
TunerRegistry --> TargetMapping[TRANSFORMERS_MODELS_TO_*_TARGET_MODULES_MAPPING]Core Components
1. PeftModel Base Class
The PeftModel class serves as the central entry point for all PEFT operations. It wraps a base model and manages adapter lifecycle, injection, and merging.
Location: src/peft/peft_model.py
#### Class Hierarchy
graph TD
PyTorchModule[torch.nn.Module] --> PeftModel
PeftModel --> PeftModelForCausalLM[PeftModelForCausalLM]
PeftModel --> PeftModelForSeq2SeqLM[PeftModelForSeq2SeqLM]
PeftModel --> PeftModelForSequenceClassification[PeftModelForSequenceClassification]
PeftModel --> PeftModelForQuestionAnswering[PeftModelForQuestionAnswering]
PeftModel --> PeftModelForTokenClassification[PeftModelForTokenClassification]
PeftModel --> PeftModelForFeatureExtraction[PeftModelForFeatureExtraction]#### Key Responsibilities
| Responsibility | Description |
|---|---|
| Adapter Management | Loading, activating, and switching between multiple adapters |
| Module Injection | Replacing target modules with tuner layers |
| Forward Pass | Intercepting and modifying forward pass with adapter weights |
| Weight Merging | Combining adapter weights with base model weights |
| Model Saving/Loading | Serialization and deserialization of PEFT configurations |
#### Constructor Signature
def __init__(self, model: torch.nn.Module, peft_config: PeftConfig, adapter_name: str = "default", **kwargs)
Parameters:
| Parameter | Type | Default | Description |
|---|---|---|---|
model | torch.nn.Module | Required | The base model to be adapted |
peft_config | PeftConfig | Required | Configuration for the PEFT method |
adapter_name | str | "default" | Name identifier for the adapter |
**kwargs | Any | - | Additional arguments passed to specific tuners |
Sources: src/peft/peft_model.py:1-100
2. BaseTuner Class
The BaseTuner class defines the abstract interface that all tuner implementations must follow. It handles the core logic for module injection and adapter management.
Location: src/peft/tuners/tuners_utils.py
#### Core Attributes
prefix: str = "" # Prefix for PEFT module names
tuner_layer_cls = None # The tuner layer class
target_module_mapping = {} # Maps model types to target modules
#### Key Methods
| Method | Purpose |
|---|---|
inject_adapter() | Creates adapter layers and replaces target modules |
_create_and_replace() | Creates or updates adapter modules for specific targets |
_replace_module() | Performs the actual module replacement |
_check_target_module_compatiblity() | Validates module compatibility (e.g., for Mamba) |
merge_and_unload() | Merges adapter weights into base model |
_unload_and_optionally_merge() | Core logic for weight merging |
#### Adapter Injection Flow
sequenceDiagram
participant User
participant PeftModel
participant BaseTuner
participant Model as Base Model
User->>PeftModel: inject_adapter(model, adapter_name)
PeftModel->>BaseTuner: inject_adapter(...)
BaseTuner->>BaseTuner: _create_and_replace(...)
BaseTuner->>Model: Walk modules recursively
Model-->>BaseTuner: Find matching targets
BaseTuner->>BaseTuner: Create adapter layer
BaseTuner->>Model: _replace_module(parent, name, new_module)
Note over Model: Target module replaced with adapterSources: src/peft/tuners/tuners_utils.py:1-200
3. Configuration System
The configuration system uses a factory pattern to map PEFT types to their corresponding configuration classes.
Location: src/peft/mapping.py
#### Configuration Mapping Table
| PEFT Type | Config Class | Tuner Layer Class |
|---|---|---|
LORA | LoraConfig | LoraLayer |
IA3 | IA3Config | IA3Layer |
ADALORA | AdaLoraConfig | AdaLoraLayer |
ADAPTER | AdapterConfig | AdapterLayer |
PREFIX_TUNING | PrefixTuningConfig | PrefixTuningLayer |
P_TUNING | PromptEncoderConfig | PromptEncoder |
LORA_CONFIG | LoraConfig | LoraLayer |
LOHA | LoHaConfig | LoHaLayer |
OFT | OFTConfig | OFTLayer |
XLORA | XLoraConfig | XLoraLayer |
HIRA | HiraConfig | HiraLayer |
SHIRA | ShiraConfig | ShiraLayer |
GRALORA | GraloraConfig | GraloraLayer |
DELORA | DeloraConfig | DeloraLayer |
MISS | MissConfig | MissLayer |
ADAMSS | AdamssConfig | AdamssLayer |
#### Auto Configuration Loading
def check_if_peft_model(model_name_or_path: str) -> bool:
"""Check if the model is a PEFT model."""
Sources: src/peft/mapping.py:1-100 Sources: src/peft/auto.py:1-50
Task-Specific Model Classes
PEFT provides specialized model classes optimized for different transformer tasks.
PeftModelForSeq2SeqLM
For sequence-to-sequence tasks (translation, summarization).
class PeftModelForSeq2SeqLM(PeftModel):
def __init__(self, model, peft_config, adapter_name="default", **kwargs):
super().__init__(model, peft_config, adapter_name, **kwargs)
self.base_model_prepare_inputs_for_generation = self.base_model.prepare_inputs_for_generation
self.base_model_prepare_encoder_decoder_kwargs_for_generation = (
self.base_model._prepare_encoder_decoder_kwargs_for_generation
)
Features:
- Customizes
prepare_inputs_for_generationfor decoder input preparation - Handles encoder-decoder kwargs for generation Sources: src/peft/peft_model.py:200-400
PeftModelForSequenceClassification
For text classification tasks.
class PeftModelForSequenceClassification(PeftModel):
def __init__(self, model, peft_config, adapter_name="default", **kwargs):
super().__init__(model, peft_config, adapter_name, **kwargs)
classifier_module_names = ["classifier", "score"]
Target Modules: ["classifier", "score"] Sources: src/peft/peft_model.py:100-200
PeftModelForQuestionAnswering
For QA tasks.
class PeftModelForQuestionAnswering(PeftModel):
def __init__(self, model, peft_config, adapter_name="default", **kwargs):
super().__init__(model, peft_config, adapter_name, **kwargs)
qa_module_names = ["qa_outputs"]
Target Modules: ["qa_outputs"] Sources: src/peft/peft_model.py:250-350
PeftModelForTokenClassification
For named entity recognition and token-level tasks.
class PeftModelForTokenClassification(PeftModel):
def __init__(self, model, peft_config=None, adapter_name="default", **kwargs):
super().__init__(model, peft_config, adapter_name, **kwargs)
classifier_module_names = ["classifier", "score"]
Sources: src/peft/peft_model.py:300-400
Tuner Implementations
Common Tuner Structure
All tuners follow a consistent pattern:
class SomeTuner(PeftModel):
prefix: str = "tuner_"
tuner_layer_cls = SomeLayerClass
target_module_mapping = TRANSFORMERS_MODELS_TO_SOME_TARGET_MODULES_MAPPING
def _create_and_replace(self, config, adapter_name, target, target_name, parent, current_key, **kwargs):
# Implementation
Target Module Mapping
Each tuner defines which modules can be targeted for adaptation based on the model architecture.
TRANSFORMERS_MODELS_TO_LORA_TARGET_MODULES_MAPPING = {
"t5": ["q", "v"],
"llama": ["q_proj", "v_proj"],
"bert": ["query", "value"],
# ... more mappings
}
Example: SHiRA Tuner
class ShiraModel(PeftModel):
prefix: str = "shira_"
tuner_layer_cls = ShiraLayer
target_module_mapping = TRANSFORMERS_MODELS_TO_SHIRA_TARGET_MODULES_MAPPING
Key Features:
- Supports random mask generation with
mask_type == "random"and configurablerandom_seed - Wraps
Linearlayers with SHiRA adapter logic
Sources: src/peft/tuners/shira/model.py:1-80
Example: GraLoRA Tuner
class GraloraModel(PeftModel):
prefix: str = "gralora_"
tuner_layer_cls = GraloraLayer
target_module_mapping = TRANSFORMERS_MODELS_TO_GRALORA_TARGET_MODULES_MAPPING
Sources: src/peft/tuners/gralora/model.py:1-80
Example: X-LoRA Tuner
X-LoRA supports multiple adapter loading with device placement:
def __init__(
self,
model: nn.Module,
config: Union[dict[str, XLoraConfig], XLoraConfig],
adapter_name: str,
torch_device: Optional[str] = None,
ephemeral_gpu_offload: bool = False,
autocast_adapter_dtype: bool = True,
**kwargs,
)
Sources: src/peft/tuners/xlora/model.py:1-100
Model Loading and Serialization
From Pretrained
@classmethod
def from_pretrained(
cls,
model: torch.nn.Module,
model_id: str,
adapter_name: str = "default",
is_trainable: bool = False,
config: Optional[PeftConfig] = None,
autocast_adapter_dtype: bool = True,
**kwargs
) -> PeftModel:
Parameters:
| Parameter | Type | Description |
|---|---|---|
model | torch.nn.Module | The base model to adapt |
model_id | str | Path or HuggingFace Hub identifier |
adapter_name | str | Adapter name (default: "default") |
is_trainable | bool | Whether adapter is trainable |
config | PeftConfig | Pre-loaded configuration |
autocast_adapter_dtype | bool | Auto-cast adapter dtype |
Sources: src/peft/peft_model.py:400-600
Hotswap Adapter
For runtime adapter replacement without full model reload:
def hotswap_adapter(
model,
model_name_or_path,
adapter_name="default",
torch_device=None,
**kwargs
):
Sources: src/peft/utils/hotswap.py:1-100
Helper Utilities
Signature Updates
For model compatibility, PEFT provides utilities to update method signatures:
def update_forward_signature(model: PeftModel) -> None:
"""Updates forward signature to include parent's signature."""
def update_generate_signature(model: PeftModel) -> None:
"""Updates generate signature to include parent's signature."""
def update_signature(model: PeftModel, method: str = "all") -> None:
"""Updates forward and/or generate signature."""
Logic: Updates signatures only when the current signature only has *args and **kwargs:
current_signature = inspect.signature(model.forward)
if (
len(current_signature.parameters) == 2
and "args" in current_signature.parameters
and "kwargs" in current_signature.parameters
):
# Update with parent's signature
Sources: src/peft/helpers.py:1-150
Adapter Scale Rescaling
Context manager for temporary adapter scaling:
@contextmanager
def rescale_adapter_scale(model, multiplier):
"""Context manager to temporarily rescale adapter scaling."""
Data Flow Diagram
graph LR
subgraph Input
InputIDs[input_ids]
Attention[attention_mask]
Embeds[inputs_embeds]
end
subgraph Processing
PEFTConfig[PeftConfig]
BaseModel[Base Model]
Adapters[Adapter Layers]
end
subgraph Output
OutputLogits[Output Logits]
HiddenStates[Hidden States]
AttentionWeights[Attention Weights]
end
InputIDs --> BaseModel
Attention --> BaseModel
Embeds --> BaseModel
PEFTConfig --> Adapters
BaseModel <--> Adapters
Adapters --> OutputLogits
Adapters --> HiddenStates
Adapters --> AttentionWeightsConfiguration Classes
Each tuner type has a corresponding configuration class:
| Tuner | Config Class | Key Parameters |
|---|---|---|
| LoRA | LoraConfig | r, lora_alpha, lora_dropout, target_modules |
| IA³ | IA3Config | target_modules, feedforward_modules |
| Prefix Tuning | PrefixTuningConfig | num_virtual_tokens, num_transformer_submodules |
| Prompt Learning | PromptEncoderConfig | num_virtual_tokens, encoder_hidden_size |
| SHiRA | ShiraConfig | r, mask_type, random_seed |
| GraLoRA | GraloraConfig | r |
| X-LoRA | XLoraConfig | Multiple adapter configs |
| Hira | HiraConfig | r, hira_dropout |
| DeLoRA | DeloraConfig | rank_pattern, lambda_pattern |
| MiSS | MissConfig | r, target_modules, init_weights |
| Adamss | AdamssConfig | r, num_subspaces, target_modules |
Multiple Adapter Support
PEFT supports loading and managing multiple adapters simultaneously:
graph TD
BaseModel[Base Model] --> Adapter1[Adapter 1: default]
BaseModel --> Adapter2[Adapter 2: adapter_v2]
BaseModel --> AdapterN[Adapter N: custom_name]
ActiveAdapter[Active Adapter] --> Selection[Selection]
Selection --> Adapter1
Selection --> Adapter2
Selection --> AdapterNKey Operations:
- Add adapters via
inject_adapter()with unique names - Activate specific adapter via
set_adapter() - Merge single or multiple adapters via
merge_and_unload(adapter_names=[...]) - Hotswap adapters at runtime via
hotswap_adapter()
Class Inheritance Diagram
classDiagram
class PeftModel {
+model
+peft_config
+active_adapters
+inject_adapter()
+merge_and_unload()
+unload()
+get_prompt()
}
class PeftModelForCausalLM {
+forward()
}
class PeftModelForSeq2SeqLM {
+forward()
+prepare_inputs_for_generation()
}
class PeftModelForSequenceClassification {
+forward()
}
class PeftModelForQuestionAnswering {
+forward()
}
class PeftModelForTokenClassification {
+forward()
}
class PeftModelForFeatureExtraction {
+forward()
}
PeftModel <|-- PeftModelForCausalLM
PeftModel <|-- PeftModelForSeq2SeqLM
PeftModel <|-- PeftModelForSequenceClassification
PeftModel <|-- PeftModelForQuestionAnswering
PeftModel <|-- PeftModelForTokenClassification
PeftModel <|-- PeftModelForFeatureExtractionSummary
The PEFT system architecture provides a flexible, extensible framework for parameter-efficient fine-tuning through:
- Centralized Model Management:
PeftModelbase class handles adapter lifecycle - Modular Tuner System: Each technique (LoRA, IA³, etc.) implements the
BaseTunerinterface - Configuration-Driven Design: Factory pattern maps PEFT types to configs
- Task-Specific Optimizations: Specialized model classes for different downstream tasks
- Multi-Adapter Support: Runtime switching and hotswapping of adapters
- Seamless Integration: Auto-loading and signature updates for transformer compatibility
This architecture enables researchers and practitioners to easily extend PEFT with new fine-tuning methods while maintaining backward compatibility and performance optimizations.
Sources: [src/peft/peft_model.py:1-100]()
Core Components
Related topics: System Architecture, Configuration System, Model Loading and Saving
Continue reading this section for the full explanation and source context.
Continue reading this section for the full explanation and source context.
Continue reading this section for the full explanation and source context.
Continue reading this section for the full explanation and source context.
Related Pages
Related topics: System Architecture, Configuration System, Model Loading and Saving
Core Components
Overview
The PEFT (Parameter-Efficient Fine-Tuning) library provides a modular architecture for adapting pre-trained models with minimal computational overhead. The Core Components form the foundational layer that enables all PEFT methods—including LoRA, IA³, Prefix Tuning, and custom tuners—to inject trainable parameters into base models efficiently.
The core architecture consists of:
- PeftModel: The primary wrapper class that encapsulates base models with adapter layers
- PeftConfig: Configuration objects that define adapter-specific parameters
- BaseTunerLayer: Base class for all adapter layer implementations
- inject_adapter: Core mechanism for attaching adapters to target modules
- Mapping System: Registry connecting PEFT types to their implementations
Sources: src/peft/peft_model.py:1-50
Architecture Overview
graph TD
A[Pre-trained Model] --> B[PeftModel]
B --> C{PEFT Type}
C -->|LORA| D[LoRA Layers]
C -->|IA3| E[IA³ Layers]
C -->|PREFIX_TUNING| F[Prefix Layers]
C -->|CUSTOM| G[Custom Tuners]
H[PeftConfig] --> B
I[Adapter Registry] --> B
J[Target Modules] --> K[inject_adapter]
K --> B
L[from_pretrained] --> B
M[get_peft_model] --> BPeftModel Base Class
The PeftModel class serves as the central abstraction for all PEFT-adapted models. It wraps a base model and manages one or more adapters, each containing trainable parameters.
Key Responsibilities
| Responsibility | Description |
|---|---|
| Adapter Management | Load, activate, and switch between multiple adapters |
| Forward Pass | Intercept forward calls to route through active adapters |
| Parameter Tracking | Report trainable vs. total parameter counts |
| Serialization | Save and load adapter weights and configurations |
Task-Specific Model Classes
PEFT provides specialized model classes for different transformer tasks:
| Model Class | Task Type | Use Case |
|---|---|---|
PeftModel | Generic | Base wrapper for any model |
PeftModelForSequenceClassification | SEQ_CLS | Text classification |
PeftModelForTokenClassification | TOKEN_CLS | Named entity recognition |
PeftModelForQuestionAnswering | QUESTION_ANS | Extractive QA |
PeftModelForSeq2SeqLM | SEQ_2_SEQ_LM | Translation, summarization |
PeftModelForCausalLM | CAUSAL_LM | Text generation |
PeftModelForFeatureExtraction | FEATURE_EXTRACTION | Embedding extraction |
Sources: src/peft/peft_model.py:50-150
Key Methods
def from_pretrained(
model: torch.nn.Module,
model_id: str | os.PathLike,
adapter_name: str = "default",
is_trainable: bool = False,
config: PeftConfig = None,
autocast_adapter_dtype: bool = True,
**kwargs
) -> PeftModel
This factory method instantiates a PEFT model from a pretrained configuration and optionally loads adapter weights.
Sources: src/peft/peft_model.py:150-200
PeftConfig System
The PeftConfig class hierarchy defines adapter-specific hyperparameters. Each PEFT method has its own configuration class that inherits from the base PeftConfig.
Configuration Class Hierarchy
graph TD
A[PeftConfig] --> B[LoraConfig]
A --> C[PromptLearningConfig]
C --> D[PrefixTuningConfig]
C --> E[PromptEncoderConfig]
A --> F[IA3Config]
A --> G[LoHaConfig]
A --> H[OFTConfig]
A --> I[TinyLoRAConfig]
A --> J[AdamssConfig]Common Configuration Parameters
| Parameter | Type | Default | Description |
|---|---|---|---|
peft_type | PeftType | Required | The PEFT method being used |
task_type | TaskType | Required | The downstream task type |
inference_mode | bool | False | Whether model is in inference mode |
target_modules | List[str] | None | Module names to apply adapters to |
r | int | 8 | LoRA rank dimension |
lora_alpha | int | 8 | LoRA scaling factor |
lora_dropout | float | 0.0 | Dropout probability for LoRA layers |
Sources: src/peft/config.py, src/peft/mapping.py
Tuner Layer Base Classes
BaseTunerLayer
The BaseTunerLayer class provides the interface that all adapter layer implementations must follow. It defines methods for layer initialization, adapter updating, and merging.
classDiagram
class BaseTunerLayer {
+base_layer: nn.Module
+active_adapters: List[str]
+adapter_list: List[str]
+update_layer(adapter_name, ...)
+merge()
+unmerge()
}Key Methods
| Method | Description |
|---|---|
update_layer(adapter_name, **kwargs) | Initialize or update adapter weights |
merge() | Merge adapter weights into base layer |
unmerge() | Restore original base layer weights |
scale_layer(scale) | Apply scaling factor to adapter output |
Sources: src/peft/tuners/tuners_utils.py:100-150
Method-Specific Tuner Layers
Each PEFT method implements its own tuner layer class:
| Tuner | Layer Class | Key Parameters |
|---|---|---|
| LoRA | LoraLayer | r, lora_alpha, lora_dropout, lora_A, lora_B |
| IA³ | IA3Layer | inn_factor, key_value_dim |
| OFT | OFTLayer | oft_r, oft_diag_blocks |
| SHiRA | ShiraLayer | mask_fn, random_seed |
| Gralora | GraloraLayer | r (SVD rank) |
Sources: src/peft/tuners/ia3/model.py, src/peft/tuners/oft/model.py, src/peft/tuners/shira/model.py, src/peft/tuners/gralora/model.py
Adapter Injection Mechanism
The inject_adapter method is the core mechanism that replaces target modules with adapter layers. This process traverses the model and substitutes compatible modules.
graph TD
A[inject_adapter called] --> B{module.is_target_module?}
B -->|Yes| C{Create New Module?}
C -->|New adapter| D[_create_new_module]
C -->|Existing adapter| E[update_layer]
D --> F[_replace_module]
E --> G[Set requires_grad False]
F --> H[Module replaced]
B -->|No| I[Skip module]
G --> IInjection Flow
def inject_adapter(
model: nn.Module,
adapter_name: str,
autocast_adapter_dtype: bool = True,
low_cpu_mem_usage: bool = False,
state_dict: Optional[dict] = None,
) -> None
The method performs the following steps:
- Identifies target modules based on
peft_config.target_modules - For each target, either creates a new adapter module or updates an existing one
- Replaces the original module in the parent model
- Sets appropriate
requires_gradflags based onis_trainable
Sources: src/peft/tuners/tuners_utils.py:150-250
_create_and_replace Pattern
Each tuner implements _create_and_replace to handle the specific module creation logic:
def _create_and_replace(
self,
config,
adapter_name,
target,
target_name,
parent,
current_key,
**optional_kwargs,
) -> None
Sources: src/peft/tuners/shira/model.py:40-80, src/peft/tuners/gralora/model.py:40-70, src/peft/tuners/miss/model.py:30-70
Mixed Model Support
The PeftMixedModel class extends PeftModel to support heterogeneous adapters—models with different PEFT methods simultaneously.
graph LR
A[Base Model] --> B[PeftMixedModel]
B --> C[LoRA Adapter]
B --> D[IA³ Adapter]
B --> E[Prefix Adapter]Loading Mixed Models
@classmethod
def from_pretrained(
cls,
model: nn.Module,
model_id: str | os.PathLike,
adapter_name: str = "default",
is_trainable: bool = False,
config: PeftConfig = None,
low_cpu_mem_usage: bool = False,
**kwargs,
) -> PeftMixedModel
Sources: src/peft/mixed_model.py:50-100
Helper Functions
The helpers.py module provides utility functions for working with PEFT models.
Signature Update Functions
These functions update the forward and generate signatures of PEFT models to expose parameters from the underlying base model.
| Function | Purpose |
|---|---|
update_forward_signature(model) | Update model.forward signature to include base model parameters |
update_generate_signature(model) | Update model.generate signature to include base model parameters |
update_signature(model, method) | Update both signatures or specify 'forward'/'generate'/'all' |
def update_forward_signature(model: PeftModel) -> None:
"""Update the forward signature to include base model parameters."""
current_signature = inspect.signature(model.forward)
if (
len(current_signature.parameters) == 2
and "args" in current_signature.parameters
and "kwargs" in current_signature.parameters
):
# Copy signature from base model
...
Sources: src/peft/helpers.py:50-100
Model Validation
def check_if_peft_model(model_name_or_path: str) -> bool:
"""
Check if the model is a PEFT model.
Returns:
bool: True if the model is a PEFT model, False otherwise.
"""
This function attempts to load a PeftConfig from the given path and returns True if successful.
Sources: src/peft/helpers.py:100-130
Adapter Rescaling Context Manager
@contextmanager
def rescale_adapter_scale(model, multiplier):
"""Temporarily rescale the scaling of the LoRA adapter."""
This context manager temporarily rescales adapter weights during inference, useful for ablation studies.
Sources: src/peft/helpers.py:130-160
Hotswap Adapter
The hotswap_adapter function enables runtime replacement of loaded adapters without reloading the entire model.
graph TD
A[hotswap_adapter called] --> B[Load new config]
B --> C[Validate PEFT type]
C --> D[Load state dict]
D --> E[Transfer to device]
E --> F[Replace adapter weights]
F --> G[Success]def hotswap_adapter(
model: PeftModel,
model_name_or_path: str,
adapter_name: str = "default",
torch_device: str = None,
**kwargs,
) -> None
Sources: src/peft/utils/hotswap.py:30-80
Unload and Merge Operations
Base tuners provide methods to unload or merge adapter weights.
merge_and_unload
def merge_and_unload(progressbar: bool = False, safe_merge: bool = False, adapter_names = None) -> nn.Module
Merges adapter weights into the base model and returns the resulting model with adapter modules removed.
unload
def unload() -> nn.Module
Returns the base model by removing all PEFT modules without merging weights. This is useful when you need the original model but want to preserve the option to reload adapters later.
_unload_and_optionally_merge
def _unload_and_optionally_merge(
progressbar: bool = False,
safe_merge: bool = False,
adapter_names = None,
merge: bool = True,
) -> nn.Module
Sources: src/peft/tuners/tuners_utils.py:80-120
Target Module Mapping
Each tuner defines a target_module_mapping that specifies which modules should be replaced for different model architectures.
# Example: SHiRA target module mapping
target_module_mapping = TRANSFORMERS_MODELS_TO_SHIRA_TARGET_MODULES_MAPPING
# Example: GraLoRA target module mapping
target_module_mapping = TRANSFORMERS_MODELS_TO_GRALORA_TARGET_MODULES_MAPPING
These mappings allow PEFT methods to automatically identify compatible layers (e.g., q_proj, v_proj, k_proj) across different transformer architectures.
BitsAndBytes Integration
PEFT supports quantized models through BitsAndBytes integration. The tuners detect quantized base layers and wrap them appropriately:
if loaded_in_8bit and isinstance(target_base_layer, bnb.nn.Linear8bitLt):
eightbit_kwargs = kwargs.copy()
eightbit_kwargs.update({
"has_fp16_weights": target_base_layer.state.has_fp16_weights,
"threshold": target_base_layer.state.threshold,
"index": target_base_layer.index,
})
new_module = Linear8bitLt(...)
Sources: src/peft/tuners/ia3/model.py:40-70
Summary
The Core Components of PEFT provide a flexible, extensible architecture for parameter-efficient fine-tuning:
- PeftModel wraps base models and manages adapters with a unified interface
- PeftConfig classes define method-specific hyperparameters
- BaseTunerLayer establishes the contract for all adapter implementations
- inject_adapter replaces target modules with adapter layers
- Helper functions provide utilities for signature updates, validation, and runtime operations
- Hotswap support enables dynamic adapter replacement
This architecture allows developers to implement new PEFT methods by subclassing existing base classes while reusing the core model management infrastructure.
Sources: [src/peft/peft_model.py:1-50]()
LoRA and LoRA Variants
Related topics: Other PEFT Methods, Quantization Integration, Configuration System
Continue reading this section for the full explanation and source context.
Continue reading this section for the full explanation and source context.
Continue reading this section for the full explanation and source context.
Continue reading this section for the full explanation and source context.
Related Pages
Related topics: Other PEFT Methods, Quantization Integration, Configuration System
LoRA and LoRA Variants
Overview
LoRA (Low-Rank Adaptation) is a parameter-efficient fine-tuning technique that reduces trainable parameters by representing weight updates as low-rank decompositions. The PEFT library implements LoRA and numerous variants that extend this foundational approach with different architectural innovations, training strategies, and optimization techniques.
The LoRA system in PEFT serves as both a standalone fine-tuning method and a framework upon which variants like DoRA, AdaLoRA, LoHa, LoKr, and others are built. These variants share a common plugin architecture but differ in how they decompose and apply trainable adapters to base model layers.
Architecture
Core LoRA Architecture
LoRA modifies pre-trained neural network layers by adding trainable low-rank decomposition matrices alongside frozen original weights. For a linear layer with weight matrix $W \in \mathbb{R}^{d \times k}$, LoRA represents the update as:
$$\Delta W = BA$$
where $B \in \mathbb{R}^{d \times r}$ and $A \in \mathbb{R}^{r \times k}$ with rank $r \ll \min(d, k)$.
graph TD
A[Base Model Layer: Weight W] --> B[Original Forward Pass<br/>y = Wx]
C[LoRA Adapter: BA Decomposition] --> D[Modified Forward Pass<br/>y = Wx + BAz]
B --> D
A --> C
E[Input x] --> A
E --> B
F[Adapter Input z<br/>Same as x or modified] --> CLoRA Module Hierarchy
graph TD
A[PeftModel] --> B[BaseModel Class]
A --> C[LoraModel / VariantModel]
C --> D[TunerLayerCls]
C --> E[target_module_mapping]
C --> F[prefix attribute]
D --> G[LoraLayer / Conv2d / Conv1d]
G --> H[Linear wrapper]
H --> I[Forward with BA decomposition]Sources: src/peft/tuners/lora/model.py:1-100
LoRA Implementation
Model Class
The LoraModel class serves as the base implementation for LoRA adapters. It extends the generic tuner base class and implements the core adapter creation logic.
class LoraModel(BaseTuner):
prefix: str = "lora_"
tuner_layer_cls = LoraLayer
target_module_mapping = TRANSFORMERS_MODELS_TO_LORA_TARGET_MODULES_MAPPING
Sources: src/peft/tuners/lora/model.py:90-95
Layer Replacement Mechanism
The _create_and_replace method handles the injection of LoRA adapters into target modules:
def _create_and_replace(
self,
lora_config,
adapter_name,
target,
target_name,
parent,
current_key,
*,
parameter_name: Optional[str] = None,
) -> None:
Sources: src/peft/tuners/lora/model.py:105-120
Forward Pass Computation
The LoRA forward pass combines the frozen base weights with trainable low-rank matrices:
def forward(self, x: torch.Tensor) -> torch.Tensor:
while self.active_adapters not in self.peft_config:
self.active_adapters = self.peft_config
scaling = {
adapter: self.peft_config[adapter].scaling_weight
for adapter in self.active_adapters
}
return self.base_layer(x) + sum(
self._forward_weight(weight, x, scaling=scaling.get(adapter, 1.0))
for adapter, weight in self.lora_A.items()
)
LoRA Configuration
LoraConfig Parameters
| Parameter | Type | Default | Description |
|---|---|---|---|
r | int | 8 | Rank of decomposition |
lora_alpha | int | 8 | Scaling factor (often set to 2×r) |
lora_dropout | float | 0.0 | Dropout probability for LoRA layers |
target_modules | Optional[List[str]] | None | Module names to apply LoRA |
bias | str | "none" | Bias training mode: "none", "all", "lora_only" |
fan_in_fan_out | bool | False | Transpose weights for certain architectures |
init_weights | bool | True | Initialize LoRA weights on creation |
Advanced Configuration Options
| Parameter | Type | Default | Description |
|---|---|---|---|
target_modules_bd_a | Optional[List[str]] | None | Modules for block-diagonal LoRA-A |
target_modules_bd_b | Optional[List[str]] | None | Modules for block-diagonal LoRA-B |
nblocks | int | 1 | Number of blocks in block-diagonal matrices |
match_strict | bool | True | Require strict matching for all target modules |
Sources: src/peft/tuners/lora/config.py:1-200
LoRA Variants
DoRA (Weight-Decomposed LoRA)
DoRA extends standard LoRA by decomposing weights into magnitude and direction components. This variant often achieves better performance with comparable parameter counts.
# DoRA configuration example
lora_config = LoraConfig(
use_dora=True,
r=32,
lora_alpha=32,
target_modules=["q_proj", "v_proj"]
)
Sources: examples/dora_finetuning/README.md
AdaLoRA (Adaptive LoRA)
AdaLoRA dynamically adjusts the rank of LoRA blocks during training, allocating more parameters to important layers. This adaptive approach optimizes the parameter budget.
python examples/alora_finetuning/alora_finetuning.py \
--base_model meta-llama/Llama-3.2-3B-Instruct \
--data_path Lots-of-LoRAs/task1660_super_glue_question_generation \
--invocation_string "<|start_header_id|>assistant<|end_header_id|>"
Sources: examples/alora_finetuning/README.md
LoHa (Low-Rank Hadamard Product)
LoHa replaces the standard AB decomposition with a Hadamard product of low-rank matrices, potentially capturing more expressive updates.
config_te = LoHaConfig(
r=8,
lora_alpha=32,
target_modules=["k_proj", "q_proj", "v_proj", "out_proj", "fc1", "fc2"],
rank_dropout=0.0,
module_dropout=0.0,
)
Sources: src/peft/tuners/loha/__init__.py
LoKr (Low-Kronecker Product)
LoKr applies Kronecker product decomposition to weight matrices, offering different trade-offs between rank and expressiveness.
config_unet = LoKrConfig(
r=8,
lora_alpha=32,
target_modules=["proj_in", "proj_out", "to_k", "to_q", "to_v"],
rank_dropout=0.0,
module_dropout=0.0,
use_effective_conv2d=True,
)
Sources: src/peft/tuners/lokr/__init__.py
Block-Diagonal LoRA
Block-diagonal LoRA constrains the LoRA matrices to be block-diagonal, enabling efficient multi-adapter serving with different sharding degrees.
config = LoraConfig(
r=16,
target_modules_bd_a=["q_proj", "v_proj"], # Block-diagonal A
target_modules_bd_b=["out_proj"], # Block-diagonal B
nblocks=4, # Sharding degree
)
Variant Comparison
| Variant | Key Innovation | Target Use Case | Complexity |
|---|---|---|---|
| LoRA | Low-rank decomposition | General fine-tuning | Low |
| DoRA | Magnitude + direction decomposition | High-quality adaptation | Low |
| AdaLoRA | Adaptive rank allocation | Resource-constrained tuning | Medium |
| LoHa | Hadamard product decomposition | Image generation | Medium |
| LoKr | Kronecker product decomposition | Diffusion models | Medium |
| Block-Diagonal | Constrained structure | Multi-adapter serving | Medium |
Usage Patterns
Basic LoRA Setup
from transformers import AutoModelForCausalLM
from peft import get_peft_model, LoraConfig
model = AutoModelForCausalLM.from_pretrained("meta-llama/Llama-2-7b")
peft_config = LoraConfig(
task_type="CAUSAL_LM",
r=16,
lora_alpha=32,
target_modules=["q_proj", "v_proj", "k_proj", "o_proj"],
lora_dropout=0.05,
bias="none",
)
peft_model = get_peft_model(model, peft_config)
peft_model.print_trainable_parameters()
Multi-Adapter Configuration
from peft import LoraConfig, PeftModel
# Load multiple adapters
peft_model = PeftModel.from_pretrained(
base_model,
adapters={
"adapter_1": "./path/to/adapter_1",
"adapter_2": "./path/to/adapter_2",
},
)
Quantization with LoRA
from peft import get_peft_model, LoraConfig, prepare_model_for_kbit_training
from transformers import BitsAndBytesConfig
quantization_config = BitsAndBytesConfig(load_in_8bit=True)
model = AutoModelForCausalLM.from_pretrained(
"meta-llama/Llama-2-7b",
quantization_config=quantization_config,
)
model = prepare_model_for_kbit_training(model)
peft_model = get_peft_model(model, lora_config)
Integration with PeftModel
All LoRA variants integrate with the base PeftModel architecture through the tuner pattern:
graph LR
A[Base Transformers Model] --> B[PeftModel]
B --> C[BaseModel Class]
C --> D[LoraModel / VariantModel]
D --> E[Adapter Injection]
E --> F[Modified Forward]The PeftModel class provides unified interfaces for:
- Forward pass handling
- Adapter switching
- Save/load operations
- Parameter printing
Sources: src/peft/peft_model.py:1-100
Design Patterns
Tuner Layer Class Structure
Each LoRA variant implements a tuner_layer_cls attribute that defines the layer wrapper class:
class LoraModel(BaseTuner):
tuner_layer_cls = LoraLayer
class LoHaModel(BaseTuner):
prefix: str = "hada_"
tuner_layer_cls = LoHaLayer
layers_mapping: dict[type[torch.nn.Module], type[LoHaLayer]] = {
torch.nn.Conv2d: Conv2d,
torch.nn.Conv1d: Conv1d,
torch.nn.Linear: Linear,
}
Target Module Mapping
Variants define target module mappings for automatic module detection:
class LoraModel(BaseTuner):
target_module_mapping = TRANSFORMERS_MODULES_TO_LORA_TARGET_MODULES_MAPPING
class ShiraModel(BaseTuner):
prefix: str = "shira_"
tuner_layer_cls = ShiraLayer
target_module_mapping = TRANSFORMERS_MODELS_TO_SHIRA_TARGET_MODULES_MAPPING
Sources: src/peft/tuners/shira/model.py:40-45
Conclusion
LoRA and its variants in the PEFT library provide a comprehensive suite of parameter-efficient fine-tuning techniques. The shared plugin architecture enables consistent APIs across variants while allowing each method to implement its unique adaptation strategy. From basic low-rank decomposition to advanced block-diagonal structures, PEFT supports a wide range of fine-tuning scenarios with minimal computational overhead.
Sources: [src/peft/tuners/lora/model.py:1-100]()
Other PEFT Methods
Related topics: LoRA and LoRA Variants, Configuration System
Continue reading this section for the full explanation and source context.
Continue reading this section for the full explanation and source context.
Continue reading this section for the full explanation and source context.
Continue reading this section for the full explanation and source context.
Related Pages
Related topics: LoRA and LoRA Variants, Configuration System
Other PEFT Methods
PEFT (Parameter-Efficient Fine-Tuning) encompasses a diverse collection of techniques beyond LoRA and QLoRA. These methods offer alternative approaches to adapting pre-trained models with minimal parameter updates, each with distinct mechanisms, trade-offs, and optimal use cases. This page provides a comprehensive overview of the "Other PEFT Methods" available in the Hugging Face PEFT library.
Overview of PEFT Method Categories
The PEFT library organizes fine-tuning methods into several categories based on their core adaptation mechanism. Understanding these categories helps practitioners select the appropriate method for their specific requirements.
graph TD
A[PEFT Methods] --> B[Prompt-Based Methods]
A --> C[Additive Methods]
A --> D[Reparameterization Methods]
A --> E[Multiplicative Methods]
A --> F[Subspace Methods]
B --> B1[Prompt Tuning]
B --> B2[Prefix Tuning]
B --> B3[P-Tuning]
B --> B4[MultiTask Prompt Tuning]
C --> C1[IA³]
D --> D1[LoRA Variants<br/>AdaLoRA, Gralora, HiRA]
E --> E1[OFT]
F --> F1[FourierFT]Prompt-Based Methods
Prompt-based methods modify the model's input or activation space without changing the underlying model weights. These methods add trainable parameters as virtual tokens or prefix embeddings that guide the model's behavior.
Prompt Tuning
Prompt Tuning introduces trainable "soft prompts" (embedding vectors) that are prepended to the input tokens. Unlike discrete text prompts, these are continuous vectors learned through backpropagation during fine-tuning.
Key Characteristics:
- Only the prompt embeddings are trainable
- No architectural changes to the base model
- Requires relatively few parameters compared to full fine-tuning
- Works well with larger models
Configuration Parameters:
| Parameter | Type | Default | Description |
|---|---|---|---|
num_virtual_tokens | int | 20 | Number of virtual tokens in the prompt |
prompt_tuning_init | str | "TEXT" | Initialization method for prompts |
prompt_tuning_init_text | str | None | Text for TEXT initialization |
token_dim | int | Model hidden dim | Dimension of model embeddings |
num_transformer_submodules | int | 1 | Number of transformer layers with prompts |
num_attention_heads | int | Model heads | Number of attention heads |
num_layers | int | Model layers | Number of transformer layers |
encoder_hidden_size | int | Same as token_dim | Hidden size for encoder |
Sources: src/peft/tuners/prompt_tuning/__init__.py
Prefix Tuning
Prefix Tuning adds trainable parameters to the attention mechanism by prepending learnable prefix vectors to the keys and values in every attention layer. Unlike Prompt Tuning, this affects all transformer layers directly.
Architecture:
graph LR
A[Input Tokens] --> B[Embedding Layer]
B --> C[Prefix P<sub>k</sub>, P<sub>v</sub>]
B --> D[Standard K, V]
C --> E[Multi-Head Attention]
D --> E
E --> F[Output]Key Differences from Prompt Tuning:
- Affects hidden states at every transformer layer
- More parameter-efficient than full prompt tuning in some scenarios
- Requires specification of prefix projection for deeper integration
Sources: src/peft/tuners/prefix_tuning/__init__.py
P-Tuning
P-Tuning uses trainable continuous embeddings combined with a prompt encoder (typically an LSTM or MLP) to generate prompts. The encoder processes anchor tokens and produces virtual token embeddings.
Unique Features:
- Uses a small LSTM/MLP encoder to generate prompt embeddings
- Supports "anchor" tokens that provide natural language hints
- More flexible than pure continuous prompts
Sources: src/peft/tuners/p_tuning/__init__.py
MultiTask Prompt Tuning (MPT)
MultiTask Prompt Tuning extends standard prompt tuning by learning a shared prompt across multiple related tasks. This enables knowledge transfer and typically improves generalization.
Use Cases:
- Multi-task learning scenarios
- Domain adaptation with related tasks
- Few-shot learning with task similarity
Sources: src/peft/tuners/multitask_prompt_tuning/__init__.py
(IA)³ - Infused Adapter by Inhibiting and Amplifying Inner Activations
(IA)³ is a multiplicative adapter method that scales activations by learned vectors. It introduces trainable vectors that multiply with hidden states at specific positions in the transformer architecture.
Mechanism
graph TD
A[Hidden Activation h] --> B[Learned Vector l<sub>i</sub>]
B --> C[Element-wise Multiplication]
A --> C
C --> D[h<sub>modified</sub> = l<sub>i</sub> ⊙ h]
D --> E[Feed-Forward<br/>or Attention]Configuration Options
| Parameter | Type | Default | Description |
|---|---|---|---|
r | int | 8 | Rank (not used in IA³ but kept for compatibility) |
target_modules | list | None | Modules to apply IA³ to |
fan_in_fan_out | bool | False | Transpose weights |
init_weights | bool | True | Initialize adapter weights |
Supported Target Modules
The IA³ method typically targets attention-related and feed-forward layers:
q_proj,k_proj,v_proj,o_proj(attention projections)fc1,fc2(feed-forward layers)gate_proj,up_proj,down_proj(for modern architectures like Llama)
Sources: src/peft/tuners/ia3/__init__.py Sources: docs/source/conceptual_guides/ia3.md
OFT - Orthogonal Fine-Tuning
OFT constrains the fine-tuning updates to an orthogonal subspace, ensuring that the learned adapters do not interfere with each other. This method is particularly useful for multi-adapter scenarios.
Key Principle
OFT optimizes a rotation matrix R such that the updated weights maintain orthogonality constraints:
W_new = W_original + β · R
Where R is constrained to be orthogonal, preventing gradient interference.
Configuration Parameters
| Parameter | Type | Default | Description |
|---|---|---|---|
r | int | 4 | Rank of the OFT transformation |
target_modules | list | ["q_proj", "v_proj"] | Layers to adapt |
module_dropout | float | 0.0 | Dropout probability for modules |
init_weights | bool | True | Initialize with pretrained weights |
Use Cases
- Stable diffusion model adaptation (text encoder, UNet)
- Multi-task learning with non-interfering adapters
- Computer vision models requiring structured updates
Sources: src/peft/tuners/oft/__init__.py
FourierFT - Fourier Transform-Based Fine-Tuning
FourierFT operates in the frequency domain, learning adapters in Fourier space rather than the original weight space. This approach can capture different aspects of the model's behavior compared to spatial-domain methods.
Advantages
- May capture global patterns more efficiently
- Different inductive bias compared to spatial methods
- Potential for more compact representations
Sources: src/peft/tuners/fourierft/__init__.py
Advanced LoRA Variants
AdaLoRA - Adaptive LoRA
AdaLoRA dynamically adjusts the rank of LoRA adaptations based on the importance of different weight matrices. It uses a budget allocation mechanism to invest more parameters in important layers.
Key Method: update_and_allocate
# Called during training loop
model.base_model.update_and_allocate(global_step)
This method updates importance scores and reallocates the rank budget based on the current training step.
Sources: src/peft/tuners/adalora/model.py
HiRA - Hierarchical Rank Adaptation
HiRA extends LoRA with hierarchical rank adaptation, allowing for more nuanced parameter allocation across different model layers.
Sources: src/peft/tuners/hira/model.py
GraLoRA - Gradient-Aware LoRA
GraLoRA considers gradient information when adapting LoRA layers, optimizing the adapter placement based on gradient flow.
Sources: src/peft/tuners/gralora/model.py
Special-Purpose Methods
SHiRA - Structured Hints for Rank Adaptation
SHiRA provides structured hints for rank adaptation, offering a different approach to parameter-efficient fine-tuning with emphasis on interpretability.
Sources: src/peft/tuners/shira/model.py
MiSS - Mixed Subspace Adaptation
MiSS adapts models in a mixed subspace, combining multiple adaptation strategies for enhanced flexibility.
Sources: src/peft/tuners/miss/model.py
Adamss - Adaptive Subspace Selection
Adamss uses adaptive subspace selection for fine-tuning, choosing the most relevant subspaces based on the task at hand.
| Parameter | Type | Default | Description |
|---|---|---|---|
r | int | 500 | Rank dimension |
num_subspaces | int | 5 | Number of subspaces |
target_modules | list | ["q_proj", "v_proj"] | Target layers |
Sources: src/peft/tuners/adamss/model.py
X-LoRA
X-LoRA supports multiple LoRA adapters with dynamic routing, allowing for sophisticated multi-adapter architectures.
Sources: src/peft/tuners/xlora/model.py
Comparison of Methods
| Method | Category | Trainable Parameters | Best For | Supports Multi-Adapter |
|---|---|---|---|---|
| Prompt Tuning | Prompt-Based | Very Low | Large models, text tasks | Yes |
| Prefix Tuning | Prompt-Based | Low | Text generation | Yes |
| P-Tuning | Prompt-Based | Low-Medium | NLU tasks | Yes |
| MPT | Prompt-Based | Medium | Multi-task learning | Yes |
| (IA)³ | Multiplicative | Low | Efficient scaling | Yes |
| OFT | Multiplicative | Low-Medium | Stable diffusion, CV | Yes |
| FourierFT | Frequency-Domain | Low | Global patterns | Yes |
| AdaLoRA | Reparameterization | Variable | Dynamic budgets | Yes |
| X-LoRA | Reparameterization | Medium-High | Complex routing | Yes |
Unified API Usage
All PEFT methods follow a consistent API pattern through get_peft_model:
from peft import get_peft_model, PromptTuningConfig
config = PromptTuningConfig(
task_type="SEQ_CLS",
num_virtual_tokens=20,
prompt_tuning_init="TEXT",
prompt_tuning_init_text="Classify the sentiment:"
)
model = AutoModelForSequenceClassification.from_pretrained("bert-base-cased")
peft_model = get_peft_model(model, config)
peft_model.print_trainable_parameters()
Sources: docs/source/conceptual_guides/prompting.md
Best Practices
Method Selection Guidelines
- For Large Language Models (>7B parameters): Prompt Tuning, Prefix Tuning, or LoRA variants
- For Image Models: OFT, (IA)³
- For Multi-Task Scenarios: MultiTask Prompt Tuning, X-LoRA
- For Limited Compute: (IA)³, standard Prompt Tuning
- For Maximum Flexibility: AdaLoRA (dynamic rank allocation)
Common Configuration Patterns
# Efficient configuration for most cases
config = LoraConfig(
r=8,
lora_alpha=16,
target_modules=["q_proj", "v_proj"],
lora_dropout=0.05,
task_type="CAUSAL_LM"
)
# For prompt-based methods
config = PromptTuningConfig(
num_virtual_tokens=50,
task_type="SEQ_CLS"
)
Summary
The PEFT library provides a comprehensive suite of fine-tuning methods beyond LoRA and QLoRA. These methods offer diverse trade-offs in terms of parameter efficiency, task performance, and computational requirements. By understanding the mechanisms and use cases of each method, practitioners can select the most appropriate technique for their specific model adaptation needs.
Key takeaways:
- Prompt-based methods modify input representations without changing model weights
- Multiplicative methods (IA)³, OFT scale or rotate weights
- Advanced LoRA variants provide dynamic optimization capabilities
- All methods support multi-adapter scenarios and can be combined through the unified PEFT API
Sources: [src/peft/tuners/prompt_tuning/__init__.py](https://github.com/huggingface/peft/blob/main/src/peft/tuners/prompt_tuning/__init__.py)
Configuration System
Related topics: Core Components, Model Loading and Saving, LoRA and LoRA Variants
Continue reading this section for the full explanation and source context.
Continue reading this section for the full explanation and source context.
Continue reading this section for the full explanation and source context.
Continue reading this section for the full explanation and source context.
Related Pages
Related topics: Core Components, Model Loading and Saving, LoRA and LoRA Variants
Configuration System
Overview
The PEFT (Parameter-Efficient Fine-Tuning) library implements a comprehensive configuration system that enables flexible and modular adapter integration across various transformer architectures. This system decouples adapter-specific parameters from model architecture, allowing users to define fine-tuning strategies through declarative configuration objects.
The configuration system serves as the foundational layer for all PEFT adapters, providing:
- Unified configuration interface across different fine-tuning methods
- Automatic model patching based on target module specifications
- Serialization and deserialization support for model saving/loading
- Multi-adapter management capabilities
graph TD
A[User Configuration] --> B[PeftConfig Subclass]
B --> C{Adapter Type}
C -->|LoRA| D[LoraConfig]
C -->|Prefix| E[PrefixTuningConfig]
C -->|Prompt| F[PromptEncoderConfig]
C -->|IA³| G[Ia3Config]
C -->|Others| H[Tuner-Specific Config]
D --> I[get_peft_model]
E --> I
F --> I
G --> I
H --> I
I --> J[PeftModel Base]
J --> K[BaseTuner.inject_adapter]
K --> L[Model Patching]Core Components
PeftConfig Base Class
The PeftConfig class is the foundational configuration object in PEFT, inheriting from transformers.PretrainedConfig. It provides the base interface for all adapter configurations.
Key Attributes:
| Attribute | Type | Description |
|---|---|---|
peft_type | PeftType | Enum specifying the adapter method |
task_type | TaskType | Enum specifying the ML task type |
inference_mode | bool | Whether model is in inference mode |
auto_mapping | Optional[dict] | Custom auto-mapping for loading |
base_model_name_or_path | str | Path/identifier of base model |
revision | str | Model revision for Hub models |
pad_token_id | Optional[int] | Padding token ID |
Source: src/peft/config.py
PeftType Enumeration
The PeftType enum defines all supported parameter-efficient fine-tuning methods:
| Value | Description |
|---|---|
LORA | Low-Rank Adaptation |
PROMPT_TUNING | Soft prompt tuning |
PREFIX_TUNING | Prefix tuning |
P_TUNING | P-tuning (prompt encoder) |
IA3 | Infused Adapter by Inhibiting and Amplifying Inner Activations |
ADALORA | Adaptive LoRA |
ADAPTION_PROMPT | Adapter tuning with adaptive prompt |
POLY | Poly (Polynomial) |
LNTYPOLY | Linear typographic polynomial |
HRA | Heterogeneous Re-Attention |
GRALORA | Gradient Routing LoRA |
SHIRA | Shifting Rank Adaptation |
XLORA | X-LoRA (Cross-Layer LoRA) |
MISS | Multi-Adapter Sparse Structure |
HIRA | Hierarchical Reattention |
ADAMSS | Adaptive Subspaces Selection |
Source: src/peft/utils/peft_types.py:1-50
TaskType Enumeration
The TaskType enum specifies the machine learning task type:
| Value | Description |
|---|---|
SEQ_CLS | Sequence Classification |
SEQ_2_SEQ_LM | Sequence-to-Sequence Language Modeling |
CAUSAL_LM | Causal Language Modeling |
TOKEN_CLS | Token Classification |
QUESTION_ANS | Question Answering |
FEATURE_EXTRACTION | Feature Extraction / Embeddings |
MULTIPLE_CHOICE | Multiple Choice |
IMAGE_CLASSIFICATION | Image Classification |
Source: src/peft/utils/peft_types.py:50-80
Tuner-Specific Configurations
LoraConfig
The LoraConfig class configures LoRA (Low-Rank Adaptation) adapters:
| Parameter | Type | Default | Description |
|---|---|---|---|
r | int | 8 | LoRA attention dimension (rank) |
target_modules | Optional[Union[List[str], str]] | None | Modules to apply LoRA to |
lora_alpha | int | 8 | LoRA alpha scaling parameter |
lora_dropout | float | 0.0 | Dropout probability for LoRA layers |
fan_in_fan_out | bool | False | Set to transpose weight (for conv layers) |
bias | str | "none" | Bias type: "none", "all", "lora_only" |
modules_to_save | Optional[List[str]] | None | Modules to make trainable |
init_lora_weights | Union[bool, str] | True | Initialization strategy |
Example Configuration:
config = {
"peft_type": "LORA",
"task_type": "CAUSAL_LM",
"r": 16,
"target_modules": ["q_proj", "v_proj"],
"lora_alpha": 32,
"lora_dropout": 0.05,
}
peft_config = get_peft_config(config)
Source: src/peft/tuners/lora/model.py
PrefixTuningConfig
Configuration for prefix-based prompt learning:
| Parameter | Type | Default | Description |
|---|---|---|---|
num_virtual_tokens | int | None | Number of virtual tokens |
token_dim | int | None | Dimensionality of token embeddings |
num_transformer_submodules | int | 1 | Number of transformer modules |
num_attention_heads | int | 12 | Number of attention heads |
num_layers | int | 12 | Number of layers |
encoder_hidden_size | int | None | Encoder hidden size |
prefix_projection | bool | False | Whether to project prefix |
Source: src/peft/peft_model.py
Configuration Loading and Saving
Loading Configurations
The configuration system supports loading from both local paths and Hugging Face Hub:
# From Hub
peft_config = PeftConfig.from_pretrained("user/peft-model")
# From dictionary
peft_config = get_peft_config(config_dict)
# Via mapping
config = PeftConfig.from_pretrained(
model_name_or_path,
**hf_kwargs
)
The from_pretrained method handles:
- Subfolder paths via
subfolderparameter - Model revisions via
revisionparameter - Authentication tokens via
tokenoruse_auth_tokenparameters
Source: src/peft/config.py, src/peft/mixed_model.py
Saving Configurations
Configurations can be serialized using the standard Hugging Face save_pretrained method:
peft_config.save_pretrained("output-directory")
Auto-Mapping
The auto_mapping parameter enables custom configuration-to-model mappings, particularly useful for custom adapters or third-party integrations:
peft_config = PeftConfig.from_pretrained(
"model-id",
auto_mapping={"custom_key": CustomAdapterClass}
)
Adapter Injection Workflow
sequenceDiagram
participant User
participant PeftModel
participant BaseTuner
participant Config
participant TargetModule
User->>PeftModel: __init__(model, peft_config)
PeftModel->>BaseTuner: inject_adapter(model, adapter_name)
BaseTuner->>Config: Validate peft_config
Config->>Config: Check target_module_compatibility
loop For each target module
BaseTuner->>TargetModule: Identify target layer
BaseTuner->>BaseTuner: _create_and_replace(...)
BaseTuner->>TargetModule: Replace with adapter layer
end
PeftModel-->>User: Ready modelThe injection process:
- Validates configuration compatibility with target modules
- Identifies modules matching
target_modulespatterns - Creates adapter layers via
_create_and_replacemethod - Replaces original modules with adapter wrappers
- Marks appropriate parameters as trainable
Source: src/peft/tuners/tuners_utils.py
Multi-Adapter Configuration
PEFT supports multiple adapters through the adapter naming system:
# Load multiple adapters
peft_model = PeftModel.from_pretrained(
base_model,
"adapter-1-path",
adapter_name="adapter_1"
)
peft_model.load_adapter("adapter-2-path", adapter_name="adapter_2")
# Set active adapter
peft_model.set_adapter("adapter_1")
Each adapter maintains its own configuration accessible via:
peft_model.peft_config["adapter_name"]
Source: src/peft/tuners/tuners_utils.py, src/peft/helpers.py
Integration with Model Types
Model-Specific Configurations
Different model architectures require specific configuration handling:
| Model Type | PeftModel Class | Special Config Parameters |
|---|---|---|
| Causal LM | PeftModelForCausalLM | Standard LoRA/Prefix |
| Seq2Seq | PeftModelForSeq2SeqLM | prepare_inputs_for_generation |
| Seq Classification | PeftModelForSequenceClassification | classifier_module_names |
| Token Classification | PeftModelForTokenClassification | classifier_module_names |
| Question Answering | PeftModelForQuestionAnswering | qa_module_names |
| Feature Extraction | PeftModelForFeatureExtraction | Standard config |
Source: src/peft/peft_model.py
Target Module Mapping
Each tuner type defines a target_module_mapping that specifies compatible layers for different model architectures:
# Example structure in tuners
target_module_mapping = TRANSFORMERS_MODELS_TO_LORA_TARGET_MODULES_MAPPING
This mapping ensures adapters are only applied to compatible modules (e.g., preventing LoRA application to incompatible modules in Mamba architectures).
Source: src/peft/tuners/lora/model.py, src/peft/tuners/tuners_utils.py
Advanced Configuration Features
Mixed Model Configuration
For models requiring multiple adapter types:
# Load mixed configuration
mixed_model = PeftMixedModel.from_pretrained(
model,
peft_model_id="mixed-peft-model",
config=mixed_config
)
Hotswap Adapters
The hotswap functionality allows runtime adapter replacement:
from peft import hotswap_adapter
hotswap_adapter(
model,
"path-to-new-adapter",
adapter_name="default",
torch_device="cuda:0"
)
Source: src/peft/utils/hotswap.py
Context Manager for Adapter Scaling
Temporarily rescale adapter scaling:
from peft import rescale_adapter_scale
with rescale_adapter_scale(model, multiplier=0.5):
output = model(inputs)
Source: src/peft/helpers.py
Configuration Validation
Target Module Compatibility
The configuration system validates target modules against model architecture:
def _check_target_module_compatiblity(self, peft_config, model, target_name):
_check_lora_target_modules_mamba(peft_config, model, target_name)
This prevents applying adapters to incompatible modules in specific architectures.
Source: src/peft/tuners/tuners_utils.py
PEFT Type Detection
Automatic PEFT type detection from model paths:
peft_type = PeftConfig._get_peft_type(model_name_or_path, **hf_kwargs)
config_cls = PEFT_TYPE_TO_CONFIG_MAPPING[peft_type]
Best Practices
- Always specify
task_type: Helps PEFT apply correct model wrapper - Use
target_moduleswisely: Restricting to key layers reduces memory - Set
inference_mode=Falsefor training: Required for gradient computation - Save adapter config alongside weights: Ensures reproducibility
- Use
modules_to_savesparingly: Only for task-specific heads
See Also
Source: https://github.com/huggingface/peft / Human Manual
Model Loading and Saving
Related topics: Core Components, Configuration System, Quantization Integration
Continue reading this section for the full explanation and source context.
Continue reading this section for the full explanation and source context.
Continue reading this section for the full explanation and source context.
Continue reading this section for the full explanation and source context.
Related Pages
Related topics: Core Components, Configuration System, Quantization Integration
Model Loading and Saving
Overview
The PEFT (Parameter-Efficient Fine-Tuning) library provides a comprehensive system for loading, saving, and managing adapter-based model configurations. This system enables users to efficiently fine-tune large language models by training only a small subset of parameters while maintaining the ability to save, load, and merge adapters with the base model.
The loading and saving architecture in PEFT is designed to be:
- Interoperable: Adapters can be shared via Hugging Face Hub
- Flexible: Multiple adapters can coexist and be switched dynamically
- Memory-efficient: Supports low CPU memory usage during loading
- Non-destructive: Original base models remain unmodified
Sources: src/peft/tuners/tuners_utils.py:1-50
Architecture
graph TD
A[Base Model] --> B[PeftModel]
B --> C[Adapter 1]
B --> D[Adapter 2]
B --> N[Adapter N]
E[save_pretrained] --> F[adapter_config.json]
E --> G[adapter_model.safetensors]
H[from_pretrained] --> I[Load Base Model]
H --> J[Load Adapter Config]
H --> K[Inject Adapters]
L[merge_and_unload] --> M[Merged Base Model]
L --> N[No Adapters]
O[unload] --> P[Original Base Model]
O --> Q[Adapters Removed]Loading PEFT Models
Loading from Pretrained
The PeftModel.from_pretrained() class method loads a PEFT model configuration and applies it to a base model:
from peft import PeftModel, PeftConfig
# Load PEFT configuration
peft_config = PeftConfig.from_pretrained("path/to/peft_model")
# Load base model
base_model = AutoModelForCausalLM.from_pretrained("base_model_name")
# Create PEFT model with loaded adapters
peft_model = PeftModel.from_pretrained(base_model, "path/to/peft_model")
Using get_peft_model
For creating new PEFT models from scratch:
from peft import get_peft_model, LoraConfig, TaskType
config = LoraConfig(
task_type=TaskType.CAUSAL_LM,
r=16,
lora_alpha=32,
target_modules=["q_proj", "v_proj"],
lora_dropout=0.05,
)
model = AutoModelForCausalLM.from_pretrained("base_model")
peft_model = get_peft_model(model, config)
Sources: src/peft/peft_model.py:1-100
Loading Parameters
| Parameter | Type | Default | Description |
|---|---|---|---|
model | torch.nn.Module | Required | The base model to apply PEFT to |
model_id | str | Required | Path or HF Hub identifier for PEFT checkpoint |
adapter_name | str | "default" | Name for the loaded adapter |
is_trainable | bool | False | Whether adapter should be trainable |
low_cpu_mem_usage | bool | False | Create weights on meta device for faster loading |
torch_dtype | torch.dtype | None | Data type for loaded weights |
device_map | str/dict | None | Device placement strategy |
Sources: src/peft/peft_model.py:100-200
Saving PEFT Models
Saving to Disk
The save_pretrained() method saves the PEFT adapter weights and configuration:
peft_model.save_pretrained("output/path")
This creates:
adapter_config.json- Adapter configurationadapter_model.safetensors- Adapter weights
Save Configuration Options
| Parameter | Type | Description |
|---|---|---|
save_adapters | bool | Whether to save all adapters (default: True) |
adapter_names | List[str] | Specific adapters to save (default: all active) |
safe_serialization | bool | Use safetensors format (default: True) |
Merging and Unloading
Merge and Unload
The merge_and_unload() method merges all adapter weights into the base model and returns the combined model:
from transformers import AutoModelForCausalLM
base_model = AutoModelForCausalLM.from_pretrained("base_model")
peft_model = PeftModel.from_pretrained(base_model, "path/to/peft_model")
# Merge adapters into base model
merged_model = peft_model.merge_and_unload()
This operation:
- Combines adapter weights with base model weights
- Removes PEFT wrapper layers
- Returns a standard HuggingFace model
Sources: src/peft/tuners/tuners_utils.py:1-100
Safe Merge
For secure merging with validation:
merged_model = peft_model.merge_and_unload(safe_merge=True)
Safe merge checks tensor shapes and dtypes before merging to prevent corruption.
Unload
The unload() method removes all PEFT adapters and returns the original base model:
base_model = peft_model.unload()
Unlike merge_and_unload(), this operation:
- Does not modify model weights
- Simply removes PEFT wrapper layers
- Returns the original base model unchanged
graph LR
A[PeftModel] -->|merge_and_unload| B[Merged Base Model]
A -->|unload| C[Original Base Model]
B --> D[Combined Weights]
C --> E[Original Weights Intact]Sources: src/peft/tuners/tuners_utils.py:100-200
Merge Utilities
The merge_utils.py module provides low-level merging functions:
| Function | Description |
|---|---|
merge_linear_weights | Merges LoRA weights into linear layers |
merge_qkv_weights | Merges QKV attention weights |
merge叠加 | Generic merge operation |
Multi-Adapter Management
Adding Multiple Adapters
PEFT supports loading multiple adapters onto a single base model:
peft_model.load_adapter("path/to/adapter1", adapter_name="adapter1")
peft_model.load_adapter("path/to/adapter2", adapter_name="adapter2")
Switching Active Adapters
# Set active adapter
peft_model.set_adapter("adapter1")
# Enable adapter fusion for inference
peft_model.enable_fusion()
Merging Specific Adapters
# Merge only specific adapters
merged_model = peft_model.merge_and_unload(adapter_names=["adapter1"])
Signature Updates
When using PEFT models with adapters, the model signatures may differ from the base model. PEFT provides utility functions to update signatures:
Update Forward Signature
from peft import update_forward_signature
update_forward_signature(peft_model)
This allows help(peft_model.forward) to show the full signature including parameters from parent classes.
Update Generate Signature
from peft import update_generate_signature
update_generate_signature(peft_model)
Enables help(peft_model.generate) to display the complete generation parameters.
Sources: src/peft/helpers.py:1-100
Checking PEFT Models
Use check_if_peft_model() to verify if a model path contains a PEFT configuration:
from peft import check_if_peft_model
is_peft = check_if_peft_model("path/to/model")
This function:
- Attempts to load a
adapter_config.json - Returns
Trueif valid PEFT config found - Returns
Falseotherwise
Sources: src/peft/helpers.py:100-200
Loading with Quantization
PEFT models can be loaded with quantized base models using BitsAndBytes:
from transformers import AutoModelForCausalLM, BitsAndBytesConfig
from peft import prepare_model_for_kbit_training
quantization_config = BitsAndBytesConfig(load_in_8bit=True)
base_model = AutoModelForCausalLM.from_pretrained(
"model_name",
quantization_config=quantization_config,
)
base_model = prepare_model_for_kbit_training(base_model)
peft_model = get_peft_model(base_model, lora_config)
Sources: src/peft/tuners/lora/model.py:1-100
Rescaling Adapter Scale
The rescale_adapter_scale() context manager temporarily adjusts adapter scaling:
from peft import rescale_adapter_scale
with rescale_adapter_scale(model, multiplier=0.5):
output = model(inputs) # Scaled by 0.5
Sources: src/peft/helpers.py:200-300
Workflow Diagram
graph TD
A[Start] --> B{Load Base Model}
B --> C[Load PEFT Config]
C --> D{Existing Adapter?}
D -->|Yes| E[from_pretrained]
D -->|No| F[get_peft_model]
E --> G[PeftModel with Adapters]
F --> H[PeftModel with New Config]
G --> I{Training}
H --> I
I --> J[Train Adapters]
J --> K[save_pretrained]
K --> L[Share via Hub]
I --> M{Inference}
M --> N{Use Merged?}
N -->|Yes| O[merge_and_unload]
N -->|No| P[Use with Adapters]
O --> Q[Merged Model]
P --> R[Forward with Adapters]Best Practices
- Memory Optimization: Use
low_cpu_mem_usage=Truewhen loading large adapters to speed up the process - Safe Serialization: Always use
save_pretrained()withsafe_serialization=True(default) for secure model sharing - Multiple Adapters: Load adapters with distinct names and switch between them using
set_adapter() - Signature Updates: Call
update_forward_signature()andupdate_generate_signature()for better IDE support - Quantization: Prepare quantized models with
prepare_model_for_kbit_training()before applying PEFT
Sources: [src/peft/tuners/tuners_utils.py:1-50]()
Quantization Integration
Related topics: LoRA and LoRA Variants, Model Loading and Saving, Advanced Features
Continue reading this section for the full explanation and source context.
Continue reading this section for the full explanation and source context.
Continue reading this section for the full explanation and source context.
Continue reading this section for the full explanation and source context.
Related Pages
Related topics: LoRA and LoRA Variants, Model Loading and Saving, Advanced Features
Quantization Integration
PEFT (Parameter-Efficient Fine-Tuning) provides comprehensive support for integrating quantized base models with various parameter-efficient fine-tuning methods. This integration enables training large models that would otherwise require prohibitive amounts of memory by combining quantization techniques with PEFT adapters.
Overview
Quantization integration in PEFT allows users to:
- Load base models in quantized form (8-bit, 4-bit, or other formats) to reduce memory footprint
- Apply PEFT adapters (LoRA, IA³, LoHa, LoKr, etc.) on top of quantized layers
- Fine-tune the adapters while keeping the quantized base model frozen
- Maintain model quality while significantly reducing GPU memory requirements
Sources: src/peft/tuners/lora/model.py
Supported Quantization Methods
PEFT supports multiple quantization backends through integration with popular quantization libraries.
| Quantization Method | Backend Library | Precision Options | Status |
|---|---|---|---|
| BitsAndBytes | bitsandbytes | 8-bit, 4-bit | Fully Supported |
| GPTQ | auto-gptq | 4-bit | Fully Supported |
| AWQ | awq | 4-bit | Fully Supported |
| AQLM | aqlm | Mixed-bit | Fully Supported |
| EETQ | eetq | 8-bit | Fully Supported |
| HQQ | hqq | Configurable | Fully Supported |
Architecture
Quantization Integration Flow
graph TD
A[Base Model Loading] --> B{Quantization Backend}
B -->|bitsandbytes| C[BitsAndBytes 8-bit/4-bit]
B -->|GPTQ| D[GPTQ 4-bit]
B -->|AWQ| E[AWQ 4-bit]
B -->|AQLM| F[AQLM]
B -->|EETQ| G[EETQ 8-bit]
B -->|HQQ| H[HQQ]
C --> I[PEFT Adapter Injection]
D --> I
E --> I
F --> I
G --> I
H --> I
I --> J[LoRA / IA³ / LoHa / LoKr Layers]
J --> K[Fine-tuning with Frozen Quantized Base]Module Replacement Strategy
When applying PEFT adapters to quantized models, the system replaces specific linear layers with quantized-aware versions that preserve quantization state.
graph LR
A[Original Linear / Quantized Linear] --> B{Is Quantized?}
B -->|Yes - 8-bit| C[Linear8bitLt + Adapter]
B -->|Yes - 4-bit| D[Linear4bit + Adapter]
B -->|No| E[Linear + Adapter]
C --> F[Forward with Quantization]
D --> F
E --> FBitsAndBytes Integration
The BitsAndBytes integration provides 8-bit and 4-bit quantization support through the bitsandbytes library.
Configuration
from transformers import AutoModelForCausalLM, BitsAndBytesConfig
from peft import get_peft_model, LoraConfig
quantization_config = BitsAndBytesConfig(
load_in_8bit=True # or load_in_4bit=True
)
model = AutoModelForCausalLM.from_pretrained(
"model_name",
quantization_config=quantization_config,
)
peft_config = LoraConfig(
r=16,
lora_alpha=32,
target_modules=["q_proj", "v_proj"],
)
peft_model = get_peft_model(model, peft_config)
8-bit Layer Implementation
When loading an 8-bit model, PEFT replaces standard linear layers with Linear8bitLt that inherits quantization state from the base layer:
# From src/peft/tuners/ia3/model.py
if loaded_in_8bit and isinstance(target_base_layer, bnb.nn.Linear8bitLt):
eightbit_kwargs = kwargs.copy()
eightbit_kwargs.update(
{
"has_fp16_weights": target_base_layer.state.has_fp16_weights,
"threshold": target_base_layer.state.threshold,
"index": target_base_layer.index,
}
)
Sources: src/peft/tuners/ia3/model.py:40-49
4-bit Layer Implementation
Similarly, 4-bit quantized layers are handled with Linear4bit:
if loaded_in_4bit and isinstance(target_base_layer, bnb.nn.Linear4bit):
fourbit_kwargs = kwargs.copy()
fourbit_kwargs.update(
{
"quant_type": target_base_layer.quant_type,
"compute_dtype": target_base_layer.compute_dtype,
"compress_statistics": target_base_layer.weight._quantize_state,
}
)
Sources: src/peft/tuners/ia3/model.py:50-56
Preparing Quantized Models for Training
PEFT provides the prepare_model_for_kbit_training utility function to prepare quantized models for training with PEFT adapters.
Function Signature
def prepare_model_for_kbit_training(
model,
use_gradient_checkpointing: bool = True,
layer_replication: Optional[List[Tuple[int, int]]] = None,
):
Sources: src/peft/helpers.py
Key Operations
- Gradient Checkpointing: Enables gradient checkpointing to save memory during backpropagation
- Layer Replication: Optionally replicates layers for certain architectures
- Cast Forward Parameters: Ensures proper dtype handling for training
Usage Example
from peft import prepare_model_for_kbit_training
# After loading quantized model
model = AutoModelForCausalLM.from_pretrained(
"mistralai/Mistral-7B-Instruct-v0.1",
quantization_config=int8_config,
device_map="cuda:0",
)
# Prepare for k-bit training
model = prepare_model_for_kbit_training(model, use_gradient_checkpointing=True)
Supported Tuners with Quantization
All major PEFT tuners support integration with quantized base models:
| Tuner | 8-bit Support | 4-bit Support | File Location |
|---|---|---|---|
| LoRA | ✅ | ✅ | src/peft/tuners/lora/ |
| IA³ | ✅ | ✅ | src/peft/tuners/ia3/ |
| LoHa | ✅ | ✅ | src/peft/tuners/loha/ |
| LoKr | ✅ | ✅ | src/peft/tuners/lokr/ |
| AdaLoRA | ✅ | ✅ | src/peft/tuners/adalora/ |
| OALoRA | ✅ | ✅ | src/peft/tuners/oaloora/ |
Layer Class Mappings
Each tuner defines specific layer mappings for different layer types:
# From src/peft/tuners/lokr/model.py
layers_mapping: dict[type[torch.nn.Module], type[LoKrLayer]] = {
torch.nn.Conv2d: Conv2d,
torch.nn.Conv1d: Conv1d,
torch.nn.Linear: Linear,
}
# From src/peft/tuners/loha/model.py
layers_mapping: dict[type[torch.nn.Module], type[LoHaLayer]] = {
torch.nn.Conv2d: Conv2d,
torch.nn.Conv1d: Conv1d,
torch.nn.Linear: Linear,
}
Sources: src/peft/tuners/lokr/model.py:87-90 Sources: src/peft/tuners/loha/model.py:79-82
Base Tuner Layer Properties
All quantized-aware tuner layers inherit from BaseTunerLayer which provides key functionality:
Key Methods
| Method | Purpose |
|---|---|
get_base_layer() | Retrieves the underlying base layer (quantized or not) |
update_layer() | Updates adapter weights for existing layers |
merge() | Merges adapter weights into base layer |
unmerge() | Separates merged adapter weights |
if isinstance(target, BaseTunerLayer):
target_base_layer = target.get_base_layer()
else:
target_base_layer = target
Sources: src/peft/tuners/ia3/model.py:34-37
Adapter Management with Quantization
Creating New Modules
When creating new adapter modules for quantized layers:
- Detect the quantization state from the base layer
- Preserve quantization parameters (thresholds, compute dtype, etc.)
- Create appropriate quantized-aware adapter layer
sequenceDiagram
participant Base as Base Model (Quantized)
participant PEFT as PEFT System
participant Adapter as Adapter Layer
Base->>PEFT: Target Linear Layer
PEFT->>PEFT: Detect 8-bit / 4-bit quantization
PEFT->>Adapter: Create with quantization state
Adapter->>Base: Store reference + quantization paramsMultiple Adapters
PEFT supports multiple adapters on quantized models through the active_adapters mechanism:
# Adding additional adapters to quantized model
if adapter_name not in self.active_adapters:
# adding an additional adapter: it is not automatically trainable
new_module.requires_grad_(False)
Sources: src/peft/tuners/loha/model.py:1 Sources: src/peft/tuners/lokr/model.py:1
Memory Efficiency Considerations
Memory Breakdown
| Component | Full Precision | 8-bit | 4-bit |
|---|---|---|---|
| Base Model | ~70GB | ~35GB | ~18GB |
| Gradients | ~70GB | ~70GB | ~70GB |
| Activations | Variable | Variable | Variable |
| Optimizer | ~280GB | ~280GB | ~280GB |
Best Practices
- Use Gradient Checkpointing: Reduces activation memory at cost of extra compute
- Target Specific Modules: Only apply adapters to key layers (q_proj, v_proj)
- Batch Size: Start with small batch sizes and scale based on available memory
- Mixed Precision: Use bfloat16 for gradients when possible
Context Manager for Adapter Scaling
PEFT provides rescale_adapter_scale for temporarily adjusting adapter scaling:
@contextmanager
def rescale_adapter_scale(model, multiplier):
"""
Context manager to temporarily rescale the scaling of the LoRA adapter.
The original scaling values are restored when the context manager exits.
"""
Sources: src/peft/helpers.py:80-90
Error Handling
Common Issues
| Error | Cause | Solution |
|---|---|---|
| TypeError on forward | Quantization state not preserved | Ensure proper layer replacement |
| OOM during forward | Batch size too large | Reduce batch size, use gradient checkpointing |
| Mismatched dtypes | Mixed precision issues | Cast to consistent dtype before training |
Verification Steps
- Verify quantization config is properly set
- Confirm adapter layers are correctly injected
- Check that gradient checkpointing is enabled for large models
Configuration Reference
BitsAndBytesConfig Options
| Parameter | Type | Default | Description |
|---|---|---|---|
load_in_8bit | bool | False | Load model in 8-bit |
load_in_4bit | bool | False | Load model in 4-bit |
llm_int8_threshold | float | 6.0 | Outlier threshold for 8-bit |
llm_int8_skip_modules | List | None | Modules to skip 8-bit conversion |
llm_int8_enable_fp32_cpu_offload | bool | False | Enable CPU offload for32-bit tensors |
See Also
Sources: [src/peft/tuners/lora/model.py]()
Advanced Features
Related topics: Quantization Integration
Continue reading this section for the full explanation and source context.
Continue reading this section for the full explanation and source context.
Continue reading this section for the full explanation and source context.
Continue reading this section for the full explanation and source context.
Related Pages
Related topics: Quantization Integration
Advanced Features
PEFT (Parameter-Efficient Fine-Tuning) provides a comprehensive suite of advanced features that extend beyond basic adapter-based fine-tuning. These features enable sophisticated model adaptation strategies, including mixed adapter configurations, runtime adapter switching, distributed training support, and advanced optimization techniques.
Mixed Adapter Models
Mixed adapter models allow multiple adapter types to coexist within a single base model. This powerful feature enables combining different fine-tuning techniques to leverage their respective strengths.
Overview
The mixed model architecture in PEFT allows a base model to have multiple adapters of different types applied simultaneously. This is particularly useful when different adapters excel at different aspects of a task, or when you want to experiment with combining adapter strengths.
The mixed model functionality is implemented across two primary modules:
| Module | File Path | Purpose |
|---|---|---|
PeftMixedModel | src/peft/mixed_model.py | Base mixed model class |
MixedModel | src/peft/tuners/mixed/model.py | Tuner-specific mixed model implementation |
Architecture
graph TD
A[Base Model] --> B[Mixed Adapter Layer]
B --> C[LoRA Adapter]
B --> D[IA³ Adapter]
B --> E[AdaLoRA Adapter]
B --> N[Additional Adapters]
F[Adapter Config 1] --> C
G[Adapter Config 2] --> D
H[Adapter Config 3] --> E
I[Active Adapter Selection] --> B
J[Multi-Adapter Inference] --> BSupported Adapter Combinations
PEFT supports multiple tuner types that can be combined in mixed configurations:
| Tuner Type | Prefix | Description |
|---|---|---|
| LoRA | lora_ | Low-Rank Adaptation |
| AdaLoRA | adalora_ | Adaptive LoRA with budget allocation |
| IA³ | ia3_ | (IA)³ - Learnable input/output/residual scaling |
| OFT | oft_ | Orthogonal Fine-Tuning |
| HRA | hra_ | Hypernetwork-based Rank Adaptation |
| HiRA | hira_ | Hierarchical Rank Adaptation |
| SHiRA | shira_ | Structured Hiera rchy-aware Rank Adaptation |
| GraLoRA | gralora_ | Gradient-aware LoRA |
| MiSS | miss_ | Multi-adapter Image-to-Image Spatial Shift |
| AdaMSS | adamss_ | Adaptive Multi-subspace Schur Complement |
| X-LoRA | xlora_ | Extended LoRA with quantization support |
| Poly | poly_ | Polynomial projection-based adaptation |
Key Implementation Details
Each tuner in PEFT defines specific attributes that enable mixed adapter support:
# Common tuner model attributes
prefix: str # Unique prefix for the tuner (e.g., "lora_", "ia3_")
tuner_layer_cls = SpecificLayerClass # The layer class for this tuner
target_module_mapping = {...} # Mapping of model types to target modules
The mixed model implementation handles adapter creation through the _create_and_replace method, which validates the current key and delegates to appropriate adapter-specific logic.
Sources: src/peft/tuners/shira/model.py:1-50 Sources: src/peft/tuners/mixed/model.py
Adapter Hotswap
The hotswap feature enables runtime replacement of adapters without requiring full model reload. This is essential for production environments where model availability must be maintained during adapter updates.
Purpose
Adapter hotswapping allows you to:
- Replace a deployed adapter with an updated version
- Switch between different fine-tuned adapters for different tasks
- Update model capabilities without downtime
- A/B test different adapter versions in production
Implementation
The hotswap functionality is implemented in src/peft/utils/hotswap.py and provides the hotswap_adapter function for runtime adapter replacement.
def hotswap_adapter(
model: "PeftModel",
model_name_or_path: str,
adapter_name: str,
torch_device: Optional[str] = None,
**kwargs,
) -> None:
Parameters
| Parameter | Type | Description |
|---|---|---|
model | PeftModel | The PEFT model with the loaded adapter |
model_name_or_path | str | Path or identifier for the new adapter |
adapter_name | str | Name of the adapter to replace (e.g., "default") |
torch_device | str, optional | Target device for adapter weights |
**kwargs | Additional arguments for config/weight loading |
Workflow
graph TD
A[Load New Adapter Config] --> B[Validate Adapter Type]
B --> C[Load Adapter Weights to Device]
C --> D[Validate Weight Compatibility]
D --> E[Replace Adapter Weights in Model]
E --> F[Update Model State]
F --> G[Model Ready for Inference]
H[Inference with New Adapter] -.-> GUsage Example
from peft import hotswap_adapter
# Replace the "default" lora adapter with a new one
hotswap_adapter(model, "path-to-new-adapter", adapter_name="default", torch_device="cuda:0")
# Use the updated model
with torch.inference_mode():
output = model(inputs).logits
Configuration Validation
During hotswap, the system performs several validations:
- Config Loading: Loads the new adapter configuration using
config_cls.from_pretrained() - Type Matching: Ensures the new adapter type is compatible with existing adapters
- Weight Loading: Loads weights onto the specified device with appropriate quantization settings
Sources: src/peft/utils/hotswap.py:1-80 Sources: docs/source/developer_guides/checkpoint.md
Incremental PCA Utilities
PEFT includes incremental PCA utilities for advanced analysis and optimization of adapter matrices. Incremental PCA is particularly useful for:
- Analyzing the rank structure of trained adapters
- Identifying redundant parameters in low-rank adaptations
- Computing principal components in a memory-efficient manner
Implementation
The incremental PCA implementation is located in src/peft/utils/incremental_pca.py. This utility supports processing large matrices in batches to avoid memory constraints.
Key Features
| Feature | Description |
|---|---|
| Batch Processing | Process large matrices incrementally |
| Memory Efficiency | Avoid loading entire matrices into memory |
| Rank Analysis | Determine effective rank of adapter matrices |
| Component Extraction | Extract principal components for analysis |
Use Cases
- Adapter Analysis: Understand the dimensionality requirements of trained adapters
- Compression: Identify opportunities for matrix rank reduction
- Quality Assessment: Verify that low-rank approximations maintain sufficient information
Sources: src/peft/utils/incremental_pca.py
Distributed Training Support
PEFT provides comprehensive support for distributed training frameworks, enabling efficient fine-tuning of large models across multiple devices and nodes.
DeepSpeed Integration
PEFT integrates with DeepSpeed ZeRO optimizations for memory-efficient distributed training.
#### Features
| Feature | Description |
|---|---|
| ZeRO Stage 2/3 | Partition optimizer states across devices |
| CPU Offload | Offload parameters/optimizer states to CPU |
| Activation Checkpointing | Reduce memory for activations |
| Mixed Precision | FP16/BF16 training support |
#### Configuration
from peft import LoraConfig, get_peft_model
from transformers import AutoModelForCausalLM
model = AutoModelForCausalLM.from_pretrained("meta-llama/Llama-2-7b-hf")
peft_config = LoraConfig(
r=16,
lora_alpha=32,
target_modules=["q_proj", "v_proj"],
lora_dropout=0.05,
)
peft_model = get_peft_model(model, peft_config)
# Train with DeepSpeed ZeRO-3 config
#### Key Considerations
- Only non-trainable weights should remain on the original device when using PEFT with DeepSpeed
- Trainable adapter weights are managed by DeepSpeed's optimizer partitioning
- Offloading should be configured at the DeepSpeed level, not within PEFT configs
Sources: docs/source/accelerate/deepspeed.md
FSDP Integration
Fully Sharded Data Parallel (FSDP) support enables sharding model parameters, gradients, and optimizer states across GPUs.
#### Features
| Feature | Description |
|---|---|
| Parameter Sharding | Distribute model parameters across GPUs |
| Gradient Sharding | Partition gradients during backward pass |
| Optimizer Sharding | Distribute optimizer states |
| Mixed Precision | Automatic FP16/BF16 handling |
#### Configuration with Accelerate
# accelerate config.yaml
compute_environment: LOCAL_MACHINE
distributed_type: FSDP
fsdp_config:
fsdp_sharding_strategy: FULL_SHARD
fsdp_auto_wrap_policy: TRANSFORMER_BASED_WRAP
fsdp_backward_prefetch: BACKWARD_PRE
fsdp_state_dict_type: FULL_STATE_DICT
#### Compatibility Notes
- FSDP support requires
transformers>=4.36.0 - Auto-wrap policies should wrap transformer layers containing PEFT adapters
- State dict type should be
FULL_STATE_DICTfor checkpoint saving
Sources: docs/source/accelerate/fsdp.md
Advanced Tuner Configurations
AdaLoRA - Adaptive Budget Allocation
AdaLoRA implements an intelligent budget allocation strategy that dynamically adjusts the rank of different adapter matrices during training.
#### Training Workflow
graph TD
A[Initialize with Uniform Rank] --> B[Forward Pass]
B --> C[Calculate Importance Scores]
C --> D{Global Step < Total - T_final?}
D -->|Yes| E[Update Rank Pattern]
E --> B
D -->|No| F{Mask Unimportant Weights}
F --> G[Finalize Adapter]#### Key Parameters
| Parameter | Description |
|---|---|
r | Initial rank for all adapters |
total_step | Total training steps |
tinit | Steps for initial warmup |
tfinal | Steps for final budget freezing |
deltaT | Interval between rank adjustments |
Sources: src/peft/tuners/adalora/model.py:1-100
X-LoRA - Extended LoRA with Quantization
X-LoRA supports advanced configurations including quantization-aware training and multi-adapter loading.
#### Features
| Feature | Description |
|---|---|
| 8-bit Quantization | Load base models in int8 format |
| 4-bit Quantization | Load base models in int4 format |
| Flash Attention | Integration with flash_attention_2 |
| Ephemeral GPU Offload | Temporary GPU memory management |
| Multiple Adapter Loading | Load multiple adapters simultaneously |
#### Configuration
from peft import XLoraConfig, get_peft_model
from transformers import AutoModelForCausalLM, BitsAndBytesConfig
import torch
quantization_config = BitsAndBytesConfig(load_in_8bit=True)
model = AutoModelForCausalLM.from_pretrained(
"mistralai/Mistral-7B-Instruct-v0.1",
quantization_config=quantization_config,
device_map="cuda:0",
)
xlora_model = get_peft_model(model, config)
Sources: src/peft/tuners/xlora/model.py:1-80
IA³ - Learned Initiation and Adaptation
The (IA)³ method applies learnable scaling vectors to key components of transformer models.
#### Target Modules
| Model Type | Target Modules |
|---|---|
| Encoder-only | q_proj, v_proj, k_proj, output_proj |
| Decoder-only | q_proj, v_proj, k_proj, output_proj, fc1 |
| Seq2Seq | q_proj, v_proj, k_proj, output_proj, fc1, fc2 |
#### Implementation Details
The IA³ implementation creates scaling vectors that are multiplied with the hidden states at specific positions in the forward pass. The scaling vectors are initialized to ones (neutral) and learned during training.
Sources: src/peft/tuners/ia3/model.py:1-80
Helper Functions
PEFT provides utility functions for common operations that enhance the developer experience.
Signature Management
#### update_forward_signature
Updates the forward signature of a PeftModel to include the base model's signature, enabling proper IDE autocompletion and documentation.
from peft import update_forward_signature
update_forward_signature(peft_model)
help(peft_model.forward) # Now shows complete signature
#### update_generate_signature
Similar to forward signature update but for the generate method, essential for seq2seq models.
from peft import update_generate_signature
update_generate_signature(peft_model)
help(peft_model.generate) # Now shows complete signature
Model Validation
#### check_if_peft_model
Validates whether a model path or identifier corresponds to a PEFT model by attempting to load its configuration.
from peft import check_if_peft_model
is_peft = check_if_peft_model("meta-llama/Llama-2-7b-adapter")
# Returns: True or False
Adapter Scale Context Manager
The rescale_adapter_scale context manager temporarily adjusts adapter scaling factors, useful for controlled inference experiments.
from peft.utils import rescale_adapter_scale
with rescale_adapter_scale(model, multiplier=0.5):
output = model(inputs) # Scaled by 0.5
# Original scaling restored after context exit
Sources: src/peft/helpers.py:1-150
Task-Specific Models
PEFT provides specialized model classes optimized for different task types.
| Task Type | Model Class | Use Case |
|---|---|---|
| Feature Extraction | PeftModelForFeatureExtraction | Extracting embeddings |
| Question Answering | PeftModelForQuestionAnswering | QA tasks |
| Sequence Classification | PeftModelForSequenceClassification | Text classification |
| Token Classification | PeftModelForTokenClassification | NER, POS tagging |
| Seq2Seq LM | PeftModelForSeq2SeqLM | Translation, summarization |
Common Initialization Pattern
All task-specific models follow a consistent initialization pattern:
def __init__(
self,
model: torch.nn.Module,
peft_config: PeftConfig,
adapter_name: str = "default",
**kwargs,
) -> None:
super().__init__(model, peft_config, adapter_name, **kwargs)
Each model class may add task-specific module name patterns for modules to save (e.g., classifier layers in sequence classification models).
Sources: src/peft/peft_model.py:1-200
Summary
PEFT's advanced features provide a comprehensive toolkit for parameter-efficient model adaptation:
| Category | Features |
|---|---|
| Mixed Adapters | Multiple adapter types per model |
| Runtime Switching | Adapter hotswap without reload |
| Analysis Tools | Incremental PCA for matrix analysis |
| Distributed Training | DeepSpeed ZeRO, FSDP support |
| Advanced Tuners | AdaLoRA, X-LoRA, IA³, OFT, and more |
| Developer Utilities | Signature management, validation helpers |
These features enable both research experimentation and production deployment of efficient fine-tuning solutions across a wide range of model architectures and training configurations.
Sources: [src/peft/tuners/shira/model.py:1-50](https://github.com/huggingface/peft/blob/main/src/peft/tuners/shira/model.py)
Doramagic Pitfall Log
Source-linked risks stay visible on the manual page so the preview does not read like a recommendation.
Users may get misleading failures or incomplete behavior unless configuration is checked carefully.
The project may affect permissions, credentials, data exposure, or host boundaries.
First-time setup may fail or require extra isolation and rollback planning.
Users may get misleading failures or incomplete behavior unless configuration is checked carefully.
Doramagic Pitfall Log
Doramagic extracted 16 source-linked risk signals. Review them before installing or handing real data to the project.
1. Configuration risk: [BUG] peft 0.19 target_modules (str) use `set`
- Severity: high
- Finding: Configuration risk is backed by a source signal: [BUG] peft 0.19 target_modules (str) use
set. Treat it as a review item until the current version is checked. - User impact: Users may get misleading failures or incomplete behavior unless configuration is checked carefully.
- Recommended check: Open the linked source, confirm whether it still applies to the current version, and keep the first run isolated.
- Evidence: Source-linked evidence: https://github.com/huggingface/peft/issues/3229
2. Security or permission risk: Comparison of Different Fine-Tuning Techniques for Conversational AI
- Severity: high
- Finding: Security or permission risk is backed by a source signal: Comparison of Different Fine-Tuning Techniques for Conversational AI. Treat it as a review item until the current version is checked.
- User impact: The project may affect permissions, credentials, data exposure, or host boundaries.
- Recommended check: Open the linked source, confirm whether it still applies to the current version, and keep the first run isolated.
- Evidence: Source-linked evidence: https://github.com/huggingface/peft/issues/2310
3. Installation risk: Feature Request: Improve offline support for custom architectures in get_peft_model_state_dict
- Severity: medium
- Finding: Installation risk is backed by a source signal: Feature Request: Improve offline support for custom architectures in get_peft_model_state_dict. Treat it as a review item until the current version is checked.
- User impact: First-time setup may fail or require extra isolation and rollback planning.
- Recommended check: Open the linked source, confirm whether it still applies to the current version, and keep the first run isolated.
- Evidence: Source-linked evidence: https://github.com/huggingface/peft/issues/3211
4. Configuration risk: 0.17.0: SHiRA, MiSS, LoRA for MoE, and more
- Severity: medium
- Finding: Configuration risk is backed by a source signal: 0.17.0: SHiRA, MiSS, LoRA for MoE, and more. Treat it as a review item until the current version is checked.
- User impact: Users may get misleading failures or incomplete behavior unless configuration is checked carefully.
- Recommended check: Open the linked source, confirm whether it still applies to the current version, and keep the first run isolated.
- Evidence: Source-linked evidence: https://github.com/huggingface/peft/releases/tag/v0.17.0
5. Configuration risk: Applying Dora to o_proj of Meta-Llama-3.1-8B results in NaN
- Severity: medium
- Finding: Configuration risk is backed by a source signal: Applying Dora to o_proj of Meta-Llama-3.1-8B results in NaN. Treat it as a review item until the current version is checked.
- User impact: Users may get misleading failures or incomplete behavior unless configuration is checked carefully.
- Recommended check: Open the linked source, confirm whether it still applies to the current version, and keep the first run isolated.
- Evidence: Source-linked evidence: https://github.com/huggingface/peft/issues/2049
6. Capability assumption: README/documentation is current enough for a first validation pass.
- Severity: medium
- Finding: README/documentation is current enough for a first validation pass.
- User impact: The project should not be treated as fully validated until this signal is reviewed.
- Recommended check: Open the linked source, confirm whether it still applies to the current version, and keep the first run isolated.
- Evidence: capability.assumptions | github_repo:570384908 | https://github.com/huggingface/peft | README/documentation is current enough for a first validation pass.
7. Project risk: 0.17.1
- Severity: medium
- Finding: Project risk is backed by a source signal: 0.17.1. Treat it as a review item until the current version is checked.
- User impact: The project should not be treated as fully validated until this signal is reviewed.
- Recommended check: Open the linked source, confirm whether it still applies to the current version, and keep the first run isolated.
- Evidence: Source-linked evidence: https://github.com/huggingface/peft/releases/tag/v0.17.1
8. Project risk: v0.15.1
- Severity: medium
- Finding: Project risk is backed by a source signal: v0.15.1. Treat it as a review item until the current version is checked.
- User impact: The project should not be treated as fully validated until this signal is reviewed.
- Recommended check: Open the linked source, confirm whether it still applies to the current version, and keep the first run isolated.
- Evidence: Source-linked evidence: https://github.com/huggingface/peft/releases/tag/v0.15.1
9. Project risk: v0.15.2
- Severity: medium
- Finding: Project risk is backed by a source signal: v0.15.2. Treat it as a review item until the current version is checked.
- User impact: The project should not be treated as fully validated until this signal is reviewed.
- Recommended check: Open the linked source, confirm whether it still applies to the current version, and keep the first run isolated.
- Evidence: Source-linked evidence: https://github.com/huggingface/peft/releases/tag/v0.15.2
10. Maintenance risk: 0.16.0: LoRA-FA, RandLoRA, C³A, and much more
- Severity: medium
- Finding: Maintenance risk is backed by a source signal: 0.16.0: LoRA-FA, RandLoRA, C³A, and much more. Treat it as a review item until the current version is checked.
- User impact: Users cannot judge support quality until recent activity, releases, and issue response are checked.
- Recommended check: Open the linked source, confirm whether it still applies to the current version, and keep the first run isolated.
- Evidence: Source-linked evidence: https://github.com/huggingface/peft/releases/tag/v0.16.0
11. Maintenance risk: Maintainer activity is unknown
- Severity: medium
- Finding: Maintenance risk is backed by a source signal: Maintainer activity is unknown. Treat it as a review item until the current version is checked.
- User impact: Users cannot judge support quality until recent activity, releases, and issue response are checked.
- Recommended check: Open the linked source, confirm whether it still applies to the current version, and keep the first run isolated.
- Evidence: evidence.maintainer_signals | github_repo:570384908 | https://github.com/huggingface/peft | last_activity_observed missing
12. Security or permission risk: no_demo
- Severity: medium
- Finding: no_demo
- User impact: The project may affect permissions, credentials, data exposure, or host boundaries.
- Recommended check: Open the linked source, confirm whether it still applies to the current version, and keep the first run isolated.
- Evidence: downstream_validation.risk_items | github_repo:570384908 | https://github.com/huggingface/peft | no_demo; severity=medium
Source: Doramagic discovery, validation, and Project Pack records
Community Discussion Evidence
These external discussion links are review inputs, not standalone proof that the project is production-ready.
Count of project-level external discussion links exposed on this manual page.
Open the linked issues or discussions before treating the pack as ready for your environment.
Community Discussion Evidence
Doramagic exposes project-level community discussion separately from official documentation. Review these links before using peft with real data or production workflows.
- Feature Request: Improve offline support for custom architectures in get - github / github_issue
- Applying Dora to o_proj of Meta-Llama-3.1-8B results in NaN - github / github_issue
- Comparison of Different Fine-Tuning Techniques for Conversational AI - github / github_issue
- [[BUG] peft 0.19 target_modules (str) use
set](https://github.com/huggingface/peft/issues/3229) - github / github_issue - v0.19.1 - github / github_release
- v0.19.0 - github / github_release
- 0.18.1 - github / github_release
- 0.18.0: RoAd, ALoRA, Arrow, WaveFT, DeLoRA, OSF, and more - github / github_release
- 0.17.1 - github / github_release
- 0.17.0: SHiRA, MiSS, LoRA for MoE, and more - github / github_release
- 0.16.0: LoRA-FA, RandLoRA, C³A, and much more - github / github_release
- v0.15.2 - github / github_release
Source: Project Pack community evidence and pitfall evidence