peft Manual Preview - Doramagic.ai

Doramagic Project Pack · Human Manual

peft

PEFT (Parameter-Efficient Fine-Tuning) is a Python library developed by Hugging Face that provides efficient methods for fine-tuning pre-trained models while keeping most model parameters ...

Introduction to PEFT

Related topics: Installation Guide, System Architecture, LoRA and LoRA Variants

Section Related Pages

Continue reading this section for the full explanation and source context.

Section Design Philosophy

Continue reading this section for the full explanation and source context.

Section Component Hierarchy

Continue reading this section for the full explanation and source context.

Section Task-Specific Models

Continue reading this section for the full explanation and source context.

Introduction to PEFT

Overview

PEFT (Parameter-Efficient Fine-Tuning) is a Python library developed by Hugging Face that provides efficient methods for fine-tuning pre-trained models while keeping most model parameters frozen. This approach significantly reduces computational costs and memory requirements compared to full fine-tuning, making it accessible to work with large language models on limited hardware resources.

The library supports multiple fine-tuning techniques including LoRA, Prefix Tuning, Prompt Tuning, AdaLoRA, QLoRA, and many other parameter-efficient methods. PEFT is designed to integrate seamlessly with the Hugging Face Transformers ecosystem, allowing users to apply adapter-based fine-tuning with minimal code changes.

Sources: src/peft/tuners/lora/model.py:1-50

Core Architecture

Design Philosophy

PEFT follows an adapter-based architecture where lightweight trainable modules are added to pre-trained models. These adapters contain a small fraction of the total model parameters, typically ranging from 0.1% to 5% of the original model size, depending on the configuration.

The core principles of PEFT's architecture include:

Modularity: Each fine-tuning method is implemented as a separate "tuner" with its own configuration class
Composability: Multiple adapters can be loaded and used simultaneously
Compatibility: Full integration with Hugging Face Transformers and Diffusers
Memory Efficiency: Support for quantization and CPU offloading strategies

Sources: src/peft/tuners/tuners_utils.py:1-30

Component Hierarchy

graph TD
    A[PeftModel] --> B[BaseTuner]
    B --> C[Model Specific Tuners]
    C --> D[LoraModel]
    C --> E[PrefixTuningModel]
    C --> F[PromptTuningModel]
    C --> G[AdaLoRAModel]
    C --> H[QLoRAModel]
    C --> I[XLoraModel]
    C --> J[HiraModel]
    C --> K[GraloraModel]
    C --> L[AdamssModel]

Supported Fine-Tuning Methods

PEFT provides implementations for various parameter-efficient fine-tuning techniques. Each method has its own configuration class and model wrapper.

Method	Configuration Class	Description
LoRA	`LoraConfig`	Low-Rank Adaptation using rank-decomposition matrices
Prefix Tuning	`PrefixTuningConfig`	Optimizes continuous prompts prepended to layer inputs
Prompt Tuning	`PromptTuningConfig`	Trains soft prompts embedded in the input layer
P-Tuning	`P-tuningConfig`	Uses trainable prompt embeddings with optional LSTM/MLP
AdaLoRA	`AdaLoraConfig`	Adaptive LoRA with dynamic rank allocation
QLoRA	`QLoRAConfig`	LoRA with quantized base models
IA³	`IA³Config`	Infused Adapter by Inhibiting and Amplifying Activations
Multi Adapter	`MultiAdapterConfig`	Combines multiple adapters
LoHa	`LoHaConfig`	Low-Rank Hadamard Product adaptation
LoKr	`LoKrConfig`	Low-Kranker factorization adaptation
AdaLoKr	`AdaLoKrConfig`	Adaptive LoKr with dynamic rank allocation
OFT	`OFTConfig`	Orthogonal Fine-Tuning
BOFT	`BOFTConfig`	Block-diagonal OFT
Vera	`VeraConfig`	Vector-based Random Matrix Adaptation
XLora	`XLoraConfig`	Cross-Layer LoRA with hierarchical structure
Hira	`HiraConfig`	Hierarchical Rank Adaptation
Gralora	`GraloraConfig`	Gradient-Routed LoRA
Adamss	`AdamssConfig`	Adaptive subspace efficient fine-tuning
SHiRA	`ShiraConfig`	SharedHierarchical Rank Adaptation
LND	`LNDConfig`	Layer-wise Normalization Distribution
Loralite	`LoraliteConfig`	Lightweight LoRA variant

Sources: src/peft/tuners/lora/model.py:1-80

Task Types

PEFT supports various NLP task types through specialized model classes. Each task type is designed for specific downstream applications.

graph LR
    A[Base Model] --> B[PeftModel]
    B --> C{Task Type}
    C --> D[CAUSAL_LM]
    C --> E[SEQ_2_SEQ_LM]
    C --> F[FEATURE_EXTRACTION]
    C --> G[QUESTION_ANS]
    C --> H[SEQ_CLS]
    C --> I[TOKEN_CLS]
    C --> J[IMAGE_CLS]

Task-Specific Models

Task Type	Model Class	Use Case
`CAUSAL_LM`	`PeftModelForCausalLM`	Autoregressive text generation
`SEQ_2_SEQ_LM`	`PeftModelForSeq2SeqLM`	Encoder-decoder tasks (translation, summarization)
`FEATURE_EXTRACTION`	`PeftModelForFeatureExtraction`	Embedding extraction
`QUESTION_ANS`	`PeftModelForQuestionAnswering`	Question answering tasks
`SEQ_CLS`	`PeftModelForSequenceClassification`	Text classification
`TOKEN_CLS`	`PeftModelForTokenClassification`	Named entity recognition, POS tagging

Sources: src/peft/peft_model.py:1-100

Core API

PeftModel Class

The PeftModel is the base class for all PEFT models. It wraps a pre-trained model and manages adapter injection, loading, and merging.

#### Key Methods

Method	Description
`from_pretrained(model, model_id, adapter_name, ...)`	Load PEFT model from pretrained weights
`get_peft_config(adapter_name)`	Get configuration for a specific adapter
`print_trainable_parameters()`	Display trainable vs total parameter counts
`merge_and_unload(progressbar, safe_merge, adapter_names)`	Merge adapters into base model
`unload()`	Return base model without PEFT modules
`set_adapter(adapter_name)`	Activate a specific adapter
`add_weighted_adapter(adapter_names, weights, combination_type)`	Combine multiple adapters

Sources: src/peft/peft_model.py:100-200

Loading Pre-trained Adapters

The from_pretrained class method loads PEFT adapters from the Hugging Face Hub or local storage:

from peft import PeftModel, PeftConfig

# Load configuration
config = PeftConfig.from_pretrained("user/peft-model")

# Load base model
base_model = AutoModelForCausalLM.from_pretrained("base-model")

# Create PEFT model with loaded adapter
peft_model = PeftModel.from_pretrained(
    base_model, 
    "user/peft-model",
    adapter_name="default",
    is_trainable=False,
    autocast_adapter_dtype=True
)

Sources: src/peft/peft_model.py:200-280

Merging and Unloading

PEFT models support merging adapters back into the base model for inference:

# Merge and unload to get a standalone model
merged_model = peft_model.merge_and_unload()

# Safe merge with weight averaging
merged_model = peft_model.merge_and_unload(safe_merge=True)

# Merge specific adapters only
merged_model = peft_model.merge_and_unload(adapter_names=["adapter1", "adapter2"])

# Unload without merging
base_model = peft_model.unload()

Sources: src/peft/tuners/tuners_utils.py:50-100

Adapter Management

Multi-Adapter Support

PEFT supports loading and managing multiple adapters simultaneously. This is useful for ensemble methods or when combining adapters trained on different tasks.

# Load multiple adapters
config = {
    "adapter_1": "./path/to/adapter-1",
    "adapter_2": "./path/to/adapter-2",
}

xlora_config = XLoraConfig(adapter_dict=config)
model = get_peft_model(base_model, xlora_config)

Sources: src/peft/tuners/xlora/model.py:1-50

Hotswap Adapter

The hotswap functionality allows replacing loaded adapters without reloading the entire model:

from peft.utils.hotswap import hotswap_adapter

# Replace the default adapter with a new one
hotswap_adapter(
    model, 
    "path-to-new-adapter", 
    adapter_name="default",
    torch_device="cuda:0"
)

This operation validates the new adapter configuration and swaps the weights while maintaining the model structure.

Sources: src/peft/utils/hotswap.py:1-80

Configuration Options

Common Parameters

Most PEFT configuration classes share common parameters that control the fine-tuning behavior:

Parameter	Type	Default	Description
`r`	int	8	LoRA rank dimension
`lora_alpha`	int	None	LoRA scaling factor
`lora_dropout`	float	0.0	Dropout probability for LoRA layers
`target_modules`	List[str]	None	Names of modules to apply adaptation
`bias`	str	"none"	Bias handling: "none", "all", "lora_only"
`modules_to_save`	List[str]	None	Additional trainable modules
`fan_in_fan_out`	bool	False	Transpose weights for certain architectures

Method-Specific Parameters

#### LoRA Configuration

from peft import LoraConfig

config = LoraConfig(
    r=16,
    lora_alpha=32,
    target_modules=["q_proj", "v_proj", "k_proj", "out_proj"],
    lora_dropout=0.05,
    bias="none",
    task_type="CAUSAL_LM"
)

#### Prefix Tuning Configuration

from peft import PrefixTuningConfig

config = PrefixTuningConfig(
    num_virtual_tokens=20,
    token_dim=768,
    num_transformer_submodules=1,
    num_attention_heads=12,
    num_layers=12,
    encoder_hidden_size=768,
    prefix_projection=False
)

Sources: src/peft/tuners/lora/model.py:50-150

Advanced Features

Dynamic Rank Allocation

Some PEFT methods support adaptive rank allocation, where the importance of different layers is evaluated during training:

# Adaptive LoRA with dynamic ranking
config = AdaLoraConfig(
    r=16,
    lora_alpha=32,
    target_r=8,
    tinit=200,
    tfinal=1000,
    deltaT=10,
    lora_dropout=0.1
)

Sources: src/peft/tuners/adamss/model.py:1-60

Hierarchical Adaptation

Methods like Hira and Gralora implement hierarchical rank adaptation for better parameter efficiency:

from peft import HiraConfig

config = HiraConfig(
    r=32,
    target_modules=["q_proj", "k_proj", "v_proj", "out_proj"],
    hira_dropout=0.01,
    task_type="SEQ_2_SEQ_LM"
)

Sources: src/peft/tuners/hira/model.py:1-60

Quantization Support

PEFT integrates with BitsAndBytes for 8-bit and 4-bit quantization:

from peft import prepare_model_for_kbit_training, get_peft_model, LoraConfig
import transformers

quantization_config = transformers.BitsAndBytesConfig(load_in_8bit=True)
model = AutoModelForCausalLM.from_pretrained(
    "model-name",
    quantization_config=quantization_config
)
model = prepare_model_for_kbit_training(model)

config = LoraConfig(r=8, lora_alpha=16, target_modules=["q_proj", "v_proj"])
peft_model = get_peft_model(model, config)

Helper Functions

Signature Updates

The helpers module provides utility functions for updating model signatures:

from peft import update_forward_signature, update_generate_signature, update_signature

# Update forward signature only
update_forward_signature(peft_model)

# Update generate signature only
update_generate_signature(peft_model)

# Update both
update_signature(peft_model, method="all")

Model Validation

from peft.helpers import check_if_peft_model

# Check if a model ID corresponds to a PEFT model
is_peft = check_if_peft_model("user/peft-model")

# Works with both Hub and local paths
is_peft_local = check_if_peft_model("./local/peft-model")

Adapter Scale Rescaling

from peft.helpers import rescale_adapter_scale

with rescale_adapter_scale(model, multiplier=0.5):
    output = model(inputs)

Memory Optimization

Low CPU Memory Usage

Loading adapters can be optimized for memory-constrained environments:

# Create adapter weights on meta device for faster loading
peft_model = PeftModel.from_pretrained(
    base_model,
    adapter_path,
    low_cpu_mem_usage=True
)

Training with Quantized Models

PEFT supports full training workflows with quantized base models:

from peft import get_peft_model, LoraConfig, prepare_model_for_kbit_training

model = AutoModelForCausalLM.from_pretrained(
    "mistralai/Mistral-7B-Instruct-v0.1",
    quantization_config=BitsAndBytesConfig(load_in_4bit=True)
)
model = prepare_model_for_kbit_training(model)

config = LoraConfig(r=8, lora_alpha=16, target_modules=["q_proj", "v_proj"])
peft_model = get_peft_model(model, config)

Integration Patterns

With Diffusers

PEFT works with Stable Diffusion and other diffusion models:

from diffusers import StableDiffusionPipeline
from peft import MissModel, MissConfig

config_unet = MissConfig(
    r=8,
    target_modules=["proj_in", "proj_out", "to_k", "to_q", "to_v"],
    init_weights=True
)

pipeline = StableDiffusionPipeline.from_pretrained("runwayml/stable-diffusion-v1-5")
pipeline.unet = MissModel(pipeline.unet, config_unet, "default")

Sources: src/peft/tuners/miss/model.py:1-60

Some PEFT methods like XLora are designed for multi-modal models with complex architecture support:

from peft import XLoraConfig, get_peft_model

config = XLoraConfig(
    adapter_dict={
        "adapter_1": "./path/to/adapter-1",
        "adapter_2": "./path/to/adapter-2"
    }
)

model = AutoModelForCausalLM.from_pretrained("model-name", trust_remote_code=True)
xlora_model = get_peft_model(model, config)

Workflow Diagram

graph TD
    A[Pre-trained Model] --> B[Choose Fine-tuning Method]
    B --> C[Create PEFT Config]
    C --> D[Initialize Adapter]
    D --> E[Train Adapter]
    E --> F{Save or Load?}
    F -->|Save| G[save_pretrained]
    F -->|Load| H[from_pretrained]
    G --> I[Hub or Local]
    H --> J[Merge or Inference]
    J --> K[merge_and_unload]
    J --> L[Direct Inference]
    K --> M[Final Model]
    L --> M

Best Practices

Start with Default Ranks: Begin with r=8 for LoRA and increase based on performance
Target Specific Modules: Prefer targeting attention projection layers (q_proj, v_proj) over all linear layers
Use Quantization for Large Models: Apply 4-bit quantization (QLoRA) for models larger than 7B parameters
Save Checkpoints Regularly: Use PEFT's built-in checkpoint saving to prevent training loss
Evaluate Before Merging: Always evaluate adapter quality before merging into the base model

Conclusion

PEFT provides a comprehensive framework for parameter-efficient fine-tuning that enables training large models on limited hardware. Its modular architecture supports various adaptation methods while maintaining compatibility with the broader Hugging Face ecosystem. Whether working with language models, vision models, or multi-modal architectures, PEFT offers consistent APIs and significant memory savings compared to full fine-tuning approaches.

Sources: src/peft/tuners/lora/model.py:1-100 Sources: src/peft/tuners/tuners_utils.py:1-50

Sources: [src/peft/tuners/lora/model.py:1-50]()

Installation Guide

Related topics: Introduction to PEFT, Quantization Integration

Section Related Pages

Continue reading this section for the full explanation and source context.

Section Hardware Requirements

Continue reading this section for the full explanation and source context.

Section Software Requirements

Continue reading this section for the full explanation and source context.

Section Standard Installation via pip

Continue reading this section for the full explanation and source context.

Installation Guide

This guide covers all methods for installing the PEFT (Parameter-Efficient Fine-Tuning) library, including dependencies management, optional feature installations, and verification procedures.

Overview

The PEFT library provides state-of-the-art parameter-efficient fine-tuning methods including LoRA, AdaLoRA, Prefix Tuning, Prompt Tuning, and many other advanced techniques. Proper installation ensures access to all functionality including GPU acceleration, quantization support, and integration with Hugging Face Transformers and Diffusers.

Key Installation Features:

Core library installation via pip, conda, or from source
Optional dependencies for specific tuners and features
GPU/CUDA support for accelerated training
BitsAndBytes integration for quantization
Diffusers integration for image generation models

System Requirements

Hardware Requirements

Component	Minimum	Recommended
RAM	8 GB	16 GB+
GPU VRAM	4 GB	8-24 GB (depending on model size)
Storage	5 GB	10 GB+
CUDA	11.6	11.8+ or CUDA 12.x

Software Requirements

Requirement	Version
Python	≥ 3.8
PyTorch	≥ 1.11.0
Transformers	≥ 4.20.0
Diffusers	≥ 0.13.0
Accelerate	≥ 0.20.0

Installation Methods

Standard Installation via pip

The simplest method to install PEFT is using pip:

pip install peft

This installs the core library with all base dependencies.

Installing Specific Versions

To install a specific version of PEFT:

pip install peft==0.13.0

To install the latest development version from GitHub:

pip install git+https://github.com/huggingface/peft.git

Installation from Source

For developers contributing to PEFT or needing the latest features:

git clone https://github.com/huggingface/peft.git
cd peft
pip install -e .

The editable installation (-e .) allows modifications to the source code while keeping the package importable.

Dependencies Structure

Core Dependencies

The core dependencies are defined in pyproject.toml and requirements.txt:

# Core runtime dependencies
torch>=1.11.0
transformers>=4.20.0
accelerate>=0.20.0
torch>=1.11.0

Sources: pyproject.toml

Optional Dependencies by Feature

PEFT provides optional dependencies for specific use cases:

Feature	Installation Command	Purpose
Quantization	`pip install peft[quantization]`	BitsAndBytes 4-bit/8-bit quantization
GPU Training	`pip install peft[gpu]`	CUDA-optimized operations
Diffusers	`pip install peft[diffusers]`	Stable Diffusion model support
Dev Tools	`pip install peft[dev]`	Testing and linting
All Extras	`pip install peft[all]`	Complete installation

Advanced Installation with Quantization

For models requiring quantized weights (e.g., using 4-bit or 8-bit precision):

pip install peft bitsandbytes scipy accelerate

This combination enables:

4-bit quantization via BitsAndBytes
8-bit quantization for extreme memory reduction
Mixed-precision training optimization
Efficient loading of large models on limited hardware

Sources: src/peft/tuners/lora/model.py

Environment Setup

Using Virtual Environments

Using venv:

python -m venv peft-env
source peft-env/bin/activate  # Linux/macOS
peft-env\Scripts\activate     # Windows
pip install peft

Using conda:

conda create -n peft-env python=3.10
conda activate peft-env
pip install peft

CUDA Configuration

For GPU acceleration, ensure CUDA is properly configured:

import torch
print(torch.cuda.is_available())  # Should return True
print(torch.cuda.device_count())  # Number of available GPUs

The PEFT library automatically detects and utilizes available CUDA devices during training.

Verification and Testing

Basic Installation Verification

Verify your installation by importing PEFT and checking the version:

import peft
print(peft.__version__)  # Should print the installed version

Quick Functionality Test

Test basic LoRA functionality:

from transformers import AutoModelForCausalLM, AutoTokenizer
from peft import get_peft_model, LoraConfig

# Load a small model for testing
model_name = "gpt2"
model = AutoModelForCausalLM.from_pretrained(model_name)

# Configure LoRA
lora_config = LoraConfig(
    task_type="CAUSAL_LM",
    r=8,
    lora_alpha=16,
    target_modules=["c_attn", "c_proj"],
    lora_dropout=0.05
)

# Apply PEFT
peft_model = get_peft_model(model, lora_config)
peft_model.print_trainable_parameters()

Signature Update Utilities

After installation, you may want to update method signatures for better IDE support:

from peft import update_forward_signature, update_generate_signature

# Update forward signature
update_forward_signature(peft_model)

# Update generate signature (for generative models)
update_generate_signature(peft_model)

Sources: src/peft/helpers.py:1-100

Tuner-Specific Installation Notes

LoRA and QLoRA

Standard LoRA requires no additional dependencies beyond core installation. QLoRA requires:

pip install peft bitsandbytes>=0.40.0 trl>=0.4.0

Sources: src/peft/tuners/lora/model.py

Prefix Tuning and Prompt Tuning

These methods require only core dependencies:

pip install peft

Diffusion Model Support (LoRA for Images)

For Stable Diffusion and similar models:

pip install peft diffusers

Example configuration for Stable Diffusion:

from diffusers import StableDiffusionPipeline
from peft import MissModel, MissConfig

config_unet = MissConfig(
    r=8,
    target_modules=["proj_in", "proj_out", "to_k", "to_q", "to_v", "to_out.0"],
    init_weights=True
)

pipeline = StableDiffusionPipeline.from_pretrained("runwayml/stable-diffusion-v1-5")
pipeline.unet = MissModel(pipeline.unet, config_unet, "default")

Sources: src/peft/tuners/miss/model.py

X-LoRA Installation

X-LoRA requires specific dependencies for multi-adapter support:

pip install peft transformers accelerate bitsandbytes

Sources: src/peft/tuners/xlora/model.py

Troubleshooting

Common Installation Issues

Issue	Solution
`ImportError: No module named peft`	Reinstall: `pip uninstall peft && pip install peft`
CUDA out of memory	Use quantization or smaller batch sizes
BitsAndBytes import failure	Install: `pip install bitsandbytes`
Old PyTorch version	Update: `pip install torch>=1.11.0`

Version Compatibility

Check compatibility matrix:

PEFT Version	Min Python	Min PyTorch	Min Transformers
0.13.x	3.8+	1.11.0	4.20.0
0.12.x	3.8+	1.11.0	4.20.0
0.11.x	3.7+	1.11.0	4.20.0

Verifying Adapter Loading

Test adapter functionality after installation:

from peft import check_if_peft_model

is_peft = check_if_peft_model("path/to/model")
print(f"Is PEFT model: {is_peft}")

Sources: src/peft/helpers.py:51-65

Adapter Hotswap Installation

For runtime adapter switching functionality:

pip install peft

The hotswap capability is built into PEFT's core functionality:

from peft.utils.hotswap import hotswap_adapter

# Load and swap adapters at runtime
hotswap_adapter(model, "path-to-new-adapter", adapter_name="default")

Sources: src/peft/utils/hotswap.py

Next Steps

After successful installation:

Quick Start: Follow the Quickstart Guide for first-time users
Tuner Selection: Review available tuners to choose the right method
Configuration: Learn about PeftConfig options
Examples: Explore example notebooks for your use case

Summary

The PEFT library offers flexible installation options to accommodate various use cases from basic fine-tuning to advanced quantized training. Core installation via pip provides immediate access to all major functionality, while optional dependencies enable specialized features like 4-bit quantization and diffusion model support.

Sources: [pyproject.toml](https://github.com/huggingface/peft/blob/main/pyproject.toml)

System Architecture

Related topics: Core Components, Introduction to PEFT, Configuration System

Section Related Pages

Continue reading this section for the full explanation and source context.

Section 1. PeftModel Base Class

Continue reading this section for the full explanation and source context.

Section 2. BaseTuner Class

Continue reading this section for the full explanation and source context.

Section 3. Configuration System

Continue reading this section for the full explanation and source context.

System Architecture

Overview

The PEFT (Parameter-Efficient Fine-Tuning) library implements a modular architecture designed to enable efficient model adaptation without modifying the entire parameter set of pre-trained models. The system architecture is built around three core pillars: the PeftModel base class hierarchy, tuner abstractions, and configuration management.

PEFT supports multiple fine-tuning techniques including LoRA, IA³, Adapters, Prefix Tuning, Prompt Learning, and various specialized methods like SHiRA, GraLoRA, X-LoRA, and others. Each technique is implemented as a separate "tuner" that follows a common interface defined in the base tuner utilities.

High-Level Architecture Diagram

graph TD
    User[User Code] --> PeftAPI[PeftModel API]
    PeftAPI --> PeftModel[PeftModel Base Class]
    PeftModel --> BaseTuner[BaseTuner]
    BaseTuner --> TunerRegistry[Tuner Registry]
    
    subgraph Tuners
        LoRA[LoRA Tuner]
        IA3[IA³ Tuner]
        PrefixTuning[Prefix Tuning]
        PromptLearning[Prompt Learning]
        SHiRA[SHiRA Tuner]
        GraLoRA[GraLoRA Tuner]
        XLoRA[X-LoRA Tuner]
        Hira[Hira Tuner]
        DeLoRA[DeLoRA Tuner]
        Miss[MiSS Tuner]
        Adamss[Adamss Tuner]
    end
    
    BaseTuner --> LoRA
    BaseTuner --> IA3
    BaseTuner --> PrefixTuning
    BaseTuner --> PromptLearning
    BaseTuner --> SHiRA
    BaseTuner --> GraLoRA
    BaseTuner --> XLoRA
    BaseTuner --> Hira
    BaseTuner --> DeLoRA
    BaseTuner --> Miss
    BaseTuner --> Adamss
    
    PeftModel --> Config[PeftConfig]
    Config --> ConfigMapping[PEFT_TYPE_TO_CONFIG_MAPPING]
    
    TunerRegistry --> TargetMapping[TRANSFORMERS_MODELS_TO_*_TARGET_MODULES_MAPPING]

Core Components

1. PeftModel Base Class

The PeftModel class serves as the central entry point for all PEFT operations. It wraps a base model and manages adapter lifecycle, injection, and merging.

Location: src/peft/peft_model.py

#### Class Hierarchy

graph TD
    PyTorchModule[torch.nn.Module] --> PeftModel
    PeftModel --> PeftModelForCausalLM[PeftModelForCausalLM]
    PeftModel --> PeftModelForSeq2SeqLM[PeftModelForSeq2SeqLM]
    PeftModel --> PeftModelForSequenceClassification[PeftModelForSequenceClassification]
    PeftModel --> PeftModelForQuestionAnswering[PeftModelForQuestionAnswering]
    PeftModel --> PeftModelForTokenClassification[PeftModelForTokenClassification]
    PeftModel --> PeftModelForFeatureExtraction[PeftModelForFeatureExtraction]

#### Key Responsibilities

Responsibility	Description
Adapter Management	Loading, activating, and switching between multiple adapters
Module Injection	Replacing target modules with tuner layers
Forward Pass	Intercepting and modifying forward pass with adapter weights
Weight Merging	Combining adapter weights with base model weights
Model Saving/Loading	Serialization and deserialization of PEFT configurations

#### Constructor Signature

def __init__(self, model: torch.nn.Module, peft_config: PeftConfig, adapter_name: str = "default", **kwargs)

Parameters:

Parameter	Type	Default	Description
`model`	`torch.nn.Module`	Required	The base model to be adapted
`peft_config`	`PeftConfig`	Required	Configuration for the PEFT method
`adapter_name`	`str`	`"default"`	Name identifier for the adapter
`**kwargs`	Any	-	Additional arguments passed to specific tuners

Sources: src/peft/peft_model.py:1-100

2. BaseTuner Class

The BaseTuner class defines the abstract interface that all tuner implementations must follow. It handles the core logic for module injection and adapter management.

Location: src/peft/tuners/tuners_utils.py

#### Core Attributes

prefix: str = ""                    # Prefix for PEFT module names
tuner_layer_cls = None              # The tuner layer class
target_module_mapping = {}          # Maps model types to target modules

#### Key Methods

Method	Purpose
`inject_adapter()`	Creates adapter layers and replaces target modules
`_create_and_replace()`	Creates or updates adapter modules for specific targets
`_replace_module()`	Performs the actual module replacement
`_check_target_module_compatiblity()`	Validates module compatibility (e.g., for Mamba)
`merge_and_unload()`	Merges adapter weights into base model
`_unload_and_optionally_merge()`	Core logic for weight merging

#### Adapter Injection Flow

sequenceDiagram
    participant User
    participant PeftModel
    participant BaseTuner
    participant Model as Base Model
    
    User->>PeftModel: inject_adapter(model, adapter_name)
    PeftModel->>BaseTuner: inject_adapter(...)
    BaseTuner->>BaseTuner: _create_and_replace(...)
    BaseTuner->>Model: Walk modules recursively
    Model-->>BaseTuner: Find matching targets
    BaseTuner->>BaseTuner: Create adapter layer
    BaseTuner->>Model: _replace_module(parent, name, new_module)
    Note over Model: Target module replaced with adapter

Sources: src/peft/tuners/tuners_utils.py:1-200

3. Configuration System

The configuration system uses a factory pattern to map PEFT types to their corresponding configuration classes.

Location: src/peft/mapping.py

#### Configuration Mapping Table

PEFT Type	Config Class	Tuner Layer Class
`LORA`	`LoraConfig`	`LoraLayer`
`IA3`	`IA3Config`	`IA3Layer`
`ADALORA`	`AdaLoraConfig`	`AdaLoraLayer`
`ADAPTER`	`AdapterConfig`	`AdapterLayer`
`PREFIX_TUNING`	`PrefixTuningConfig`	`PrefixTuningLayer`
`P_TUNING`	`PromptEncoderConfig`	`PromptEncoder`
`LORA_CONFIG`	`LoraConfig`	`LoraLayer`
`LOHA`	`LoHaConfig`	`LoHaLayer`
`OFT`	`OFTConfig`	`OFTLayer`
`XLORA`	`XLoraConfig`	`XLoraLayer`
`HIRA`	`HiraConfig`	`HiraLayer`
`SHIRA`	`ShiraConfig`	`ShiraLayer`
`GRALORA`	`GraloraConfig`	`GraloraLayer`
`DELORA`	`DeloraConfig`	`DeloraLayer`
`MISS`	`MissConfig`	`MissLayer`
`ADAMSS`	`AdamssConfig`	`AdamssLayer`

#### Auto Configuration Loading

def check_if_peft_model(model_name_or_path: str) -> bool:
    """Check if the model is a PEFT model."""

Sources: src/peft/mapping.py:1-100 Sources: src/peft/auto.py:1-50

Task-Specific Model Classes

PEFT provides specialized model classes optimized for different transformer tasks.

PeftModelForSeq2SeqLM

For sequence-to-sequence tasks (translation, summarization).

class PeftModelForSeq2SeqLM(PeftModel):
    def __init__(self, model, peft_config, adapter_name="default", **kwargs):
        super().__init__(model, peft_config, adapter_name, **kwargs)
        self.base_model_prepare_inputs_for_generation = self.base_model.prepare_inputs_for_generation
        self.base_model_prepare_encoder_decoder_kwargs_for_generation = (
            self.base_model._prepare_encoder_decoder_kwargs_for_generation
        )

Features:

Customizes prepare_inputs_for_generation for decoder input preparation
Handles encoder-decoder kwargs for generation Sources: src/peft/peft_model.py:200-400

PeftModelForSequenceClassification

For text classification tasks.

class PeftModelForSequenceClassification(PeftModel):
    def __init__(self, model, peft_config, adapter_name="default", **kwargs):
        super().__init__(model, peft_config, adapter_name, **kwargs)
        classifier_module_names = ["classifier", "score"]

Target Modules: ["classifier", "score"] Sources: src/peft/peft_model.py:100-200

PeftModelForQuestionAnswering

For QA tasks.

class PeftModelForQuestionAnswering(PeftModel):
    def __init__(self, model, peft_config, adapter_name="default", **kwargs):
        super().__init__(model, peft_config, adapter_name, **kwargs)
        qa_module_names = ["qa_outputs"]

Target Modules: ["qa_outputs"] Sources: src/peft/peft_model.py:250-350

PeftModelForTokenClassification

For named entity recognition and token-level tasks.

class PeftModelForTokenClassification(PeftModel):
    def __init__(self, model, peft_config=None, adapter_name="default", **kwargs):
        super().__init__(model, peft_config, adapter_name, **kwargs)
        classifier_module_names = ["classifier", "score"]

Sources: src/peft/peft_model.py:300-400

Tuner Implementations

Common Tuner Structure

All tuners follow a consistent pattern:

class SomeTuner(PeftModel):
    prefix: str = "tuner_"
    tuner_layer_cls = SomeLayerClass
    target_module_mapping = TRANSFORMERS_MODELS_TO_SOME_TARGET_MODULES_MAPPING
    
    def _create_and_replace(self, config, adapter_name, target, target_name, parent, current_key, **kwargs):
        # Implementation

Target Module Mapping

Each tuner defines which modules can be targeted for adaptation based on the model architecture.

TRANSFORMERS_MODELS_TO_LORA_TARGET_MODULES_MAPPING = {
    "t5": ["q", "v"],
    "llama": ["q_proj", "v_proj"],
    "bert": ["query", "value"],
    # ... more mappings
}

Example: SHiRA Tuner

class ShiraModel(PeftModel):
    prefix: str = "shira_"
    tuner_layer_cls = ShiraLayer
    target_module_mapping = TRANSFORMERS_MODELS_TO_SHIRA_TARGET_MODULES_MAPPING

Key Features:

Supports random mask generation with mask_type == "random" and configurable random_seed
Wraps Linear layers with SHiRA adapter logic

Sources: src/peft/tuners/shira/model.py:1-80

Example: GraLoRA Tuner

class GraloraModel(PeftModel):
    prefix: str = "gralora_"
    tuner_layer_cls = GraloraLayer
    target_module_mapping = TRANSFORMERS_MODELS_TO_GRALORA_TARGET_MODULES_MAPPING

Sources: src/peft/tuners/gralora/model.py:1-80

Example: X-LoRA Tuner

X-LoRA supports multiple adapter loading with device placement:

def __init__(
    self,
    model: nn.Module,
    config: Union[dict[str, XLoraConfig], XLoraConfig],
    adapter_name: str,
    torch_device: Optional[str] = None,
    ephemeral_gpu_offload: bool = False,
    autocast_adapter_dtype: bool = True,
    **kwargs,
)

Sources: src/peft/tuners/xlora/model.py:1-100

Model Loading and Serialization

From Pretrained

@classmethod
def from_pretrained(
    cls,
    model: torch.nn.Module,
    model_id: str,
    adapter_name: str = "default",
    is_trainable: bool = False,
    config: Optional[PeftConfig] = None,
    autocast_adapter_dtype: bool = True,
    **kwargs
) -> PeftModel:

Parameters:

Parameter	Type	Description
`model`	`torch.nn.Module`	The base model to adapt
`model_id`	`str`	Path or HuggingFace Hub identifier
`adapter_name`	`str`	Adapter name (default: "default")
`is_trainable`	`bool`	Whether adapter is trainable
`config`	`PeftConfig`	Pre-loaded configuration
`autocast_adapter_dtype`	`bool`	Auto-cast adapter dtype

Sources: src/peft/peft_model.py:400-600

Hotswap Adapter

For runtime adapter replacement without full model reload:

def hotswap_adapter(
    model,
    model_name_or_path,
    adapter_name="default",
    torch_device=None,
    **kwargs
):

Sources: src/peft/utils/hotswap.py:1-100

Helper Utilities

Signature Updates

For model compatibility, PEFT provides utilities to update method signatures:

def update_forward_signature(model: PeftModel) -> None:
    """Updates forward signature to include parent's signature."""

def update_generate_signature(model: PeftModel) -> None:
    """Updates generate signature to include parent's signature."""

def update_signature(model: PeftModel, method: str = "all") -> None:
    """Updates forward and/or generate signature."""

Logic: Updates signatures only when the current signature only has *args and **kwargs:

current_signature = inspect.signature(model.forward)
if (
    len(current_signature.parameters) == 2
    and "args" in current_signature.parameters
    and "kwargs" in current_signature.parameters
):
    # Update with parent's signature

Sources: src/peft/helpers.py:1-150

Adapter Scale Rescaling

Context manager for temporary adapter scaling:

@contextmanager
def rescale_adapter_scale(model, multiplier):
    """Context manager to temporarily rescale adapter scaling."""

Data Flow Diagram

graph LR
    subgraph Input
        InputIDs[input_ids]
        Attention[attention_mask]
        Embeds[inputs_embeds]
    end
    
    subgraph Processing
        PEFTConfig[PeftConfig]
        BaseModel[Base Model]
        Adapters[Adapter Layers]
    end
    
    subgraph Output
        OutputLogits[Output Logits]
        HiddenStates[Hidden States]
        AttentionWeights[Attention Weights]
    end
    
    InputIDs --> BaseModel
    Attention --> BaseModel
    Embeds --> BaseModel
    PEFTConfig --> Adapters
    BaseModel <--> Adapters
    Adapters --> OutputLogits
    Adapters --> HiddenStates
    Adapters --> AttentionWeights

Configuration Classes

Each tuner type has a corresponding configuration class:

Tuner	Config Class	Key Parameters
LoRA	`LoraConfig`	`r`, `lora_alpha`, `lora_dropout`, `target_modules`
IA³	`IA3Config`	`target_modules`, `feedforward_modules`
Prefix Tuning	`PrefixTuningConfig`	`num_virtual_tokens`, `num_transformer_submodules`
Prompt Learning	`PromptEncoderConfig`	`num_virtual_tokens`, `encoder_hidden_size`
SHiRA	`ShiraConfig`	`r`, `mask_type`, `random_seed`
GraLoRA	`GraloraConfig`	`r`
X-LoRA	`XLoraConfig`	Multiple adapter configs
Hira	`HiraConfig`	`r`, `hira_dropout`
DeLoRA	`DeloraConfig`	`rank_pattern`, `lambda_pattern`
MiSS	`MissConfig`	`r`, `target_modules`, `init_weights`
Adamss	`AdamssConfig`	`r`, `num_subspaces`, `target_modules`

Multiple Adapter Support

PEFT supports loading and managing multiple adapters simultaneously:

graph TD
    BaseModel[Base Model] --> Adapter1[Adapter 1: default]
    BaseModel --> Adapter2[Adapter 2: adapter_v2]
    BaseModel --> AdapterN[Adapter N: custom_name]
    
    ActiveAdapter[Active Adapter] --> Selection[Selection]
    Selection --> Adapter1
    Selection --> Adapter2
    Selection --> AdapterN

Key Operations:

Add adapters via inject_adapter() with unique names
Activate specific adapter via set_adapter()
Merge single or multiple adapters via merge_and_unload(adapter_names=[...])
Hotswap adapters at runtime via hotswap_adapter()

Class Inheritance Diagram

classDiagram
    class PeftModel {
        +model
        +peft_config
        +active_adapters
        +inject_adapter()
        +merge_and_unload()
        +unload()
        +get_prompt()
    }
    
    class PeftModelForCausalLM {
        +forward()
    }
    
    class PeftModelForSeq2SeqLM {
        +forward()
        +prepare_inputs_for_generation()
    }
    
    class PeftModelForSequenceClassification {
        +forward()
    }
    
    class PeftModelForQuestionAnswering {
        +forward()
    }
    
    class PeftModelForTokenClassification {
        +forward()
    }
    
    class PeftModelForFeatureExtraction {
        +forward()
    }
    
    PeftModel <|-- PeftModelForCausalLM
    PeftModel <|-- PeftModelForSeq2SeqLM
    PeftModel <|-- PeftModelForSequenceClassification
    PeftModel <|-- PeftModelForQuestionAnswering
    PeftModel <|-- PeftModelForTokenClassification
    PeftModel <|-- PeftModelForFeatureExtraction

Summary

The PEFT system architecture provides a flexible, extensible framework for parameter-efficient fine-tuning through:

Centralized Model Management: PeftModel base class handles adapter lifecycle
Modular Tuner System: Each technique (LoRA, IA³, etc.) implements the BaseTuner interface
Configuration-Driven Design: Factory pattern maps PEFT types to configs
Task-Specific Optimizations: Specialized model classes for different downstream tasks
Multi-Adapter Support: Runtime switching and hotswapping of adapters
Seamless Integration: Auto-loading and signature updates for transformer compatibility

This architecture enables researchers and practitioners to easily extend PEFT with new fine-tuning methods while maintaining backward compatibility and performance optimizations.

Sources: [src/peft/peft_model.py:1-100]()

Core Components

Related topics: System Architecture, Configuration System, Model Loading and Saving

Section Related Pages

Continue reading this section for the full explanation and source context.

Section Key Responsibilities

Continue reading this section for the full explanation and source context.

Section Task-Specific Model Classes

Continue reading this section for the full explanation and source context.

Section Key Methods

Continue reading this section for the full explanation and source context.

Core Components

Overview

The PEFT (Parameter-Efficient Fine-Tuning) library provides a modular architecture for adapting pre-trained models with minimal computational overhead. The Core Components form the foundational layer that enables all PEFT methods—including LoRA, IA³, Prefix Tuning, and custom tuners—to inject trainable parameters into base models efficiently.

The core architecture consists of:

PeftModel: The primary wrapper class that encapsulates base models with adapter layers
PeftConfig: Configuration objects that define adapter-specific parameters
BaseTunerLayer: Base class for all adapter layer implementations
inject_adapter: Core mechanism for attaching adapters to target modules
Mapping System: Registry connecting PEFT types to their implementations

Sources: src/peft/peft_model.py:1-50

Architecture Overview

graph TD
    A[Pre-trained Model] --> B[PeftModel]
    B --> C{PEFT Type}
    C -->|LORA| D[LoRA Layers]
    C -->|IA3| E[IA³ Layers]
    C -->|PREFIX_TUNING| F[Prefix Layers]
    C -->|CUSTOM| G[Custom Tuners]
    
    H[PeftConfig] --> B
    I[Adapter Registry] --> B
    
    J[Target Modules] --> K[inject_adapter]
    K --> B
    
    L[from_pretrained] --> B
    M[get_peft_model] --> B

PeftModel Base Class

The PeftModel class serves as the central abstraction for all PEFT-adapted models. It wraps a base model and manages one or more adapters, each containing trainable parameters.

Key Responsibilities

Responsibility	Description
Adapter Management	Load, activate, and switch between multiple adapters
Forward Pass	Intercept forward calls to route through active adapters
Parameter Tracking	Report trainable vs. total parameter counts
Serialization	Save and load adapter weights and configurations

Task-Specific Model Classes

PEFT provides specialized model classes for different transformer tasks:

Model Class	Task Type	Use Case
`PeftModel`	Generic	Base wrapper for any model
`PeftModelForSequenceClassification`	SEQ_CLS	Text classification
`PeftModelForTokenClassification`	TOKEN_CLS	Named entity recognition
`PeftModelForQuestionAnswering`	QUESTION_ANS	Extractive QA
`PeftModelForSeq2SeqLM`	SEQ_2_SEQ_LM	Translation, summarization
`PeftModelForCausalLM`	CAUSAL_LM	Text generation
`PeftModelForFeatureExtraction`	FEATURE_EXTRACTION	Embedding extraction

Sources: src/peft/peft_model.py:50-150

Key Methods

def from_pretrained(
    model: torch.nn.Module,
    model_id: str | os.PathLike,
    adapter_name: str = "default",
    is_trainable: bool = False,
    config: PeftConfig = None,
    autocast_adapter_dtype: bool = True,
    **kwargs
) -> PeftModel

This factory method instantiates a PEFT model from a pretrained configuration and optionally loads adapter weights.

Sources: src/peft/peft_model.py:150-200

PeftConfig System

The PeftConfig class hierarchy defines adapter-specific hyperparameters. Each PEFT method has its own configuration class that inherits from the base PeftConfig.

Configuration Class Hierarchy

graph TD
    A[PeftConfig] --> B[LoraConfig]
    A --> C[PromptLearningConfig]
    C --> D[PrefixTuningConfig]
    C --> E[PromptEncoderConfig]
    A --> F[IA3Config]
    A --> G[LoHaConfig]
    A --> H[OFTConfig]
    A --> I[TinyLoRAConfig]
    A --> J[AdamssConfig]

Common Configuration Parameters

Parameter	Type	Default	Description
`peft_type`	`PeftType`	Required	The PEFT method being used
`task_type`	`TaskType`	Required	The downstream task type
`inference_mode`	`bool`	`False`	Whether model is in inference mode
`target_modules`	`List[str]`	`None`	Module names to apply adapters to
`r`	`int`	`8`	LoRA rank dimension
`lora_alpha`	`int`	`8`	LoRA scaling factor
`lora_dropout`	`float`	`0.0`	Dropout probability for LoRA layers

Sources: src/peft/config.py, src/peft/mapping.py

Tuner Layer Base Classes

BaseTunerLayer

The BaseTunerLayer class provides the interface that all adapter layer implementations must follow. It defines methods for layer initialization, adapter updating, and merging.

classDiagram
    class BaseTunerLayer {
        +base_layer: nn.Module
        +active_adapters: List[str]
        +adapter_list: List[str]
        +update_layer(adapter_name, ...)
        +merge()
        +unmerge()
    }

Key Methods

Method	Description
`update_layer(adapter_name, **kwargs)`	Initialize or update adapter weights
`merge()`	Merge adapter weights into base layer
`unmerge()`	Restore original base layer weights
`scale_layer(scale)`	Apply scaling factor to adapter output

Sources: src/peft/tuners/tuners_utils.py:100-150

Method-Specific Tuner Layers

Each PEFT method implements its own tuner layer class:

Tuner	Layer Class	Key Parameters
LoRA	`LoraLayer`	`r`, `lora_alpha`, `lora_dropout`, `lora_A`, `lora_B`
IA³	`IA3Layer`	`inn_factor`, `key_value_dim`
OFT	`OFTLayer`	`oft_r`, `oft_diag_blocks`
SHiRA	`ShiraLayer`	`mask_fn`, `random_seed`
Gralora	`GraloraLayer`	`r` (SVD rank)

Sources: src/peft/tuners/ia3/model.py, src/peft/tuners/oft/model.py, src/peft/tuners/shira/model.py, src/peft/tuners/gralora/model.py

Adapter Injection Mechanism

The inject_adapter method is the core mechanism that replaces target modules with adapter layers. This process traverses the model and substitutes compatible modules.

graph TD
    A[inject_adapter called] --> B{module.is_target_module?}
    B -->|Yes| C{Create New Module?}
    C -->|New adapter| D[_create_new_module]
    C -->|Existing adapter| E[update_layer]
    D --> F[_replace_module]
    E --> G[Set requires_grad False]
    F --> H[Module replaced]
    B -->|No| I[Skip module]
    G --> I

Injection Flow

def inject_adapter(
    model: nn.Module,
    adapter_name: str,
    autocast_adapter_dtype: bool = True,
    low_cpu_mem_usage: bool = False,
    state_dict: Optional[dict] = None,
) -> None

The method performs the following steps:

Identifies target modules based on peft_config.target_modules
For each target, either creates a new adapter module or updates an existing one
Replaces the original module in the parent model
Sets appropriate requires_grad flags based on is_trainable

Sources: src/peft/tuners/tuners_utils.py:150-250

_create_and_replace Pattern

Each tuner implements _create_and_replace to handle the specific module creation logic:

def _create_and_replace(
    self,
    config,
    adapter_name,
    target,
    target_name,
    parent,
    current_key,
    **optional_kwargs,
) -> None

Sources: src/peft/tuners/shira/model.py:40-80, src/peft/tuners/gralora/model.py:40-70, src/peft/tuners/miss/model.py:30-70

Mixed Model Support

The PeftMixedModel class extends PeftModel to support heterogeneous adapters—models with different PEFT methods simultaneously.

graph LR
    A[Base Model] --> B[PeftMixedModel]
    B --> C[LoRA Adapter]
    B --> D[IA³ Adapter]
    B --> E[Prefix Adapter]

Loading Mixed Models

@classmethod
def from_pretrained(
    cls,
    model: nn.Module,
    model_id: str | os.PathLike,
    adapter_name: str = "default",
    is_trainable: bool = False,
    config: PeftConfig = None,
    low_cpu_mem_usage: bool = False,
    **kwargs,
) -> PeftMixedModel

Sources: src/peft/mixed_model.py:50-100

Helper Functions

The helpers.py module provides utility functions for working with PEFT models.

Signature Update Functions

These functions update the forward and generate signatures of PEFT models to expose parameters from the underlying base model.

Function	Purpose
`update_forward_signature(model)`	Update `model.forward` signature to include base model parameters
`update_generate_signature(model)`	Update `model.generate` signature to include base model parameters
`update_signature(model, method)`	Update both signatures or specify `'forward'`/`'generate'`/`'all'`

def update_forward_signature(model: PeftModel) -> None:
    """Update the forward signature to include base model parameters."""
    current_signature = inspect.signature(model.forward)
    if (
        len(current_signature.parameters) == 2
        and "args" in current_signature.parameters
        and "kwargs" in current_signature.parameters
    ):
        # Copy signature from base model
        ...

Sources: src/peft/helpers.py:50-100

Model Validation

def check_if_peft_model(model_name_or_path: str) -> bool:
    """
    Check if the model is a PEFT model.
    
    Returns:
        bool: True if the model is a PEFT model, False otherwise.
    """

This function attempts to load a PeftConfig from the given path and returns True if successful.

Sources: src/peft/helpers.py:100-130

Adapter Rescaling Context Manager

@contextmanager
def rescale_adapter_scale(model, multiplier):
    """Temporarily rescale the scaling of the LoRA adapter."""

This context manager temporarily rescales adapter weights during inference, useful for ablation studies.

Sources: src/peft/helpers.py:130-160

Hotswap Adapter

The hotswap_adapter function enables runtime replacement of loaded adapters without reloading the entire model.

graph TD
    A[hotswap_adapter called] --> B[Load new config]
    B --> C[Validate PEFT type]
    C --> D[Load state dict]
    D --> E[Transfer to device]
    E --> F[Replace adapter weights]
    F --> G[Success]

def hotswap_adapter(
    model: PeftModel,
    model_name_or_path: str,
    adapter_name: str = "default",
    torch_device: str = None,
    **kwargs,
) -> None

Sources: src/peft/utils/hotswap.py:30-80

Unload and Merge Operations

Base tuners provide methods to unload or merge adapter weights.

merge_and_unload

def merge_and_unload(progressbar: bool = False, safe_merge: bool = False, adapter_names = None) -> nn.Module

Merges adapter weights into the base model and returns the resulting model with adapter modules removed.

unload

def unload() -> nn.Module

Returns the base model by removing all PEFT modules without merging weights. This is useful when you need the original model but want to preserve the option to reload adapters later.

_unload_and_optionally_merge

def _unload_and_optionally_merge(
    progressbar: bool = False,
    safe_merge: bool = False,
    adapter_names = None,
    merge: bool = True,
) -> nn.Module

Sources: src/peft/tuners/tuners_utils.py:80-120

Target Module Mapping

Each tuner defines a target_module_mapping that specifies which modules should be replaced for different model architectures.

# Example: SHiRA target module mapping
target_module_mapping = TRANSFORMERS_MODELS_TO_SHIRA_TARGET_MODULES_MAPPING

# Example: GraLoRA target module mapping
target_module_mapping = TRANSFORMERS_MODELS_TO_GRALORA_TARGET_MODULES_MAPPING

These mappings allow PEFT methods to automatically identify compatible layers (e.g., q_proj, v_proj, k_proj) across different transformer architectures.

BitsAndBytes Integration

PEFT supports quantized models through BitsAndBytes integration. The tuners detect quantized base layers and wrap them appropriately:

if loaded_in_8bit and isinstance(target_base_layer, bnb.nn.Linear8bitLt):
    eightbit_kwargs = kwargs.copy()
    eightbit_kwargs.update({
        "has_fp16_weights": target_base_layer.state.has_fp16_weights,
        "threshold": target_base_layer.state.threshold,
        "index": target_base_layer.index,
    })
    new_module = Linear8bitLt(...)

Sources: src/peft/tuners/ia3/model.py:40-70

Summary

The Core Components of PEFT provide a flexible, extensible architecture for parameter-efficient fine-tuning:

PeftModel wraps base models and manages adapters with a unified interface
PeftConfig classes define method-specific hyperparameters
BaseTunerLayer establishes the contract for all adapter implementations
inject_adapter replaces target modules with adapter layers
Helper functions provide utilities for signature updates, validation, and runtime operations
Hotswap support enables dynamic adapter replacement

This architecture allows developers to implement new PEFT methods by subclassing existing base classes while reusing the core model management infrastructure.

Sources: [src/peft/peft_model.py:1-50]()

LoRA and LoRA Variants

Related topics: Other PEFT Methods, Quantization Integration, Configuration System

Section Related Pages

Continue reading this section for the full explanation and source context.

Section Core LoRA Architecture

Continue reading this section for the full explanation and source context.

Section LoRA Module Hierarchy

Continue reading this section for the full explanation and source context.

Section Model Class

Continue reading this section for the full explanation and source context.

LoRA and LoRA Variants

Overview

LoRA (Low-Rank Adaptation) is a parameter-efficient fine-tuning technique that reduces trainable parameters by representing weight updates as low-rank decompositions. The PEFT library implements LoRA and numerous variants that extend this foundational approach with different architectural innovations, training strategies, and optimization techniques.

The LoRA system in PEFT serves as both a standalone fine-tuning method and a framework upon which variants like DoRA, AdaLoRA, LoHa, LoKr, and others are built. These variants share a common plugin architecture but differ in how they decompose and apply trainable adapters to base model layers.

Architecture

Core LoRA Architecture

LoRA modifies pre-trained neural network layers by adding trainable low-rank decomposition matrices alongside frozen original weights. For a linear layer with weight matrix $W \in \mathbb{R}^{d \times k}$, LoRA represents the update as:

$$\Delta W = BA$$

where $B \in \mathbb{R}^{d \times r}$ and $A \in \mathbb{R}^{r \times k}$ with rank $r \ll \min(d, k)$.

graph TD
    A[Base Model Layer: Weight W] --> B[Original Forward Pass<br/>y = Wx]
    C[LoRA Adapter: BA Decomposition] --> D[Modified Forward Pass<br/>y = Wx + BAz]
    B --> D
    A --> C
    E[Input x] --> A
    E --> B
    F[Adapter Input z<br/>Same as x or modified] --> C

LoRA Module Hierarchy

graph TD
    A[PeftModel] --> B[BaseModel Class]
    A --> C[LoraModel / VariantModel]
    C --> D[TunerLayerCls]
    C --> E[target_module_mapping]
    C --> F[prefix attribute]
    D --> G[LoraLayer / Conv2d / Conv1d]
    G --> H[Linear wrapper]
    H --> I[Forward with BA decomposition]

Sources: src/peft/tuners/lora/model.py:1-100

LoRA Implementation

Model Class

The LoraModel class serves as the base implementation for LoRA adapters. It extends the generic tuner base class and implements the core adapter creation logic.

class LoraModel(BaseTuner):
    prefix: str = "lora_"
    tuner_layer_cls = LoraLayer
    target_module_mapping = TRANSFORMERS_MODELS_TO_LORA_TARGET_MODULES_MAPPING

Sources: src/peft/tuners/lora/model.py:90-95

Layer Replacement Mechanism

The _create_and_replace method handles the injection of LoRA adapters into target modules:

def _create_and_replace(
    self,
    lora_config,
    adapter_name,
    target,
    target_name,
    parent,
    current_key,
    *,
    parameter_name: Optional[str] = None,
) -> None:

Sources: src/peft/tuners/lora/model.py:105-120

Forward Pass Computation

The LoRA forward pass combines the frozen base weights with trainable low-rank matrices:

def forward(self, x: torch.Tensor) -> torch.Tensor:
    while self.active_adapters not in self.peft_config:
        self.active_adapters = self.peft_config
    
    scaling = {
        adapter: self.peft_config[adapter].scaling_weight
        for adapter in self.active_adapters
    }
    
    return self.base_layer(x) + sum(
        self._forward_weight(weight, x, scaling=scaling.get(adapter, 1.0))
        for adapter, weight in self.lora_A.items()
    )

LoRA Configuration

LoraConfig Parameters

Parameter	Type	Default	Description
`r`	int	8	Rank of decomposition
`lora_alpha`	int	8	Scaling factor (often set to 2×r)
`lora_dropout`	float	0.0	Dropout probability for LoRA layers
`target_modules`	Optional[List[str]]	None	Module names to apply LoRA
`bias`	str	"none"	Bias training mode: "none", "all", "lora_only"
`fan_in_fan_out`	bool	False	Transpose weights for certain architectures
`init_weights`	bool	True	Initialize LoRA weights on creation

Advanced Configuration Options

Parameter	Type	Default	Description
`target_modules_bd_a`	Optional[List[str]]	None	Modules for block-diagonal LoRA-A
`target_modules_bd_b`	Optional[List[str]]	None	Modules for block-diagonal LoRA-B
`nblocks`	int	1	Number of blocks in block-diagonal matrices
`match_strict`	bool	True	Require strict matching for all target modules

Sources: src/peft/tuners/lora/config.py:1-200

LoRA Variants

DoRA (Weight-Decomposed LoRA)

DoRA extends standard LoRA by decomposing weights into magnitude and direction components. This variant often achieves better performance with comparable parameter counts.

# DoRA configuration example
lora_config = LoraConfig(
    use_dora=True,
    r=32,
    lora_alpha=32,
    target_modules=["q_proj", "v_proj"]
)

Sources: examples/dora_finetuning/README.md

AdaLoRA (Adaptive LoRA)

AdaLoRA dynamically adjusts the rank of LoRA blocks during training, allocating more parameters to important layers. This adaptive approach optimizes the parameter budget.

python examples/alora_finetuning/alora_finetuning.py \
  --base_model meta-llama/Llama-3.2-3B-Instruct \
  --data_path Lots-of-LoRAs/task1660_super_glue_question_generation \
  --invocation_string "<|start_header_id|>assistant<|end_header_id|>"

Sources: examples/alora_finetuning/README.md

LoHa (Low-Rank Hadamard Product)

LoHa replaces the standard AB decomposition with a Hadamard product of low-rank matrices, potentially capturing more expressive updates.

config_te = LoHaConfig(
    r=8,
    lora_alpha=32,
    target_modules=["k_proj", "q_proj", "v_proj", "out_proj", "fc1", "fc2"],
    rank_dropout=0.0,
    module_dropout=0.0,
)

Sources: src/peft/tuners/loha/__init__.py

LoKr (Low-Kronecker Product)

LoKr applies Kronecker product decomposition to weight matrices, offering different trade-offs between rank and expressiveness.

config_unet = LoKrConfig(
    r=8,
    lora_alpha=32,
    target_modules=["proj_in", "proj_out", "to_k", "to_q", "to_v"],
    rank_dropout=0.0,
    module_dropout=0.0,
    use_effective_conv2d=True,
)

Sources: src/peft/tuners/lokr/__init__.py

Block-Diagonal LoRA

Block-diagonal LoRA constrains the LoRA matrices to be block-diagonal, enabling efficient multi-adapter serving with different sharding degrees.

config = LoraConfig(
    r=16,
    target_modules_bd_a=["q_proj", "v_proj"],  # Block-diagonal A
    target_modules_bd_b=["out_proj"],            # Block-diagonal B
    nblocks=4,                                    # Sharding degree
)

Variant Comparison

Variant	Key Innovation	Target Use Case	Complexity
LoRA	Low-rank decomposition	General fine-tuning	Low
DoRA	Magnitude + direction decomposition	High-quality adaptation	Low
AdaLoRA	Adaptive rank allocation	Resource-constrained tuning	Medium
LoHa	Hadamard product decomposition	Image generation	Medium
LoKr	Kronecker product decomposition	Diffusion models	Medium
Block-Diagonal	Constrained structure	Multi-adapter serving	Medium

Usage Patterns

Basic LoRA Setup

from transformers import AutoModelForCausalLM
from peft import get_peft_model, LoraConfig

model = AutoModelForCausalLM.from_pretrained("meta-llama/Llama-2-7b")

peft_config = LoraConfig(
    task_type="CAUSAL_LM",
    r=16,
    lora_alpha=32,
    target_modules=["q_proj", "v_proj", "k_proj", "o_proj"],
    lora_dropout=0.05,
    bias="none",
)

peft_model = get_peft_model(model, peft_config)
peft_model.print_trainable_parameters()

Multi-Adapter Configuration

from peft import LoraConfig, PeftModel

# Load multiple adapters
peft_model = PeftModel.from_pretrained(
    base_model,
    adapters={
        "adapter_1": "./path/to/adapter_1",
        "adapter_2": "./path/to/adapter_2",
    },
)

Quantization with LoRA

from peft import get_peft_model, LoraConfig, prepare_model_for_kbit_training
from transformers import BitsAndBytesConfig

quantization_config = BitsAndBytesConfig(load_in_8bit=True)
model = AutoModelForCausalLM.from_pretrained(
    "meta-llama/Llama-2-7b",
    quantization_config=quantization_config,
)

model = prepare_model_for_kbit_training(model)
peft_model = get_peft_model(model, lora_config)

Integration with PeftModel

All LoRA variants integrate with the base PeftModel architecture through the tuner pattern:

graph LR
    A[Base Transformers Model] --> B[PeftModel]
    B --> C[BaseModel Class]
    C --> D[LoraModel / VariantModel]
    D --> E[Adapter Injection]
    E --> F[Modified Forward]

The PeftModel class provides unified interfaces for:

Forward pass handling
Adapter switching
Save/load operations
Parameter printing

Sources: src/peft/peft_model.py:1-100

Design Patterns

Tuner Layer Class Structure

Each LoRA variant implements a tuner_layer_cls attribute that defines the layer wrapper class:

class LoraModel(BaseTuner):
    tuner_layer_cls = LoraLayer
    
class LoHaModel(BaseTuner):
    prefix: str = "hada_"
    tuner_layer_cls = LoHaLayer
    layers_mapping: dict[type[torch.nn.Module], type[LoHaLayer]] = {
        torch.nn.Conv2d: Conv2d,
        torch.nn.Conv1d: Conv1d,
        torch.nn.Linear: Linear,
    }

Target Module Mapping

Variants define target module mappings for automatic module detection:

class LoraModel(BaseTuner):
    target_module_mapping = TRANSFORMERS_MODULES_TO_LORA_TARGET_MODULES_MAPPING

class ShiraModel(BaseTuner):
    prefix: str = "shira_"
    tuner_layer_cls = ShiraLayer
    target_module_mapping = TRANSFORMERS_MODELS_TO_SHIRA_TARGET_MODULES_MAPPING

Sources: src/peft/tuners/shira/model.py:40-45

Conclusion

LoRA and its variants in the PEFT library provide a comprehensive suite of parameter-efficient fine-tuning techniques. The shared plugin architecture enables consistent APIs across variants while allowing each method to implement its unique adaptation strategy. From basic low-rank decomposition to advanced block-diagonal structures, PEFT supports a wide range of fine-tuning scenarios with minimal computational overhead.

Sources: [src/peft/tuners/lora/model.py:1-100]()

Other PEFT Methods

Related topics: LoRA and LoRA Variants, Configuration System

Section Related Pages

Continue reading this section for the full explanation and source context.

Section Prompt Tuning

Continue reading this section for the full explanation and source context.

Section Prefix Tuning

Continue reading this section for the full explanation and source context.

Section P-Tuning

Continue reading this section for the full explanation and source context.

Other PEFT Methods

PEFT (Parameter-Efficient Fine-Tuning) encompasses a diverse collection of techniques beyond LoRA and QLoRA. These methods offer alternative approaches to adapting pre-trained models with minimal parameter updates, each with distinct mechanisms, trade-offs, and optimal use cases. This page provides a comprehensive overview of the "Other PEFT Methods" available in the Hugging Face PEFT library.

Overview of PEFT Method Categories

The PEFT library organizes fine-tuning methods into several categories based on their core adaptation mechanism. Understanding these categories helps practitioners select the appropriate method for their specific requirements.

graph TD
    A[PEFT Methods] --> B[Prompt-Based Methods]
    A --> C[Additive Methods]
    A --> D[Reparameterization Methods]
    A --> E[Multiplicative Methods]
    A --> F[Subspace Methods]
    
    B --> B1[Prompt Tuning]
    B --> B2[Prefix Tuning]
    B --> B3[P-Tuning]
    B --> B4[MultiTask Prompt Tuning]
    
    C --> C1[IA³]
    
    D --> D1[LoRA Variants<br/>AdaLoRA, Gralora, HiRA]
    
    E --> E1[OFT]
    
    F --> F1[FourierFT]

Prompt-Based Methods

Prompt-based methods modify the model's input or activation space without changing the underlying model weights. These methods add trainable parameters as virtual tokens or prefix embeddings that guide the model's behavior.

Prompt Tuning

Prompt Tuning introduces trainable "soft prompts" (embedding vectors) that are prepended to the input tokens. Unlike discrete text prompts, these are continuous vectors learned through backpropagation during fine-tuning.

Key Characteristics:

Only the prompt embeddings are trainable
No architectural changes to the base model
Requires relatively few parameters compared to full fine-tuning
Works well with larger models

Configuration Parameters:

Parameter	Type	Default	Description
`num_virtual_tokens`	int	20	Number of virtual tokens in the prompt
`prompt_tuning_init`	str	"TEXT"	Initialization method for prompts
`prompt_tuning_init_text`	str	None	Text for TEXT initialization
`token_dim`	int	Model hidden dim	Dimension of model embeddings
`num_transformer_submodules`	int	1	Number of transformer layers with prompts
`num_attention_heads`	int	Model heads	Number of attention heads
`num_layers`	int	Model layers	Number of transformer layers
`encoder_hidden_size`	int	Same as token_dim	Hidden size for encoder

Sources: src/peft/tuners/prompt_tuning/__init__.py

Prefix Tuning

Prefix Tuning adds trainable parameters to the attention mechanism by prepending learnable prefix vectors to the keys and values in every attention layer. Unlike Prompt Tuning, this affects all transformer layers directly.

Architecture:

graph LR
    A[Input Tokens] --> B[Embedding Layer]
    B --> C[Prefix P<sub>k</sub>, P<sub>v</sub>]
    B --> D[Standard K, V]
    C --> E[Multi-Head Attention]
    D --> E
    E --> F[Output]

Key Differences from Prompt Tuning:

Affects hidden states at every transformer layer
More parameter-efficient than full prompt tuning in some scenarios
Requires specification of prefix projection for deeper integration

Sources: src/peft/tuners/prefix_tuning/__init__.py

P-Tuning

P-Tuning uses trainable continuous embeddings combined with a prompt encoder (typically an LSTM or MLP) to generate prompts. The encoder processes anchor tokens and produces virtual token embeddings.

Unique Features:

Uses a small LSTM/MLP encoder to generate prompt embeddings
Supports "anchor" tokens that provide natural language hints
More flexible than pure continuous prompts

Sources: src/peft/tuners/p_tuning/__init__.py

MultiTask Prompt Tuning (MPT)

MultiTask Prompt Tuning extends standard prompt tuning by learning a shared prompt across multiple related tasks. This enables knowledge transfer and typically improves generalization.

Use Cases:

Multi-task learning scenarios
Domain adaptation with related tasks
Few-shot learning with task similarity

Sources: src/peft/tuners/multitask_prompt_tuning/__init__.py

(IA)³ - Infused Adapter by Inhibiting and Amplifying Inner Activations

(IA)³ is a multiplicative adapter method that scales activations by learned vectors. It introduces trainable vectors that multiply with hidden states at specific positions in the transformer architecture.

Mechanism

graph TD
    A[Hidden Activation h] --> B[Learned Vector l<sub>i</sub>]
    B --> C[Element-wise Multiplication]
    A --> C
    C --> D[h<sub>modified</sub> = l<sub>i</sub> ⊙ h]
    D --> E[Feed-Forward<br/>or Attention]

Configuration Options

Parameter	Type	Default	Description
`r`	int	8	Rank (not used in IA³ but kept for compatibility)
`target_modules`	list	None	Modules to apply IA³ to
`fan_in_fan_out`	bool	False	Transpose weights
`init_weights`	bool	True	Initialize adapter weights

Supported Target Modules

The IA³ method typically targets attention-related and feed-forward layers:

q_proj, k_proj, v_proj, o_proj (attention projections)
fc1, fc2 (feed-forward layers)
gate_proj, up_proj, down_proj (for modern architectures like Llama)

Sources: src/peft/tuners/ia3/__init__.py Sources: docs/source/conceptual_guides/ia3.md

OFT - Orthogonal Fine-Tuning

OFT constrains the fine-tuning updates to an orthogonal subspace, ensuring that the learned adapters do not interfere with each other. This method is particularly useful for multi-adapter scenarios.

Key Principle

OFT optimizes a rotation matrix R such that the updated weights maintain orthogonality constraints:

W_new = W_original + β · R

Where R is constrained to be orthogonal, preventing gradient interference.

Configuration Parameters

Parameter	Type	Default	Description
`r`	int	4	Rank of the OFT transformation
`target_modules`	list	["q_proj", "v_proj"]	Layers to adapt
`module_dropout`	float	0.0	Dropout probability for modules
`init_weights`	bool	True	Initialize with pretrained weights

Use Cases

Stable diffusion model adaptation (text encoder, UNet)
Multi-task learning with non-interfering adapters
Computer vision models requiring structured updates

Sources: src/peft/tuners/oft/__init__.py

FourierFT - Fourier Transform-Based Fine-Tuning

FourierFT operates in the frequency domain, learning adapters in Fourier space rather than the original weight space. This approach can capture different aspects of the model's behavior compared to spatial-domain methods.

Advantages

May capture global patterns more efficiently
Different inductive bias compared to spatial methods
Potential for more compact representations

Sources: src/peft/tuners/fourierft/__init__.py

Advanced LoRA Variants

AdaLoRA - Adaptive LoRA

AdaLoRA dynamically adjusts the rank of LoRA adaptations based on the importance of different weight matrices. It uses a budget allocation mechanism to invest more parameters in important layers.

Key Method: update_and_allocate

# Called during training loop
model.base_model.update_and_allocate(global_step)

This method updates importance scores and reallocates the rank budget based on the current training step.

Sources: src/peft/tuners/adalora/model.py

HiRA - Hierarchical Rank Adaptation

HiRA extends LoRA with hierarchical rank adaptation, allowing for more nuanced parameter allocation across different model layers.

Sources: src/peft/tuners/hira/model.py

GraLoRA - Gradient-Aware LoRA

GraLoRA considers gradient information when adapting LoRA layers, optimizing the adapter placement based on gradient flow.

Sources: src/peft/tuners/gralora/model.py

Special-Purpose Methods

SHiRA - Structured Hints for Rank Adaptation

SHiRA provides structured hints for rank adaptation, offering a different approach to parameter-efficient fine-tuning with emphasis on interpretability.

Sources: src/peft/tuners/shira/model.py

MiSS - Mixed Subspace Adaptation

MiSS adapts models in a mixed subspace, combining multiple adaptation strategies for enhanced flexibility.

Sources: src/peft/tuners/miss/model.py

Adamss - Adaptive Subspace Selection

Adamss uses adaptive subspace selection for fine-tuning, choosing the most relevant subspaces based on the task at hand.

Parameter	Type	Default	Description
`r`	int	500	Rank dimension
`num_subspaces`	int	5	Number of subspaces
`target_modules`	list	["q_proj", "v_proj"]	Target layers

Sources: src/peft/tuners/adamss/model.py

X-LoRA

X-LoRA supports multiple LoRA adapters with dynamic routing, allowing for sophisticated multi-adapter architectures.

Sources: src/peft/tuners/xlora/model.py

Comparison of Methods

Method	Category	Trainable Parameters	Best For	Supports Multi-Adapter
Prompt Tuning	Prompt-Based	Very Low	Large models, text tasks	Yes
Prefix Tuning	Prompt-Based	Low	Text generation	Yes
P-Tuning	Prompt-Based	Low-Medium	NLU tasks	Yes
MPT	Prompt-Based	Medium	Multi-task learning	Yes
(IA)³	Multiplicative	Low	Efficient scaling	Yes
OFT	Multiplicative	Low-Medium	Stable diffusion, CV	Yes
FourierFT	Frequency-Domain	Low	Global patterns	Yes
AdaLoRA	Reparameterization	Variable	Dynamic budgets	Yes
X-LoRA	Reparameterization	Medium-High	Complex routing	Yes

Unified API Usage

All PEFT methods follow a consistent API pattern through get_peft_model:

from peft import get_peft_model, PromptTuningConfig

config = PromptTuningConfig(
    task_type="SEQ_CLS",
    num_virtual_tokens=20,
    prompt_tuning_init="TEXT",
    prompt_tuning_init_text="Classify the sentiment:"
)

model = AutoModelForSequenceClassification.from_pretrained("bert-base-cased")
peft_model = get_peft_model(model, config)
peft_model.print_trainable_parameters()

Sources: docs/source/conceptual_guides/prompting.md

Best Practices

Method Selection Guidelines

For Large Language Models (>7B parameters): Prompt Tuning, Prefix Tuning, or LoRA variants
For Image Models: OFT, (IA)³
For Multi-Task Scenarios: MultiTask Prompt Tuning, X-LoRA
For Limited Compute: (IA)³, standard Prompt Tuning
For Maximum Flexibility: AdaLoRA (dynamic rank allocation)

Common Configuration Patterns

# Efficient configuration for most cases
config = LoraConfig(
    r=8,
    lora_alpha=16,
    target_modules=["q_proj", "v_proj"],
    lora_dropout=0.05,
    task_type="CAUSAL_LM"
)

# For prompt-based methods
config = PromptTuningConfig(
    num_virtual_tokens=50,
    task_type="SEQ_CLS"
)

Summary

The PEFT library provides a comprehensive suite of fine-tuning methods beyond LoRA and QLoRA. These methods offer diverse trade-offs in terms of parameter efficiency, task performance, and computational requirements. By understanding the mechanisms and use cases of each method, practitioners can select the most appropriate technique for their specific model adaptation needs.

Key takeaways:

Prompt-based methods modify input representations without changing model weights
Multiplicative methods (IA)³, OFT scale or rotate weights
Advanced LoRA variants provide dynamic optimization capabilities
All methods support multi-adapter scenarios and can be combined through the unified PEFT API

Sources: [src/peft/tuners/prompt_tuning/__init__.py](https://github.com/huggingface/peft/blob/main/src/peft/tuners/prompt_tuning/__init__.py)

Configuration System

Related topics: Core Components, Model Loading and Saving, LoRA and LoRA Variants

Section Related Pages

Continue reading this section for the full explanation and source context.

Section PeftConfig Base Class

Continue reading this section for the full explanation and source context.

Section PeftType Enumeration

Continue reading this section for the full explanation and source context.

Section TaskType Enumeration

Continue reading this section for the full explanation and source context.

Configuration System

Overview

The PEFT (Parameter-Efficient Fine-Tuning) library implements a comprehensive configuration system that enables flexible and modular adapter integration across various transformer architectures. This system decouples adapter-specific parameters from model architecture, allowing users to define fine-tuning strategies through declarative configuration objects.

The configuration system serves as the foundational layer for all PEFT adapters, providing:

Unified configuration interface across different fine-tuning methods
Automatic model patching based on target module specifications
Serialization and deserialization support for model saving/loading
Multi-adapter management capabilities

graph TD
    A[User Configuration] --> B[PeftConfig Subclass]
    B --> C{Adapter Type}
    C -->|LoRA| D[LoraConfig]
    C -->|Prefix| E[PrefixTuningConfig]
    C -->|Prompt| F[PromptEncoderConfig]
    C -->|IA³| G[Ia3Config]
    C -->|Others| H[Tuner-Specific Config]
    
    D --> I[get_peft_model]
    E --> I
    F --> I
    G --> I
    H --> I
    
    I --> J[PeftModel Base]
    J --> K[BaseTuner.inject_adapter]
    K --> L[Model Patching]

Core Components

PeftConfig Base Class

The PeftConfig class is the foundational configuration object in PEFT, inheriting from transformers.PretrainedConfig. It provides the base interface for all adapter configurations.

Key Attributes:

Attribute	Type	Description
`peft_type`	`PeftType`	Enum specifying the adapter method
`task_type`	`TaskType`	Enum specifying the ML task type
`inference_mode`	`bool`	Whether model is in inference mode
`auto_mapping`	`Optional[dict]`	Custom auto-mapping for loading
`base_model_name_or_path`	`str`	Path/identifier of base model
`revision`	`str`	Model revision for Hub models
`pad_token_id`	`Optional[int]`	Padding token ID

Source: src/peft/config.py

PeftType Enumeration

The PeftType enum defines all supported parameter-efficient fine-tuning methods:

Value	Description
`LORA`	Low-Rank Adaptation
`PROMPT_TUNING`	Soft prompt tuning
`PREFIX_TUNING`	Prefix tuning
`P_TUNING`	P-tuning (prompt encoder)
`IA3`	Infused Adapter by Inhibiting and Amplifying Inner Activations
`ADALORA`	Adaptive LoRA
`ADAPTION_PROMPT`	Adapter tuning with adaptive prompt
`POLY`	Poly (Polynomial)
`LNTYPOLY`	Linear typographic polynomial
`HRA`	Heterogeneous Re-Attention
`GRALORA`	Gradient Routing LoRA
`SHIRA`	Shifting Rank Adaptation
`XLORA`	X-LoRA (Cross-Layer LoRA)
`MISS`	Multi-Adapter Sparse Structure
`HIRA`	Hierarchical Reattention
`ADAMSS`	Adaptive Subspaces Selection

Source: src/peft/utils/peft_types.py:1-50

TaskType Enumeration

The TaskType enum specifies the machine learning task type:

Value	Description
`SEQ_CLS`	Sequence Classification
`SEQ_2_SEQ_LM`	Sequence-to-Sequence Language Modeling
`CAUSAL_LM`	Causal Language Modeling
`TOKEN_CLS`	Token Classification
`QUESTION_ANS`	Question Answering
`FEATURE_EXTRACTION`	Feature Extraction / Embeddings
`MULTIPLE_CHOICE`	Multiple Choice
`IMAGE_CLASSIFICATION`	Image Classification

Source: src/peft/utils/peft_types.py:50-80

Tuner-Specific Configurations

LoraConfig

The LoraConfig class configures LoRA (Low-Rank Adaptation) adapters:

Parameter	Type	Default	Description
`r`	`int`	8	LoRA attention dimension (rank)
`target_modules`	`Optional[Union[List[str], str]]`	`None`	Modules to apply LoRA to
`lora_alpha`	`int`	8	LoRA alpha scaling parameter
`lora_dropout`	`float`	0.0	Dropout probability for LoRA layers
`fan_in_fan_out`	`bool`	`False`	Set to transpose weight (for conv layers)
`bias`	`str`	`"none"`	Bias type: `"none"`, `"all"`, `"lora_only"`
`modules_to_save`	`Optional[List[str]]`	`None`	Modules to make trainable
`init_lora_weights`	`Union[bool, str]`	`True`	Initialization strategy

Example Configuration:

config = {
    "peft_type": "LORA",
    "task_type": "CAUSAL_LM",
    "r": 16,
    "target_modules": ["q_proj", "v_proj"],
    "lora_alpha": 32,
    "lora_dropout": 0.05,
}
peft_config = get_peft_config(config)

Source: src/peft/tuners/lora/model.py

PrefixTuningConfig

Configuration for prefix-based prompt learning:

Parameter	Type	Default	Description
`num_virtual_tokens`	`int`	None	Number of virtual tokens
`token_dim`	`int`	None	Dimensionality of token embeddings
`num_transformer_submodules`	`int`	1	Number of transformer modules
`num_attention_heads`	`int`	12	Number of attention heads
`num_layers`	`int`	12	Number of layers
`encoder_hidden_size`	`int`	None	Encoder hidden size
`prefix_projection`	`bool`	`False`	Whether to project prefix

Source: src/peft/peft_model.py

Configuration Loading and Saving

Loading Configurations

The configuration system supports loading from both local paths and Hugging Face Hub:

# From Hub
peft_config = PeftConfig.from_pretrained("user/peft-model")

# From dictionary
peft_config = get_peft_config(config_dict)

# Via mapping
config = PeftConfig.from_pretrained(
    model_name_or_path,
    **hf_kwargs
)

The from_pretrained method handles:

Subfolder paths via subfolder parameter
Model revisions via revision parameter
Authentication tokens via token or use_auth_token parameters

Source: src/peft/config.py, src/peft/mixed_model.py

Saving Configurations

Configurations can be serialized using the standard Hugging Face save_pretrained method:

peft_config.save_pretrained("output-directory")

Auto-Mapping

The auto_mapping parameter enables custom configuration-to-model mappings, particularly useful for custom adapters or third-party integrations:

peft_config = PeftConfig.from_pretrained(
    "model-id",
    auto_mapping={"custom_key": CustomAdapterClass}
)

Adapter Injection Workflow

sequenceDiagram
    participant User
    participant PeftModel
    participant BaseTuner
    participant Config
    participant TargetModule
    
    User->>PeftModel: __init__(model, peft_config)
    PeftModel->>BaseTuner: inject_adapter(model, adapter_name)
    BaseTuner->>Config: Validate peft_config
    Config->>Config: Check target_module_compatibility
    
    loop For each target module
        BaseTuner->>TargetModule: Identify target layer
        BaseTuner->>BaseTuner: _create_and_replace(...)
        BaseTuner->>TargetModule: Replace with adapter layer
    end
    
    PeftModel-->>User: Ready model

The injection process:

Validates configuration compatibility with target modules
Identifies modules matching target_modules patterns
Creates adapter layers via _create_and_replace method
Replaces original modules with adapter wrappers
Marks appropriate parameters as trainable

Source: src/peft/tuners/tuners_utils.py

Multi-Adapter Configuration

PEFT supports multiple adapters through the adapter naming system:

# Load multiple adapters
peft_model = PeftModel.from_pretrained(
    base_model, 
    "adapter-1-path",
    adapter_name="adapter_1"
)
peft_model.load_adapter("adapter-2-path", adapter_name="adapter_2")

# Set active adapter
peft_model.set_adapter("adapter_1")

Each adapter maintains its own configuration accessible via:

peft_model.peft_config["adapter_name"]

Source: src/peft/tuners/tuners_utils.py, src/peft/helpers.py

Integration with Model Types

Model-Specific Configurations

Different model architectures require specific configuration handling:

Model Type	PeftModel Class	Special Config Parameters
Causal LM	`PeftModelForCausalLM`	Standard LoRA/Prefix
Seq2Seq	`PeftModelForSeq2SeqLM`	`prepare_inputs_for_generation`
Seq Classification	`PeftModelForSequenceClassification`	`classifier_module_names`
Token Classification	`PeftModelForTokenClassification`	`classifier_module_names`
Question Answering	`PeftModelForQuestionAnswering`	`qa_module_names`
Feature Extraction	`PeftModelForFeatureExtraction`	Standard config

Source: src/peft/peft_model.py

Target Module Mapping

Each tuner type defines a target_module_mapping that specifies compatible layers for different model architectures:

# Example structure in tuners
target_module_mapping = TRANSFORMERS_MODELS_TO_LORA_TARGET_MODULES_MAPPING

This mapping ensures adapters are only applied to compatible modules (e.g., preventing LoRA application to incompatible modules in Mamba architectures).

Source: src/peft/tuners/lora/model.py, src/peft/tuners/tuners_utils.py

Advanced Configuration Features

Mixed Model Configuration

For models requiring multiple adapter types:

# Load mixed configuration
mixed_model = PeftMixedModel.from_pretrained(
    model,
    peft_model_id="mixed-peft-model",
    config=mixed_config
)

Hotswap Adapters

The hotswap functionality allows runtime adapter replacement:

from peft import hotswap_adapter

hotswap_adapter(
    model, 
    "path-to-new-adapter", 
    adapter_name="default",
    torch_device="cuda:0"
)

Source: src/peft/utils/hotswap.py

Context Manager for Adapter Scaling

Temporarily rescale adapter scaling:

from peft import rescale_adapter_scale

with rescale_adapter_scale(model, multiplier=0.5):
    output = model(inputs)

Source: src/peft/helpers.py

Configuration Validation

Target Module Compatibility

The configuration system validates target modules against model architecture:

def _check_target_module_compatiblity(self, peft_config, model, target_name):
    _check_lora_target_modules_mamba(peft_config, model, target_name)

This prevents applying adapters to incompatible modules in specific architectures.

Source: src/peft/tuners/tuners_utils.py

PEFT Type Detection

Automatic PEFT type detection from model paths:

peft_type = PeftConfig._get_peft_type(model_name_or_path, **hf_kwargs)
config_cls = PEFT_TYPE_TO_CONFIG_MAPPING[peft_type]

Best Practices

Always specify task_type: Helps PEFT apply correct model wrapper
Use target_modules wisely: Restricting to key layers reduces memory
Set inference_mode=False for training: Required for gradient computation
Save adapter config alongside weights: Ensures reproducibility
Use modules_to_save sparingly: Only for task-specific heads

Model Loading and Saving

Related topics: Core Components, Configuration System, Quantization Integration

Section Related Pages

Continue reading this section for the full explanation and source context.

Section Loading from Pretrained

Continue reading this section for the full explanation and source context.

Section Using getpeftmodel

Continue reading this section for the full explanation and source context.

Section Loading Parameters

Continue reading this section for the full explanation and source context.

Model Loading and Saving

Overview

The PEFT (Parameter-Efficient Fine-Tuning) library provides a comprehensive system for loading, saving, and managing adapter-based model configurations. This system enables users to efficiently fine-tune large language models by training only a small subset of parameters while maintaining the ability to save, load, and merge adapters with the base model.

The loading and saving architecture in PEFT is designed to be:

Interoperable: Adapters can be shared via Hugging Face Hub
Flexible: Multiple adapters can coexist and be switched dynamically
Memory-efficient: Supports low CPU memory usage during loading
Non-destructive: Original base models remain unmodified

Sources: src/peft/tuners/tuners_utils.py:1-50

Architecture

graph TD
    A[Base Model] --> B[PeftModel]
    B --> C[Adapter 1]
    B --> D[Adapter 2]
    B --> N[Adapter N]
    
    E[save_pretrained] --> F[adapter_config.json]
    E --> G[adapter_model.safetensors]
    
    H[from_pretrained] --> I[Load Base Model]
    H --> J[Load Adapter Config]
    H --> K[Inject Adapters]
    
    L[merge_and_unload] --> M[Merged Base Model]
    L --> N[No Adapters]
    
    O[unload] --> P[Original Base Model]
    O --> Q[Adapters Removed]

Loading PEFT Models

Loading from Pretrained

The PeftModel.from_pretrained() class method loads a PEFT model configuration and applies it to a base model:

from peft import PeftModel, PeftConfig

# Load PEFT configuration
peft_config = PeftConfig.from_pretrained("path/to/peft_model")

# Load base model
base_model = AutoModelForCausalLM.from_pretrained("base_model_name")

# Create PEFT model with loaded adapters
peft_model = PeftModel.from_pretrained(base_model, "path/to/peft_model")

Using get_peft_model

For creating new PEFT models from scratch:

from peft import get_peft_model, LoraConfig, TaskType

config = LoraConfig(
    task_type=TaskType.CAUSAL_LM,
    r=16,
    lora_alpha=32,
    target_modules=["q_proj", "v_proj"],
    lora_dropout=0.05,
)

model = AutoModelForCausalLM.from_pretrained("base_model")
peft_model = get_peft_model(model, config)

Sources: src/peft/peft_model.py:1-100

Loading Parameters

Parameter	Type	Default	Description
`model`	`torch.nn.Module`	Required	The base model to apply PEFT to
`model_id`	`str`	Required	Path or HF Hub identifier for PEFT checkpoint
`adapter_name`	`str`	"default"	Name for the loaded adapter
`is_trainable`	`bool`	`False`	Whether adapter should be trainable
`low_cpu_mem_usage`	`bool`	`False`	Create weights on meta device for faster loading
`torch_dtype`	`torch.dtype`	None	Data type for loaded weights
`device_map`	`str/dict`	None	Device placement strategy

Sources: src/peft/peft_model.py:100-200

Saving PEFT Models

Saving to Disk

The save_pretrained() method saves the PEFT adapter weights and configuration:

peft_model.save_pretrained("output/path")

This creates:

adapter_config.json - Adapter configuration
adapter_model.safetensors - Adapter weights

Save Configuration Options

Parameter	Type	Description
`save_adapters`	`bool`	Whether to save all adapters (default: `True`)
`adapter_names`	`List[str]`	Specific adapters to save (default: all active)
`safe_serialization`	`bool`	Use safetensors format (default: `True`)

Merging and Unloading

Merge and Unload

The merge_and_unload() method merges all adapter weights into the base model and returns the combined model:

from transformers import AutoModelForCausalLM

base_model = AutoModelForCausalLM.from_pretrained("base_model")
peft_model = PeftModel.from_pretrained(base_model, "path/to/peft_model")

# Merge adapters into base model
merged_model = peft_model.merge_and_unload()

This operation:

Combines adapter weights with base model weights
Removes PEFT wrapper layers
Returns a standard HuggingFace model

Sources: src/peft/tuners/tuners_utils.py:1-100

Safe Merge

For secure merging with validation:

merged_model = peft_model.merge_and_unload(safe_merge=True)

Safe merge checks tensor shapes and dtypes before merging to prevent corruption.

Unload

The unload() method removes all PEFT adapters and returns the original base model:

base_model = peft_model.unload()

Unlike merge_and_unload(), this operation:

Does not modify model weights
Simply removes PEFT wrapper layers
Returns the original base model unchanged

graph LR
    A[PeftModel] -->|merge_and_unload| B[Merged Base Model]
    A -->|unload| C[Original Base Model]
    
    B --> D[Combined Weights]
    C --> E[Original Weights Intact]

Sources: src/peft/tuners/tuners_utils.py:100-200

Merge Utilities

The merge_utils.py module provides low-level merging functions:

Function	Description
`merge_linear_weights`	Merges LoRA weights into linear layers
`merge_qkv_weights`	Merges QKV attention weights
`merge叠加`	Generic merge operation

Multi-Adapter Management

Adding Multiple Adapters

PEFT supports loading multiple adapters onto a single base model:

peft_model.load_adapter("path/to/adapter1", adapter_name="adapter1")
peft_model.load_adapter("path/to/adapter2", adapter_name="adapter2")

Switching Active Adapters

# Set active adapter
peft_model.set_adapter("adapter1")

# Enable adapter fusion for inference
peft_model.enable_fusion()

Merging Specific Adapters

# Merge only specific adapters
merged_model = peft_model.merge_and_unload(adapter_names=["adapter1"])

Signature Updates

When using PEFT models with adapters, the model signatures may differ from the base model. PEFT provides utility functions to update signatures:

Update Forward Signature

from peft import update_forward_signature

update_forward_signature(peft_model)

This allows help(peft_model.forward) to show the full signature including parameters from parent classes.

Update Generate Signature

from peft import update_generate_signature

update_generate_signature(peft_model)

Enables help(peft_model.generate) to display the complete generation parameters.

Sources: src/peft/helpers.py:1-100

Checking PEFT Models

Use check_if_peft_model() to verify if a model path contains a PEFT configuration:

from peft import check_if_peft_model

is_peft = check_if_peft_model("path/to/model")

This function:

Attempts to load a adapter_config.json
Returns True if valid PEFT config found
Returns False otherwise

Sources: src/peft/helpers.py:100-200

Loading with Quantization

PEFT models can be loaded with quantized base models using BitsAndBytes:

from transformers import AutoModelForCausalLM, BitsAndBytesConfig
from peft import prepare_model_for_kbit_training

quantization_config = BitsAndBytesConfig(load_in_8bit=True)
base_model = AutoModelForCausalLM.from_pretrained(
    "model_name",
    quantization_config=quantization_config,
)

base_model = prepare_model_for_kbit_training(base_model)
peft_model = get_peft_model(base_model, lora_config)

Sources: src/peft/tuners/lora/model.py:1-100

Rescaling Adapter Scale

The rescale_adapter_scale() context manager temporarily adjusts adapter scaling:

from peft import rescale_adapter_scale

with rescale_adapter_scale(model, multiplier=0.5):
    output = model(inputs)  # Scaled by 0.5

Sources: src/peft/helpers.py:200-300

Workflow Diagram

graph TD
    A[Start] --> B{Load Base Model}
    B --> C[Load PEFT Config]
    C --> D{Existing Adapter?}
    
    D -->|Yes| E[from_pretrained]
    D -->|No| F[get_peft_model]
    
    E --> G[PeftModel with Adapters]
    F --> H[PeftModel with New Config]
    
    G --> I{Training}
    H --> I
    
    I --> J[Train Adapters]
    J --> K[save_pretrained]
    
    K --> L[Share via Hub]
    
    I --> M{Inference}
    M --> N{Use Merged?}
    
    N -->|Yes| O[merge_and_unload]
    N -->|No| P[Use with Adapters]
    
    O --> Q[Merged Model]
    P --> R[Forward with Adapters]

Best Practices

Memory Optimization: Use low_cpu_mem_usage=True when loading large adapters to speed up the process
Safe Serialization: Always use save_pretrained() with safe_serialization=True (default) for secure model sharing
Multiple Adapters: Load adapters with distinct names and switch between them using set_adapter()
Signature Updates: Call update_forward_signature() and update_generate_signature() for better IDE support
Quantization: Prepare quantized models with prepare_model_for_kbit_training() before applying PEFT

Sources: [src/peft/tuners/tuners_utils.py:1-50]()

Quantization Integration

Related topics: LoRA and LoRA Variants, Model Loading and Saving, Advanced Features

Section Related Pages

Continue reading this section for the full explanation and source context.

Section Quantization Integration Flow

Continue reading this section for the full explanation and source context.

Section Module Replacement Strategy

Continue reading this section for the full explanation and source context.

Section Configuration

Continue reading this section for the full explanation and source context.

Quantization Integration

PEFT (Parameter-Efficient Fine-Tuning) provides comprehensive support for integrating quantized base models with various parameter-efficient fine-tuning methods. This integration enables training large models that would otherwise require prohibitive amounts of memory by combining quantization techniques with PEFT adapters.

Overview

Quantization integration in PEFT allows users to:

Load base models in quantized form (8-bit, 4-bit, or other formats) to reduce memory footprint
Apply PEFT adapters (LoRA, IA³, LoHa, LoKr, etc.) on top of quantized layers
Fine-tune the adapters while keeping the quantized base model frozen
Maintain model quality while significantly reducing GPU memory requirements

Sources: src/peft/tuners/lora/model.py

Supported Quantization Methods

PEFT supports multiple quantization backends through integration with popular quantization libraries.

Quantization Method	Backend Library	Precision Options	Status
BitsAndBytes	`bitsandbytes`	8-bit, 4-bit	Fully Supported
GPTQ	`auto-gptq`	4-bit	Fully Supported
AWQ	`awq`	4-bit	Fully Supported
AQLM	`aqlm`	Mixed-bit	Fully Supported
EETQ	`eetq`	8-bit	Fully Supported
HQQ	`hqq`	Configurable	Fully Supported

Architecture

Quantization Integration Flow

graph TD
    A[Base Model Loading] --> B{Quantization Backend}
    B -->|bitsandbytes| C[BitsAndBytes 8-bit/4-bit]
    B -->|GPTQ| D[GPTQ 4-bit]
    B -->|AWQ| E[AWQ 4-bit]
    B -->|AQLM| F[AQLM]
    B -->|EETQ| G[EETQ 8-bit]
    B -->|HQQ| H[HQQ]
    
    C --> I[PEFT Adapter Injection]
    D --> I
    E --> I
    F --> I
    G --> I
    H --> I
    
    I --> J[LoRA / IA³ / LoHa / LoKr Layers]
    J --> K[Fine-tuning with Frozen Quantized Base]

Module Replacement Strategy

When applying PEFT adapters to quantized models, the system replaces specific linear layers with quantized-aware versions that preserve quantization state.

graph LR
    A[Original Linear / Quantized Linear] --> B{Is Quantized?}
    B -->|Yes - 8-bit| C[Linear8bitLt + Adapter]
    B -->|Yes - 4-bit| D[Linear4bit + Adapter]
    B -->|No| E[Linear + Adapter]
    
    C --> F[Forward with Quantization]
    D --> F
    E --> F

BitsAndBytes Integration

The BitsAndBytes integration provides 8-bit and 4-bit quantization support through the bitsandbytes library.

Configuration

from transformers import AutoModelForCausalLM, BitsAndBytesConfig
from peft import get_peft_model, LoraConfig

quantization_config = BitsAndBytesConfig(
    load_in_8bit=True  # or load_in_4bit=True
)

model = AutoModelForCausalLM.from_pretrained(
    "model_name",
    quantization_config=quantization_config,
)

peft_config = LoraConfig(
    r=16,
    lora_alpha=32,
    target_modules=["q_proj", "v_proj"],
)

peft_model = get_peft_model(model, peft_config)

8-bit Layer Implementation

When loading an 8-bit model, PEFT replaces standard linear layers with Linear8bitLt that inherits quantization state from the base layer:

# From src/peft/tuners/ia3/model.py
if loaded_in_8bit and isinstance(target_base_layer, bnb.nn.Linear8bitLt):
    eightbit_kwargs = kwargs.copy()
    eightbit_kwargs.update(
        {
            "has_fp16_weights": target_base_layer.state.has_fp16_weights,
            "threshold": target_base_layer.state.threshold,
            "index": target_base_layer.index,
        }
    )

Sources: src/peft/tuners/ia3/model.py:40-49

4-bit Layer Implementation

Similarly, 4-bit quantized layers are handled with Linear4bit:

if loaded_in_4bit and isinstance(target_base_layer, bnb.nn.Linear4bit):
    fourbit_kwargs = kwargs.copy()
    fourbit_kwargs.update(
        {
            "quant_type": target_base_layer.quant_type,
            "compute_dtype": target_base_layer.compute_dtype,
            "compress_statistics": target_base_layer.weight._quantize_state,
        }
    )

Sources: src/peft/tuners/ia3/model.py:50-56

Preparing Quantized Models for Training

PEFT provides the prepare_model_for_kbit_training utility function to prepare quantized models for training with PEFT adapters.

Function Signature

def prepare_model_for_kbit_training(
    model,
    use_gradient_checkpointing: bool = True,
    layer_replication: Optional[List[Tuple[int, int]]] = None,
):

Sources: src/peft/helpers.py

Key Operations

Gradient Checkpointing: Enables gradient checkpointing to save memory during backpropagation
Layer Replication: Optionally replicates layers for certain architectures
Cast Forward Parameters: Ensures proper dtype handling for training

Usage Example

from peft import prepare_model_for_kbit_training

# After loading quantized model
model = AutoModelForCausalLM.from_pretrained(
    "mistralai/Mistral-7B-Instruct-v0.1",
    quantization_config=int8_config,
    device_map="cuda:0",
)

# Prepare for k-bit training
model = prepare_model_for_kbit_training(model, use_gradient_checkpointing=True)

Supported Tuners with Quantization

All major PEFT tuners support integration with quantized base models:

Tuner	8-bit Support	4-bit Support	File Location
LoRA	✅	✅	`src/peft/tuners/lora/`
IA³	✅	✅	`src/peft/tuners/ia3/`
LoHa	✅	✅	`src/peft/tuners/loha/`
LoKr	✅	✅	`src/peft/tuners/lokr/`
AdaLoRA	✅	✅	`src/peft/tuners/adalora/`
OALoRA	✅	✅	`src/peft/tuners/oaloora/`

Layer Class Mappings

Each tuner defines specific layer mappings for different layer types:

# From src/peft/tuners/lokr/model.py
layers_mapping: dict[type[torch.nn.Module], type[LoKrLayer]] = {
    torch.nn.Conv2d: Conv2d,
    torch.nn.Conv1d: Conv1d,
    torch.nn.Linear: Linear,
}

# From src/peft/tuners/loha/model.py  
layers_mapping: dict[type[torch.nn.Module], type[LoHaLayer]] = {
    torch.nn.Conv2d: Conv2d,
    torch.nn.Conv1d: Conv1d,
    torch.nn.Linear: Linear,
}

Sources: src/peft/tuners/lokr/model.py:87-90 Sources: src/peft/tuners/loha/model.py:79-82

Base Tuner Layer Properties

All quantized-aware tuner layers inherit from BaseTunerLayer which provides key functionality:

Key Methods

Method	Purpose
`get_base_layer()`	Retrieves the underlying base layer (quantized or not)
`update_layer()`	Updates adapter weights for existing layers
`merge()`	Merges adapter weights into base layer
`unmerge()`	Separates merged adapter weights

if isinstance(target, BaseTunerLayer):
    target_base_layer = target.get_base_layer()
else:
    target_base_layer = target

Sources: src/peft/tuners/ia3/model.py:34-37

Adapter Management with Quantization

Creating New Modules

When creating new adapter modules for quantized layers:

Detect the quantization state from the base layer
Preserve quantization parameters (thresholds, compute dtype, etc.)
Create appropriate quantized-aware adapter layer

sequenceDiagram
    participant Base as Base Model (Quantized)
    participant PEFT as PEFT System
    participant Adapter as Adapter Layer
    
    Base->>PEFT: Target Linear Layer
    PEFT->>PEFT: Detect 8-bit / 4-bit quantization
    PEFT->>Adapter: Create with quantization state
    Adapter->>Base: Store reference + quantization params

Multiple Adapters

PEFT supports multiple adapters on quantized models through the active_adapters mechanism:

# Adding additional adapters to quantized model
if adapter_name not in self.active_adapters:
    # adding an additional adapter: it is not automatically trainable
    new_module.requires_grad_(False)

Sources: src/peft/tuners/loha/model.py:1 Sources: src/peft/tuners/lokr/model.py:1

Memory Efficiency Considerations

Memory Breakdown

Component	Full Precision	8-bit	4-bit
Base Model	~70GB	~35GB	~18GB
Gradients	~70GB	~70GB	~70GB
Activations	Variable	Variable	Variable
Optimizer	~280GB	~280GB	~280GB

Best Practices

Use Gradient Checkpointing: Reduces activation memory at cost of extra compute
Target Specific Modules: Only apply adapters to key layers (q_proj, v_proj)
Batch Size: Start with small batch sizes and scale based on available memory
Mixed Precision: Use bfloat16 for gradients when possible

Context Manager for Adapter Scaling

PEFT provides rescale_adapter_scale for temporarily adjusting adapter scaling:

@contextmanager
def rescale_adapter_scale(model, multiplier):
    """
    Context manager to temporarily rescale the scaling of the LoRA adapter.
    
    The original scaling values are restored when the context manager exits.
    """

Sources: src/peft/helpers.py:80-90

Error Handling

Common Issues

Error	Cause	Solution
TypeError on forward	Quantization state not preserved	Ensure proper layer replacement
OOM during forward	Batch size too large	Reduce batch size, use gradient checkpointing
Mismatched dtypes	Mixed precision issues	Cast to consistent dtype before training

Verification Steps

Verify quantization config is properly set
Confirm adapter layers are correctly injected
Check that gradient checkpointing is enabled for large models

Configuration Reference

BitsAndBytesConfig Options

Parameter	Type	Default	Description
`load_in_8bit`	bool	False	Load model in 8-bit
`load_in_4bit`	bool	False	Load model in 4-bit
`llm_int8_threshold`	float	6.0	Outlier threshold for 8-bit
`llm_int8_skip_modules`	List	None	Modules to skip 8-bit conversion
`llm_int8_enable_fp32_cpu_offload`	bool	False	Enable CPU offload for32-bit tensors

Advanced Features

Related topics: Quantization Integration

Section Related Pages

Continue reading this section for the full explanation and source context.

Section Overview

Continue reading this section for the full explanation and source context.

Section Architecture

Continue reading this section for the full explanation and source context.

Section Supported Adapter Combinations

Continue reading this section for the full explanation and source context.

Related topics: Quantization Integration

Advanced Features

PEFT (Parameter-Efficient Fine-Tuning) provides a comprehensive suite of advanced features that extend beyond basic adapter-based fine-tuning. These features enable sophisticated model adaptation strategies, including mixed adapter configurations, runtime adapter switching, distributed training support, and advanced optimization techniques.

Mixed Adapter Models

Mixed adapter models allow multiple adapter types to coexist within a single base model. This powerful feature enables combining different fine-tuning techniques to leverage their respective strengths.

Overview

The mixed model architecture in PEFT allows a base model to have multiple adapters of different types applied simultaneously. This is particularly useful when different adapters excel at different aspects of a task, or when you want to experiment with combining adapter strengths.

The mixed model functionality is implemented across two primary modules:

Module	File Path	Purpose
`PeftMixedModel`	`src/peft/mixed_model.py`	Base mixed model class
`MixedModel`	`src/peft/tuners/mixed/model.py`	Tuner-specific mixed model implementation

Architecture

graph TD
    A[Base Model] --> B[Mixed Adapter Layer]
    B --> C[LoRA Adapter]
    B --> D[IA³ Adapter]
    B --> E[AdaLoRA Adapter]
    B --> N[Additional Adapters]
    
    F[Adapter Config 1] --> C
    G[Adapter Config 2] --> D
    H[Adapter Config 3] --> E
    
    I[Active Adapter Selection] --> B
    J[Multi-Adapter Inference] --> B

Supported Adapter Combinations

PEFT supports multiple tuner types that can be combined in mixed configurations:

Tuner Type	Prefix	Description
LoRA	`lora_`	Low-Rank Adaptation
AdaLoRA	`adalora_`	Adaptive LoRA with budget allocation
IA³	`ia3_`	(IA)³ - Learnable input/output/residual scaling
OFT	`oft_`	Orthogonal Fine-Tuning
HRA	`hra_`	Hypernetwork-based Rank Adaptation
HiRA	`hira_`	Hierarchical Rank Adaptation
SHiRA	`shira_`	Structured Hiera rchy-aware Rank Adaptation
GraLoRA	`gralora_`	Gradient-aware LoRA
MiSS	`miss_`	Multi-adapter Image-to-Image Spatial Shift
AdaMSS	`adamss_`	Adaptive Multi-subspace Schur Complement
X-LoRA	`xlora_`	Extended LoRA with quantization support
Poly	`poly_`	Polynomial projection-based adaptation

Key Implementation Details

Each tuner in PEFT defines specific attributes that enable mixed adapter support:

# Common tuner model attributes
prefix: str  # Unique prefix for the tuner (e.g., "lora_", "ia3_")
tuner_layer_cls = SpecificLayerClass  # The layer class for this tuner
target_module_mapping = {...}  # Mapping of model types to target modules

The mixed model implementation handles adapter creation through the _create_and_replace method, which validates the current key and delegates to appropriate adapter-specific logic.

Sources: src/peft/tuners/shira/model.py:1-50 Sources: src/peft/tuners/mixed/model.py

Adapter Hotswap

The hotswap feature enables runtime replacement of adapters without requiring full model reload. This is essential for production environments where model availability must be maintained during adapter updates.

Purpose

Adapter hotswapping allows you to:

Replace a deployed adapter with an updated version
Switch between different fine-tuned adapters for different tasks
Update model capabilities without downtime
A/B test different adapter versions in production

Implementation

The hotswap functionality is implemented in src/peft/utils/hotswap.py and provides the hotswap_adapter function for runtime adapter replacement.

def hotswap_adapter(
    model: "PeftModel",
    model_name_or_path: str,
    adapter_name: str,
    torch_device: Optional[str] = None,
    **kwargs,
) -> None:

Parameters

Parameter	Type	Description
`model`	`PeftModel`	The PEFT model with the loaded adapter
`model_name_or_path`	`str`	Path or identifier for the new adapter
`adapter_name`	`str`	Name of the adapter to replace (e.g., `"default"`)
`torch_device`	`str`, optional	Target device for adapter weights
`**kwargs`		Additional arguments for config/weight loading

Workflow

graph TD
    A[Load New Adapter Config] --> B[Validate Adapter Type]
    B --> C[Load Adapter Weights to Device]
    C --> D[Validate Weight Compatibility]
    D --> E[Replace Adapter Weights in Model]
    E --> F[Update Model State]
    F --> G[Model Ready for Inference]
    
    H[Inference with New Adapter] -.-> G

Usage Example

from peft import hotswap_adapter

# Replace the "default" lora adapter with a new one
hotswap_adapter(model, "path-to-new-adapter", adapter_name="default", torch_device="cuda:0")

# Use the updated model
with torch.inference_mode():
    output = model(inputs).logits

Configuration Validation

During hotswap, the system performs several validations:

Config Loading: Loads the new adapter configuration using config_cls.from_pretrained()
Type Matching: Ensures the new adapter type is compatible with existing adapters
Weight Loading: Loads weights onto the specified device with appropriate quantization settings

Sources: src/peft/utils/hotswap.py:1-80 Sources: docs/source/developer_guides/checkpoint.md

Incremental PCA Utilities

PEFT includes incremental PCA utilities for advanced analysis and optimization of adapter matrices. Incremental PCA is particularly useful for:

Analyzing the rank structure of trained adapters
Identifying redundant parameters in low-rank adaptations
Computing principal components in a memory-efficient manner

Implementation

The incremental PCA implementation is located in src/peft/utils/incremental_pca.py. This utility supports processing large matrices in batches to avoid memory constraints.

Key Features

Feature	Description
Batch Processing	Process large matrices incrementally
Memory Efficiency	Avoid loading entire matrices into memory
Rank Analysis	Determine effective rank of adapter matrices
Component Extraction	Extract principal components for analysis

Use Cases

Adapter Analysis: Understand the dimensionality requirements of trained adapters
Compression: Identify opportunities for matrix rank reduction
Quality Assessment: Verify that low-rank approximations maintain sufficient information

Sources: src/peft/utils/incremental_pca.py

Distributed Training Support

PEFT provides comprehensive support for distributed training frameworks, enabling efficient fine-tuning of large models across multiple devices and nodes.

DeepSpeed Integration

PEFT integrates with DeepSpeed ZeRO optimizations for memory-efficient distributed training.

#### Features

Feature	Description
ZeRO Stage 2/3	Partition optimizer states across devices
CPU Offload	Offload parameters/optimizer states to CPU
Activation Checkpointing	Reduce memory for activations
Mixed Precision	FP16/BF16 training support

#### Configuration

from peft import LoraConfig, get_peft_model
from transformers import AutoModelForCausalLM

model = AutoModelForCausalLM.from_pretrained("meta-llama/Llama-2-7b-hf")
peft_config = LoraConfig(
    r=16,
    lora_alpha=32,
    target_modules=["q_proj", "v_proj"],
    lora_dropout=0.05,
)

peft_model = get_peft_model(model, peft_config)
# Train with DeepSpeed ZeRO-3 config

#### Key Considerations

Only non-trainable weights should remain on the original device when using PEFT with DeepSpeed
Trainable adapter weights are managed by DeepSpeed's optimizer partitioning
Offloading should be configured at the DeepSpeed level, not within PEFT configs

Sources: docs/source/accelerate/deepspeed.md

FSDP Integration

Fully Sharded Data Parallel (FSDP) support enables sharding model parameters, gradients, and optimizer states across GPUs.

#### Features

Feature	Description
Parameter Sharding	Distribute model parameters across GPUs
Gradient Sharding	Partition gradients during backward pass
Optimizer Sharding	Distribute optimizer states
Mixed Precision	Automatic FP16/BF16 handling

#### Configuration with Accelerate

# accelerate config.yaml
compute_environment: LOCAL_MACHINE
distributed_type: FSDP
fsdp_config:
  fsdp_sharding_strategy: FULL_SHARD
  fsdp_auto_wrap_policy: TRANSFORMER_BASED_WRAP
  fsdp_backward_prefetch: BACKWARD_PRE
  fsdp_state_dict_type: FULL_STATE_DICT

#### Compatibility Notes

FSDP support requires transformers>=4.36.0
Auto-wrap policies should wrap transformer layers containing PEFT adapters
State dict type should be FULL_STATE_DICT for checkpoint saving

Sources: docs/source/accelerate/fsdp.md

Advanced Tuner Configurations

AdaLoRA - Adaptive Budget Allocation

AdaLoRA implements an intelligent budget allocation strategy that dynamically adjusts the rank of different adapter matrices during training.

#### Training Workflow

graph TD
    A[Initialize with Uniform Rank] --> B[Forward Pass]
    B --> C[Calculate Importance Scores]
    C --> D{Global Step < Total - T_final?}
    D -->|Yes| E[Update Rank Pattern]
    E --> B
    D -->|No| F{Mask Unimportant Weights}
    F --> G[Finalize Adapter]

#### Key Parameters

Parameter	Description
`r`	Initial rank for all adapters
`total_step`	Total training steps
`tinit`	Steps for initial warmup
`tfinal`	Steps for final budget freezing
`deltaT`	Interval between rank adjustments

Sources: src/peft/tuners/adalora/model.py:1-100

X-LoRA - Extended LoRA with Quantization

X-LoRA supports advanced configurations including quantization-aware training and multi-adapter loading.

#### Features

Feature	Description
8-bit Quantization	Load base models in int8 format
4-bit Quantization	Load base models in int4 format
Flash Attention	Integration with flash_attention_2
Ephemeral GPU Offload	Temporary GPU memory management
Multiple Adapter Loading	Load multiple adapters simultaneously

#### Configuration

from peft import XLoraConfig, get_peft_model
from transformers import AutoModelForCausalLM, BitsAndBytesConfig
import torch

quantization_config = BitsAndBytesConfig(load_in_8bit=True)
model = AutoModelForCausalLM.from_pretrained(
    "mistralai/Mistral-7B-Instruct-v0.1",
    quantization_config=quantization_config,
    device_map="cuda:0",
)
xlora_model = get_peft_model(model, config)

Sources: src/peft/tuners/xlora/model.py:1-80

IA³ - Learned Initiation and Adaptation

The (IA)³ method applies learnable scaling vectors to key components of transformer models.

#### Target Modules

Model Type	Target Modules
Encoder-only	`q_proj`, `v_proj`, `k_proj`, `output_proj`
Decoder-only	`q_proj`, `v_proj`, `k_proj`, `output_proj`, `fc1`
Seq2Seq	`q_proj`, `v_proj`, `k_proj`, `output_proj`, `fc1`, `fc2`

#### Implementation Details

The IA³ implementation creates scaling vectors that are multiplied with the hidden states at specific positions in the forward pass. The scaling vectors are initialized to ones (neutral) and learned during training.

Sources: src/peft/tuners/ia3/model.py:1-80

Helper Functions

PEFT provides utility functions for common operations that enhance the developer experience.

Signature Management

#### update_forward_signature

Updates the forward signature of a PeftModel to include the base model's signature, enabling proper IDE autocompletion and documentation.

from peft import update_forward_signature

update_forward_signature(peft_model)
help(peft_model.forward)  # Now shows complete signature

#### update_generate_signature

Similar to forward signature update but for the generate method, essential for seq2seq models.

from peft import update_generate_signature

update_generate_signature(peft_model)
help(peft_model.generate)  # Now shows complete signature

Model Validation

#### check_if_peft_model

Validates whether a model path or identifier corresponds to a PEFT model by attempting to load its configuration.

from peft import check_if_peft_model

is_peft = check_if_peft_model("meta-llama/Llama-2-7b-adapter")
# Returns: True or False

Adapter Scale Context Manager

The rescale_adapter_scale context manager temporarily adjusts adapter scaling factors, useful for controlled inference experiments.

from peft.utils import rescale_adapter_scale

with rescale_adapter_scale(model, multiplier=0.5):
    output = model(inputs)  # Scaled by 0.5
# Original scaling restored after context exit

Sources: src/peft/helpers.py:1-150

Task-Specific Models

PEFT provides specialized model classes optimized for different task types.

Task Type	Model Class	Use Case
Feature Extraction	`PeftModelForFeatureExtraction`	Extracting embeddings
Question Answering	`PeftModelForQuestionAnswering`	QA tasks
Sequence Classification	`PeftModelForSequenceClassification`	Text classification
Token Classification	`PeftModelForTokenClassification`	NER, POS tagging
Seq2Seq LM	`PeftModelForSeq2SeqLM`	Translation, summarization

Common Initialization Pattern

All task-specific models follow a consistent initialization pattern:

def __init__(
    self,
    model: torch.nn.Module,
    peft_config: PeftConfig,
    adapter_name: str = "default",
    **kwargs,
) -> None:
    super().__init__(model, peft_config, adapter_name, **kwargs)

Each model class may add task-specific module name patterns for modules to save (e.g., classifier layers in sequence classification models).

Sources: src/peft/peft_model.py:1-200

Summary

PEFT's advanced features provide a comprehensive toolkit for parameter-efficient model adaptation:

Category	Features
Mixed Adapters	Multiple adapter types per model
Runtime Switching	Adapter hotswap without reload
Analysis Tools	Incremental PCA for matrix analysis
Distributed Training	DeepSpeed ZeRO, FSDP support
Advanced Tuners	AdaLoRA, X-LoRA, IA³, OFT, and more
Developer Utilities	Signature management, validation helpers

These features enable both research experimentation and production deployment of efficient fine-tuning solutions across a wide range of model architectures and training configurations.

Sources: [src/peft/tuners/shira/model.py:1-50](https://github.com/huggingface/peft/blob/main/src/peft/tuners/shira/model.py)

Doramagic Pitfall Log

Source-linked risks stay visible on the manual page so the preview does not read like a recommendation.

high [BUG] peft 0.19 target_modules (str) use `set`

Users may get misleading failures or incomplete behavior unless configuration is checked carefully.

high Comparison of Different Fine-Tuning Techniques for Conversational AI

The project may affect permissions, credentials, data exposure, or host boundaries.

medium Feature Request: Improve offline support for custom architectures in get_peft_model_state_dict

First-time setup may fail or require extra isolation and rollback planning.

medium 0.17.0: SHiRA, MiSS, LoRA for MoE, and more

Users may get misleading failures or incomplete behavior unless configuration is checked carefully.

Doramagic Pitfall Log

Doramagic extracted 16 source-linked risk signals. Review them before installing or handing real data to the project.

1. Configuration risk: [BUG] peft 0.19 target_modules (str) use `set`

Severity: high
Finding: Configuration risk is backed by a source signal: [BUG] peft 0.19 target_modules (str) use set. Treat it as a review item until the current version is checked.
User impact: Users may get misleading failures or incomplete behavior unless configuration is checked carefully.
Recommended check: Open the linked source, confirm whether it still applies to the current version, and keep the first run isolated.
Evidence: Source-linked evidence: https://github.com/huggingface/peft/issues/3229

2. Security or permission risk: Comparison of Different Fine-Tuning Techniques for Conversational AI

Severity: high
Finding: Security or permission risk is backed by a source signal: Comparison of Different Fine-Tuning Techniques for Conversational AI. Treat it as a review item until the current version is checked.
User impact: The project may affect permissions, credentials, data exposure, or host boundaries.
Recommended check: Open the linked source, confirm whether it still applies to the current version, and keep the first run isolated.
Evidence: Source-linked evidence: https://github.com/huggingface/peft/issues/2310

3. Installation risk: Feature Request: Improve offline support for custom architectures in get_peft_model_state_dict

Severity: medium
Finding: Installation risk is backed by a source signal: Feature Request: Improve offline support for custom architectures in get_peft_model_state_dict. Treat it as a review item until the current version is checked.
User impact: First-time setup may fail or require extra isolation and rollback planning.
Recommended check: Open the linked source, confirm whether it still applies to the current version, and keep the first run isolated.
Evidence: Source-linked evidence: https://github.com/huggingface/peft/issues/3211

4. Configuration risk: 0.17.0: SHiRA, MiSS, LoRA for MoE, and more

Severity: medium
Finding: Configuration risk is backed by a source signal: 0.17.0: SHiRA, MiSS, LoRA for MoE, and more. Treat it as a review item until the current version is checked.
User impact: Users may get misleading failures or incomplete behavior unless configuration is checked carefully.
Recommended check: Open the linked source, confirm whether it still applies to the current version, and keep the first run isolated.
Evidence: Source-linked evidence: https://github.com/huggingface/peft/releases/tag/v0.17.0

5. Configuration risk: Applying Dora to o_proj of Meta-Llama-3.1-8B results in NaN

Severity: medium
Finding: Configuration risk is backed by a source signal: Applying Dora to o_proj of Meta-Llama-3.1-8B results in NaN. Treat it as a review item until the current version is checked.
User impact: Users may get misleading failures or incomplete behavior unless configuration is checked carefully.
Recommended check: Open the linked source, confirm whether it still applies to the current version, and keep the first run isolated.
Evidence: Source-linked evidence: https://github.com/huggingface/peft/issues/2049

6. Capability assumption: README/documentation is current enough for a first validation pass.

Severity: medium
Finding: README/documentation is current enough for a first validation pass.
User impact: The project should not be treated as fully validated until this signal is reviewed.
Recommended check: Open the linked source, confirm whether it still applies to the current version, and keep the first run isolated.
Evidence: capability.assumptions | github_repo:570384908 | https://github.com/huggingface/peft | README/documentation is current enough for a first validation pass.

7. Project risk: 0.17.1

Severity: medium
Finding: Project risk is backed by a source signal: 0.17.1. Treat it as a review item until the current version is checked.
User impact: The project should not be treated as fully validated until this signal is reviewed.
Recommended check: Open the linked source, confirm whether it still applies to the current version, and keep the first run isolated.
Evidence: Source-linked evidence: https://github.com/huggingface/peft/releases/tag/v0.17.1

8. Project risk: v0.15.1

Severity: medium
Finding: Project risk is backed by a source signal: v0.15.1. Treat it as a review item until the current version is checked.
User impact: The project should not be treated as fully validated until this signal is reviewed.
Recommended check: Open the linked source, confirm whether it still applies to the current version, and keep the first run isolated.
Evidence: Source-linked evidence: https://github.com/huggingface/peft/releases/tag/v0.15.1

9. Project risk: v0.15.2

Severity: medium
Finding: Project risk is backed by a source signal: v0.15.2. Treat it as a review item until the current version is checked.
User impact: The project should not be treated as fully validated until this signal is reviewed.
Recommended check: Open the linked source, confirm whether it still applies to the current version, and keep the first run isolated.
Evidence: Source-linked evidence: https://github.com/huggingface/peft/releases/tag/v0.15.2

10. Maintenance risk: 0.16.0: LoRA-FA, RandLoRA, C³A, and much more

Severity: medium
Finding: Maintenance risk is backed by a source signal: 0.16.0: LoRA-FA, RandLoRA, C³A, and much more. Treat it as a review item until the current version is checked.
User impact: Users cannot judge support quality until recent activity, releases, and issue response are checked.
Recommended check: Open the linked source, confirm whether it still applies to the current version, and keep the first run isolated.
Evidence: Source-linked evidence: https://github.com/huggingface/peft/releases/tag/v0.16.0

11. Maintenance risk: Maintainer activity is unknown

Severity: medium
Finding: Maintenance risk is backed by a source signal: Maintainer activity is unknown. Treat it as a review item until the current version is checked.
User impact: Users cannot judge support quality until recent activity, releases, and issue response are checked.
Recommended check: Open the linked source, confirm whether it still applies to the current version, and keep the first run isolated.
Evidence: evidence.maintainer_signals | github_repo:570384908 | https://github.com/huggingface/peft | last_activity_observed missing

12. Security or permission risk: no_demo

Severity: medium
Finding: no_demo
User impact: The project may affect permissions, credentials, data exposure, or host boundaries.
Recommended check: Open the linked source, confirm whether it still applies to the current version, and keep the first run isolated.
Evidence: downstream_validation.risk_items | github_repo:570384908 | https://github.com/huggingface/peft | no_demo; severity=medium

Source: Doramagic discovery, validation, and Project Pack records

Community Discussion Evidence

These external discussion links are review inputs, not standalone proof that the project is production-ready.

Sources 12

Count of project-level external discussion links exposed on this manual page.

Use Review before install

Open the linked issues or discussions before treating the pack as ready for your environment.

Community Discussion Evidence

Doramagic exposes project-level community discussion separately from official documentation. Review these links before using peft with real data or production workflows.

Feature Request: Improve offline support for custom architectures in get - github / github_issue
Applying Dora to o_proj of Meta-Llama-3.1-8B results in NaN - github / github_issue
Comparison of Different Fine-Tuning Techniques for Conversational AI - github / github_issue
[[BUG] peft 0.19 target_modules (str) use set](https://github.com/huggingface/peft/issues/3229) - github / github_issue
v0.19.1 - github / github_release
v0.19.0 - github / github_release
0.18.1 - github / github_release
0.18.0: RoAd, ALoRA, Arrow, WaveFT, DeLoRA, OSF, and more - github / github_release
0.17.1 - github / github_release
0.17.0: SHiRA, MiSS, LoRA for MoE, and more - github / github_release
0.16.0: LoRA-FA, RandLoRA, C³A, and much more - github / github_release
v0.15.2 - github / github_release

Source: Project Pack community evidence and pitfall evidence

peft

Introduction to PEFT

Related Pages

Introduction to PEFT

Overview

Core Architecture

Design Philosophy

Component Hierarchy

Supported Fine-Tuning Methods

Task Types

Task-Specific Models

Core API

PeftModel Class

Loading Pre-trained Adapters

Merging and Unloading

Adapter Management

Multi-Adapter Support

Hotswap Adapter

Configuration Options

Common Parameters

Method-Specific Parameters

Advanced Features

Dynamic Rank Allocation

Hierarchical Adaptation

Quantization Support

Helper Functions

Signature Updates

Model Validation

Adapter Scale Rescaling

Memory Optimization

Low CPU Memory Usage

Training with Quantized Models

Integration Patterns

With Diffusers

Cross-Modal Applications

Workflow Diagram

Best Practices

Conclusion

Installation Guide

Related Pages

Installation Guide

Overview

System Requirements

Hardware Requirements

Software Requirements

Installation Methods

Standard Installation via pip

Installing Specific Versions

Installation from Source

Dependencies Structure

Core Dependencies

Optional Dependencies by Feature

Advanced Installation with Quantization

Environment Setup

Using Virtual Environments

CUDA Configuration

Verification and Testing

Basic Installation Verification

Quick Functionality Test

Signature Update Utilities

Tuner-Specific Installation Notes

LoRA and QLoRA

Prefix Tuning and Prompt Tuning

Diffusion Model Support (LoRA for Images)

X-LoRA Installation

Troubleshooting

Common Installation Issues

Version Compatibility

Verifying Adapter Loading

Adapter Hotswap Installation

Next Steps

Summary

System Architecture

Related Pages

System Architecture

Overview

High-Level Architecture Diagram

Core Components

1. PeftModel Base Class

2. BaseTuner Class