Doramagic Project Pack · Human Manual

peft

PEFT (Parameter-Efficient Fine-Tuning) is a Python library developed by Hugging Face that provides efficient methods for fine-tuning pre-trained models while keeping most model parameters ...

Introduction to PEFT

Related topics: Installation Guide, System Architecture, LoRA and LoRA Variants

Section Related Pages

Continue reading this section for the full explanation and source context.

Section Design Philosophy

Continue reading this section for the full explanation and source context.

Section Component Hierarchy

Continue reading this section for the full explanation and source context.

Section Task-Specific Models

Continue reading this section for the full explanation and source context.

Related topics: Installation Guide, System Architecture, LoRA and LoRA Variants

Introduction to PEFT

Overview

PEFT (Parameter-Efficient Fine-Tuning) is a Python library developed by Hugging Face that provides efficient methods for fine-tuning pre-trained models while keeping most model parameters frozen. This approach significantly reduces computational costs and memory requirements compared to full fine-tuning, making it accessible to work with large language models on limited hardware resources.

The library supports multiple fine-tuning techniques including LoRA, Prefix Tuning, Prompt Tuning, AdaLoRA, QLoRA, and many other parameter-efficient methods. PEFT is designed to integrate seamlessly with the Hugging Face Transformers ecosystem, allowing users to apply adapter-based fine-tuning with minimal code changes.

Sources: src/peft/tuners/lora/model.py:1-50

Core Architecture

Design Philosophy

PEFT follows an adapter-based architecture where lightweight trainable modules are added to pre-trained models. These adapters contain a small fraction of the total model parameters, typically ranging from 0.1% to 5% of the original model size, depending on the configuration.

The core principles of PEFT's architecture include:

  • Modularity: Each fine-tuning method is implemented as a separate "tuner" with its own configuration class
  • Composability: Multiple adapters can be loaded and used simultaneously
  • Compatibility: Full integration with Hugging Face Transformers and Diffusers
  • Memory Efficiency: Support for quantization and CPU offloading strategies

Sources: src/peft/tuners/tuners_utils.py:1-30

Component Hierarchy

graph TD
    A[PeftModel] --> B[BaseTuner]
    B --> C[Model Specific Tuners]
    C --> D[LoraModel]
    C --> E[PrefixTuningModel]
    C --> F[PromptTuningModel]
    C --> G[AdaLoRAModel]
    C --> H[QLoRAModel]
    C --> I[XLoraModel]
    C --> J[HiraModel]
    C --> K[GraloraModel]
    C --> L[AdamssModel]

Supported Fine-Tuning Methods

PEFT provides implementations for various parameter-efficient fine-tuning techniques. Each method has its own configuration class and model wrapper.

MethodConfiguration ClassDescription
LoRALoraConfigLow-Rank Adaptation using rank-decomposition matrices
Prefix TuningPrefixTuningConfigOptimizes continuous prompts prepended to layer inputs
Prompt TuningPromptTuningConfigTrains soft prompts embedded in the input layer
P-TuningP-tuningConfigUses trainable prompt embeddings with optional LSTM/MLP
AdaLoRAAdaLoraConfigAdaptive LoRA with dynamic rank allocation
QLoRAQLoRAConfigLoRA with quantized base models
IA³IA³ConfigInfused Adapter by Inhibiting and Amplifying Activations
Multi AdapterMultiAdapterConfigCombines multiple adapters
LoHaLoHaConfigLow-Rank Hadamard Product adaptation
LoKrLoKrConfigLow-Kranker factorization adaptation
AdaLoKrAdaLoKrConfigAdaptive LoKr with dynamic rank allocation
OFTOFTConfigOrthogonal Fine-Tuning
BOFTBOFTConfigBlock-diagonal OFT
VeraVeraConfigVector-based Random Matrix Adaptation
XLoraXLoraConfigCross-Layer LoRA with hierarchical structure
HiraHiraConfigHierarchical Rank Adaptation
GraloraGraloraConfigGradient-Routed LoRA
AdamssAdamssConfigAdaptive subspace efficient fine-tuning
SHiRAShiraConfigSharedHierarchical Rank Adaptation
LNDLNDConfigLayer-wise Normalization Distribution
LoraliteLoraliteConfigLightweight LoRA variant

Sources: src/peft/tuners/lora/model.py:1-80

Task Types

PEFT supports various NLP task types through specialized model classes. Each task type is designed for specific downstream applications.

graph LR
    A[Base Model] --> B[PeftModel]
    B --> C{Task Type}
    C --> D[CAUSAL_LM]
    C --> E[SEQ_2_SEQ_LM]
    C --> F[FEATURE_EXTRACTION]
    C --> G[QUESTION_ANS]
    C --> H[SEQ_CLS]
    C --> I[TOKEN_CLS]
    C --> J[IMAGE_CLS]

Task-Specific Models

Task TypeModel ClassUse Case
CAUSAL_LMPeftModelForCausalLMAutoregressive text generation
SEQ_2_SEQ_LMPeftModelForSeq2SeqLMEncoder-decoder tasks (translation, summarization)
FEATURE_EXTRACTIONPeftModelForFeatureExtractionEmbedding extraction
QUESTION_ANSPeftModelForQuestionAnsweringQuestion answering tasks
SEQ_CLSPeftModelForSequenceClassificationText classification
TOKEN_CLSPeftModelForTokenClassificationNamed entity recognition, POS tagging

Sources: src/peft/peft_model.py:1-100

Core API

PeftModel Class

The PeftModel is the base class for all PEFT models. It wraps a pre-trained model and manages adapter injection, loading, and merging.

#### Key Methods

MethodDescription
from_pretrained(model, model_id, adapter_name, ...)Load PEFT model from pretrained weights
get_peft_config(adapter_name)Get configuration for a specific adapter
print_trainable_parameters()Display trainable vs total parameter counts
merge_and_unload(progressbar, safe_merge, adapter_names)Merge adapters into base model
unload()Return base model without PEFT modules
set_adapter(adapter_name)Activate a specific adapter
add_weighted_adapter(adapter_names, weights, combination_type)Combine multiple adapters

Sources: src/peft/peft_model.py:100-200

Loading Pre-trained Adapters

The from_pretrained class method loads PEFT adapters from the Hugging Face Hub or local storage:

from peft import PeftModel, PeftConfig

# Load configuration
config = PeftConfig.from_pretrained("user/peft-model")

# Load base model
base_model = AutoModelForCausalLM.from_pretrained("base-model")

# Create PEFT model with loaded adapter
peft_model = PeftModel.from_pretrained(
    base_model, 
    "user/peft-model",
    adapter_name="default",
    is_trainable=False,
    autocast_adapter_dtype=True
)

Sources: src/peft/peft_model.py:200-280

Merging and Unloading

PEFT models support merging adapters back into the base model for inference:

# Merge and unload to get a standalone model
merged_model = peft_model.merge_and_unload()

# Safe merge with weight averaging
merged_model = peft_model.merge_and_unload(safe_merge=True)

# Merge specific adapters only
merged_model = peft_model.merge_and_unload(adapter_names=["adapter1", "adapter2"])

# Unload without merging
base_model = peft_model.unload()

Sources: src/peft/tuners/tuners_utils.py:50-100

Adapter Management

Multi-Adapter Support

PEFT supports loading and managing multiple adapters simultaneously. This is useful for ensemble methods or when combining adapters trained on different tasks.

# Load multiple adapters
config = {
    "adapter_1": "./path/to/adapter-1",
    "adapter_2": "./path/to/adapter-2",
}

xlora_config = XLoraConfig(adapter_dict=config)
model = get_peft_model(base_model, xlora_config)

Sources: src/peft/tuners/xlora/model.py:1-50

Hotswap Adapter

The hotswap functionality allows replacing loaded adapters without reloading the entire model:

from peft.utils.hotswap import hotswap_adapter

# Replace the default adapter with a new one
hotswap_adapter(
    model, 
    "path-to-new-adapter", 
    adapter_name="default",
    torch_device="cuda:0"
)

This operation validates the new adapter configuration and swaps the weights while maintaining the model structure.

Sources: src/peft/utils/hotswap.py:1-80

Configuration Options

Common Parameters

Most PEFT configuration classes share common parameters that control the fine-tuning behavior:

ParameterTypeDefaultDescription
rint8LoRA rank dimension
lora_alphaintNoneLoRA scaling factor
lora_dropoutfloat0.0Dropout probability for LoRA layers
target_modulesList[str]NoneNames of modules to apply adaptation
biasstr"none"Bias handling: "none", "all", "lora_only"
modules_to_saveList[str]NoneAdditional trainable modules
fan_in_fan_outboolFalseTranspose weights for certain architectures

Method-Specific Parameters

#### LoRA Configuration

from peft import LoraConfig

config = LoraConfig(
    r=16,
    lora_alpha=32,
    target_modules=["q_proj", "v_proj", "k_proj", "out_proj"],
    lora_dropout=0.05,
    bias="none",
    task_type="CAUSAL_LM"
)

#### Prefix Tuning Configuration

from peft import PrefixTuningConfig

config = PrefixTuningConfig(
    num_virtual_tokens=20,
    token_dim=768,
    num_transformer_submodules=1,
    num_attention_heads=12,
    num_layers=12,
    encoder_hidden_size=768,
    prefix_projection=False
)

Sources: src/peft/tuners/lora/model.py:50-150

Advanced Features

Dynamic Rank Allocation

Some PEFT methods support adaptive rank allocation, where the importance of different layers is evaluated during training:

# Adaptive LoRA with dynamic ranking
config = AdaLoraConfig(
    r=16,
    lora_alpha=32,
    target_r=8,
    tinit=200,
    tfinal=1000,
    deltaT=10,
    lora_dropout=0.1
)

Sources: src/peft/tuners/adamss/model.py:1-60

Hierarchical Adaptation

Methods like Hira and Gralora implement hierarchical rank adaptation for better parameter efficiency:

from peft import HiraConfig

config = HiraConfig(
    r=32,
    target_modules=["q_proj", "k_proj", "v_proj", "out_proj"],
    hira_dropout=0.01,
    task_type="SEQ_2_SEQ_LM"
)

Sources: src/peft/tuners/hira/model.py:1-60

Quantization Support

PEFT integrates with BitsAndBytes for 8-bit and 4-bit quantization:

from peft import prepare_model_for_kbit_training, get_peft_model, LoraConfig
import transformers

quantization_config = transformers.BitsAndBytesConfig(load_in_8bit=True)
model = AutoModelForCausalLM.from_pretrained(
    "model-name",
    quantization_config=quantization_config
)
model = prepare_model_for_kbit_training(model)

config = LoraConfig(r=8, lora_alpha=16, target_modules=["q_proj", "v_proj"])
peft_model = get_peft_model(model, config)

Helper Functions

Signature Updates

The helpers module provides utility functions for updating model signatures:

from peft import update_forward_signature, update_generate_signature, update_signature

# Update forward signature only
update_forward_signature(peft_model)

# Update generate signature only
update_generate_signature(peft_model)

# Update both
update_signature(peft_model, method="all")

Model Validation

from peft.helpers import check_if_peft_model

# Check if a model ID corresponds to a PEFT model
is_peft = check_if_peft_model("user/peft-model")

# Works with both Hub and local paths
is_peft_local = check_if_peft_model("./local/peft-model")

Adapter Scale Rescaling

from peft.helpers import rescale_adapter_scale

with rescale_adapter_scale(model, multiplier=0.5):
    output = model(inputs)

Memory Optimization

Low CPU Memory Usage

Loading adapters can be optimized for memory-constrained environments:

# Create adapter weights on meta device for faster loading
peft_model = PeftModel.from_pretrained(
    base_model,
    adapter_path,
    low_cpu_mem_usage=True
)

Training with Quantized Models

PEFT supports full training workflows with quantized base models:

from peft import get_peft_model, LoraConfig, prepare_model_for_kbit_training

model = AutoModelForCausalLM.from_pretrained(
    "mistralai/Mistral-7B-Instruct-v0.1",
    quantization_config=BitsAndBytesConfig(load_in_4bit=True)
)
model = prepare_model_for_kbit_training(model)

config = LoraConfig(r=8, lora_alpha=16, target_modules=["q_proj", "v_proj"])
peft_model = get_peft_model(model, config)

Integration Patterns

With Diffusers

PEFT works with Stable Diffusion and other diffusion models:

from diffusers import StableDiffusionPipeline
from peft import MissModel, MissConfig

config_unet = MissConfig(
    r=8,
    target_modules=["proj_in", "proj_out", "to_k", "to_q", "to_v"],
    init_weights=True
)

pipeline = StableDiffusionPipeline.from_pretrained("runwayml/stable-diffusion-v1-5")
pipeline.unet = MissModel(pipeline.unet, config_unet, "default")

Sources: src/peft/tuners/miss/model.py:1-60

Cross-Modal Applications

Some PEFT methods like XLora are designed for multi-modal models with complex architecture support:

from peft import XLoraConfig, get_peft_model

config = XLoraConfig(
    adapter_dict={
        "adapter_1": "./path/to/adapter-1",
        "adapter_2": "./path/to/adapter-2"
    }
)

model = AutoModelForCausalLM.from_pretrained("model-name", trust_remote_code=True)
xlora_model = get_peft_model(model, config)

Workflow Diagram

graph TD
    A[Pre-trained Model] --> B[Choose Fine-tuning Method]
    B --> C[Create PEFT Config]
    C --> D[Initialize Adapter]
    D --> E[Train Adapter]
    E --> F{Save or Load?}
    F -->|Save| G[save_pretrained]
    F -->|Load| H[from_pretrained]
    G --> I[Hub or Local]
    H --> J[Merge or Inference]
    J --> K[merge_and_unload]
    J --> L[Direct Inference]
    K --> M[Final Model]
    L --> M

Best Practices

  1. Start with Default Ranks: Begin with r=8 for LoRA and increase based on performance
  2. Target Specific Modules: Prefer targeting attention projection layers (q_proj, v_proj) over all linear layers
  3. Use Quantization for Large Models: Apply 4-bit quantization (QLoRA) for models larger than 7B parameters
  4. Save Checkpoints Regularly: Use PEFT's built-in checkpoint saving to prevent training loss
  5. Evaluate Before Merging: Always evaluate adapter quality before merging into the base model

Conclusion

PEFT provides a comprehensive framework for parameter-efficient fine-tuning that enables training large models on limited hardware. Its modular architecture supports various adaptation methods while maintaining compatibility with the broader Hugging Face ecosystem. Whether working with language models, vision models, or multi-modal architectures, PEFT offers consistent APIs and significant memory savings compared to full fine-tuning approaches.

Sources: src/peft/tuners/lora/model.py:1-100 Sources: src/peft/tuners/tuners_utils.py:1-50

Sources: [src/peft/tuners/lora/model.py:1-50]()

Installation Guide

Related topics: Introduction to PEFT, Quantization Integration

Section Related Pages

Continue reading this section for the full explanation and source context.

Section Hardware Requirements

Continue reading this section for the full explanation and source context.

Section Software Requirements

Continue reading this section for the full explanation and source context.

Section Standard Installation via pip

Continue reading this section for the full explanation and source context.

Related topics: Introduction to PEFT, Quantization Integration

Installation Guide

This guide covers all methods for installing the PEFT (Parameter-Efficient Fine-Tuning) library, including dependencies management, optional feature installations, and verification procedures.

Overview

The PEFT library provides state-of-the-art parameter-efficient fine-tuning methods including LoRA, AdaLoRA, Prefix Tuning, Prompt Tuning, and many other advanced techniques. Proper installation ensures access to all functionality including GPU acceleration, quantization support, and integration with Hugging Face Transformers and Diffusers.

Key Installation Features:

  • Core library installation via pip, conda, or from source
  • Optional dependencies for specific tuners and features
  • GPU/CUDA support for accelerated training
  • BitsAndBytes integration for quantization
  • Diffusers integration for image generation models

System Requirements

Hardware Requirements

ComponentMinimumRecommended
RAM8 GB16 GB+
GPU VRAM4 GB8-24 GB (depending on model size)
Storage5 GB10 GB+
CUDA11.611.8+ or CUDA 12.x

Software Requirements

RequirementVersion
Python≥ 3.8
PyTorch≥ 1.11.0
Transformers≥ 4.20.0
Diffusers≥ 0.13.0
Accelerate≥ 0.20.0

Installation Methods

Standard Installation via pip

The simplest method to install PEFT is using pip:

pip install peft

This installs the core library with all base dependencies.

Installing Specific Versions

To install a specific version of PEFT:

pip install peft==0.13.0

To install the latest development version from GitHub:

pip install git+https://github.com/huggingface/peft.git

Installation from Source

For developers contributing to PEFT or needing the latest features:

git clone https://github.com/huggingface/peft.git
cd peft
pip install -e .

The editable installation (-e .) allows modifications to the source code while keeping the package importable.

Dependencies Structure

Core Dependencies

The core dependencies are defined in pyproject.toml and requirements.txt:

# Core runtime dependencies
torch>=1.11.0
transformers>=4.20.0
accelerate>=0.20.0
torch>=1.11.0

Sources: pyproject.toml

Optional Dependencies by Feature

PEFT provides optional dependencies for specific use cases:

FeatureInstallation CommandPurpose
Quantizationpip install peft[quantization]BitsAndBytes 4-bit/8-bit quantization
GPU Trainingpip install peft[gpu]CUDA-optimized operations
Diffuserspip install peft[diffusers]Stable Diffusion model support
Dev Toolspip install peft[dev]Testing and linting
All Extraspip install peft[all]Complete installation

Advanced Installation with Quantization

For models requiring quantized weights (e.g., using 4-bit or 8-bit precision):

pip install peft bitsandbytes scipy accelerate

This combination enables:

  • 4-bit quantization via BitsAndBytes
  • 8-bit quantization for extreme memory reduction
  • Mixed-precision training optimization
  • Efficient loading of large models on limited hardware

Sources: src/peft/tuners/lora/model.py

Environment Setup

Using Virtual Environments

Using venv:

python -m venv peft-env
source peft-env/bin/activate  # Linux/macOS
peft-env\Scripts\activate     # Windows
pip install peft

Using conda:

conda create -n peft-env python=3.10
conda activate peft-env
pip install peft

CUDA Configuration

For GPU acceleration, ensure CUDA is properly configured:

import torch
print(torch.cuda.is_available())  # Should return True
print(torch.cuda.device_count())  # Number of available GPUs

The PEFT library automatically detects and utilizes available CUDA devices during training.

Verification and Testing

Basic Installation Verification

Verify your installation by importing PEFT and checking the version:

import peft
print(peft.__version__)  # Should print the installed version

Quick Functionality Test

Test basic LoRA functionality:

from transformers import AutoModelForCausalLM, AutoTokenizer
from peft import get_peft_model, LoraConfig

# Load a small model for testing
model_name = "gpt2"
model = AutoModelForCausalLM.from_pretrained(model_name)

# Configure LoRA
lora_config = LoraConfig(
    task_type="CAUSAL_LM",
    r=8,
    lora_alpha=16,
    target_modules=["c_attn", "c_proj"],
    lora_dropout=0.05
)

# Apply PEFT
peft_model = get_peft_model(model, lora_config)
peft_model.print_trainable_parameters()

Signature Update Utilities

After installation, you may want to update method signatures for better IDE support:

from peft import update_forward_signature, update_generate_signature

# Update forward signature
update_forward_signature(peft_model)

# Update generate signature (for generative models)
update_generate_signature(peft_model)

Sources: src/peft/helpers.py:1-100

Tuner-Specific Installation Notes

LoRA and QLoRA

Standard LoRA requires no additional dependencies beyond core installation. QLoRA requires:

pip install peft bitsandbytes>=0.40.0 trl>=0.4.0

Sources: src/peft/tuners/lora/model.py

Prefix Tuning and Prompt Tuning

These methods require only core dependencies:

pip install peft

Diffusion Model Support (LoRA for Images)

For Stable Diffusion and similar models:

pip install peft diffusers

Example configuration for Stable Diffusion:

from diffusers import StableDiffusionPipeline
from peft import MissModel, MissConfig

config_unet = MissConfig(
    r=8,
    target_modules=["proj_in", "proj_out", "to_k", "to_q", "to_v", "to_out.0"],
    init_weights=True
)

pipeline = StableDiffusionPipeline.from_pretrained("runwayml/stable-diffusion-v1-5")
pipeline.unet = MissModel(pipeline.unet, config_unet, "default")

Sources: src/peft/tuners/miss/model.py

X-LoRA Installation

X-LoRA requires specific dependencies for multi-adapter support:

pip install peft transformers accelerate bitsandbytes

Sources: src/peft/tuners/xlora/model.py

Troubleshooting

Common Installation Issues

IssueSolution
ImportError: No module named peftReinstall: pip uninstall peft && pip install peft
CUDA out of memoryUse quantization or smaller batch sizes
BitsAndBytes import failureInstall: pip install bitsandbytes
Old PyTorch versionUpdate: pip install torch>=1.11.0

Version Compatibility

Check compatibility matrix:

PEFT VersionMin PythonMin PyTorchMin Transformers
0.13.x3.8+1.11.04.20.0
0.12.x3.8+1.11.04.20.0
0.11.x3.7+1.11.04.20.0

Verifying Adapter Loading

Test adapter functionality after installation:

from peft import check_if_peft_model

is_peft = check_if_peft_model("path/to/model")
print(f"Is PEFT model: {is_peft}")

Sources: src/peft/helpers.py:51-65

Adapter Hotswap Installation

For runtime adapter switching functionality:

pip install peft

The hotswap capability is built into PEFT's core functionality:

from peft.utils.hotswap import hotswap_adapter

# Load and swap adapters at runtime
hotswap_adapter(model, "path-to-new-adapter", adapter_name="default")

Sources: src/peft/utils/hotswap.py

Next Steps

After successful installation:

  1. Quick Start: Follow the Quickstart Guide for first-time users
  2. Tuner Selection: Review available tuners to choose the right method
  3. Configuration: Learn about PeftConfig options
  4. Examples: Explore example notebooks for your use case

Summary

The PEFT library offers flexible installation options to accommodate various use cases from basic fine-tuning to advanced quantized training. Core installation via pip provides immediate access to all major functionality, while optional dependencies enable specialized features like 4-bit quantization and diffusion model support.

Sources: [pyproject.toml](https://github.com/huggingface/peft/blob/main/pyproject.toml)

System Architecture

Related topics: Core Components, Introduction to PEFT, Configuration System

Section Related Pages

Continue reading this section for the full explanation and source context.

Section 1. PeftModel Base Class

Continue reading this section for the full explanation and source context.

Section 2. BaseTuner Class

Continue reading this section for the full explanation and source context.

Section 3. Configuration System

Continue reading this section for the full explanation and source context.

Related topics: Core Components, Introduction to PEFT, Configuration System

System Architecture

Overview

The PEFT (Parameter-Efficient Fine-Tuning) library implements a modular architecture designed to enable efficient model adaptation without modifying the entire parameter set of pre-trained models. The system architecture is built around three core pillars: the PeftModel base class hierarchy, tuner abstractions, and configuration management.

PEFT supports multiple fine-tuning techniques including LoRA, IA³, Adapters, Prefix Tuning, Prompt Learning, and various specialized methods like SHiRA, GraLoRA, X-LoRA, and others. Each technique is implemented as a separate "tuner" that follows a common interface defined in the base tuner utilities.

High-Level Architecture Diagram

graph TD
    User[User Code] --> PeftAPI[PeftModel API]
    PeftAPI --> PeftModel[PeftModel Base Class]
    PeftModel --> BaseTuner[BaseTuner]
    BaseTuner --> TunerRegistry[Tuner Registry]
    
    subgraph Tuners
        LoRA[LoRA Tuner]
        IA3[IA³ Tuner]
        PrefixTuning[Prefix Tuning]
        PromptLearning[Prompt Learning]
        SHiRA[SHiRA Tuner]
        GraLoRA[GraLoRA Tuner]
        XLoRA[X-LoRA Tuner]
        Hira[Hira Tuner]
        DeLoRA[DeLoRA Tuner]
        Miss[MiSS Tuner]
        Adamss[Adamss Tuner]
    end
    
    BaseTuner --> LoRA
    BaseTuner --> IA3
    BaseTuner --> PrefixTuning
    BaseTuner --> PromptLearning
    BaseTuner --> SHiRA
    BaseTuner --> GraLoRA
    BaseTuner --> XLoRA
    BaseTuner --> Hira
    BaseTuner --> DeLoRA
    BaseTuner --> Miss
    BaseTuner --> Adamss
    
    PeftModel --> Config[PeftConfig]
    Config --> ConfigMapping[PEFT_TYPE_TO_CONFIG_MAPPING]
    
    TunerRegistry --> TargetMapping[TRANSFORMERS_MODELS_TO_*_TARGET_MODULES_MAPPING]

Core Components

1. PeftModel Base Class

The PeftModel class serves as the central entry point for all PEFT operations. It wraps a base model and manages adapter lifecycle, injection, and merging.

Location: src/peft/peft_model.py

#### Class Hierarchy

graph TD
    PyTorchModule[torch.nn.Module] --> PeftModel
    PeftModel --> PeftModelForCausalLM[PeftModelForCausalLM]
    PeftModel --> PeftModelForSeq2SeqLM[PeftModelForSeq2SeqLM]
    PeftModel --> PeftModelForSequenceClassification[PeftModelForSequenceClassification]
    PeftModel --> PeftModelForQuestionAnswering[PeftModelForQuestionAnswering]
    PeftModel --> PeftModelForTokenClassification[PeftModelForTokenClassification]
    PeftModel --> PeftModelForFeatureExtraction[PeftModelForFeatureExtraction]

#### Key Responsibilities

ResponsibilityDescription
Adapter ManagementLoading, activating, and switching between multiple adapters
Module InjectionReplacing target modules with tuner layers
Forward PassIntercepting and modifying forward pass with adapter weights
Weight MergingCombining adapter weights with base model weights
Model Saving/LoadingSerialization and deserialization of PEFT configurations

#### Constructor Signature

def __init__(self, model: torch.nn.Module, peft_config: PeftConfig, adapter_name: str = "default", **kwargs)

Parameters:

ParameterTypeDefaultDescription
modeltorch.nn.ModuleRequiredThe base model to be adapted
peft_configPeftConfigRequiredConfiguration for the PEFT method
adapter_namestr"default"Name identifier for the adapter
**kwargsAny-Additional arguments passed to specific tuners

Sources: src/peft/peft_model.py:1-100

2. BaseTuner Class

The BaseTuner class defines the abstract interface that all tuner implementations must follow. It handles the core logic for module injection and adapter management.

Location: src/peft/tuners/tuners_utils.py

#### Core Attributes

prefix: str = ""                    # Prefix for PEFT module names
tuner_layer_cls = None              # The tuner layer class
target_module_mapping = {}          # Maps model types to target modules

#### Key Methods

MethodPurpose
inject_adapter()Creates adapter layers and replaces target modules
_create_and_replace()Creates or updates adapter modules for specific targets
_replace_module()Performs the actual module replacement
_check_target_module_compatiblity()Validates module compatibility (e.g., for Mamba)
merge_and_unload()Merges adapter weights into base model
_unload_and_optionally_merge()Core logic for weight merging

#### Adapter Injection Flow

sequenceDiagram
    participant User
    participant PeftModel
    participant BaseTuner
    participant Model as Base Model
    
    User->>PeftModel: inject_adapter(model, adapter_name)
    PeftModel->>BaseTuner: inject_adapter(...)
    BaseTuner->>BaseTuner: _create_and_replace(...)
    BaseTuner->>Model: Walk modules recursively
    Model-->>BaseTuner: Find matching targets
    BaseTuner->>BaseTuner: Create adapter layer
    BaseTuner->>Model: _replace_module(parent, name, new_module)
    Note over Model: Target module replaced with adapter

Sources: src/peft/tuners/tuners_utils.py:1-200

3. Configuration System

The configuration system uses a factory pattern to map PEFT types to their corresponding configuration classes.

Location: src/peft/mapping.py

#### Configuration Mapping Table

PEFT TypeConfig ClassTuner Layer Class
LORALoraConfigLoraLayer
IA3IA3ConfigIA3Layer
ADALORAAdaLoraConfigAdaLoraLayer
ADAPTERAdapterConfigAdapterLayer
PREFIX_TUNINGPrefixTuningConfigPrefixTuningLayer
P_TUNINGPromptEncoderConfigPromptEncoder
LORA_CONFIGLoraConfigLoraLayer
LOHALoHaConfigLoHaLayer
OFTOFTConfigOFTLayer
XLORAXLoraConfigXLoraLayer
HIRAHiraConfigHiraLayer
SHIRAShiraConfigShiraLayer
GRALORAGraloraConfigGraloraLayer
DELORADeloraConfigDeloraLayer
MISSMissConfigMissLayer
ADAMSSAdamssConfigAdamssLayer

#### Auto Configuration Loading

def check_if_peft_model(model_name_or_path: str) -> bool:
    """Check if the model is a PEFT model."""

Sources: src/peft/mapping.py:1-100 Sources: src/peft/auto.py:1-50

Task-Specific Model Classes

PEFT provides specialized model classes optimized for different transformer tasks.

PeftModelForSeq2SeqLM

For sequence-to-sequence tasks (translation, summarization).

class PeftModelForSeq2SeqLM(PeftModel):
    def __init__(self, model, peft_config, adapter_name="default", **kwargs):
        super().__init__(model, peft_config, adapter_name, **kwargs)
        self.base_model_prepare_inputs_for_generation = self.base_model.prepare_inputs_for_generation
        self.base_model_prepare_encoder_decoder_kwargs_for_generation = (
            self.base_model._prepare_encoder_decoder_kwargs_for_generation
        )

Features:

  • Customizes prepare_inputs_for_generation for decoder input preparation
  • Handles encoder-decoder kwargs for generation Sources: src/peft/peft_model.py:200-400

PeftModelForSequenceClassification

For text classification tasks.

class PeftModelForSequenceClassification(PeftModel):
    def __init__(self, model, peft_config, adapter_name="default", **kwargs):
        super().__init__(model, peft_config, adapter_name, **kwargs)
        classifier_module_names = ["classifier", "score"]

Target Modules: ["classifier", "score"] Sources: src/peft/peft_model.py:100-200

PeftModelForQuestionAnswering

For QA tasks.

class PeftModelForQuestionAnswering(PeftModel):
    def __init__(self, model, peft_config, adapter_name="default", **kwargs):
        super().__init__(model, peft_config, adapter_name, **kwargs)
        qa_module_names = ["qa_outputs"]

Target Modules: ["qa_outputs"] Sources: src/peft/peft_model.py:250-350

PeftModelForTokenClassification

For named entity recognition and token-level tasks.

class PeftModelForTokenClassification(PeftModel):
    def __init__(self, model, peft_config=None, adapter_name="default", **kwargs):
        super().__init__(model, peft_config, adapter_name, **kwargs)
        classifier_module_names = ["classifier", "score"]

Sources: src/peft/peft_model.py:300-400

Tuner Implementations

Common Tuner Structure

All tuners follow a consistent pattern:

class SomeTuner(PeftModel):
    prefix: str = "tuner_"
    tuner_layer_cls = SomeLayerClass
    target_module_mapping = TRANSFORMERS_MODELS_TO_SOME_TARGET_MODULES_MAPPING
    
    def _create_and_replace(self, config, adapter_name, target, target_name, parent, current_key, **kwargs):
        # Implementation

Target Module Mapping

Each tuner defines which modules can be targeted for adaptation based on the model architecture.

TRANSFORMERS_MODELS_TO_LORA_TARGET_MODULES_MAPPING = {
    "t5": ["q", "v"],
    "llama": ["q_proj", "v_proj"],
    "bert": ["query", "value"],
    # ... more mappings
}

Example: SHiRA Tuner

class ShiraModel(PeftModel):
    prefix: str = "shira_"
    tuner_layer_cls = ShiraLayer
    target_module_mapping = TRANSFORMERS_MODELS_TO_SHIRA_TARGET_MODULES_MAPPING

Key Features:

  • Supports random mask generation with mask_type == "random" and configurable random_seed
  • Wraps Linear layers with SHiRA adapter logic

Sources: src/peft/tuners/shira/model.py:1-80

Example: GraLoRA Tuner

class GraloraModel(PeftModel):
    prefix: str = "gralora_"
    tuner_layer_cls = GraloraLayer
    target_module_mapping = TRANSFORMERS_MODELS_TO_GRALORA_TARGET_MODULES_MAPPING

Sources: src/peft/tuners/gralora/model.py:1-80

Example: X-LoRA Tuner

X-LoRA supports multiple adapter loading with device placement:

def __init__(
    self,
    model: nn.Module,
    config: Union[dict[str, XLoraConfig], XLoraConfig],
    adapter_name: str,
    torch_device: Optional[str] = None,
    ephemeral_gpu_offload: bool = False,
    autocast_adapter_dtype: bool = True,
    **kwargs,
)

Sources: src/peft/tuners/xlora/model.py:1-100

Model Loading and Serialization

From Pretrained

@classmethod
def from_pretrained(
    cls,
    model: torch.nn.Module,
    model_id: str,
    adapter_name: str = "default",
    is_trainable: bool = False,
    config: Optional[PeftConfig] = None,
    autocast_adapter_dtype: bool = True,
    **kwargs
) -> PeftModel:

Parameters:

ParameterTypeDescription
modeltorch.nn.ModuleThe base model to adapt
model_idstrPath or HuggingFace Hub identifier
adapter_namestrAdapter name (default: "default")
is_trainableboolWhether adapter is trainable
configPeftConfigPre-loaded configuration
autocast_adapter_dtypeboolAuto-cast adapter dtype

Sources: src/peft/peft_model.py:400-600

Hotswap Adapter

For runtime adapter replacement without full model reload:

def hotswap_adapter(
    model,
    model_name_or_path,
    adapter_name="default",
    torch_device=None,
    **kwargs
):

Sources: src/peft/utils/hotswap.py:1-100

Helper Utilities

Signature Updates

For model compatibility, PEFT provides utilities to update method signatures:

def update_forward_signature(model: PeftModel) -> None:
    """Updates forward signature to include parent's signature."""

def update_generate_signature(model: PeftModel) -> None:
    """Updates generate signature to include parent's signature."""

def update_signature(model: PeftModel, method: str = "all") -> None:
    """Updates forward and/or generate signature."""

Logic: Updates signatures only when the current signature only has *args and **kwargs:

current_signature = inspect.signature(model.forward)
if (
    len(current_signature.parameters) == 2
    and "args" in current_signature.parameters
    and "kwargs" in current_signature.parameters
):
    # Update with parent's signature

Sources: src/peft/helpers.py:1-150

Adapter Scale Rescaling

Context manager for temporary adapter scaling:

@contextmanager
def rescale_adapter_scale(model, multiplier):
    """Context manager to temporarily rescale adapter scaling."""

Data Flow Diagram

graph LR
    subgraph Input
        InputIDs[input_ids]
        Attention[attention_mask]
        Embeds[inputs_embeds]
    end
    
    subgraph Processing
        PEFTConfig[PeftConfig]
        BaseModel[Base Model]
        Adapters[Adapter Layers]
    end
    
    subgraph Output
        OutputLogits[Output Logits]
        HiddenStates[Hidden States]
        AttentionWeights[Attention Weights]
    end
    
    InputIDs --> BaseModel
    Attention --> BaseModel
    Embeds --> BaseModel
    PEFTConfig --> Adapters
    BaseModel <--> Adapters
    Adapters --> OutputLogits
    Adapters --> HiddenStates
    Adapters --> AttentionWeights

Configuration Classes

Each tuner type has a corresponding configuration class:

TunerConfig ClassKey Parameters
LoRALoraConfigr, lora_alpha, lora_dropout, target_modules
IA³IA3Configtarget_modules, feedforward_modules
Prefix TuningPrefixTuningConfignum_virtual_tokens, num_transformer_submodules
Prompt LearningPromptEncoderConfignum_virtual_tokens, encoder_hidden_size
SHiRAShiraConfigr, mask_type, random_seed
GraLoRAGraloraConfigr
X-LoRAXLoraConfigMultiple adapter configs
HiraHiraConfigr, hira_dropout
DeLoRADeloraConfigrank_pattern, lambda_pattern
MiSSMissConfigr, target_modules, init_weights
AdamssAdamssConfigr, num_subspaces, target_modules

Multiple Adapter Support

PEFT supports loading and managing multiple adapters simultaneously:

graph TD
    BaseModel[Base Model] --> Adapter1[Adapter 1: default]
    BaseModel --> Adapter2[Adapter 2: adapter_v2]
    BaseModel --> AdapterN[Adapter N: custom_name]
    
    ActiveAdapter[Active Adapter] --> Selection[Selection]
    Selection --> Adapter1
    Selection --> Adapter2
    Selection --> AdapterN

Key Operations:

  • Add adapters via inject_adapter() with unique names
  • Activate specific adapter via set_adapter()
  • Merge single or multiple adapters via merge_and_unload(adapter_names=[...])
  • Hotswap adapters at runtime via hotswap_adapter()

Class Inheritance Diagram

classDiagram
    class PeftModel {
        +model
        +peft_config
        +active_adapters
        +inject_adapter()
        +merge_and_unload()
        +unload()
        +get_prompt()
    }
    
    class PeftModelForCausalLM {
        +forward()
    }
    
    class PeftModelForSeq2SeqLM {
        +forward()
        +prepare_inputs_for_generation()
    }
    
    class PeftModelForSequenceClassification {
        +forward()
    }
    
    class PeftModelForQuestionAnswering {
        +forward()
    }
    
    class PeftModelForTokenClassification {
        +forward()
    }
    
    class PeftModelForFeatureExtraction {
        +forward()
    }
    
    PeftModel <|-- PeftModelForCausalLM
    PeftModel <|-- PeftModelForSeq2SeqLM
    PeftModel <|-- PeftModelForSequenceClassification
    PeftModel <|-- PeftModelForQuestionAnswering
    PeftModel <|-- PeftModelForTokenClassification
    PeftModel <|-- PeftModelForFeatureExtraction

Summary

The PEFT system architecture provides a flexible, extensible framework for parameter-efficient fine-tuning through:

  1. Centralized Model Management: PeftModel base class handles adapter lifecycle
  2. Modular Tuner System: Each technique (LoRA, IA³, etc.) implements the BaseTuner interface
  3. Configuration-Driven Design: Factory pattern maps PEFT types to configs
  4. Task-Specific Optimizations: Specialized model classes for different downstream tasks
  5. Multi-Adapter Support: Runtime switching and hotswapping of adapters
  6. Seamless Integration: Auto-loading and signature updates for transformer compatibility

This architecture enables researchers and practitioners to easily extend PEFT with new fine-tuning methods while maintaining backward compatibility and performance optimizations.

Sources: [src/peft/peft_model.py:1-100]()

Core Components

Related topics: System Architecture, Configuration System, Model Loading and Saving

Section Related Pages

Continue reading this section for the full explanation and source context.

Section Key Responsibilities

Continue reading this section for the full explanation and source context.

Section Task-Specific Model Classes

Continue reading this section for the full explanation and source context.

Section Key Methods

Continue reading this section for the full explanation and source context.

Related topics: System Architecture, Configuration System, Model Loading and Saving

Core Components

Overview

The PEFT (Parameter-Efficient Fine-Tuning) library provides a modular architecture for adapting pre-trained models with minimal computational overhead. The Core Components form the foundational layer that enables all PEFT methods—including LoRA, IA³, Prefix Tuning, and custom tuners—to inject trainable parameters into base models efficiently.

The core architecture consists of:

  • PeftModel: The primary wrapper class that encapsulates base models with adapter layers
  • PeftConfig: Configuration objects that define adapter-specific parameters
  • BaseTunerLayer: Base class for all adapter layer implementations
  • inject_adapter: Core mechanism for attaching adapters to target modules
  • Mapping System: Registry connecting PEFT types to their implementations

Sources: src/peft/peft_model.py:1-50

Architecture Overview

graph TD
    A[Pre-trained Model] --> B[PeftModel]
    B --> C{PEFT Type}
    C -->|LORA| D[LoRA Layers]
    C -->|IA3| E[IA³ Layers]
    C -->|PREFIX_TUNING| F[Prefix Layers]
    C -->|CUSTOM| G[Custom Tuners]
    
    H[PeftConfig] --> B
    I[Adapter Registry] --> B
    
    J[Target Modules] --> K[inject_adapter]
    K --> B
    
    L[from_pretrained] --> B
    M[get_peft_model] --> B

PeftModel Base Class

The PeftModel class serves as the central abstraction for all PEFT-adapted models. It wraps a base model and manages one or more adapters, each containing trainable parameters.

Key Responsibilities

ResponsibilityDescription
Adapter ManagementLoad, activate, and switch between multiple adapters
Forward PassIntercept forward calls to route through active adapters
Parameter TrackingReport trainable vs. total parameter counts
SerializationSave and load adapter weights and configurations

Task-Specific Model Classes

PEFT provides specialized model classes for different transformer tasks:

Model ClassTask TypeUse Case
PeftModelGenericBase wrapper for any model
PeftModelForSequenceClassificationSEQ_CLSText classification
PeftModelForTokenClassificationTOKEN_CLSNamed entity recognition
PeftModelForQuestionAnsweringQUESTION_ANSExtractive QA
PeftModelForSeq2SeqLMSEQ_2_SEQ_LMTranslation, summarization
PeftModelForCausalLMCAUSAL_LMText generation
PeftModelForFeatureExtractionFEATURE_EXTRACTIONEmbedding extraction

Sources: src/peft/peft_model.py:50-150

Key Methods

def from_pretrained(
    model: torch.nn.Module,
    model_id: str | os.PathLike,
    adapter_name: str = "default",
    is_trainable: bool = False,
    config: PeftConfig = None,
    autocast_adapter_dtype: bool = True,
    **kwargs
) -> PeftModel

This factory method instantiates a PEFT model from a pretrained configuration and optionally loads adapter weights.

Sources: src/peft/peft_model.py:150-200

PeftConfig System

The PeftConfig class hierarchy defines adapter-specific hyperparameters. Each PEFT method has its own configuration class that inherits from the base PeftConfig.

Configuration Class Hierarchy

graph TD
    A[PeftConfig] --> B[LoraConfig]
    A --> C[PromptLearningConfig]
    C --> D[PrefixTuningConfig]
    C --> E[PromptEncoderConfig]
    A --> F[IA3Config]
    A --> G[LoHaConfig]
    A --> H[OFTConfig]
    A --> I[TinyLoRAConfig]
    A --> J[AdamssConfig]

Common Configuration Parameters

ParameterTypeDefaultDescription
peft_typePeftTypeRequiredThe PEFT method being used
task_typeTaskTypeRequiredThe downstream task type
inference_modeboolFalseWhether model is in inference mode
target_modulesList[str]NoneModule names to apply adapters to
rint8LoRA rank dimension
lora_alphaint8LoRA scaling factor
lora_dropoutfloat0.0Dropout probability for LoRA layers

Sources: src/peft/config.py, src/peft/mapping.py

Tuner Layer Base Classes

BaseTunerLayer

The BaseTunerLayer class provides the interface that all adapter layer implementations must follow. It defines methods for layer initialization, adapter updating, and merging.

classDiagram
    class BaseTunerLayer {
        +base_layer: nn.Module
        +active_adapters: List[str]
        +adapter_list: List[str]
        +update_layer(adapter_name, ...)
        +merge()
        +unmerge()
    }

Key Methods

MethodDescription
update_layer(adapter_name, **kwargs)Initialize or update adapter weights
merge()Merge adapter weights into base layer
unmerge()Restore original base layer weights
scale_layer(scale)Apply scaling factor to adapter output

Sources: src/peft/tuners/tuners_utils.py:100-150

Method-Specific Tuner Layers

Each PEFT method implements its own tuner layer class:

TunerLayer ClassKey Parameters
LoRALoraLayerr, lora_alpha, lora_dropout, lora_A, lora_B
IA³IA3Layerinn_factor, key_value_dim
OFTOFTLayeroft_r, oft_diag_blocks
SHiRAShiraLayermask_fn, random_seed
GraloraGraloraLayerr (SVD rank)

Sources: src/peft/tuners/ia3/model.py, src/peft/tuners/oft/model.py, src/peft/tuners/shira/model.py, src/peft/tuners/gralora/model.py

Adapter Injection Mechanism

The inject_adapter method is the core mechanism that replaces target modules with adapter layers. This process traverses the model and substitutes compatible modules.

graph TD
    A[inject_adapter called] --> B{module.is_target_module?}
    B -->|Yes| C{Create New Module?}
    C -->|New adapter| D[_create_new_module]
    C -->|Existing adapter| E[update_layer]
    D --> F[_replace_module]
    E --> G[Set requires_grad False]
    F --> H[Module replaced]
    B -->|No| I[Skip module]
    G --> I

Injection Flow

def inject_adapter(
    model: nn.Module,
    adapter_name: str,
    autocast_adapter_dtype: bool = True,
    low_cpu_mem_usage: bool = False,
    state_dict: Optional[dict] = None,
) -> None

The method performs the following steps:

  1. Identifies target modules based on peft_config.target_modules
  2. For each target, either creates a new adapter module or updates an existing one
  3. Replaces the original module in the parent model
  4. Sets appropriate requires_grad flags based on is_trainable

Sources: src/peft/tuners/tuners_utils.py:150-250

_create_and_replace Pattern

Each tuner implements _create_and_replace to handle the specific module creation logic:

def _create_and_replace(
    self,
    config,
    adapter_name,
    target,
    target_name,
    parent,
    current_key,
    **optional_kwargs,
) -> None

Sources: src/peft/tuners/shira/model.py:40-80, src/peft/tuners/gralora/model.py:40-70, src/peft/tuners/miss/model.py:30-70

Mixed Model Support

The PeftMixedModel class extends PeftModel to support heterogeneous adapters—models with different PEFT methods simultaneously.

graph LR
    A[Base Model] --> B[PeftMixedModel]
    B --> C[LoRA Adapter]
    B --> D[IA³ Adapter]
    B --> E[Prefix Adapter]

Loading Mixed Models

@classmethod
def from_pretrained(
    cls,
    model: nn.Module,
    model_id: str | os.PathLike,
    adapter_name: str = "default",
    is_trainable: bool = False,
    config: PeftConfig = None,
    low_cpu_mem_usage: bool = False,
    **kwargs,
) -> PeftMixedModel

Sources: src/peft/mixed_model.py:50-100

Helper Functions

The helpers.py module provides utility functions for working with PEFT models.

Signature Update Functions

These functions update the forward and generate signatures of PEFT models to expose parameters from the underlying base model.

FunctionPurpose
update_forward_signature(model)Update model.forward signature to include base model parameters
update_generate_signature(model)Update model.generate signature to include base model parameters
update_signature(model, method)Update both signatures or specify 'forward'/'generate'/'all'
def update_forward_signature(model: PeftModel) -> None:
    """Update the forward signature to include base model parameters."""
    current_signature = inspect.signature(model.forward)
    if (
        len(current_signature.parameters) == 2
        and "args" in current_signature.parameters
        and "kwargs" in current_signature.parameters
    ):
        # Copy signature from base model
        ...

Sources: src/peft/helpers.py:50-100

Model Validation

def check_if_peft_model(model_name_or_path: str) -> bool:
    """
    Check if the model is a PEFT model.
    
    Returns:
        bool: True if the model is a PEFT model, False otherwise.
    """

This function attempts to load a PeftConfig from the given path and returns True if successful.

Sources: src/peft/helpers.py:100-130

Adapter Rescaling Context Manager

@contextmanager
def rescale_adapter_scale(model, multiplier):
    """Temporarily rescale the scaling of the LoRA adapter."""

This context manager temporarily rescales adapter weights during inference, useful for ablation studies.

Sources: src/peft/helpers.py:130-160

Hotswap Adapter

The hotswap_adapter function enables runtime replacement of loaded adapters without reloading the entire model.

graph TD
    A[hotswap_adapter called] --> B[Load new config]
    B --> C[Validate PEFT type]
    C --> D[Load state dict]
    D --> E[Transfer to device]
    E --> F[Replace adapter weights]
    F --> G[Success]
def hotswap_adapter(
    model: PeftModel,
    model_name_or_path: str,
    adapter_name: str = "default",
    torch_device: str = None,
    **kwargs,
) -> None

Sources: src/peft/utils/hotswap.py:30-80

Unload and Merge Operations

Base tuners provide methods to unload or merge adapter weights.

merge_and_unload

def merge_and_unload(progressbar: bool = False, safe_merge: bool = False, adapter_names = None) -> nn.Module

Merges adapter weights into the base model and returns the resulting model with adapter modules removed.

unload

def unload() -> nn.Module

Returns the base model by removing all PEFT modules without merging weights. This is useful when you need the original model but want to preserve the option to reload adapters later.

_unload_and_optionally_merge

def _unload_and_optionally_merge(
    progressbar: bool = False,
    safe_merge: bool = False,
    adapter_names = None,
    merge: bool = True,
) -> nn.Module

Sources: src/peft/tuners/tuners_utils.py:80-120

Target Module Mapping

Each tuner defines a target_module_mapping that specifies which modules should be replaced for different model architectures.

# Example: SHiRA target module mapping
target_module_mapping = TRANSFORMERS_MODELS_TO_SHIRA_TARGET_MODULES_MAPPING

# Example: GraLoRA target module mapping
target_module_mapping = TRANSFORMERS_MODELS_TO_GRALORA_TARGET_MODULES_MAPPING

These mappings allow PEFT methods to automatically identify compatible layers (e.g., q_proj, v_proj, k_proj) across different transformer architectures.

BitsAndBytes Integration

PEFT supports quantized models through BitsAndBytes integration. The tuners detect quantized base layers and wrap them appropriately:

if loaded_in_8bit and isinstance(target_base_layer, bnb.nn.Linear8bitLt):
    eightbit_kwargs = kwargs.copy()
    eightbit_kwargs.update({
        "has_fp16_weights": target_base_layer.state.has_fp16_weights,
        "threshold": target_base_layer.state.threshold,
        "index": target_base_layer.index,
    })
    new_module = Linear8bitLt(...)

Sources: src/peft/tuners/ia3/model.py:40-70

Summary

The Core Components of PEFT provide a flexible, extensible architecture for parameter-efficient fine-tuning:

  1. PeftModel wraps base models and manages adapters with a unified interface
  2. PeftConfig classes define method-specific hyperparameters
  3. BaseTunerLayer establishes the contract for all adapter implementations
  4. inject_adapter replaces target modules with adapter layers
  5. Helper functions provide utilities for signature updates, validation, and runtime operations
  6. Hotswap support enables dynamic adapter replacement

This architecture allows developers to implement new PEFT methods by subclassing existing base classes while reusing the core model management infrastructure.

Sources: [src/peft/peft_model.py:1-50]()

LoRA and LoRA Variants

Related topics: Other PEFT Methods, Quantization Integration, Configuration System

Section Related Pages

Continue reading this section for the full explanation and source context.

Section Core LoRA Architecture

Continue reading this section for the full explanation and source context.

Section LoRA Module Hierarchy

Continue reading this section for the full explanation and source context.

Section Model Class

Continue reading this section for the full explanation and source context.

Related topics: Other PEFT Methods, Quantization Integration, Configuration System

LoRA and LoRA Variants

Overview

LoRA (Low-Rank Adaptation) is a parameter-efficient fine-tuning technique that reduces trainable parameters by representing weight updates as low-rank decompositions. The PEFT library implements LoRA and numerous variants that extend this foundational approach with different architectural innovations, training strategies, and optimization techniques.

The LoRA system in PEFT serves as both a standalone fine-tuning method and a framework upon which variants like DoRA, AdaLoRA, LoHa, LoKr, and others are built. These variants share a common plugin architecture but differ in how they decompose and apply trainable adapters to base model layers.

Architecture

Core LoRA Architecture

LoRA modifies pre-trained neural network layers by adding trainable low-rank decomposition matrices alongside frozen original weights. For a linear layer with weight matrix $W \in \mathbb{R}^{d \times k}$, LoRA represents the update as:

$$\Delta W = BA$$

where $B \in \mathbb{R}^{d \times r}$ and $A \in \mathbb{R}^{r \times k}$ with rank $r \ll \min(d, k)$.

graph TD
    A[Base Model Layer: Weight W] --> B[Original Forward Pass<br/>y = Wx]
    C[LoRA Adapter: BA Decomposition] --> D[Modified Forward Pass<br/>y = Wx + BAz]
    B --> D
    A --> C
    E[Input x] --> A
    E --> B
    F[Adapter Input z<br/>Same as x or modified] --> C

LoRA Module Hierarchy

graph TD
    A[PeftModel] --> B[BaseModel Class]
    A --> C[LoraModel / VariantModel]
    C --> D[TunerLayerCls]
    C --> E[target_module_mapping]
    C --> F[prefix attribute]
    D --> G[LoraLayer / Conv2d / Conv1d]
    G --> H[Linear wrapper]
    H --> I[Forward with BA decomposition]

Sources: src/peft/tuners/lora/model.py:1-100

LoRA Implementation

Model Class

The LoraModel class serves as the base implementation for LoRA adapters. It extends the generic tuner base class and implements the core adapter creation logic.

class LoraModel(BaseTuner):
    prefix: str = "lora_"
    tuner_layer_cls = LoraLayer
    target_module_mapping = TRANSFORMERS_MODELS_TO_LORA_TARGET_MODULES_MAPPING

Sources: src/peft/tuners/lora/model.py:90-95

Layer Replacement Mechanism

The _create_and_replace method handles the injection of LoRA adapters into target modules:

def _create_and_replace(
    self,
    lora_config,
    adapter_name,
    target,
    target_name,
    parent,
    current_key,
    *,
    parameter_name: Optional[str] = None,
) -> None:

Sources: src/peft/tuners/lora/model.py:105-120

Forward Pass Computation

The LoRA forward pass combines the frozen base weights with trainable low-rank matrices:

def forward(self, x: torch.Tensor) -> torch.Tensor:
    while self.active_adapters not in self.peft_config:
        self.active_adapters = self.peft_config
    
    scaling = {
        adapter: self.peft_config[adapter].scaling_weight
        for adapter in self.active_adapters
    }
    
    return self.base_layer(x) + sum(
        self._forward_weight(weight, x, scaling=scaling.get(adapter, 1.0))
        for adapter, weight in self.lora_A.items()
    )

LoRA Configuration

LoraConfig Parameters

ParameterTypeDefaultDescription
rint8Rank of decomposition
lora_alphaint8Scaling factor (often set to 2×r)
lora_dropoutfloat0.0Dropout probability for LoRA layers
target_modulesOptional[List[str]]NoneModule names to apply LoRA
biasstr"none"Bias training mode: "none", "all", "lora_only"
fan_in_fan_outboolFalseTranspose weights for certain architectures
init_weightsboolTrueInitialize LoRA weights on creation

Advanced Configuration Options

ParameterTypeDefaultDescription
target_modules_bd_aOptional[List[str]]NoneModules for block-diagonal LoRA-A
target_modules_bd_bOptional[List[str]]NoneModules for block-diagonal LoRA-B
nblocksint1Number of blocks in block-diagonal matrices
match_strictboolTrueRequire strict matching for all target modules

Sources: src/peft/tuners/lora/config.py:1-200

LoRA Variants

DoRA (Weight-Decomposed LoRA)

DoRA extends standard LoRA by decomposing weights into magnitude and direction components. This variant often achieves better performance with comparable parameter counts.

# DoRA configuration example
lora_config = LoraConfig(
    use_dora=True,
    r=32,
    lora_alpha=32,
    target_modules=["q_proj", "v_proj"]
)

Sources: examples/dora_finetuning/README.md

AdaLoRA (Adaptive LoRA)

AdaLoRA dynamically adjusts the rank of LoRA blocks during training, allocating more parameters to important layers. This adaptive approach optimizes the parameter budget.

python examples/alora_finetuning/alora_finetuning.py \
  --base_model meta-llama/Llama-3.2-3B-Instruct \
  --data_path Lots-of-LoRAs/task1660_super_glue_question_generation \
  --invocation_string "<|start_header_id|>assistant<|end_header_id|>"

Sources: examples/alora_finetuning/README.md

LoHa (Low-Rank Hadamard Product)

LoHa replaces the standard AB decomposition with a Hadamard product of low-rank matrices, potentially capturing more expressive updates.

config_te = LoHaConfig(
    r=8,
    lora_alpha=32,
    target_modules=["k_proj", "q_proj", "v_proj", "out_proj", "fc1", "fc2"],
    rank_dropout=0.0,
    module_dropout=0.0,
)

Sources: src/peft/tuners/loha/__init__.py

LoKr (Low-Kronecker Product)

LoKr applies Kronecker product decomposition to weight matrices, offering different trade-offs between rank and expressiveness.

config_unet = LoKrConfig(
    r=8,
    lora_alpha=32,
    target_modules=["proj_in", "proj_out", "to_k", "to_q", "to_v"],
    rank_dropout=0.0,
    module_dropout=0.0,
    use_effective_conv2d=True,
)

Sources: src/peft/tuners/lokr/__init__.py

Block-Diagonal LoRA

Block-diagonal LoRA constrains the LoRA matrices to be block-diagonal, enabling efficient multi-adapter serving with different sharding degrees.

config = LoraConfig(
    r=16,
    target_modules_bd_a=["q_proj", "v_proj"],  # Block-diagonal A
    target_modules_bd_b=["out_proj"],            # Block-diagonal B
    nblocks=4,                                    # Sharding degree
)

Variant Comparison

VariantKey InnovationTarget Use CaseComplexity
LoRALow-rank decompositionGeneral fine-tuningLow
DoRAMagnitude + direction decompositionHigh-quality adaptationLow
AdaLoRAAdaptive rank allocationResource-constrained tuningMedium
LoHaHadamard product decompositionImage generationMedium
LoKrKronecker product decompositionDiffusion modelsMedium
Block-DiagonalConstrained structureMulti-adapter servingMedium

Usage Patterns

Basic LoRA Setup

from transformers import AutoModelForCausalLM
from peft import get_peft_model, LoraConfig

model = AutoModelForCausalLM.from_pretrained("meta-llama/Llama-2-7b")

peft_config = LoraConfig(
    task_type="CAUSAL_LM",
    r=16,
    lora_alpha=32,
    target_modules=["q_proj", "v_proj", "k_proj", "o_proj"],
    lora_dropout=0.05,
    bias="none",
)

peft_model = get_peft_model(model, peft_config)
peft_model.print_trainable_parameters()

Multi-Adapter Configuration

from peft import LoraConfig, PeftModel

# Load multiple adapters
peft_model = PeftModel.from_pretrained(
    base_model,
    adapters={
        "adapter_1": "./path/to/adapter_1",
        "adapter_2": "./path/to/adapter_2",
    },
)

Quantization with LoRA

from peft import get_peft_model, LoraConfig, prepare_model_for_kbit_training
from transformers import BitsAndBytesConfig

quantization_config = BitsAndBytesConfig(load_in_8bit=True)
model = AutoModelForCausalLM.from_pretrained(
    "meta-llama/Llama-2-7b",
    quantization_config=quantization_config,
)

model = prepare_model_for_kbit_training(model)
peft_model = get_peft_model(model, lora_config)

Integration with PeftModel

All LoRA variants integrate with the base PeftModel architecture through the tuner pattern:

graph LR
    A[Base Transformers Model] --> B[PeftModel]
    B --> C[BaseModel Class]
    C --> D[LoraModel / VariantModel]
    D --> E[Adapter Injection]
    E --> F[Modified Forward]

The PeftModel class provides unified interfaces for:

  • Forward pass handling
  • Adapter switching
  • Save/load operations
  • Parameter printing

Sources: src/peft/peft_model.py:1-100

Design Patterns

Tuner Layer Class Structure

Each LoRA variant implements a tuner_layer_cls attribute that defines the layer wrapper class:

class LoraModel(BaseTuner):
    tuner_layer_cls = LoraLayer
    
class LoHaModel(BaseTuner):
    prefix: str = "hada_"
    tuner_layer_cls = LoHaLayer
    layers_mapping: dict[type[torch.nn.Module], type[LoHaLayer]] = {
        torch.nn.Conv2d: Conv2d,
        torch.nn.Conv1d: Conv1d,
        torch.nn.Linear: Linear,
    }

Target Module Mapping

Variants define target module mappings for automatic module detection:

class LoraModel(BaseTuner):
    target_module_mapping = TRANSFORMERS_MODULES_TO_LORA_TARGET_MODULES_MAPPING

class ShiraModel(BaseTuner):
    prefix: str = "shira_"
    tuner_layer_cls = ShiraLayer
    target_module_mapping = TRANSFORMERS_MODELS_TO_SHIRA_TARGET_MODULES_MAPPING

Sources: src/peft/tuners/shira/model.py:40-45

Conclusion

LoRA and its variants in the PEFT library provide a comprehensive suite of parameter-efficient fine-tuning techniques. The shared plugin architecture enables consistent APIs across variants while allowing each method to implement its unique adaptation strategy. From basic low-rank decomposition to advanced block-diagonal structures, PEFT supports a wide range of fine-tuning scenarios with minimal computational overhead.

Sources: [src/peft/tuners/lora/model.py:1-100]()

Other PEFT Methods

Related topics: LoRA and LoRA Variants, Configuration System

Section Related Pages

Continue reading this section for the full explanation and source context.

Section Prompt Tuning

Continue reading this section for the full explanation and source context.

Section Prefix Tuning

Continue reading this section for the full explanation and source context.

Section P-Tuning

Continue reading this section for the full explanation and source context.

Related topics: LoRA and LoRA Variants, Configuration System

Other PEFT Methods

PEFT (Parameter-Efficient Fine-Tuning) encompasses a diverse collection of techniques beyond LoRA and QLoRA. These methods offer alternative approaches to adapting pre-trained models with minimal parameter updates, each with distinct mechanisms, trade-offs, and optimal use cases. This page provides a comprehensive overview of the "Other PEFT Methods" available in the Hugging Face PEFT library.

Overview of PEFT Method Categories

The PEFT library organizes fine-tuning methods into several categories based on their core adaptation mechanism. Understanding these categories helps practitioners select the appropriate method for their specific requirements.

graph TD
    A[PEFT Methods] --> B[Prompt-Based Methods]
    A --> C[Additive Methods]
    A --> D[Reparameterization Methods]
    A --> E[Multiplicative Methods]
    A --> F[Subspace Methods]
    
    B --> B1[Prompt Tuning]
    B --> B2[Prefix Tuning]
    B --> B3[P-Tuning]
    B --> B4[MultiTask Prompt Tuning]
    
    C --> C1[IA³]
    
    D --> D1[LoRA Variants<br/>AdaLoRA, Gralora, HiRA]
    
    E --> E1[OFT]
    
    F --> F1[FourierFT]

Prompt-Based Methods

Prompt-based methods modify the model's input or activation space without changing the underlying model weights. These methods add trainable parameters as virtual tokens or prefix embeddings that guide the model's behavior.

Prompt Tuning

Prompt Tuning introduces trainable "soft prompts" (embedding vectors) that are prepended to the input tokens. Unlike discrete text prompts, these are continuous vectors learned through backpropagation during fine-tuning.

Key Characteristics:

  • Only the prompt embeddings are trainable
  • No architectural changes to the base model
  • Requires relatively few parameters compared to full fine-tuning
  • Works well with larger models

Configuration Parameters:

ParameterTypeDefaultDescription
num_virtual_tokensint20Number of virtual tokens in the prompt
prompt_tuning_initstr"TEXT"Initialization method for prompts
prompt_tuning_init_textstrNoneText for TEXT initialization
token_dimintModel hidden dimDimension of model embeddings
num_transformer_submodulesint1Number of transformer layers with prompts
num_attention_headsintModel headsNumber of attention heads
num_layersintModel layersNumber of transformer layers
encoder_hidden_sizeintSame as token_dimHidden size for encoder

Sources: src/peft/tuners/prompt_tuning/__init__.py

Prefix Tuning

Prefix Tuning adds trainable parameters to the attention mechanism by prepending learnable prefix vectors to the keys and values in every attention layer. Unlike Prompt Tuning, this affects all transformer layers directly.

Architecture:

graph LR
    A[Input Tokens] --> B[Embedding Layer]
    B --> C[Prefix P<sub>k</sub>, P<sub>v</sub>]
    B --> D[Standard K, V]
    C --> E[Multi-Head Attention]
    D --> E
    E --> F[Output]

Key Differences from Prompt Tuning:

  • Affects hidden states at every transformer layer
  • More parameter-efficient than full prompt tuning in some scenarios
  • Requires specification of prefix projection for deeper integration

Sources: src/peft/tuners/prefix_tuning/__init__.py

P-Tuning

P-Tuning uses trainable continuous embeddings combined with a prompt encoder (typically an LSTM or MLP) to generate prompts. The encoder processes anchor tokens and produces virtual token embeddings.

Unique Features:

  • Uses a small LSTM/MLP encoder to generate prompt embeddings
  • Supports "anchor" tokens that provide natural language hints
  • More flexible than pure continuous prompts

Sources: src/peft/tuners/p_tuning/__init__.py

MultiTask Prompt Tuning (MPT)

MultiTask Prompt Tuning extends standard prompt tuning by learning a shared prompt across multiple related tasks. This enables knowledge transfer and typically improves generalization.

Use Cases:

  • Multi-task learning scenarios
  • Domain adaptation with related tasks
  • Few-shot learning with task similarity

Sources: src/peft/tuners/multitask_prompt_tuning/__init__.py

(IA)³ - Infused Adapter by Inhibiting and Amplifying Inner Activations

(IA)³ is a multiplicative adapter method that scales activations by learned vectors. It introduces trainable vectors that multiply with hidden states at specific positions in the transformer architecture.

Mechanism

graph TD
    A[Hidden Activation h] --> B[Learned Vector l<sub>i</sub>]
    B --> C[Element-wise Multiplication]
    A --> C
    C --> D[h<sub>modified</sub> = l<sub>i</sub> ⊙ h]
    D --> E[Feed-Forward<br/>or Attention]

Configuration Options

ParameterTypeDefaultDescription
rint8Rank (not used in IA³ but kept for compatibility)
target_moduleslistNoneModules to apply IA³ to
fan_in_fan_outboolFalseTranspose weights
init_weightsboolTrueInitialize adapter weights

Supported Target Modules

The IA³ method typically targets attention-related and feed-forward layers:

  • q_proj, k_proj, v_proj, o_proj (attention projections)
  • fc1, fc2 (feed-forward layers)
  • gate_proj, up_proj, down_proj (for modern architectures like Llama)

Sources: src/peft/tuners/ia3/__init__.py Sources: docs/source/conceptual_guides/ia3.md

OFT - Orthogonal Fine-Tuning

OFT constrains the fine-tuning updates to an orthogonal subspace, ensuring that the learned adapters do not interfere with each other. This method is particularly useful for multi-adapter scenarios.

Key Principle

OFT optimizes a rotation matrix R such that the updated weights maintain orthogonality constraints:

W_new = W_original + β · R

Where R is constrained to be orthogonal, preventing gradient interference.

Configuration Parameters

ParameterTypeDefaultDescription
rint4Rank of the OFT transformation
target_moduleslist["q_proj", "v_proj"]Layers to adapt
module_dropoutfloat0.0Dropout probability for modules
init_weightsboolTrueInitialize with pretrained weights

Use Cases

  • Stable diffusion model adaptation (text encoder, UNet)
  • Multi-task learning with non-interfering adapters
  • Computer vision models requiring structured updates

Sources: src/peft/tuners/oft/__init__.py

FourierFT - Fourier Transform-Based Fine-Tuning

FourierFT operates in the frequency domain, learning adapters in Fourier space rather than the original weight space. This approach can capture different aspects of the model's behavior compared to spatial-domain methods.

Advantages

  • May capture global patterns more efficiently
  • Different inductive bias compared to spatial methods
  • Potential for more compact representations

Sources: src/peft/tuners/fourierft/__init__.py

Advanced LoRA Variants

AdaLoRA - Adaptive LoRA

AdaLoRA dynamically adjusts the rank of LoRA adaptations based on the importance of different weight matrices. It uses a budget allocation mechanism to invest more parameters in important layers.

Key Method: update_and_allocate

# Called during training loop
model.base_model.update_and_allocate(global_step)

This method updates importance scores and reallocates the rank budget based on the current training step.

Sources: src/peft/tuners/adalora/model.py

HiRA - Hierarchical Rank Adaptation

HiRA extends LoRA with hierarchical rank adaptation, allowing for more nuanced parameter allocation across different model layers.

Sources: src/peft/tuners/hira/model.py

GraLoRA - Gradient-Aware LoRA

GraLoRA considers gradient information when adapting LoRA layers, optimizing the adapter placement based on gradient flow.

Sources: src/peft/tuners/gralora/model.py

Special-Purpose Methods

SHiRA - Structured Hints for Rank Adaptation

SHiRA provides structured hints for rank adaptation, offering a different approach to parameter-efficient fine-tuning with emphasis on interpretability.

Sources: src/peft/tuners/shira/model.py

MiSS - Mixed Subspace Adaptation

MiSS adapts models in a mixed subspace, combining multiple adaptation strategies for enhanced flexibility.

Sources: src/peft/tuners/miss/model.py

Adamss - Adaptive Subspace Selection

Adamss uses adaptive subspace selection for fine-tuning, choosing the most relevant subspaces based on the task at hand.

ParameterTypeDefaultDescription
rint500Rank dimension
num_subspacesint5Number of subspaces
target_moduleslist["q_proj", "v_proj"]Target layers

Sources: src/peft/tuners/adamss/model.py

X-LoRA

X-LoRA supports multiple LoRA adapters with dynamic routing, allowing for sophisticated multi-adapter architectures.

Sources: src/peft/tuners/xlora/model.py

Comparison of Methods

MethodCategoryTrainable ParametersBest ForSupports Multi-Adapter
Prompt TuningPrompt-BasedVery LowLarge models, text tasksYes
Prefix TuningPrompt-BasedLowText generationYes
P-TuningPrompt-BasedLow-MediumNLU tasksYes
MPTPrompt-BasedMediumMulti-task learningYes
(IA)³MultiplicativeLowEfficient scalingYes
OFTMultiplicativeLow-MediumStable diffusion, CVYes
FourierFTFrequency-DomainLowGlobal patternsYes
AdaLoRAReparameterizationVariableDynamic budgetsYes
X-LoRAReparameterizationMedium-HighComplex routingYes

Unified API Usage

All PEFT methods follow a consistent API pattern through get_peft_model:

from peft import get_peft_model, PromptTuningConfig

config = PromptTuningConfig(
    task_type="SEQ_CLS",
    num_virtual_tokens=20,
    prompt_tuning_init="TEXT",
    prompt_tuning_init_text="Classify the sentiment:"
)

model = AutoModelForSequenceClassification.from_pretrained("bert-base-cased")
peft_model = get_peft_model(model, config)
peft_model.print_trainable_parameters()

Sources: docs/source/conceptual_guides/prompting.md

Best Practices

Method Selection Guidelines

  1. For Large Language Models (>7B parameters): Prompt Tuning, Prefix Tuning, or LoRA variants
  2. For Image Models: OFT, (IA)³
  3. For Multi-Task Scenarios: MultiTask Prompt Tuning, X-LoRA
  4. For Limited Compute: (IA)³, standard Prompt Tuning
  5. For Maximum Flexibility: AdaLoRA (dynamic rank allocation)

Common Configuration Patterns

# Efficient configuration for most cases
config = LoraConfig(
    r=8,
    lora_alpha=16,
    target_modules=["q_proj", "v_proj"],
    lora_dropout=0.05,
    task_type="CAUSAL_LM"
)

# For prompt-based methods
config = PromptTuningConfig(
    num_virtual_tokens=50,
    task_type="SEQ_CLS"
)

Summary

The PEFT library provides a comprehensive suite of fine-tuning methods beyond LoRA and QLoRA. These methods offer diverse trade-offs in terms of parameter efficiency, task performance, and computational requirements. By understanding the mechanisms and use cases of each method, practitioners can select the most appropriate technique for their specific model adaptation needs.

Key takeaways:

  • Prompt-based methods modify input representations without changing model weights
  • Multiplicative methods (IA)³, OFT scale or rotate weights
  • Advanced LoRA variants provide dynamic optimization capabilities
  • All methods support multi-adapter scenarios and can be combined through the unified PEFT API

Sources: [src/peft/tuners/prompt_tuning/__init__.py](https://github.com/huggingface/peft/blob/main/src/peft/tuners/prompt_tuning/__init__.py)

Configuration System

Related topics: Core Components, Model Loading and Saving, LoRA and LoRA Variants

Section Related Pages

Continue reading this section for the full explanation and source context.

Section PeftConfig Base Class

Continue reading this section for the full explanation and source context.

Section PeftType Enumeration

Continue reading this section for the full explanation and source context.

Section TaskType Enumeration

Continue reading this section for the full explanation and source context.

Related topics: Core Components, Model Loading and Saving, LoRA and LoRA Variants

Configuration System

Overview

The PEFT (Parameter-Efficient Fine-Tuning) library implements a comprehensive configuration system that enables flexible and modular adapter integration across various transformer architectures. This system decouples adapter-specific parameters from model architecture, allowing users to define fine-tuning strategies through declarative configuration objects.

The configuration system serves as the foundational layer for all PEFT adapters, providing:

  • Unified configuration interface across different fine-tuning methods
  • Automatic model patching based on target module specifications
  • Serialization and deserialization support for model saving/loading
  • Multi-adapter management capabilities
graph TD
    A[User Configuration] --> B[PeftConfig Subclass]
    B --> C{Adapter Type}
    C -->|LoRA| D[LoraConfig]
    C -->|Prefix| E[PrefixTuningConfig]
    C -->|Prompt| F[PromptEncoderConfig]
    C -->|IA³| G[Ia3Config]
    C -->|Others| H[Tuner-Specific Config]
    
    D --> I[get_peft_model]
    E --> I
    F --> I
    G --> I
    H --> I
    
    I --> J[PeftModel Base]
    J --> K[BaseTuner.inject_adapter]
    K --> L[Model Patching]

Core Components

PeftConfig Base Class

The PeftConfig class is the foundational configuration object in PEFT, inheriting from transformers.PretrainedConfig. It provides the base interface for all adapter configurations.

Key Attributes:

AttributeTypeDescription
peft_typePeftTypeEnum specifying the adapter method
task_typeTaskTypeEnum specifying the ML task type
inference_modeboolWhether model is in inference mode
auto_mappingOptional[dict]Custom auto-mapping for loading
base_model_name_or_pathstrPath/identifier of base model
revisionstrModel revision for Hub models
pad_token_idOptional[int]Padding token ID

Source: src/peft/config.py

PeftType Enumeration

The PeftType enum defines all supported parameter-efficient fine-tuning methods:

ValueDescription
LORALow-Rank Adaptation
PROMPT_TUNINGSoft prompt tuning
PREFIX_TUNINGPrefix tuning
P_TUNINGP-tuning (prompt encoder)
IA3Infused Adapter by Inhibiting and Amplifying Inner Activations
ADALORAAdaptive LoRA
ADAPTION_PROMPTAdapter tuning with adaptive prompt
POLYPoly (Polynomial)
LNTYPOLYLinear typographic polynomial
HRAHeterogeneous Re-Attention
GRALORAGradient Routing LoRA
SHIRAShifting Rank Adaptation
XLORAX-LoRA (Cross-Layer LoRA)
MISSMulti-Adapter Sparse Structure
HIRAHierarchical Reattention
ADAMSSAdaptive Subspaces Selection

Source: src/peft/utils/peft_types.py:1-50

TaskType Enumeration

The TaskType enum specifies the machine learning task type:

ValueDescription
SEQ_CLSSequence Classification
SEQ_2_SEQ_LMSequence-to-Sequence Language Modeling
CAUSAL_LMCausal Language Modeling
TOKEN_CLSToken Classification
QUESTION_ANSQuestion Answering
FEATURE_EXTRACTIONFeature Extraction / Embeddings
MULTIPLE_CHOICEMultiple Choice
IMAGE_CLASSIFICATIONImage Classification

Source: src/peft/utils/peft_types.py:50-80

Tuner-Specific Configurations

LoraConfig

The LoraConfig class configures LoRA (Low-Rank Adaptation) adapters:

ParameterTypeDefaultDescription
rint8LoRA attention dimension (rank)
target_modulesOptional[Union[List[str], str]]NoneModules to apply LoRA to
lora_alphaint8LoRA alpha scaling parameter
lora_dropoutfloat0.0Dropout probability for LoRA layers
fan_in_fan_outboolFalseSet to transpose weight (for conv layers)
biasstr"none"Bias type: "none", "all", "lora_only"
modules_to_saveOptional[List[str]]NoneModules to make trainable
init_lora_weightsUnion[bool, str]TrueInitialization strategy

Example Configuration:

config = {
    "peft_type": "LORA",
    "task_type": "CAUSAL_LM",
    "r": 16,
    "target_modules": ["q_proj", "v_proj"],
    "lora_alpha": 32,
    "lora_dropout": 0.05,
}
peft_config = get_peft_config(config)

Source: src/peft/tuners/lora/model.py

PrefixTuningConfig

Configuration for prefix-based prompt learning:

ParameterTypeDefaultDescription
num_virtual_tokensintNoneNumber of virtual tokens
token_dimintNoneDimensionality of token embeddings
num_transformer_submodulesint1Number of transformer modules
num_attention_headsint12Number of attention heads
num_layersint12Number of layers
encoder_hidden_sizeintNoneEncoder hidden size
prefix_projectionboolFalseWhether to project prefix

Source: src/peft/peft_model.py

Configuration Loading and Saving

Loading Configurations

The configuration system supports loading from both local paths and Hugging Face Hub:

# From Hub
peft_config = PeftConfig.from_pretrained("user/peft-model")

# From dictionary
peft_config = get_peft_config(config_dict)

# Via mapping
config = PeftConfig.from_pretrained(
    model_name_or_path,
    **hf_kwargs
)

The from_pretrained method handles:

  • Subfolder paths via subfolder parameter
  • Model revisions via revision parameter
  • Authentication tokens via token or use_auth_token parameters

Source: src/peft/config.py, src/peft/mixed_model.py

Saving Configurations

Configurations can be serialized using the standard Hugging Face save_pretrained method:

peft_config.save_pretrained("output-directory")

Auto-Mapping

The auto_mapping parameter enables custom configuration-to-model mappings, particularly useful for custom adapters or third-party integrations:

peft_config = PeftConfig.from_pretrained(
    "model-id",
    auto_mapping={"custom_key": CustomAdapterClass}
)

Adapter Injection Workflow

sequenceDiagram
    participant User
    participant PeftModel
    participant BaseTuner
    participant Config
    participant TargetModule
    
    User->>PeftModel: __init__(model, peft_config)
    PeftModel->>BaseTuner: inject_adapter(model, adapter_name)
    BaseTuner->>Config: Validate peft_config
    Config->>Config: Check target_module_compatibility
    
    loop For each target module
        BaseTuner->>TargetModule: Identify target layer
        BaseTuner->>BaseTuner: _create_and_replace(...)
        BaseTuner->>TargetModule: Replace with adapter layer
    end
    
    PeftModel-->>User: Ready model

The injection process:

  1. Validates configuration compatibility with target modules
  2. Identifies modules matching target_modules patterns
  3. Creates adapter layers via _create_and_replace method
  4. Replaces original modules with adapter wrappers
  5. Marks appropriate parameters as trainable

Source: src/peft/tuners/tuners_utils.py

Multi-Adapter Configuration

PEFT supports multiple adapters through the adapter naming system:

# Load multiple adapters
peft_model = PeftModel.from_pretrained(
    base_model, 
    "adapter-1-path",
    adapter_name="adapter_1"
)
peft_model.load_adapter("adapter-2-path", adapter_name="adapter_2")

# Set active adapter
peft_model.set_adapter("adapter_1")

Each adapter maintains its own configuration accessible via:

peft_model.peft_config["adapter_name"]

Source: src/peft/tuners/tuners_utils.py, src/peft/helpers.py

Integration with Model Types

Model-Specific Configurations

Different model architectures require specific configuration handling:

Model TypePeftModel ClassSpecial Config Parameters
Causal LMPeftModelForCausalLMStandard LoRA/Prefix
Seq2SeqPeftModelForSeq2SeqLMprepare_inputs_for_generation
Seq ClassificationPeftModelForSequenceClassificationclassifier_module_names
Token ClassificationPeftModelForTokenClassificationclassifier_module_names
Question AnsweringPeftModelForQuestionAnsweringqa_module_names
Feature ExtractionPeftModelForFeatureExtractionStandard config

Source: src/peft/peft_model.py

Target Module Mapping

Each tuner type defines a target_module_mapping that specifies compatible layers for different model architectures:

# Example structure in tuners
target_module_mapping = TRANSFORMERS_MODELS_TO_LORA_TARGET_MODULES_MAPPING

This mapping ensures adapters are only applied to compatible modules (e.g., preventing LoRA application to incompatible modules in Mamba architectures).

Source: src/peft/tuners/lora/model.py, src/peft/tuners/tuners_utils.py

Advanced Configuration Features

Mixed Model Configuration

For models requiring multiple adapter types:

# Load mixed configuration
mixed_model = PeftMixedModel.from_pretrained(
    model,
    peft_model_id="mixed-peft-model",
    config=mixed_config
)

Hotswap Adapters

The hotswap functionality allows runtime adapter replacement:

from peft import hotswap_adapter

hotswap_adapter(
    model, 
    "path-to-new-adapter", 
    adapter_name="default",
    torch_device="cuda:0"
)

Source: src/peft/utils/hotswap.py

Context Manager for Adapter Scaling

Temporarily rescale adapter scaling:

from peft import rescale_adapter_scale

with rescale_adapter_scale(model, multiplier=0.5):
    output = model(inputs)

Source: src/peft/helpers.py

Configuration Validation

Target Module Compatibility

The configuration system validates target modules against model architecture:

def _check_target_module_compatiblity(self, peft_config, model, target_name):
    _check_lora_target_modules_mamba(peft_config, model, target_name)

This prevents applying adapters to incompatible modules in specific architectures.

Source: src/peft/tuners/tuners_utils.py

PEFT Type Detection

Automatic PEFT type detection from model paths:

peft_type = PeftConfig._get_peft_type(model_name_or_path, **hf_kwargs)
config_cls = PEFT_TYPE_TO_CONFIG_MAPPING[peft_type]

Best Practices

  1. Always specify task_type: Helps PEFT apply correct model wrapper
  2. Use target_modules wisely: Restricting to key layers reduces memory
  3. Set inference_mode=False for training: Required for gradient computation
  4. Save adapter config alongside weights: Ensures reproducibility
  5. Use modules_to_save sparingly: Only for task-specific heads

See Also

Source: https://github.com/huggingface/peft / Human Manual

Model Loading and Saving

Related topics: Core Components, Configuration System, Quantization Integration

Section Related Pages

Continue reading this section for the full explanation and source context.

Section Loading from Pretrained

Continue reading this section for the full explanation and source context.

Section Using getpeftmodel

Continue reading this section for the full explanation and source context.

Section Loading Parameters

Continue reading this section for the full explanation and source context.

Related topics: Core Components, Configuration System, Quantization Integration

Model Loading and Saving

Overview

The PEFT (Parameter-Efficient Fine-Tuning) library provides a comprehensive system for loading, saving, and managing adapter-based model configurations. This system enables users to efficiently fine-tune large language models by training only a small subset of parameters while maintaining the ability to save, load, and merge adapters with the base model.

The loading and saving architecture in PEFT is designed to be:

  • Interoperable: Adapters can be shared via Hugging Face Hub
  • Flexible: Multiple adapters can coexist and be switched dynamically
  • Memory-efficient: Supports low CPU memory usage during loading
  • Non-destructive: Original base models remain unmodified

Sources: src/peft/tuners/tuners_utils.py:1-50

Architecture

graph TD
    A[Base Model] --> B[PeftModel]
    B --> C[Adapter 1]
    B --> D[Adapter 2]
    B --> N[Adapter N]
    
    E[save_pretrained] --> F[adapter_config.json]
    E --> G[adapter_model.safetensors]
    
    H[from_pretrained] --> I[Load Base Model]
    H --> J[Load Adapter Config]
    H --> K[Inject Adapters]
    
    L[merge_and_unload] --> M[Merged Base Model]
    L --> N[No Adapters]
    
    O[unload] --> P[Original Base Model]
    O --> Q[Adapters Removed]

Loading PEFT Models

Loading from Pretrained

The PeftModel.from_pretrained() class method loads a PEFT model configuration and applies it to a base model:

from peft import PeftModel, PeftConfig

# Load PEFT configuration
peft_config = PeftConfig.from_pretrained("path/to/peft_model")

# Load base model
base_model = AutoModelForCausalLM.from_pretrained("base_model_name")

# Create PEFT model with loaded adapters
peft_model = PeftModel.from_pretrained(base_model, "path/to/peft_model")

Using get_peft_model

For creating new PEFT models from scratch:

from peft import get_peft_model, LoraConfig, TaskType

config = LoraConfig(
    task_type=TaskType.CAUSAL_LM,
    r=16,
    lora_alpha=32,
    target_modules=["q_proj", "v_proj"],
    lora_dropout=0.05,
)

model = AutoModelForCausalLM.from_pretrained("base_model")
peft_model = get_peft_model(model, config)

Sources: src/peft/peft_model.py:1-100

Loading Parameters

ParameterTypeDefaultDescription
modeltorch.nn.ModuleRequiredThe base model to apply PEFT to
model_idstrRequiredPath or HF Hub identifier for PEFT checkpoint
adapter_namestr"default"Name for the loaded adapter
is_trainableboolFalseWhether adapter should be trainable
low_cpu_mem_usageboolFalseCreate weights on meta device for faster loading
torch_dtypetorch.dtypeNoneData type for loaded weights
device_mapstr/dictNoneDevice placement strategy

Sources: src/peft/peft_model.py:100-200

Saving PEFT Models

Saving to Disk

The save_pretrained() method saves the PEFT adapter weights and configuration:

peft_model.save_pretrained("output/path")

This creates:

Save Configuration Options

ParameterTypeDescription
save_adaptersboolWhether to save all adapters (default: True)
adapter_namesList[str]Specific adapters to save (default: all active)
safe_serializationboolUse safetensors format (default: True)

Merging and Unloading

Merge and Unload

The merge_and_unload() method merges all adapter weights into the base model and returns the combined model:

from transformers import AutoModelForCausalLM

base_model = AutoModelForCausalLM.from_pretrained("base_model")
peft_model = PeftModel.from_pretrained(base_model, "path/to/peft_model")

# Merge adapters into base model
merged_model = peft_model.merge_and_unload()

This operation:

  • Combines adapter weights with base model weights
  • Removes PEFT wrapper layers
  • Returns a standard HuggingFace model

Sources: src/peft/tuners/tuners_utils.py:1-100

Safe Merge

For secure merging with validation:

merged_model = peft_model.merge_and_unload(safe_merge=True)

Safe merge checks tensor shapes and dtypes before merging to prevent corruption.

Unload

The unload() method removes all PEFT adapters and returns the original base model:

base_model = peft_model.unload()

Unlike merge_and_unload(), this operation:

  • Does not modify model weights
  • Simply removes PEFT wrapper layers
  • Returns the original base model unchanged
graph LR
    A[PeftModel] -->|merge_and_unload| B[Merged Base Model]
    A -->|unload| C[Original Base Model]
    
    B --> D[Combined Weights]
    C --> E[Original Weights Intact]

Sources: src/peft/tuners/tuners_utils.py:100-200

Merge Utilities

The merge_utils.py module provides low-level merging functions:

FunctionDescription
merge_linear_weightsMerges LoRA weights into linear layers
merge_qkv_weightsMerges QKV attention weights
merge叠加Generic merge operation

Multi-Adapter Management

Adding Multiple Adapters

PEFT supports loading multiple adapters onto a single base model:

peft_model.load_adapter("path/to/adapter1", adapter_name="adapter1")
peft_model.load_adapter("path/to/adapter2", adapter_name="adapter2")

Switching Active Adapters

# Set active adapter
peft_model.set_adapter("adapter1")

# Enable adapter fusion for inference
peft_model.enable_fusion()

Merging Specific Adapters

# Merge only specific adapters
merged_model = peft_model.merge_and_unload(adapter_names=["adapter1"])

Signature Updates

When using PEFT models with adapters, the model signatures may differ from the base model. PEFT provides utility functions to update signatures:

Update Forward Signature

from peft import update_forward_signature

update_forward_signature(peft_model)

This allows help(peft_model.forward) to show the full signature including parameters from parent classes.

Update Generate Signature

from peft import update_generate_signature

update_generate_signature(peft_model)

Enables help(peft_model.generate) to display the complete generation parameters.

Sources: src/peft/helpers.py:1-100

Checking PEFT Models

Use check_if_peft_model() to verify if a model path contains a PEFT configuration:

from peft import check_if_peft_model

is_peft = check_if_peft_model("path/to/model")

This function:

  • Attempts to load a adapter_config.json
  • Returns True if valid PEFT config found
  • Returns False otherwise

Sources: src/peft/helpers.py:100-200

Loading with Quantization

PEFT models can be loaded with quantized base models using BitsAndBytes:

from transformers import AutoModelForCausalLM, BitsAndBytesConfig
from peft import prepare_model_for_kbit_training

quantization_config = BitsAndBytesConfig(load_in_8bit=True)
base_model = AutoModelForCausalLM.from_pretrained(
    "model_name",
    quantization_config=quantization_config,
)

base_model = prepare_model_for_kbit_training(base_model)
peft_model = get_peft_model(base_model, lora_config)

Sources: src/peft/tuners/lora/model.py:1-100

Rescaling Adapter Scale

The rescale_adapter_scale() context manager temporarily adjusts adapter scaling:

from peft import rescale_adapter_scale

with rescale_adapter_scale(model, multiplier=0.5):
    output = model(inputs)  # Scaled by 0.5

Sources: src/peft/helpers.py:200-300

Workflow Diagram

graph TD
    A[Start] --> B{Load Base Model}
    B --> C[Load PEFT Config]
    C --> D{Existing Adapter?}
    
    D -->|Yes| E[from_pretrained]
    D -->|No| F[get_peft_model]
    
    E --> G[PeftModel with Adapters]
    F --> H[PeftModel with New Config]
    
    G --> I{Training}
    H --> I
    
    I --> J[Train Adapters]
    J --> K[save_pretrained]
    
    K --> L[Share via Hub]
    
    I --> M{Inference}
    M --> N{Use Merged?}
    
    N -->|Yes| O[merge_and_unload]
    N -->|No| P[Use with Adapters]
    
    O --> Q[Merged Model]
    P --> R[Forward with Adapters]

Best Practices

  1. Memory Optimization: Use low_cpu_mem_usage=True when loading large adapters to speed up the process
  2. Safe Serialization: Always use save_pretrained() with safe_serialization=True (default) for secure model sharing
  3. Multiple Adapters: Load adapters with distinct names and switch between them using set_adapter()
  4. Signature Updates: Call update_forward_signature() and update_generate_signature() for better IDE support
  5. Quantization: Prepare quantized models with prepare_model_for_kbit_training() before applying PEFT

Sources: [src/peft/tuners/tuners_utils.py:1-50]()

Quantization Integration

Related topics: LoRA and LoRA Variants, Model Loading and Saving, Advanced Features

Section Related Pages

Continue reading this section for the full explanation and source context.

Section Quantization Integration Flow

Continue reading this section for the full explanation and source context.

Section Module Replacement Strategy

Continue reading this section for the full explanation and source context.

Section Configuration

Continue reading this section for the full explanation and source context.

Related topics: LoRA and LoRA Variants, Model Loading and Saving, Advanced Features

Quantization Integration

PEFT (Parameter-Efficient Fine-Tuning) provides comprehensive support for integrating quantized base models with various parameter-efficient fine-tuning methods. This integration enables training large models that would otherwise require prohibitive amounts of memory by combining quantization techniques with PEFT adapters.

Overview

Quantization integration in PEFT allows users to:

  • Load base models in quantized form (8-bit, 4-bit, or other formats) to reduce memory footprint
  • Apply PEFT adapters (LoRA, IA³, LoHa, LoKr, etc.) on top of quantized layers
  • Fine-tune the adapters while keeping the quantized base model frozen
  • Maintain model quality while significantly reducing GPU memory requirements

Sources: src/peft/tuners/lora/model.py

Supported Quantization Methods

PEFT supports multiple quantization backends through integration with popular quantization libraries.

Quantization MethodBackend LibraryPrecision OptionsStatus
BitsAndBytesbitsandbytes8-bit, 4-bitFully Supported
GPTQauto-gptq4-bitFully Supported
AWQawq4-bitFully Supported
AQLMaqlmMixed-bitFully Supported
EETQeetq8-bitFully Supported
HQQhqqConfigurableFully Supported

Architecture

Quantization Integration Flow

graph TD
    A[Base Model Loading] --> B{Quantization Backend}
    B -->|bitsandbytes| C[BitsAndBytes 8-bit/4-bit]
    B -->|GPTQ| D[GPTQ 4-bit]
    B -->|AWQ| E[AWQ 4-bit]
    B -->|AQLM| F[AQLM]
    B -->|EETQ| G[EETQ 8-bit]
    B -->|HQQ| H[HQQ]
    
    C --> I[PEFT Adapter Injection]
    D --> I
    E --> I
    F --> I
    G --> I
    H --> I
    
    I --> J[LoRA / IA³ / LoHa / LoKr Layers]
    J --> K[Fine-tuning with Frozen Quantized Base]

Module Replacement Strategy

When applying PEFT adapters to quantized models, the system replaces specific linear layers with quantized-aware versions that preserve quantization state.

graph LR
    A[Original Linear / Quantized Linear] --> B{Is Quantized?}
    B -->|Yes - 8-bit| C[Linear8bitLt + Adapter]
    B -->|Yes - 4-bit| D[Linear4bit + Adapter]
    B -->|No| E[Linear + Adapter]
    
    C --> F[Forward with Quantization]
    D --> F
    E --> F

BitsAndBytes Integration

The BitsAndBytes integration provides 8-bit and 4-bit quantization support through the bitsandbytes library.

Configuration

from transformers import AutoModelForCausalLM, BitsAndBytesConfig
from peft import get_peft_model, LoraConfig

quantization_config = BitsAndBytesConfig(
    load_in_8bit=True  # or load_in_4bit=True
)

model = AutoModelForCausalLM.from_pretrained(
    "model_name",
    quantization_config=quantization_config,
)

peft_config = LoraConfig(
    r=16,
    lora_alpha=32,
    target_modules=["q_proj", "v_proj"],
)

peft_model = get_peft_model(model, peft_config)

8-bit Layer Implementation

When loading an 8-bit model, PEFT replaces standard linear layers with Linear8bitLt that inherits quantization state from the base layer:

# From src/peft/tuners/ia3/model.py
if loaded_in_8bit and isinstance(target_base_layer, bnb.nn.Linear8bitLt):
    eightbit_kwargs = kwargs.copy()
    eightbit_kwargs.update(
        {
            "has_fp16_weights": target_base_layer.state.has_fp16_weights,
            "threshold": target_base_layer.state.threshold,
            "index": target_base_layer.index,
        }
    )

Sources: src/peft/tuners/ia3/model.py:40-49

4-bit Layer Implementation

Similarly, 4-bit quantized layers are handled with Linear4bit:

if loaded_in_4bit and isinstance(target_base_layer, bnb.nn.Linear4bit):
    fourbit_kwargs = kwargs.copy()
    fourbit_kwargs.update(
        {
            "quant_type": target_base_layer.quant_type,
            "compute_dtype": target_base_layer.compute_dtype,
            "compress_statistics": target_base_layer.weight._quantize_state,
        }
    )

Sources: src/peft/tuners/ia3/model.py:50-56

Preparing Quantized Models for Training

PEFT provides the prepare_model_for_kbit_training utility function to prepare quantized models for training with PEFT adapters.

Function Signature

def prepare_model_for_kbit_training(
    model,
    use_gradient_checkpointing: bool = True,
    layer_replication: Optional[List[Tuple[int, int]]] = None,
):

Sources: src/peft/helpers.py

Key Operations

  1. Gradient Checkpointing: Enables gradient checkpointing to save memory during backpropagation
  2. Layer Replication: Optionally replicates layers for certain architectures
  3. Cast Forward Parameters: Ensures proper dtype handling for training

Usage Example

from peft import prepare_model_for_kbit_training

# After loading quantized model
model = AutoModelForCausalLM.from_pretrained(
    "mistralai/Mistral-7B-Instruct-v0.1",
    quantization_config=int8_config,
    device_map="cuda:0",
)

# Prepare for k-bit training
model = prepare_model_for_kbit_training(model, use_gradient_checkpointing=True)

Supported Tuners with Quantization

All major PEFT tuners support integration with quantized base models:

Tuner8-bit Support4-bit SupportFile Location
LoRAsrc/peft/tuners/lora/
IA³src/peft/tuners/ia3/
LoHasrc/peft/tuners/loha/
LoKrsrc/peft/tuners/lokr/
AdaLoRAsrc/peft/tuners/adalora/
OALoRAsrc/peft/tuners/oaloora/

Layer Class Mappings

Each tuner defines specific layer mappings for different layer types:

# From src/peft/tuners/lokr/model.py
layers_mapping: dict[type[torch.nn.Module], type[LoKrLayer]] = {
    torch.nn.Conv2d: Conv2d,
    torch.nn.Conv1d: Conv1d,
    torch.nn.Linear: Linear,
}

# From src/peft/tuners/loha/model.py  
layers_mapping: dict[type[torch.nn.Module], type[LoHaLayer]] = {
    torch.nn.Conv2d: Conv2d,
    torch.nn.Conv1d: Conv1d,
    torch.nn.Linear: Linear,
}

Sources: src/peft/tuners/lokr/model.py:87-90 Sources: src/peft/tuners/loha/model.py:79-82

Base Tuner Layer Properties

All quantized-aware tuner layers inherit from BaseTunerLayer which provides key functionality:

Key Methods

MethodPurpose
get_base_layer()Retrieves the underlying base layer (quantized or not)
update_layer()Updates adapter weights for existing layers
merge()Merges adapter weights into base layer
unmerge()Separates merged adapter weights
if isinstance(target, BaseTunerLayer):
    target_base_layer = target.get_base_layer()
else:
    target_base_layer = target

Sources: src/peft/tuners/ia3/model.py:34-37

Adapter Management with Quantization

Creating New Modules

When creating new adapter modules for quantized layers:

  1. Detect the quantization state from the base layer
  2. Preserve quantization parameters (thresholds, compute dtype, etc.)
  3. Create appropriate quantized-aware adapter layer
sequenceDiagram
    participant Base as Base Model (Quantized)
    participant PEFT as PEFT System
    participant Adapter as Adapter Layer
    
    Base->>PEFT: Target Linear Layer
    PEFT->>PEFT: Detect 8-bit / 4-bit quantization
    PEFT->>Adapter: Create with quantization state
    Adapter->>Base: Store reference + quantization params

Multiple Adapters

PEFT supports multiple adapters on quantized models through the active_adapters mechanism:

# Adding additional adapters to quantized model
if adapter_name not in self.active_adapters:
    # adding an additional adapter: it is not automatically trainable
    new_module.requires_grad_(False)

Sources: src/peft/tuners/loha/model.py:1 Sources: src/peft/tuners/lokr/model.py:1

Memory Efficiency Considerations

Memory Breakdown

ComponentFull Precision8-bit4-bit
Base Model~70GB~35GB~18GB
Gradients~70GB~70GB~70GB
ActivationsVariableVariableVariable
Optimizer~280GB~280GB~280GB

Best Practices

  1. Use Gradient Checkpointing: Reduces activation memory at cost of extra compute
  2. Target Specific Modules: Only apply adapters to key layers (q_proj, v_proj)
  3. Batch Size: Start with small batch sizes and scale based on available memory
  4. Mixed Precision: Use bfloat16 for gradients when possible

Context Manager for Adapter Scaling

PEFT provides rescale_adapter_scale for temporarily adjusting adapter scaling:

@contextmanager
def rescale_adapter_scale(model, multiplier):
    """
    Context manager to temporarily rescale the scaling of the LoRA adapter.
    
    The original scaling values are restored when the context manager exits.
    """

Sources: src/peft/helpers.py:80-90

Error Handling

Common Issues

ErrorCauseSolution
TypeError on forwardQuantization state not preservedEnsure proper layer replacement
OOM during forwardBatch size too largeReduce batch size, use gradient checkpointing
Mismatched dtypesMixed precision issuesCast to consistent dtype before training

Verification Steps

  1. Verify quantization config is properly set
  2. Confirm adapter layers are correctly injected
  3. Check that gradient checkpointing is enabled for large models

Configuration Reference

BitsAndBytesConfig Options

ParameterTypeDefaultDescription
load_in_8bitboolFalseLoad model in 8-bit
load_in_4bitboolFalseLoad model in 4-bit
llm_int8_thresholdfloat6.0Outlier threshold for 8-bit
llm_int8_skip_modulesListNoneModules to skip 8-bit conversion
llm_int8_enable_fp32_cpu_offloadboolFalseEnable CPU offload for32-bit tensors

See Also

Sources: [src/peft/tuners/lora/model.py]()

Advanced Features

Related topics: Quantization Integration

Section Related Pages

Continue reading this section for the full explanation and source context.

Section Overview

Continue reading this section for the full explanation and source context.

Section Architecture

Continue reading this section for the full explanation and source context.

Section Supported Adapter Combinations

Continue reading this section for the full explanation and source context.

Related topics: Quantization Integration

Advanced Features

PEFT (Parameter-Efficient Fine-Tuning) provides a comprehensive suite of advanced features that extend beyond basic adapter-based fine-tuning. These features enable sophisticated model adaptation strategies, including mixed adapter configurations, runtime adapter switching, distributed training support, and advanced optimization techniques.

Mixed Adapter Models

Mixed adapter models allow multiple adapter types to coexist within a single base model. This powerful feature enables combining different fine-tuning techniques to leverage their respective strengths.

Overview

The mixed model architecture in PEFT allows a base model to have multiple adapters of different types applied simultaneously. This is particularly useful when different adapters excel at different aspects of a task, or when you want to experiment with combining adapter strengths.

The mixed model functionality is implemented across two primary modules:

ModuleFile PathPurpose
PeftMixedModelsrc/peft/mixed_model.pyBase mixed model class
MixedModelsrc/peft/tuners/mixed/model.pyTuner-specific mixed model implementation

Architecture

graph TD
    A[Base Model] --> B[Mixed Adapter Layer]
    B --> C[LoRA Adapter]
    B --> D[IA³ Adapter]
    B --> E[AdaLoRA Adapter]
    B --> N[Additional Adapters]
    
    F[Adapter Config 1] --> C
    G[Adapter Config 2] --> D
    H[Adapter Config 3] --> E
    
    I[Active Adapter Selection] --> B
    J[Multi-Adapter Inference] --> B

Supported Adapter Combinations

PEFT supports multiple tuner types that can be combined in mixed configurations:

Tuner TypePrefixDescription
LoRAlora_Low-Rank Adaptation
AdaLoRAadalora_Adaptive LoRA with budget allocation
IA³ia3_(IA)³ - Learnable input/output/residual scaling
OFToft_Orthogonal Fine-Tuning
HRAhra_Hypernetwork-based Rank Adaptation
HiRAhira_Hierarchical Rank Adaptation
SHiRAshira_Structured Hiera rchy-aware Rank Adaptation
GraLoRAgralora_Gradient-aware LoRA
MiSSmiss_Multi-adapter Image-to-Image Spatial Shift
AdaMSSadamss_Adaptive Multi-subspace Schur Complement
X-LoRAxlora_Extended LoRA with quantization support
Polypoly_Polynomial projection-based adaptation

Key Implementation Details

Each tuner in PEFT defines specific attributes that enable mixed adapter support:

# Common tuner model attributes
prefix: str  # Unique prefix for the tuner (e.g., "lora_", "ia3_")
tuner_layer_cls = SpecificLayerClass  # The layer class for this tuner
target_module_mapping = {...}  # Mapping of model types to target modules

The mixed model implementation handles adapter creation through the _create_and_replace method, which validates the current key and delegates to appropriate adapter-specific logic.

Sources: src/peft/tuners/shira/model.py:1-50 Sources: src/peft/tuners/mixed/model.py

Adapter Hotswap

The hotswap feature enables runtime replacement of adapters without requiring full model reload. This is essential for production environments where model availability must be maintained during adapter updates.

Purpose

Adapter hotswapping allows you to:

  • Replace a deployed adapter with an updated version
  • Switch between different fine-tuned adapters for different tasks
  • Update model capabilities without downtime
  • A/B test different adapter versions in production

Implementation

The hotswap functionality is implemented in src/peft/utils/hotswap.py and provides the hotswap_adapter function for runtime adapter replacement.

def hotswap_adapter(
    model: "PeftModel",
    model_name_or_path: str,
    adapter_name: str,
    torch_device: Optional[str] = None,
    **kwargs,
) -> None:

Parameters

ParameterTypeDescription
modelPeftModelThe PEFT model with the loaded adapter
model_name_or_pathstrPath or identifier for the new adapter
adapter_namestrName of the adapter to replace (e.g., "default")
torch_devicestr, optionalTarget device for adapter weights
**kwargsAdditional arguments for config/weight loading

Workflow

graph TD
    A[Load New Adapter Config] --> B[Validate Adapter Type]
    B --> C[Load Adapter Weights to Device]
    C --> D[Validate Weight Compatibility]
    D --> E[Replace Adapter Weights in Model]
    E --> F[Update Model State]
    F --> G[Model Ready for Inference]
    
    H[Inference with New Adapter] -.-> G

Usage Example

from peft import hotswap_adapter

# Replace the "default" lora adapter with a new one
hotswap_adapter(model, "path-to-new-adapter", adapter_name="default", torch_device="cuda:0")

# Use the updated model
with torch.inference_mode():
    output = model(inputs).logits

Configuration Validation

During hotswap, the system performs several validations:

  1. Config Loading: Loads the new adapter configuration using config_cls.from_pretrained()
  2. Type Matching: Ensures the new adapter type is compatible with existing adapters
  3. Weight Loading: Loads weights onto the specified device with appropriate quantization settings

Sources: src/peft/utils/hotswap.py:1-80 Sources: docs/source/developer_guides/checkpoint.md

Incremental PCA Utilities

PEFT includes incremental PCA utilities for advanced analysis and optimization of adapter matrices. Incremental PCA is particularly useful for:

  • Analyzing the rank structure of trained adapters
  • Identifying redundant parameters in low-rank adaptations
  • Computing principal components in a memory-efficient manner

Implementation

The incremental PCA implementation is located in src/peft/utils/incremental_pca.py. This utility supports processing large matrices in batches to avoid memory constraints.

Key Features

FeatureDescription
Batch ProcessingProcess large matrices incrementally
Memory EfficiencyAvoid loading entire matrices into memory
Rank AnalysisDetermine effective rank of adapter matrices
Component ExtractionExtract principal components for analysis

Use Cases

  1. Adapter Analysis: Understand the dimensionality requirements of trained adapters
  2. Compression: Identify opportunities for matrix rank reduction
  3. Quality Assessment: Verify that low-rank approximations maintain sufficient information

Sources: src/peft/utils/incremental_pca.py

Distributed Training Support

PEFT provides comprehensive support for distributed training frameworks, enabling efficient fine-tuning of large models across multiple devices and nodes.

DeepSpeed Integration

PEFT integrates with DeepSpeed ZeRO optimizations for memory-efficient distributed training.

#### Features

FeatureDescription
ZeRO Stage 2/3Partition optimizer states across devices
CPU OffloadOffload parameters/optimizer states to CPU
Activation CheckpointingReduce memory for activations
Mixed PrecisionFP16/BF16 training support

#### Configuration

from peft import LoraConfig, get_peft_model
from transformers import AutoModelForCausalLM

model = AutoModelForCausalLM.from_pretrained("meta-llama/Llama-2-7b-hf")
peft_config = LoraConfig(
    r=16,
    lora_alpha=32,
    target_modules=["q_proj", "v_proj"],
    lora_dropout=0.05,
)

peft_model = get_peft_model(model, peft_config)
# Train with DeepSpeed ZeRO-3 config

#### Key Considerations

  • Only non-trainable weights should remain on the original device when using PEFT with DeepSpeed
  • Trainable adapter weights are managed by DeepSpeed's optimizer partitioning
  • Offloading should be configured at the DeepSpeed level, not within PEFT configs

Sources: docs/source/accelerate/deepspeed.md

FSDP Integration

Fully Sharded Data Parallel (FSDP) support enables sharding model parameters, gradients, and optimizer states across GPUs.

#### Features

FeatureDescription
Parameter ShardingDistribute model parameters across GPUs
Gradient ShardingPartition gradients during backward pass
Optimizer ShardingDistribute optimizer states
Mixed PrecisionAutomatic FP16/BF16 handling

#### Configuration with Accelerate

# accelerate config.yaml
compute_environment: LOCAL_MACHINE
distributed_type: FSDP
fsdp_config:
  fsdp_sharding_strategy: FULL_SHARD
  fsdp_auto_wrap_policy: TRANSFORMER_BASED_WRAP
  fsdp_backward_prefetch: BACKWARD_PRE
  fsdp_state_dict_type: FULL_STATE_DICT

#### Compatibility Notes

  • FSDP support requires transformers>=4.36.0
  • Auto-wrap policies should wrap transformer layers containing PEFT adapters
  • State dict type should be FULL_STATE_DICT for checkpoint saving

Sources: docs/source/accelerate/fsdp.md

Advanced Tuner Configurations

AdaLoRA - Adaptive Budget Allocation

AdaLoRA implements an intelligent budget allocation strategy that dynamically adjusts the rank of different adapter matrices during training.

#### Training Workflow

graph TD
    A[Initialize with Uniform Rank] --> B[Forward Pass]
    B --> C[Calculate Importance Scores]
    C --> D{Global Step < Total - T_final?}
    D -->|Yes| E[Update Rank Pattern]
    E --> B
    D -->|No| F{Mask Unimportant Weights}
    F --> G[Finalize Adapter]

#### Key Parameters

ParameterDescription
rInitial rank for all adapters
total_stepTotal training steps
tinitSteps for initial warmup
tfinalSteps for final budget freezing
deltaTInterval between rank adjustments

Sources: src/peft/tuners/adalora/model.py:1-100

X-LoRA - Extended LoRA with Quantization

X-LoRA supports advanced configurations including quantization-aware training and multi-adapter loading.

#### Features

FeatureDescription
8-bit QuantizationLoad base models in int8 format
4-bit QuantizationLoad base models in int4 format
Flash AttentionIntegration with flash_attention_2
Ephemeral GPU OffloadTemporary GPU memory management
Multiple Adapter LoadingLoad multiple adapters simultaneously

#### Configuration

from peft import XLoraConfig, get_peft_model
from transformers import AutoModelForCausalLM, BitsAndBytesConfig
import torch

quantization_config = BitsAndBytesConfig(load_in_8bit=True)
model = AutoModelForCausalLM.from_pretrained(
    "mistralai/Mistral-7B-Instruct-v0.1",
    quantization_config=quantization_config,
    device_map="cuda:0",
)
xlora_model = get_peft_model(model, config)

Sources: src/peft/tuners/xlora/model.py:1-80

IA³ - Learned Initiation and Adaptation

The (IA)³ method applies learnable scaling vectors to key components of transformer models.

#### Target Modules

Model TypeTarget Modules
Encoder-onlyq_proj, v_proj, k_proj, output_proj
Decoder-onlyq_proj, v_proj, k_proj, output_proj, fc1
Seq2Seqq_proj, v_proj, k_proj, output_proj, fc1, fc2

#### Implementation Details

The IA³ implementation creates scaling vectors that are multiplied with the hidden states at specific positions in the forward pass. The scaling vectors are initialized to ones (neutral) and learned during training.

Sources: src/peft/tuners/ia3/model.py:1-80

Helper Functions

PEFT provides utility functions for common operations that enhance the developer experience.

Signature Management

#### update_forward_signature

Updates the forward signature of a PeftModel to include the base model's signature, enabling proper IDE autocompletion and documentation.

from peft import update_forward_signature

update_forward_signature(peft_model)
help(peft_model.forward)  # Now shows complete signature

#### update_generate_signature

Similar to forward signature update but for the generate method, essential for seq2seq models.

from peft import update_generate_signature

update_generate_signature(peft_model)
help(peft_model.generate)  # Now shows complete signature

Model Validation

#### check_if_peft_model

Validates whether a model path or identifier corresponds to a PEFT model by attempting to load its configuration.

from peft import check_if_peft_model

is_peft = check_if_peft_model("meta-llama/Llama-2-7b-adapter")
# Returns: True or False

Adapter Scale Context Manager

The rescale_adapter_scale context manager temporarily adjusts adapter scaling factors, useful for controlled inference experiments.

from peft.utils import rescale_adapter_scale

with rescale_adapter_scale(model, multiplier=0.5):
    output = model(inputs)  # Scaled by 0.5
# Original scaling restored after context exit

Sources: src/peft/helpers.py:1-150

Task-Specific Models

PEFT provides specialized model classes optimized for different task types.

Task TypeModel ClassUse Case
Feature ExtractionPeftModelForFeatureExtractionExtracting embeddings
Question AnsweringPeftModelForQuestionAnsweringQA tasks
Sequence ClassificationPeftModelForSequenceClassificationText classification
Token ClassificationPeftModelForTokenClassificationNER, POS tagging
Seq2Seq LMPeftModelForSeq2SeqLMTranslation, summarization

Common Initialization Pattern

All task-specific models follow a consistent initialization pattern:

def __init__(
    self,
    model: torch.nn.Module,
    peft_config: PeftConfig,
    adapter_name: str = "default",
    **kwargs,
) -> None:
    super().__init__(model, peft_config, adapter_name, **kwargs)

Each model class may add task-specific module name patterns for modules to save (e.g., classifier layers in sequence classification models).

Sources: src/peft/peft_model.py:1-200

Summary

PEFT's advanced features provide a comprehensive toolkit for parameter-efficient model adaptation:

CategoryFeatures
Mixed AdaptersMultiple adapter types per model
Runtime SwitchingAdapter hotswap without reload
Analysis ToolsIncremental PCA for matrix analysis
Distributed TrainingDeepSpeed ZeRO, FSDP support
Advanced TunersAdaLoRA, X-LoRA, IA³, OFT, and more
Developer UtilitiesSignature management, validation helpers

These features enable both research experimentation and production deployment of efficient fine-tuning solutions across a wide range of model architectures and training configurations.

Sources: [src/peft/tuners/shira/model.py:1-50](https://github.com/huggingface/peft/blob/main/src/peft/tuners/shira/model.py)

Doramagic Pitfall Log

Source-linked risks stay visible on the manual page so the preview does not read like a recommendation.

high [BUG] peft 0.19 target_modules (str) use `set`

Users may get misleading failures or incomplete behavior unless configuration is checked carefully.

high Comparison of Different Fine-Tuning Techniques for Conversational AI

The project may affect permissions, credentials, data exposure, or host boundaries.

medium Feature Request: Improve offline support for custom architectures in get_peft_model_state_dict

First-time setup may fail or require extra isolation and rollback planning.

medium 0.17.0: SHiRA, MiSS, LoRA for MoE, and more

Users may get misleading failures or incomplete behavior unless configuration is checked carefully.

Doramagic Pitfall Log

Doramagic extracted 16 source-linked risk signals. Review them before installing or handing real data to the project.

1. Configuration risk: [BUG] peft 0.19 target_modules (str) use `set`

  • Severity: high
  • Finding: Configuration risk is backed by a source signal: [BUG] peft 0.19 target_modules (str) use set. Treat it as a review item until the current version is checked.
  • User impact: Users may get misleading failures or incomplete behavior unless configuration is checked carefully.
  • Recommended check: Open the linked source, confirm whether it still applies to the current version, and keep the first run isolated.
  • Evidence: Source-linked evidence: https://github.com/huggingface/peft/issues/3229

2. Security or permission risk: Comparison of Different Fine-Tuning Techniques for Conversational AI

  • Severity: high
  • Finding: Security or permission risk is backed by a source signal: Comparison of Different Fine-Tuning Techniques for Conversational AI. Treat it as a review item until the current version is checked.
  • User impact: The project may affect permissions, credentials, data exposure, or host boundaries.
  • Recommended check: Open the linked source, confirm whether it still applies to the current version, and keep the first run isolated.
  • Evidence: Source-linked evidence: https://github.com/huggingface/peft/issues/2310

3. Installation risk: Feature Request: Improve offline support for custom architectures in get_peft_model_state_dict

  • Severity: medium
  • Finding: Installation risk is backed by a source signal: Feature Request: Improve offline support for custom architectures in get_peft_model_state_dict. Treat it as a review item until the current version is checked.
  • User impact: First-time setup may fail or require extra isolation and rollback planning.
  • Recommended check: Open the linked source, confirm whether it still applies to the current version, and keep the first run isolated.
  • Evidence: Source-linked evidence: https://github.com/huggingface/peft/issues/3211

4. Configuration risk: 0.17.0: SHiRA, MiSS, LoRA for MoE, and more

  • Severity: medium
  • Finding: Configuration risk is backed by a source signal: 0.17.0: SHiRA, MiSS, LoRA for MoE, and more. Treat it as a review item until the current version is checked.
  • User impact: Users may get misleading failures or incomplete behavior unless configuration is checked carefully.
  • Recommended check: Open the linked source, confirm whether it still applies to the current version, and keep the first run isolated.
  • Evidence: Source-linked evidence: https://github.com/huggingface/peft/releases/tag/v0.17.0

5. Configuration risk: Applying Dora to o_proj of Meta-Llama-3.1-8B results in NaN

  • Severity: medium
  • Finding: Configuration risk is backed by a source signal: Applying Dora to o_proj of Meta-Llama-3.1-8B results in NaN. Treat it as a review item until the current version is checked.
  • User impact: Users may get misleading failures or incomplete behavior unless configuration is checked carefully.
  • Recommended check: Open the linked source, confirm whether it still applies to the current version, and keep the first run isolated.
  • Evidence: Source-linked evidence: https://github.com/huggingface/peft/issues/2049

6. Capability assumption: README/documentation is current enough for a first validation pass.

  • Severity: medium
  • Finding: README/documentation is current enough for a first validation pass.
  • User impact: The project should not be treated as fully validated until this signal is reviewed.
  • Recommended check: Open the linked source, confirm whether it still applies to the current version, and keep the first run isolated.
  • Evidence: capability.assumptions | github_repo:570384908 | https://github.com/huggingface/peft | README/documentation is current enough for a first validation pass.

7. Project risk: 0.17.1

  • Severity: medium
  • Finding: Project risk is backed by a source signal: 0.17.1. Treat it as a review item until the current version is checked.
  • User impact: The project should not be treated as fully validated until this signal is reviewed.
  • Recommended check: Open the linked source, confirm whether it still applies to the current version, and keep the first run isolated.
  • Evidence: Source-linked evidence: https://github.com/huggingface/peft/releases/tag/v0.17.1

8. Project risk: v0.15.1

  • Severity: medium
  • Finding: Project risk is backed by a source signal: v0.15.1. Treat it as a review item until the current version is checked.
  • User impact: The project should not be treated as fully validated until this signal is reviewed.
  • Recommended check: Open the linked source, confirm whether it still applies to the current version, and keep the first run isolated.
  • Evidence: Source-linked evidence: https://github.com/huggingface/peft/releases/tag/v0.15.1

9. Project risk: v0.15.2

  • Severity: medium
  • Finding: Project risk is backed by a source signal: v0.15.2. Treat it as a review item until the current version is checked.
  • User impact: The project should not be treated as fully validated until this signal is reviewed.
  • Recommended check: Open the linked source, confirm whether it still applies to the current version, and keep the first run isolated.
  • Evidence: Source-linked evidence: https://github.com/huggingface/peft/releases/tag/v0.15.2

10. Maintenance risk: 0.16.0: LoRA-FA, RandLoRA, C³A, and much more

  • Severity: medium
  • Finding: Maintenance risk is backed by a source signal: 0.16.0: LoRA-FA, RandLoRA, C³A, and much more. Treat it as a review item until the current version is checked.
  • User impact: Users cannot judge support quality until recent activity, releases, and issue response are checked.
  • Recommended check: Open the linked source, confirm whether it still applies to the current version, and keep the first run isolated.
  • Evidence: Source-linked evidence: https://github.com/huggingface/peft/releases/tag/v0.16.0

11. Maintenance risk: Maintainer activity is unknown

  • Severity: medium
  • Finding: Maintenance risk is backed by a source signal: Maintainer activity is unknown. Treat it as a review item until the current version is checked.
  • User impact: Users cannot judge support quality until recent activity, releases, and issue response are checked.
  • Recommended check: Open the linked source, confirm whether it still applies to the current version, and keep the first run isolated.
  • Evidence: evidence.maintainer_signals | github_repo:570384908 | https://github.com/huggingface/peft | last_activity_observed missing

12. Security or permission risk: no_demo

  • Severity: medium
  • Finding: no_demo
  • User impact: The project may affect permissions, credentials, data exposure, or host boundaries.
  • Recommended check: Open the linked source, confirm whether it still applies to the current version, and keep the first run isolated.
  • Evidence: downstream_validation.risk_items | github_repo:570384908 | https://github.com/huggingface/peft | no_demo; severity=medium

Source: Doramagic discovery, validation, and Project Pack records

Community Discussion Evidence

These external discussion links are review inputs, not standalone proof that the project is production-ready.

Sources 12

Count of project-level external discussion links exposed on this manual page.

Use Review before install

Open the linked issues or discussions before treating the pack as ready for your environment.

Community Discussion Evidence

Doramagic exposes project-level community discussion separately from official documentation. Review these links before using peft with real data or production workflows.

Source: Project Pack community evidence and pitfall evidence