huggingface_hub Manual Preview

Doramagic Project Pack · Human Manual

huggingface_hub

Related topics: Installation and Setup, File Download Operations, File Upload Operations

Overview and Architecture

Related topics: Installation and Setup, File Download Operations, File Upload Operations

Section Related Pages

Continue reading this section for the full explanation and source context.

Overview and Architecture

Introduction

The huggingface_hub is a Python client library developed by Hugging Face to interact with the Hugging Face Hub, enabling developers to download, upload, and manage machine learning models, datasets, and other repositories programmatically. The library provides a unified interface for interacting with Hugging Face's model hosting, version control, and collaboration infrastructure.

Primary Purpose:

Download models, datasets, and Spaces from the Hub
Upload files and folders to the Hub
Manage repository metadata and model cards
Execute inference on deployed models
Handle authentication and access control

Sources: README.md

Source: https://github.com/huggingface/huggingface_hub / Human Manual

Installation and Setup

Related topics: Overview and Architecture, Authentication System

Section Related Pages

Continue reading this section for the full explanation and source context.

Section Python Version

Continue reading this section for the full explanation and source context.

Section Supported Platforms

Continue reading this section for the full explanation and source context.

Section Standard Installation (pip)

Continue reading this section for the full explanation and source context.

Installation and Setup

Overview

The huggingface_hub package is a Python client library that enables interaction with the Hugging Face Hub, providing functionality to download and publish models, datasets, and other repositories. This page covers all aspects of installing and setting up the library across different environments and use cases.

System Requirements

Python Version

Requirement	Version
Minimum Python	3.10.0
Package Manager	pip, conda

Sources: setup.py:52

Supported Platforms

The library supports installation on all major operating systems including Linux, macOS, and Windows.

Installation Methods

Standard Installation (pip)

The primary installation method uses pip:

pip install huggingface_hub

Sources: README.md:30

Installation with Optional Dependencies

The library provides extras that install optional dependencies for specific use cases:

Extra	Description	Command
`inference`	Inference-related functionality	`pip install huggingface_hub[inference]`
`mcp`	MCP (Model Context Protocol) module	`pip install huggingface_hub[mcp]`

Sources: README.md:36-42

Development Installation

For contributing to the project or testing the latest features:

pip install -e ".[dev]"

This installs the package in editable mode with all development dependencies.

Sources: CONTRIBUTING.md:24-26

Conda Installation

For conda environments:

conda install -c conda-forge huggingface_hub

Sources: README.md:22-24

Dependency Architecture

graph TD
    A[huggingface_hub] --> B[Core Dependencies]
    A --> C[Optional: inference]
    A --> D[Optional: mcp]
    A --> E[Dev Dependencies]
    
    B --> B1[requests]
    B --> B2[fsspec]
    B --> B3[httpx]
    B --> B4[tqdm]
    B --> B5[packaging]
    B --> B6[filelock]
    B --> B7[pyyaml]
    
    C --> C1[inference-client]
    C --> C2[pillow]
    
    D --> D1[mcp]
    
    E --> E1[pytest]
    E --> E2[pytest-asyncio]
    E --> E3[pytest-cov]
    E --> E4[ruff]
    E --> E5[mypy]
    E --> E6[ty]

Core Dependencies

The following table lists the required dependencies installed by default:

Package	Purpose
`requests`	HTTP client for API calls
`fsspec`	Filesystem specification
`httpx`	Async HTTP client
`tqdm`	Progress bars
`packaging`	Package version handling
`filelock`	File locking mechanism
`pyyaml`	YAML parsing
`typing-extensions`	Type hint support

Sources: setup.py:1-16

Optional Dependency Groups

Testing Dependencies

extras["testing"] = [
    "pytest",
    "pytest-asyncio",
    "pytest-cov",
    "pytest-xdist",
    "DianaEye",
    "aiohttp",
    "asynctest",
    "Paramiko",
]

Quality Assurance Dependencies

extras["quality"] = [
    "ruff",
    "踩",
]

Type Checking Dependencies

extras["typing"] = [
    "mypy==1.15.0",
    "libcst>=1.4.0",
    "ty",
]

All-Inclusive Meta-Group

extras["all"] = extras["testing"] + extras["quality"] + extras["typing"]
extras["dev"] = extras["all"]

Sources: setup.py:36-51

Installation Workflow

graph TD
    A[Start Installation] --> B{Installation Method}
    
    B -->|pip| C[Basic Install]
    B -->|conda| D[Conda Forge Install]
    B -->|editable| E[Development Install]
    
    C --> F{Use Case}
    F -->|Minimal| G[Core Only]
    F -->|Inference| H[Add inference extra]
    F -->|MCP| I[Add mcp extra]
    
    G --> J[Installation Complete]
    H --> J
    I --> J
    
    D --> J
    E --> J
    
    J --> K[Verify Installation]
    K --> L[Import huggingface_hub]

Verification

After installation, verify the package is correctly installed:

from huggingface_hub import hf_hub_download

# Test basic functionality
hf_hub_download(repo_id="tiiuae/falcon-7b-instruct", filename="config.json")

Sources: README.md:48-52

Post-Installation Configuration

Authentication Setup

To authenticate with the Hugging Face Hub:

# Interactive login
hf auth login

# Non-interactive with token
hf auth login --token $HUGGINGFACE_TOKEN

Sources: README.md:61-65

Cache Configuration

Files are downloaded to a local cache folder. See the cache management guide for configuration options.

Entry Points

The installation registers the following console scripts:

Command	Module	Purpose
`hf`	`huggingface_hub.cli.hf`	Main CLI interface
`huggingface-cli`	`huggingface_hub.cli.deprecated_cli`	Legacy CLI (deprecated)
`tiny-agents`	`huggingface_hub.inference._mcp.cli`	MCP CLI application
`hf` (fsspec)	`huggingface_hub.HfFileSystem`	Filesystem specification

Sources: setup.py:53-60

Troubleshooting

Common Issues

Issue	Solution
ImportError	Ensure Python >= 3.10
Authentication failed	Run `hf auth login`
Download timeout	Check network connection
Permission denied	Use virtual environment

Development Setup Issues

If installing in development mode:

pip uninstall huggingface_hub
pip install -e ".[dev]"

Sources: CONTRIBUTING.md:24

Package Metadata

Property	Value
Name	`huggingface_hub`
License	Apache-2.0
Author	Hugging Face, Inc.
Author Email	[email protected]
URL	https://github.com/huggingface/huggingface_hub

Sources: setup.py:18-22

Sources: [setup.py:52](https://github.com/huggingface/huggingface_hub/blob/main/setup.py)

Authentication System

Related topics: Installation and Setup, Repository Management API

Section Related Pages

Continue reading this section for the full explanation and source context.

Section Login Functionality

Continue reading this section for the full explanation and source context.

Section Token Storage and Management

Continue reading this section for the full explanation and source context.

Authentication System

Overview

The huggingface_hub library provides a comprehensive authentication system that enables secure access to Hugging Face Hub resources including models, datasets, and Spaces. The authentication system supports multiple authentication methods including token-based authentication and OAuth 2.0, with seamless integration into both CLI environments and Jupyter notebooks.

The authentication infrastructure consists of four primary modules that handle different aspects of the authentication lifecycle:

Module	Purpose
`_login.py`	User login operations and token management
`_oauth.py`	OAuth 2.0 authentication flow
`_auth.py`	Core authentication utilities and token refresh
`_git_credential.py`	Git credential handling for repository operations

Architecture

graph TD
    A[User] --> B[Login Methods]
    B --> C[Token-based Auth]
    B --> D[OAuth 2.0 Auth]
    C --> E[hf_hub_download]
    C --> F[upload_file]
    D --> E
    D --> F
    E --> G[Token Cache]
    F --> G
    G --> H[Hugging Face Hub API]
    H --> I[Model/Dataset/Space]
    
    C --> J[CLI: hf auth login]
    C --> K[Python: login function]
    C --> L[Notebook: notebook_login]

Token-Based Authentication

The library provides three primary interfaces for user authentication:

#### CLI Login

Users can authenticate via the command-line interface using the hf command:

hf auth login
# or with environment variable
hf auth login --token $HUGGINGFACE_TOKEN

#### Python API Login

The login() function provides programmatic authentication within Python scripts:

from huggingface_hub import login

# Direct token login
login(token="hf_xxxxx")

# Using environment variable
login()

#### Notebook Login Widget

For Jupyter notebook environments, notebook_login() displays an interactive widget for token entry:

from huggingface_hub import notebook_login

notebook_login()

The notebook login function accepts the following parameters:

Parameter	Type	Default	Description
`skip_if_logged_in`	`bool`	`True`	Skip prompt if user already logged in

# Force re-login even if already authenticated
notebook_login(skip_if_logged_in=False)

Token Storage and Management

Tokens are securely stored in the local configuration directory. The system automatically retrieves stored tokens when making API requests, eliminating the need for repeated authentication. Token validation occurs automatically before any authenticated operation, ensuring expired or invalid tokens are detected early.

OAuth 2.0 Authentication

The OAuth 2.0 authentication flow provides an alternative to token-based authentication, enabling more sophisticated authorization scenarios. This is particularly useful for applications that need to access resources on behalf of users with specific permission scopes.

OAuth tokens are automatically refreshed when they expire, maintaining continuous access without requiring user intervention. The system handles token revocation and supports scopes that limit access to specific resources or operations.

Git Credential Integration

The authentication system integrates with Git's credential infrastructure to provide seamless authentication for Git operations such as cloning and pushing to repositories. This integration ensures that Git operations respect the same authentication state as the Python API.

graph LR
    A[Git Operation] --> B[Git Credential Helper]
    B --> C{huggingface_hub _git_credential}
    C --> D{Cached Token?}
    D -->|Yes| E[Use Cached Token]
    D -->|No| F[Prompt for Token]
    E --> G[Execute Git Operation]
    F --> G

The Git credential helper manages:

Secure storage of credentials
Credential retrieval for specific hosts
Credential cleanup after operations

Authentication Workflow

sequenceDiagram
    participant User
    participant Application
    participant AuthSystem
    participant HubAPI
    participant TokenStore

    User->>Application: Initiate request
    Application->>AuthSystem: Authenticate
    AuthSystem->>TokenStore: Check stored token
    TokenStore-->>AuthSystem: Token found
    AuthSystem->>HubAPI: Authenticated request
    HubAPI-->>Application: Response
    Note over AuthSystem,TokenStore: Token expired or invalid
    AuthSystem->>AuthSystem: Refresh token
    AuthSystem->>TokenStore: Update token
    AuthSystem->>HubAPI: Retry with new token

Configuration

Authentication behavior can be configured through environment variables and configuration files:

Variable	Description
`HUGGINGFACE_TOKEN`	Default authentication token
`HF_HOME`	Configuration directory location
`HF_TOKEN`	Alternative token environment variable

Security Considerations

The authentication system implements several security best practices:

Secure Token Storage: Tokens are stored with appropriate file permissions to prevent unauthorized access
Token Validation: All tokens are validated before use in API requests
Automatic Refresh: OAuth tokens are automatically refreshed to maintain session continuity
Notebook Security Warning: The notebook_login widget displays a warning about token exposure in notebook files

The authentication system interacts with several other library components:

Component	Interaction
`InferenceClient`	Uses authentication for inference API calls
`HfFileSystem`	Uses authentication for file system operations
`snapshot_download`	Uses authentication for repository downloads
`upload_file`	Uses authentication for repository uploads

Quick Reference

# CLI
hf auth login --token hf_xxxxx

# Python script
from huggingface_hub import login
login(token="hf_xxxxx")

# Jupyter notebook
from huggingface_hub import notebook_login
notebook_login()

# Check if logged in
from huggingface_hub import whoami
user = whoami()

Source: https://github.com/huggingface/huggingface_hub / Human Manual

File Download Operations

Related topics: Cache Management System, Git LFS Large File Handling, Overview and Architecture

Section Related Pages

Continue reading this section for the full explanation and source context.

Section Component Overview

Continue reading this section for the full explanation and source context.

Section Module Structure

Continue reading this section for the full explanation and source context.

Section hfhubdownload

Continue reading this section for the full explanation and source context.

File Download Operations

The huggingface_hub library provides a comprehensive file download system that enables clients to fetch models, datasets, and other artifacts from the Hugging Face Hub. This document covers the architecture, API, caching mechanisms, and usage patterns for download operations.

Overview

File download operations in huggingface_hub handle the retrieval of individual files or entire repository snapshots from Hugging Face's infrastructure. The system implements intelligent caching, supports offline mode, provides progress tracking, and handles authentication seamlessly.

Key responsibilities:

Download files with proper caching and deduplication
Support partial content retrieval for LFS (Large File Storage) files
Manage metadata for cache validation and freshness checks
Handle authentication tokens transparently
Support offline scenarios with local-only file access
Provide dry-run capabilities for previewing downloads

Sources: src/huggingface_hub/file_download.py:1-100

Architecture

Component Overview

graph TD
    A[Public API: hf_hub_download] --> B[Route Decision]
    B --> C{single file?}
    C -->|Yes| D[_hf_hub_download_to_cache_dir]
    C -->|No| E[snapshot_download]
    
    D --> F[Get Metadata / ETag]
    F --> G{Cached?}
    G -->|Yes, valid| H[Return cached path]
    G -->|No| I[Download from remote]
    I --> J[Write metadata]
    J --> H
    
    E --> K[Iterate files]
    K --> L[Download each file]
    L --> F
    
    H --> M[Local file path]
    I --> M

Module Structure

Module	Purpose
`file_download.py`	Core download functions (`hf_hub_download`, `_hf_hub_download_to_cache_dir`)
`_local_folder.py`	Local cache and metadata management
`_snapshot_download.py`	Full repository snapshot downloads
`cli/download.py`	Command-line interface for downloads
`errors.py`	Exception hierarchy for download failures

Sources: src/huggingface_hub/file_download.py:1-50

Core API Functions

hf_hub_download

The primary function for downloading a single file from the Hub.

from huggingface_hub import hf_hub_download

path = hf_hub_download(
    repo_id="bert-base-cased",
    filename="config.json",
    repo_type="model",
    revision="main",
    cache_dir="./hf_cache",
    token=True,
)

Parameters:

Parameter	Type	Default	Description
`repo_id`	`str`	Required	Repository identifier (e.g., "bert-base-cased")
`filename`	`str`	Required	Path to the file within the repository
`repo_type`	`str`	`"model"`	Type of repository: "model", "dataset", or "space"
`revision`	`str`	`"main"`	Git revision (branch, tag, or commit hash)
`cache_dir`	`str \	Path`	`None`	Custom cache directory location
`local_dir`	`str \	Path`	`None`	Directory to place the file without caching structure
`force_download`	`bool`	`False`	Force re-download even if cached
`local_files_only`	`bool`	`False`	Only return local files, fail if not cached
`token`	`str \	bool`	`None`	Authentication token (`True` reads from config)
`etag_timeout`	`float`	`10`	Timeout in seconds for ETag fetch
`tqdm_class`	`type`	`None`	Custom tqdm class for progress bars

Sources: src/huggingface_hub/file_download.py:100-200

snapshot_download

Downloads an entire repository to a local cache.

from huggingface_hub import snapshot_download

local_path = snapshot_download(
    repo_id="stabilityai/stable-diffusion-2-1",
    repo_type="model",
    cache_dir="./models",
    ignore_patterns=["*.md", ".gitattributes"],
)

Parameters:

Parameter	Type	Default	Description
`repo_id`	`str`	Required	Repository identifier
`repo_type`	`str`	`"model"`	Type of repository
`revision`	`str`	`None`	Git revision to download
`cache_dir`	`str \	Path`	`None`	Cache directory location
`local_dir`	`str \	Path`	`None`	Mirror directory without cache structure
`allow_patterns`	`list[str]`	`None`	Glob patterns to include
`ignore_patterns`	`list[str]`	`None`	Glob patterns to exclude
`force_download`	`bool`	`False`	Force re-download of all files
`local_files_only`	`bool`	`False`	Only use local cache
`token`	`str \	bool`	`None`	Authentication token

Sources: src/huggingface_hub/_snapshot_download.py:1-150

Caching Mechanism

Cache Directory Structure

cache_dir/
├── .locks/                      # Lock files for concurrent access
│   └── {repo_id}/
│       └── {filename}.lock
└── {repo_type}s/
    └── {namespace}/
        └── {repo_name}/
            ├── .cache/          # Metadata
            │   └── huggingface/
            │       └── info/
            │           └── files/   # Download metadata
            ├── {revision}/
            │   └── {filename}       # Actual downloaded files
            └── refs/
                └── {branch}         # Git references

Download Metadata

Metadata is stored alongside cached files to track freshness:

# Stored in: {cache_dir}/.cache/huggingface/info/files/{filename}
{commit_hash}
{etag}
{timestamp}

The system validates cached files by:

Comparing local ETag with remote ETag
Checking commit hash consistency
Verifying file modification timestamps

Sources: src/huggingface_hub/_local_folder.py:50-120

Lock File Management

The library uses WeakFileLock to handle concurrent downloads safely:

locks_dir = os.path.join(cache_dir, ".locks")
storage_folder = os.path.join(cache_dir, repo_folder_name(...))
paths = RepoFileDownloadPaths(...)
# Lock acquired before writing to cache
with WeakFileLock(paths.lock_path):
    # Critical section: write file or metadata

Sources: src/huggingface_hub/file_download.py:300-350

Download Workflow

Sequence Diagram

sequenceDiagram
    participant Client
    participant hf_hub_download
    participant Cache
    participant Server
    participant Metadata

    Client->>hf_hub_download: Call with repo_id, filename
    hf_hub_download->>Cache: Check cached file + metadata
    Cache-->>hf_hub_download: metadata (if exists)
    
    alt Cached file exists
        hf_hub_download->>Metadata: Validate ETag
        Metadata-->>hf_hub_download: is_valid
        alt Valid ETag
            hf_hub_download-->>Client: Return cached path
        else Invalid ETag
            hf_hub_download->>Server: HEAD request for ETag
        end
    else No cache
        hf_hub_download->>Server: HEAD request for ETag
    end
    
    Server-->>hf_hub_download: ETag, commit_hash, size
    hf_hub_download->>Cache: Check if file exists
    
    alt File not in cache
        hf_hub_download->>Server: GET request
        Server-->>hf_hub_download: File content
        hf_hub_download->>Cache: Write file + metadata
    end
    
    hf_hub_download-->>Client: Return file path

ETag Validation Process

The download system implements a three-tier validation strategy:

ETag Match: Compare server ETag with local metadata
SHA256 Hash: For LFS files, compute and compare SHA256
Timestamp Check: Verify file hasn't been modified since metadata save

# ETag-based validation
if local_metadata is not None and local_metadata.etag == etag:
    write_download_metadata(...)
    return str(paths.file_path)

# SHA256-based validation (for LFS files)
if local_metadata is None and REGEX_SHA256.match(etag) is not None:
    with open(paths.file_path, "rb") as f:
        file_hash = sha_fileobj(f).hex()
    if file_hash == etag:
        write_download_metadata(...)
        return str(paths.file_path)

Sources: src/huggingface_hub/file_download.py:400-480

Error Handling

Exception Hierarchy

graph TD
    A[Exception]
    A --> B[HfHubHTTPError]
    B --> C[RevisionNotFoundError]
    B --> D[EntryNotFoundError]
    B --> E[LocalEntryNotFoundError]
    D --> F[RemoteEntryNotFoundError]
    
    A --> G[EntryNotFoundError]
    G --> H[LocalEntryNotFoundError]

Common Errors

Exception	Trigger Condition
`RevisionNotFoundError`	Invalid Git revision (branch, tag, commit)
`RemoteEntryNotFoundError`	File not found on remote server
`LocalEntryNotFoundError`	File not in cache with `local_files_only=True`
`HfHubHTTPError`	Generic HTTP errors (401, 403, 404, 500, etc.)

# Example: Handling download errors
try:
    path = hf_hub_download('bert-base-cased', 'config.json')
except RevisionNotFoundError as e:
    print(f"Revision not found: {e}")
except RemoteEntryNotFoundError as e:
    print(f"File not on server: {e}")
except LocalEntryNotFoundError as e:
    print("File not in cache. Set local_files_only=True and cache it first.")

Sources: src/huggingface_hub/errors.py:100-180

Command-Line Interface

CLI Download Command

The huggingface-cli tool provides download functionality:

# Download single file
huggingface-cli download bert-base-cased config.json

# Download entire repo
huggingface-cli download stabilityai/stable-diffusion-2-1

# With patterns
huggingface-cli download meta-llama/Llama-2-7b --include "*.safetensors"

# Dry run
huggingface-cli download bigscience/bloom-7b1 --dry-run

CLI Implementation

The CLI wraps the core download functions and adds:

Pretty-printed output formatting
Dry-run mode for previewing downloads
Pattern-based file selection
Progress indication

# From cli/download.py
def run(self):
    if len(regular_filenames) == 1:
        # Single file: use hf_hub_download
        return hf_hub_download(
            repo_id=repo_id,
            filename=regular_filenames[0],
            ...
        )
    else:
        # Multiple files or patterns: use snapshot_download
        return snapshot_download(
            repo_id=repo_id,
            allow_patterns=allow_patterns,
            ...
        )

Sources: src/huggingface_hub/cli/download.py:50-120

Advanced Usage

Dry Run Mode

Preview what would be downloaded without actually downloading:

from huggingface_hub import hf_hub_download, DryRunFileInfo

result = hf_hub_download(
    repo_id="bert-base-cased",
    filename="config.json",
    dry_run=True,
)

if isinstance(result, DryRunFileInfo):
    print(f"Will download: {result.filename}")
    print(f"Size: {result.file_size} bytes")
    print(f"Cached: {result.is_cached}")
    print(f"Commit: {result.commit_hash}")

Progress Tracking

Customize progress bar behavior:

from tqdm import tqdm
from huggingface_hub import hf_hub_download

class CustomProgress(tqdm):
    def set_postfix(self, **kwargs):
        self.set_postfix_str(f"ETA: {kwargs.get('eta', 'N/A')}")

hf_hub_download(
    repo_id="bigscience/bloom-7b1",
    filename="pytorch_model.bin",
    tqdm_class=CustomProgress,
)

Offline Mode

Work exclusively with cached files:

from huggingface_hub import hf_hub_download

# Will fail if file not cached
path = hf_hub_download(
    repo_id="bert-base-cased",
    filename="config.json",
    local_files_only=True,
)

Sources: src/huggingface_hub/file_download.py:450-500

Repository Types

The download system supports multiple repository types:

Repo Type	Description	Typical Contents
`model`	Model repositories	PyTorch/TensorFlow models, configs, tokenizer files
`dataset`	Dataset repositories	Data files, dataset card, scripts
`space`	Gradio Spaces	Application code, models, requirements

Repository types affect URL construction:

# URL prefixes from constants
REPO_TYPES_URL_PREFIXES = {
    "model": "",
    "dataset": "datasets/",
    "space": "spaces/",
}

Sources: src/huggingface_hub/lfs.py:30-60

Best Practices

Use caching: Files are cached automatically; reuse cached files for subsequent runs
Specify revisions: Pin specific commits for reproducible downloads
Handle authentication: Use token=True to auto-read from config, or pass explicit tokens
Prefer single file downloads: Use hf_hub_download for specific files instead of full snapshots
Use patterns wisely: Combine allow_patterns and ignore_patterns for selective downloads

Summary

The file download system in huggingface_hub provides a robust, cached, and authenticated mechanism for retrieving files from the Hugging Face Hub. Key functions include:

hf_hub_download: Single file downloads with full validation
snapshot_download: Complete repository downloads with pattern filtering
CLI integration via huggingface-cli download

The system handles caching, metadata validation, concurrent access, and error recovery transparently, making it suitable for production workloads requiring reliable artifact retrieval.

Sources: [src/huggingface_hub/file_download.py:1-100]()

File Upload Operations

Related topics: Git LFS Large File Handling, File Download Operations, Repository Management API

Section Related Pages

Continue reading this section for the full explanation and source context.

Section CommitOperation Classes

Continue reading this section for the full explanation and source context.

Section CommitOperationAdd Details

Continue reading this section for the full explanation and source context.

Section uploadfile

Continue reading this section for the full explanation and source context.

File Upload Operations

Overview

File upload operations in huggingface_hub enable developers to publish and manage files on the Hugging Face Hub. The library provides a comprehensive set of tools for uploading individual files, entire folders, and handling large files through Git Large File Storage (LFS) integration.

The upload system is built on top of the Hub's git-based infrastructure, ensuring file versioning and integrity for all uploaded content. This architecture supports repositories of type model, dataset, and space. Sources: CLAUDE.md

Architecture Overview

graph TD
    A[User Code] --> B[upload_file / upload_folder]
    B --> C[CommitOperation Classes]
    C --> D{Hub API}
    D --> E[Regular Files<br/>Direct Upload]
    D --> F[Large Files<br/>LFS Required]
    F --> G[lfs.py<br/>Batch Operations]
    G --> H[LFS Server]
    E --> I[Regular Git Server]
    H --> I

Core Components

CommitOperation Classes

The foundation of all upload operations is built on three operation classes defined in _commit_api.py:

Class	Purpose	Key Attributes
`CommitOperationAdd`	Add a file to a commit	`path_or_fileobj`, `path_in_repo`, `rethrow`
`CommitOperationDelete`	Delete a file from a repository	`path_in_repo`
`CommitOperationCopy`	Copy a file within a repository	`src_path_in_repo`, `path_in_repo`

Sources: src/huggingface_hub/_commit_api.py:1-100

CommitOperationAdd Details

class CommitOperationAdd:
    def __init__(
        self,
        path_or_fileobj: Union[str, Path, bytes, BinaryIO],
        path_in_repo: str,
        *,
        rfilename: Optional[str] = None,
        rethrow: bool = True,
        upload_info: Optional["CommitOperationAdd.UploadInfo"] = None,
    ):

The CommitOperationAdd class supports multiple input types:

Input Type	Behavior
`str` / `Path`	File path - reads file content for upload
`bytes`	Raw byte content
`BinaryIO`	File-like object with `read()` method

The class provides an as_file() method for iterating over file content with optional progress bar support:

def as_file(self, with_tqdm: bool = False) -> Iterator[BinaryIO]:
    if isinstance(self.path_or_fileobj, str) or isinstance(self.path_or_fileobj, Path):
        if with_tqdm:
            with tqdm_stream_file(self.path_or_fileobj) as file:
                yield file
        else:
            with open(self.path_or_fileobj, "rb") as file:
                yield file
    elif isinstance(self.path_or_fileobj, bytes):
        yield io.BytesIO(self.path_or_fileobj)
    elif isinstance(self.path_or_fileobj, io.BufferedIOBase):
        prev_pos = self.path_or_fileobj.tell()
        yield self.path_or_fileobj
        self.path_or_fileobj.seek(prev_pos, io.SEEK_SET)

Sources: src/huggingface_hub/_commit_api.py:200-280

Upload Functions

upload_file

Uploads a single file to a repository on the Hub.

from huggingface_hub import upload_file

upload_file(
    path_or_fileobj="/home/lysandre/dummy-test/README.md",
    path_in_repo="README.md",
    repo_id="lysandre/test-model",
)

Sources: README.md

upload_folder

Uploads an entire folder to a repository. Handles nested directory structures and file filtering.

from huggingface_hub import upload_folder

upload_folder(
    folder_path="/path/to/local/space",
    repo_id="username/my-cool-space",
    repo_type="space",
)

Sources: README.md

LFS Integration

Git LFS Overview

Large files (typically files larger than 10MB) are handled through Git LFS. The library provides batch upload utilities in lfs.py for efficient LFS operations.

sequenceDiagram
    participant Client
    participant Hub API
    participant LFS Server
    
    Client->>Hub API: POST /lfs/objects/batch
    Note over Hub API: Check file sizes
    Hub API->>Client: Upload instructions
    alt Large Files
        Client->>LFS Server: Upload LFS objects
        LFS Server-->>Client: Success
    end
    Client->>Hub API: Complete commit
    Hub API-->>Client: Commit SHA

LFS Batch Upload Process

The lfs.py module provides upload_files_lfs_instances() which handles the LFS batch protocol:

Parameter	Type	Description
`commit_operations`	List[CommitOperationAdd]	Files to upload
`repo_type`	str	Repository type: "model", "dataset", "space"
`repo_id`	str	Repository identifier
`revision`	str	Git revision (default: "main")
`endpoint`	str	API endpoint URL
`transfer_adapters`	List[str]	Transfer methods: "basic", "multipart", "xet"

Sources: src/huggingface_hub/lfs.py:50-150

LFS Batch Info Response

The LfsBatchInfo dataclass contains three elements:

@dataclass
class LfsBatchInfo:
    instructions: List["LfsUploadInfo"]
    errors: List["LfsError"]
    transfer_mode: "TransferMethod"

The upload process automatically determines which files require LFS handling based on file size thresholds configured by the Hub.

Large Folder Upload

For repositories with many files or very large folder structures, _upload_large_folder.py provides chunked upload capabilities:

# Internal chunked upload for large repositories
upload_folder(
    folder_path="/path/to/large/repo",
    repo_id="user/large-model",
    allow_patterns=["*.bin", "*.safetensors", "config.json"],
    ignore_patterns=["*.git*", "__pycache__/*"],
)

Sources: src/huggingface_hub/_upload_large_folder.py

Upload Workflow

graph TD
    A[Start Upload] --> B{File Size Check}
    B -->|Small File| C[Direct Git Upload]
    B -->|Large File| D[LFS Upload Required]
    C --> E[Create Commit]
    D --> F[Batch Request to LFS]
    F --> G[Get Upload Instructions]
    G --> H[Upload to LFS Server]
    H --> E
    E --> I[Commit to Hub]
    I --> J[Return Commit SHA]

Configuration Options

Repository Types

Type	Description	Typical Use
`model`	Model repositories	Trained weights, configs
`dataset`	Dataset repositories	Data files, metadata
`space`	Space repositories	Demo applications

Common Parameters

Parameter	Required	Default	Description
`repo_id`	Yes	-	Namespace/repo name
`repo_type`	No	"model"	Type of repository
`revision`	No	"main"	Git branch/tag
`token`	No	None	HF token for auth
`create_pr`	No	False	Create PR instead of commit

Error Handling

CommitOperationAdd Error Handling

The rethrow parameter controls error behavior:

# Default: raises exception on failure
operation = CommitOperationAdd(path_or_fileobj="file.bin", path_in_repo="model.bin")

# With error suppression
operation = CommitOperationAdd(path_or_fileobj="file.bin", path_in_repo="model.bin", rethrow=False)

Upload Errors

Error Type	Cause	Resolution
`HfHubHTTPError`	Server rejection	Check token permissions
`ValueError`	Invalid parameters	Validate repo_id, path_in_repo
`LocalUploadNotImplementedError`	Unsupported local upload	Use file path instead

Best Practices

Use upload_folder for multiple files to ensure atomic commits
Token Authentication: Always authenticate before uploading private repositories
File Filtering: Use allow_patterns and ignore_patterns for large repos
Progress Tracking: Enable tqdm for long uploads

from huggingface_hub import HfApi

api = HfApi()
api.upload_folder(
    folder_path="./model",
    repo_id="username/my-model",
    repo_type="model",
    token=True,  # Prompt for token if needed
)

Upload Guide - Detailed upload instructions
Repository Management - Repository operations
Manage Cache - Cache configuration

Module Structure Summary

File	Responsibility
`_commit_api.py`	Core commit operations and operation classes
`_upload_large_folder.py`	Chunked folder uploads
`lfs.py`	Git LFS batch upload protocol implementation
`_local_folder.py`	Local folder scanning and filtering
`hf_api.py`	High-level HfApi methods for upload

Sources: [src/huggingface_hub/_commit_api.py:1-100]()

Git LFS Large File Handling

Related topics: File Upload Operations, File Download Operations

Section Related Pages

Continue reading this section for the full explanation and source context.

Section Cache Directory Structure

Continue reading this section for the full explanation and source context.

Section LFS Upload Workflow

Continue reading this section for the full explanation and source context.

Section LFS Module (src/huggingfacehub/lfs.py)

Continue reading this section for the full explanation and source context.

Git LFS Large File Handling

Overview

Git LFS (Large File Storage) is a Git extension that handles large files by storing binary content outside the Git repository while maintaining a lightweight pointer file within it. The huggingface_hub library implements comprehensive LFS support to manage large model weights, datasets, and other binary assets on the Hugging Face Hub.

In the huggingface_hub ecosystem, LFS files are distinguished from regular Git-tracked files through their content addressing:

File Type	Storage Method	Identifier	Location in Cache
Regular Git Blob	Git commit SHA	40-char hex string	`blobs/`
LFS File	SHA256 hash	64-char hex string	`blobs/`

Sources: src/huggingface_hub/lfs.py:1-50

Architecture

Cache Directory Structure

When files are downloaded from the Hub, they are stored in a hierarchical cache structure:

graph TD
    A["Cache Root<br/>~/.cache/huggingface/hub/"] --> B["models--{repo_id}"]
    A --> C["datasets--{repo_id}"]
    A --> D["spaces--{repo_id}"]
    
    B --> E["blobs/"]
    B --> F["refs/"]
    B --> G["snapshots/"]
    
    E --> H["git-sha<br/>40-char"]
    E --> I["sha256<br/>64-char (LFS)"]
    
    G --> J["{commit_hash}/"]
    J --> K["filename → symlink → blob"]

Sources: src/huggingface_hub/file_download.py:1-30

LFS Upload Workflow

The upload process follows a batch-oriented approach using the LFS Batch API:

sequenceDiagram
    participant Client
    participant Hub as HF Hub API
    participant LFS as LFS Server
    
    Client->>Hub: POST /{repo_type}/{repo_id}.git/info/lfs/objects/batch
    Note over Hub,LFS: Batch request includes<br/>upload instructions request
    Hub->>LFS: Check upload eligibility
    LFS-->>Hub: Upload instructions (presigned URLs)
    Hub-->>Client: LfsBatchInfo with actions
    
    alt basic/multipart transfer
        Client->>LFS: PUT file content directly
        LFS-->>Client: 200 OK
    else xet transfer
        Client->>Hub: Use custom xet protocol
    end
    
    Client->>Hub: POST /{repo_type}/{repo_id}.git/info/lfs/objects/batch
    Note over Client: Confirm upload completion
    Hub->>LFS: Verify file content
    LFS-->>Hub: Verification result
    Hub-->>Client: Commit ready

Sources: src/huggingface_hub/lfs.py:60-120

Core Components

LFS Module (`src/huggingface_hub/lfs.py`)

The main LFS module provides type definitions and utilities for handling LFS operations.

#### Constants

Constant	Value	Purpose
`LFS_MULTIPART_UPLOAD_COMMAND`	`"lfs-multipart-upload"`	Identifier for multipart upload operations
`OID_REGEX`	`^[0-9a-f]{40}$`	Pattern for validating Git object identifiers
`LFS_HEADERS`	Dict	Accept and content type headers for LFS API

Sources: src/huggingface_hub/lfs.py:40-55

#### LFS Headers

LFS_HEADERS = {
    "Accept": "application/vnd.git-lfs+json",
    "Content-Type": "application/vnd.git-lfs+json",
}

These headers indicate that all LFS API communications use JSON with the vnd.git-lfs+json media type, following the LFS specification.

Sources: src/huggingface_hub/lfs.py:50-55

LFS Utilities (`src/huggingface_hub/utils/_lfs.py`)

Utility functions for LFS operations include:

Function	Purpose
`SliceFileObj`	Context manager for slicing file objects during multipart uploads
SHA utilities	Calculate SHA256 for LFS file content verification
Content range handling	Manage byte ranges for resumable uploads

Sources: src/huggingface_hub/utils/_lfs.py

API Reference

LfsBatchInfo

The LfsBatchInfo dataclass encapsulates the server response from the LFS Batch API:

@dataclass
class LfsBatchInfo:
    """Information returned by the LFS batch API."""
    
    actions: dict
    """Dictionary of available actions (upload, verify)."""
    
    objects: list[dict]
    """List of objects with their metadata."""
    
    transfers: list[str]
    """Supported transfer adapters (e.g., 'basic', 'multipart', 'xet')."""

Sources: src/huggingface_hub/lfs.py:55-80

Upload Information Classes

The library uses dataclasses to represent different types of upload information:

Class	Inheritance	Purpose
`UploadInfo`	Base	Abstract base for all upload info types
`LfsUploadFileInfo`	`UploadInfo`	Standard LFS file upload with size and SHA256
`LfsUploadTtHubInfo`	`UploadInfo`	TtHub-specific upload info

Sources: src/huggingface_hub/_commit_api.py:1-100

Transfer Adapters

The Hugging Face Hub supports multiple LFS transfer methods, negotiated during the batch API handshake:

Supported Transfer Methods

Transfer Method	Description	Use Case
`basic`	Direct HTTP PUT upload	Small to medium files
`multipart`	Chunked upload for very large files	Files > 100MB
`xet`	Custom xet protocol for optimized transfers	High-performance scenarios

Sources: src/huggingface_hub/lfs.py:60-100

Transfer Method Selection

The client sends supported transfer methods in the batch request:

payload: dict = {
    "operation": "upload",
    "transfers": transfers if transfers is not None else ["basic", "multipart"],
    ...
}

The server responds with the transfer adapter it will use, which the client then employs for the actual upload.

Sources: src/huggingface_hub/lfs.py:85-95

Large File Identification

Size Thresholds

Files are treated as LFS content when they exceed certain thresholds:

Threshold	Action
< 5MB	Stored as regular Git blob
>= 5MB	Redirected to LFS storage

OID (Object Identifier) Format

LFS files are identified by their SHA256 hash, represented as a 64-character hexadecimal string:

Pattern: ^[0-9a-f]{64}$
Example: 403450e234d65943a7dcf7e05a771ce3c92faa84dd07db4ac20f592037a1e4bd

Regular Git blobs use 40-character SHA1 identifiers, while LFS files use 64-character SHA256 identifiers.

Sources: src/huggingface_hub/lfs.py:45

Multipart Upload for Large Files

Upload Process

graph LR
    A[File] --> B{Split into chunks}
    B --> C[Chunk 1]
    B --> D[Chunk 2]
    B --> E[Chunk N]
    
    C --> F[Upload Part 1]
    D --> G[Upload Part 2]
    E --> H[Upload Part N]
    
    F --> I{All parts<br/>complete?}
    G --> I
    H --> I
    
    I -->|Yes| J[Complete multipart<br/>upload]

Chunk Size Calculation

The library calculates optimal chunk sizes based on file size:

from math import ceil

chunk_size = ceil(file_size / total_parts)

This ensures even distribution of work across all chunks.

Sources: src/huggingface_hub/lfs.py:30-35

Integration with Commit API

CommitOperationAdd with LFS

The CommitOperationAdd class handles both regular and LFS file uploads:

class CommitOperationAdd(TypedDict):
    path_in_repo: str
    id: str  # OID (git-sha or sha256 for LFS)
    size: int
    filepath: str
    upload_info: UploadInfo

The upload_info attribute contains the LFS-specific upload metadata, which determines whether the file goes through LFS or regular Git upload.

Sources: src/huggingface_hub/_commit_api.py:100-150

Upload Flow

flowchart TD
    A[Create CommitOperationAdd] --> B{File size<br/>> threshold?}
    
    B -->|Yes| C[Create LfsUploadFileInfo]
    B -->|No| D[Create UploadInfo for Git]
    
    C --> E[Upload via LFS Batch API]
    D --> F[Upload via Git HTTP API]
    
    E --> G{LFS transfer<br/>method}
    G -->|basic| H[Single PUT request]
    G -->|multipart| I[Chunked upload]
    G -->|xet| J[Custom xet protocol]
    
    H --> K[Verify upload]
    I --> K
    J --> K
    
    K --> L[Commit confirmation]
    F --> L

Error Handling

Validation Errors

Error	Condition	Handling
Invalid OID	Not matching `OID_REGEX`	Raise `ValueError`
Missing upload info	`upload_info` not set	Raise `ValueError`
Malformed batch response	Missing required fields	Raise `HfHubHTTPError`

Network Errors

The library implements automatic retry with exponential backoff for failed LFS operations:

from huggingface_hub.utils import http_backoff

# Wrapped in http_backoff for resilience
hf_raise_for_status(response)

Sources: src/huggingface_hub/lfs.py:50-80

Configuration

Environment Variables

Variable	Effect
`HF_ENDPOINT`	Override default `https://huggingface.co`
`HF_TOKEN`	Authentication token for private repos

Upload Options

When uploading files, the following options control LFS behavior:

Parameter	Type	Default	Description
`transfers`	`list[str]`	`["basic", "multipart"]`	Allowed transfer methods
`endpoint`	`str`	Hub endpoint	LFS server endpoint
`repo_type`	`str`	`"model"`	Repository type
`repo_id`	`str`	Required	Repository identifier

Sources: src/huggingface_hub/lfs.py:60-110

Best Practices

File Organization

Group large binary files - Store model weights and dataset files separately from code
Use consistent file sizes - Avoid extremely small LFS files (< 5MB overhead)
Leverage symlinks - The snapshot directory uses symlinks to avoid duplication

Upload Optimization

Prefer multipart for files > 100MB - Enables parallel uploads and resumability
Enable xet for frequent transfers - Custom protocol reduces bandwidth
Batch operations - Use CommitOperationAdd batching to minimize round trips

Caching Strategy

~/.cache/huggingface/hub/
└── models--{repo_id}/
    ├── blobs/          # Physical file storage (SHA256 for LFS)
    ├── refs/           # Revision pointers
    └── snapshots/      # Virtual files pointing to blobs

The snapshot layer provides deduplication - the same blob referenced by multiple commits is stored only once.

Sources: src/huggingface_hub/file_download.py:10-30

Repository Management API

Related topics: File Upload Operations, Authentication System, Cache Management System

Section Related Pages

Continue reading this section for the full explanation and source context.

Section Core Components

Continue reading this section for the full explanation and source context.

Section Repository Lifecycle

Continue reading this section for the full explanation and source context.

Section Repository Settings

Continue reading this section for the full explanation and source context.

Repository Management API

The Repository Management API in huggingface_hub provides a comprehensive interface for creating, configuring, and managing Hugging Face repositories (models, datasets, and Spaces) directly from Python code or via the command-line interface.

Overview

The Repository Management API serves as the core layer for all Hub repository operations, enabling developers to programmatically:

Create and delete repositories
Configure repository settings
Upload and download files
Manage repository metadata
Handle commit operations with Git LFS support

Sources: src/huggingface_hub/README.md

Architecture

The repository management functionality is distributed across multiple modules:

graph TD
    A[Repository Management API] --> B[hf_api.py]
    A --> C[_commit_api.py]
    A --> D[_buckets.py]
    B --> E[REST API Client]
    C --> E
    B --> F[CLI Interface]
    F --> G[hf CLI Command]

Core Components

Module	Purpose
`hf_api.py`	Main `HfApi` class with all CRUD operations
`_commit_api.py`	Low-level commit operations, LFS handling
`_buckets.py`	Bucket/S3-compatible storage management
`cli/`	Command-line interface implementation

Sources: CLAUDE.md

Repository Types

The API supports three primary repository types:

Type	Description	Use Case
`model`	Model repositories	Storing and sharing ML model weights
`dataset`	Dataset repositories	Hosting and versioning datasets
`space`	Space repositories	Hosting Gradio/Streamlit demos

Core Operations

Repository Lifecycle

graph LR
    A[create_repo] --> B[Update Settings]
    B --> C[Upload Files]
    C --> D[Manage Commits]
    D --> E[delete_repo]

#### Creating a Repository

from huggingface_hub import create_repo, HfApi

# Using HfApi class
api = HfApi()
api.create_repo(
    repo_id="username/my-model",
    repo_type="model",
    exist_ok=False
)

# Using convenience function
create_repo(
    repo_id="super-cool-model",
    token="hf_xxxxx"
)

#### Deleting a Repository

api.delete_repo(repo_id="username/my-model", repo_type="model")

Sources: src/huggingface_hub/hf_api.py

Repository Settings

Update repository configuration after creation:

api.update_repo_settings(
    repo_id="username/my-model",
    private=True,
    repo_type="model",
    gated=True  # Enable gated access
)

Listing Repository Contents

# List all files in a repository
files = api.list_repo_files(repo_id="tiiuae/falcon-7b-instruct")

# List repo objects with pagination
objects = api.list_repo_objects(
    repo_id="my-org/my-dataset",
    repo_type="dataset"
)

Commit Operations

Commit Operation Classes

The _commit_api.py module provides low-level commit primitives:

Class	Purpose
`CommitOperationAdd`	Add a file to the repository
`CommitOperationDelete`	Remove a file from the repository
`CommitOperationCopy`	Copy a file within the repository

from huggingface_hub import CommitOperationAdd, HfApi

operations = [
    CommitOperationAdd(
        path_in_repo="config.json",
        path_or_fileobj="/local/path/config.json"
    ),
]

api.create_commit(
    repo_id="username/my-model",
    operations=operations,
    commit_message="Add config file"
)

Sources: src/huggingface_hub/_commit_api.py

Large File Upload (Git LFS)

Large files are automatically handled via Git LFS:

graph TD
    A[File > 10MB] --> B{LFS Required?}
    B -->|Yes| C[Upload to LFS Storage]
    B -->|No| D[Upload as Regular File]
    C --> E[Create LFS Pointer]
    E --> F[Commit Pointer to Repo]
    D --> F

File Upload Operations

Single File Upload

from huggingface_hub import upload_file

upload_file(
    path_or_fileobj="/home/user/model.bin",
    path_in_repo="pytorch_model.bin",
    repo_id="username/my-model",
)

Folder Upload

from huggingface_hub import upload_folder

upload_folder(
    folder_path="/path/to/local/space",
    repo_id="username/my-cool-space",
    repo_type="space",
    commit_message="Update space files"
)

For very large folders, the library provides chunked upload:

from huggingface_hub import _upload_large_folder

_upload_large_folder(
    repo_id="username/large-dataset",
    folder_path="/data/large-folder",
    repo_type="dataset"
)

Sources: src/huggingface_hub/README.md

File Deletion Operations

# Delete a single file
api.delete_file(
    path_in_repo="old-model.bin",
    repo_id="username/my-model",
    commit_message="Remove deprecated file"
)

# Delete a folder
api.delete_folder(
    path_in_repo="old-folder/",
    repo_id="username/my-model",
    commit_message="Clean up old directory"
)

CLI Interface

The repository management features are exposed through the hf CLI:

# Authentication
hf auth login
hf auth logout
hf auth whoami

# Repository operations
hf repos create username/my-model --type model
hf repos create username/my-dataset --type dataset

Sources: setup.py

Configuration Parameters

Repository Creation Parameters

Parameter	Type	Default	Description
`repo_id`	str	Required	Repository identifier (user/name or org/name)
`repo_type`	str	"model"	Type: "model", "dataset", or "space"
`exist_ok`	bool	False	Allow overwriting existing repo
`private`	bool	False	Make repository private
`token`	str	None	Hugging Face authentication token
`space_sdk`	str	None	Space SDK: "gradio", "streamlit", "docker", "docker_leaf", "static", "nextjs"
`space_hardware`	str	None	Space hardware tier

Commit Operation Parameters

Parameter	Type	Default	Description
`operations`	list[CommitOperation]	Required	List of file operations
`commit_message`	str	Required	Description of changes
`commit_description`	str	None	Extended commit description
`parent_commit`	str	None	Parent commit SHA for incremental updates
`create_pr`	bool	False	Create a Pull Request instead of committing to main

Error Handling

The repository management API raises specific exception types:

Exception	Cause
`RepositoryNotFoundError`	Repository does not exist or user lacks access
`RevisionNotFoundError`	Specified git revision not found
`EntryNotFoundError`	File or folder not found in repository
`HfHubHTTPError`	HTTP error from the Hub API

from huggingface_hub import hf_hub_download
from huggingface_hub.errors import RevisionNotFoundError

try:
    hf_hub_download(
        repo_id="bert-base-cased",
        filename="config.json",
        revision="<non-existent-revision>"
    )
except RevisionNotFoundError as e:
    print(f"Revision not found: {e}")

Sources: src/huggingface_hub/errors.py

Common Usage Patterns

Model Publishing Workflow

graph TD
    A[Create Repo] --> B[Upload Model Files]
    B --> C[Create Model Card]
    C --> D[Set Metadata/Tags]
    D --> E[Publish to Hub]

from huggingface_hub import HfApi, upload_file, RepoCard, ModelCardData

api = HfApi()

# 1. Create repository
api.create_repo(repo_id="my-org/my-model", exist_ok=True)

# 2. Upload model files
upload_file(
    path_or_fileobj="./model.bin",
    path_in_repo="pytorch_model.bin",
    repo_id="my-org/my-model"
)

# 3. Create and upload model card
card = RepoCard.from_template(
    ModelCardData(
        language="en",
        license="apache-2.0",
        model_name="My Custom Model",
        tags=["pytorch", "image-classification"]
    ),
    text="This is a custom model trained on..."
)
card.save(".gitattributes")

Dataset Versioning Workflow

from huggingface_hub import create_commit, CommitOperationAdd

operations = [
    CommitOperationAdd(
        path_in_repo="data/train.parquet",
        path_or_fileobj="./train.parquet"
    ),
    CommitOperationAdd(
        path_in_repo="data/validation.parquet", 
        path_or_fileobj="./validation.parquet"
    ),
]

api.create_commit(
    repo_id="username/my-dataset",
    operations=operations,
    commit_message="Add training and validation splits",
    commit_description="Initial dataset release with train/validation split"
)

Best Practices

Use exist_ok=True when creating repositories in automated pipelines
Include commit messages for better version control history
Use parent_commit parameter when making sequential updates to prevent race conditions
Enable LFS automatically for files larger than 10MB
Use create_pr=True for reviewing changes before merging to main branch

Cache Management System

Related topics: File Download Operations, Repository Management API

Section Related Pages

Continue reading this section for the full explanation and source context.

Section HFCacheInfo

Continue reading this section for the full explanation and source context.

Section CachedRepoInfo

Continue reading this section for the full explanation and source context.

Section CachedRevisionInfo

Continue reading this section for the full explanation and source context.

Cache Management System

Overview

The Cache Management System in huggingface_hub provides comprehensive utilities for managing locally cached models, datasets, and Spaces downloaded from the Hugging Face Hub. The system handles automatic caching of downloaded content, tracks cache metadata, and offers programmatic and CLI interfaces for inspecting and managing cached resources.

The cache system is designed to:

Store downloaded files efficiently with deduplication via blob storage
Track repository metadata including revisions, commit hashes, and file information
Provide safe deletion strategies that don't corrupt cache state
Handle corrupted cache entries gracefully with warnings

Cache Directory Structure

The Hugging Face cache follows a specific directory structure to organize cached content:

HF_HUB_CACHE/
├── .locks/                    # Lock files for concurrent access
│   └── ...
├── CACHEDIR.TAG               # OS-native cache directory marker
├── models--owner/
│   └── repo_name/
│       ├── .cache/           # Metadata and tracking
│       ├── blobs/            # Actual file content (deduplicated)
│       ├── refs/             # Branch and tag references
│       ├── snapshots/        # Symlinks to blobs
│       └── ...
├── datasets--org/
│   └── dataset_name/
│       └── ...
└── spaces--user/
    └── space_name/
        └── ...

Cache directories follow the naming convention type--repo_id where:

type is singular (e.g., model, dataset, space)
repo_id slashes are converted to hyphens (e.g., google/fleurs becomes google--fleurs)

Sources: src/huggingface_hub/utils/_cache_manager.py:1-100

Core Data Models

HFCacheInfo

The main container class for cache information returned by scan operations:

Attribute	Type	Description
`repos`	`frozenset[CachedRepoInfo]`	All cached repositories
`size_on_disk`	`int`	Total size of all cached content in bytes
`warnings`	`list[CorruptedCacheException]`	Issues encountered during scanning

CachedRepoInfo

Represents a single cached repository:

Attribute	Type	Description
`repo_id`	`str`	Repository identifier (e.g., `google/gemma-3-4b-it`)
`repo_type`	`str`	Type: `model`, `dataset`, or `space`
`size_on_disk`	`int`	Total size of cached revisions
`revisions`	`frozenset[CachedRevisionInfo]`	All cached revisions
`snapshot_path`	`Path`	Path to the snapshot directory

CachedRevisionInfo

Represents a specific revision within a cached repository:

Attribute	Type	Description
`commit_hash`	`str`	Git commit hash (40-character hex string)
`size_on_disk`	`int`	Size of this specific revision
`files`	`frozenset[CachedFileInfo]`	Files in this revision
`last_modified`	`datetime`	Last modification timestamp

CachedFileInfo

Represents an individual cached file:

Attribute	Type	Description
`file_name`	`str`	Name of the file
`size_on_disk`	`int`	Size of the file in bytes
`file_path`	`Path`	Path to the symlinked file in snapshots
`blob_path`	`Path`	Path to the actual blob storage

Sources: src/huggingface_hub/utils/_cache_manager.py:1-100

Key API Functions

scan_cache_dir()

Scans the cache directory and returns information about all cached repositories.

from huggingface_hub import scan_cache_dir

cache_info = scan_cache_dir()
print(f"Total size: {cache_info.size_on_disk / 1024 / 1024:.2f} MB")
for repo in cache_info.repos:
    print(f"{repo.repo_type}/{repo.repo_id}")

Parameters:

Parameter	Type	Default	Description
`cache_dir`	`str` or `Path`	`HF_HUB_CACHE` env var	Cache directory to scan

Returns: HFCacheInfo object containing repository information

Raises:

CacheNotFound if the cache directory doesn't exist
ValueError if cache_dir is a file instead of a directory

try_to_load_from_cache()

Checks if a file exists in the local cache without downloading.

from huggingface_hub import try_to_load_from_cache, _CACHED_NO_EXIST

filepath = try_to_load_from_cache(
    repo_id="tiiuae/falcon-7b-instruct",
    filename="config.json",
    revision="main",
    repo_type="model"
)

if isinstance(filepath, str):
    print(f"File cached at: {filepath}")
elif filepath is _CACHED_NO_EXIST:
    print("File confirmed to not exist at this revision")
else:
    print("File not in cache")

Parameters:

Parameter	Type	Default	Description
`cache_dir`	`str` or `Path`	None	Cache directory path
`repo_id`	`str`	Required	Repository identifier
`filename`	`str`	Required	Filename to look for
`revision`	`str`	`"main"`	Specific revision to check
`repo_type`	`str`	`"model"`	Type of repository

Returns: str (file path), _CACHED_NO_EXIST, or None

Sources: src/huggingface_hub/file_download.py:1-100 Sources: src/huggingface_hub/utils/_cache_manager.py:1-100

DeleteCacheStrategy

The deletion system uses a two-phase approach: create a strategy, then execute it. This prevents accidental data loss and allows for dry-run validation.

from huggingface_hub import scan_cache_dir

cache_info = scan_cache_dir()

# Create deletion strategy (doesn't delete yet)
delete_strategy = cache_info.delete_revisions(
    "81fd1d6e7847c99f5862c9fb81387956d99ec7aa",
    "e2983b237dccf3ab4937c97fa717319a9ca1a96d",
)

# Preview what will be deleted
print(f"Will free: {delete_strategy.expected_free_space / 1024 / 1024:.2f} MB")

# Execute the deletion
delete_strategy.execute()

Deletion Workflow

graph TD
    A[scan_cache_dir] --> B[Get HFCacheInfo]
    B --> C[Call delete_revisions with commit hashes]
    C --> D[Create DeleteCacheStrategy]
    D --> E{Preview/Dry Run}
    E -->|Inspect| F[Review expected_free_space]
    E -->|Confirm| G[execute]
    G --> H[Delete blobs and refs]
    H --> I[Cache deletion done]
    F --> G

DeleteCacheStrategy Properties

Property	Type	Description
`repos_to_delete`	`dict[CachedRepoInfo, set[CachedRevisionInfo]]`	Repos and revisions marked for deletion
`blobs_to_delete`	`set[Path]`	Blob file paths to remove
`expected_free_space`	`int`	Estimated bytes to be freed

Sources: src/huggingface_hub/utils/_cache_manager.py:1-100

CLI Interface

The hf command provides cache management through the cache subcommand.

List Cached Repositories

hf cache ls

Output format:

ID                          SIZE     LAST_ACCESSED LAST_MODIFIED REFS
--------------------------- -------- ------------- ------------- -----------
dataset/nyu-mll/glue          157.4M 2 days ago    2 days ago    main script
model/LiquidAI/LFM2-VL-1.6B     3.2G 4 days ago    4 days ago    main
model/microsoft/UserLM-8b      32.1G 4 days ago    4 days ago    main

Done in 0.0s. Scanned 6 repo(s) for a total of 3.4G.

Filtering Options

Filter Key	Operators	Example
`type`	`==`, `!=`	`--filter type==model`
`size`	`>`, `<`, `>=`, `<=`, `=`	`--filter size>=1G`
`accessed`	`>`, `<`, `>=`, `<=`	`--filter accessed<7d`
`modified`	`>`, `<`, `>=`, `<=`	`--filter modified>30d`
`refs`	`==`, `!=`	`--filter refs==main`

Examples:

# Filter large models
hf cache ls --filter type==model --filter size>=5G

# Find recently accessed datasets
hf cache ls --filter type==dataset --filter accessed<7d

# Filter by modification time
hf cache ls --filter modified>30d

Sorting Options

Sort Key	Default Order	Ascending Option
`name`	`asc`	`name:asc`
`size`	`desc`	`size:asc`
`accessed`	`desc`	`accessed:asc`
`modified`	`desc`	`modified:asc`

Examples:

# Sort by size descending (largest first)
hf cache ls --sort size

# Sort by name ascending, then size descending
hf cache ls --sort name:asc --sort size:desc

Delete Specific Revisions

hf cache delete <revision_hash> [<revision_hash>...]

The CLI will prompt for confirmation before deletion.

Sources: src/huggingface_hub/cli/cache.py:1-100

Cache Scanning Process

The cache scanning process validates the cache directory structure and handles corrupted entries gracefully.

graph TD
    A[Start scan_cache_dir] --> B{Is cache_dir set?}
    B -->|No| C[Use HF_HUB_CACHE env var]
    B -->|Yes| D[Use provided path]
    C --> E{Does directory exist?}
    D --> E
    E -->|No| F[Raise CacheNotFound]
    E -->|Yes| G{Is it a file?}
    G -->|Yes| H[Raise ValueError]
    G -->|No| I[Iterate subdirectories]
    I --> J{Skip .locks and CACHEDIR.TAG?}
    J -->|Yes| K[Next directory]
    J -->|No| L[_scan_cached_repo]
    L --> M{Valid format?}
    M -->|No| N[Log CorruptedCacheException]
    M -->|Yes| O[Create CachedRepoInfo]
    N --> P[Add to warnings list]
    O --> Q[Add to repos set]
    P --> K
    Q --> K
    K --> R{More directories?}
    R -->|Yes| I
    R -->|No| S[Return HFCacheInfo]

Validation Rules

Each subdirectory must follow the type--repo_id naming convention
The type must be one of: model, dataset, space
Directories must contain expected subdirectories (snapshots, blobs, refs)

Sources: src/huggingface_hub/utils/_cache_manager.py:100-200

HubMixin Integration

The HubMixin class integrates with the cache system for model loading:

from huggingface_hub import HubMixin

class MyModel(HubMixin, torch.nn.Module):
    pass

# Load model - uses cache automatically
model = MyModel.from_pretrained("bert-base-uncased")

# Cache behavior:
# 1. Check if model exists locally
# 2. If local_files_only=True, use cached version or raise error
# 3. Otherwise, download and cache from Hub
# 4. Store in cache_dir or default HF_HUB_CACHE location

HubMixin Parameters Related to Caching:

Parameter	Type	Default	Description
`cache_dir`	`str` or `Path`	None	Custom cache location
`force_download`	`bool`	`False`	Force re-download
`local_files_only`	`bool`	`False`	Only use cached files
`token`	`str` or `bool`	None	HuggingFace token

Sources: src/huggingface_hub/hub_mixin.py:1-100

Exception Handling

CorruptedCacheException

Raised when cache directory structure is invalid or expected files are missing.

class CorruptedCacheException(Exception):
    """Exception raised when a cache entry is corrupted."""
    def __init__(self, message: str):
        self.message = message
        super().__init__(self.message)

Common corruption scenarios:

Snapshots directory doesn't exist
Invalid repository directory naming
Missing expected cache metadata

CacheNotFound

Raised when the cache directory cannot be located.

raise CacheNotFound(
    f"Cache directory not found: {cache_dir}. "
    "Please use `cache_dir` argument or set `HF_HUB_CACHE` environment variable.",
    cache_dir=cache_dir,
)

Environment Variables

Variable	Description	Default
`HF_HUB_CACHE`	Primary cache directory	`~/.cache/huggingface/hub`
`HF_HUB_DOWNLOAD_TIMEOUT`	Download timeout in seconds	10

Sources: src/huggingface_hub/constants.py

Best Practices

Efficient Cache Usage

Reuse cached content: Multiple models sharing the same base weights will reference the same blobs
Use revision pinning: Specify exact commit hashes for reproducible builds
Monitor cache size: Regularly run hf cache ls to identify large repositories

Safe Deletion

Always use scan_cache_dir() to inspect before deletion
Check warnings in HFCacheInfo for corrupted entries
Use expected_free_space property to estimate space recovery
Execute deletion only after confirming the strategy

Troubleshooting

Issue	Solution
CacheNotFound error	Set `HF_HUB_CACHE` or use `cache_dir` parameter
CorruptedCacheException	Manually delete the corrupted cache entry
Large cache size	Use `delete_revisions()` to remove old/unused revisions
Permission denied	Check file permissions on cache directory

Complete Usage Example

from huggingface_hub import scan_cache_dir

# Scan cache and get overview
cache_info = scan_cache_dir()

print(f"Total cached repos: {len(cache_info.repos)}")
print(f"Total size: {cache_info.size_on_disk / 1024**3:.2f} GB")

# Find specific repo
target_repo = "stabilityai/stable-diffusion-2-1"
for repo in cache_info.repos:
    if repo.repo_id == target_repo:
        print(f"\nFound {target_repo}:")
        print(f"  Type: {repo.repo_type}")
        print(f"  Revisions: {len(repo.revisions)}")
        for revision in repo.revisions:
            print(f"    - {revision.commit_hash[:8]}")
            print(f"      Size: {revision.size_on_disk / 1024**2:.2f} MB")
            print(f"      Files: {len(revision.files)}")

# Clean up old revisions
if cache_info.repos:
    first_repo = next(iter(cache_info.repos))
    if len(first_repo.revisions) > 1:
        # Keep only the latest revision
        revisions_to_delete = [
            rev.commit_hash 
            for rev in list(first_repo.revisions)[1:]
        ]
        strategy = cache_info.delete_revisions(*revisions_to_delete)
        print(f"\nWould free: {strategy.expected_free_space / 1024**2:.2f} MB")
        # strategy.execute()  # Uncomment to actually delete

Sources: [src/huggingface_hub/utils/_cache_manager.py:1-100]()

Inference Client and Providers

Related topics: HuggingFace File System (HfFileSystem), Overview and Architecture

Section Related Pages

Continue reading this section for the full explanation and source context.

Section Core Components

Continue reading this section for the full explanation and source context.

Section Task Categories and Methods

Continue reading this section for the full explanation and source context.

Section Initialization Parameters

Continue reading this section for the full explanation and source context.

Inference Client and Providers

Overview

The Inference Client and Providers system provides a unified interface for performing inference with machine learning models hosted on Hugging Face or third-party inference providers. This system abstracts the complexity of interacting with various inference backends, allowing developers to make inference calls through a consistent Python API.

The InferenceClient class serves as the primary entry point for synchronous inference operations, while AsyncInferenceClient provides asynchronous alternatives for non-blocking workflows. Both clients leverage a provider system that normalizes API differences between various inference services like Replicate, Together AI, Fal.ai, and Sambanova. Sources: src/huggingface_hub/inference/_client.py:1-100

Architecture

The inference system follows a layered architecture where the client exposes a high-level API while delegating provider-specific details to helper classes.

graph TD
    User[User Code] --> Client[InferenceClient]
    Client --> ProviderHelper[Provider Helper]
    ProviderHelper --> ProviderAPI[Third-party Provider API]
    ProviderHelper --> HFRouting[Hugging Face Routing]
    
    subgraph "InferenceClient"
        Methods[text_generation, chat_completion,<br/>text_to_image, etc.]
    end
    
    subgraph "Provider Layer"
        get_provider_helper[get_provider_helper]
        prepare_request[prepare_request]
        get_response[get_response]
    end

Core Components

Component	File Location	Purpose
`InferenceClient`	`inference/_client.py`	Synchronous inference operations
`AsyncInferenceClient`	`inference/_generated/_async_client.py`	Asynchronous inference operations
Provider Helpers	`inference/_providers/*.py`	Provider-specific request/response handling
Provider Registry	`inference/_providers/__init__.py`	Provider discovery and initialization

Sources: src/huggingface_hub/inference/_client.py:1-50

Supported Inference Tasks

The InferenceClient supports a comprehensive set of inference tasks through method-based API calls.

Task Categories and Methods

Category	Method	Description
Text Generation	`text_generation()`	Generate text from prompts with streaming support
Chat	`chat_completion()`	Multi-turn conversation with message history
Image Generation	`text_to_image()`	Generate images from text prompts
Video Generation	`text_to_video()`	Generate videos from text descriptions
Text Analysis	`summarization()`	Summarize long text documents
Text Analysis	`fill_mask()`	Fill masked tokens in text
Text Analysis	`zero_shot_classification()`	Classify text with arbitrary labels
Table Operations	`table_question_answering()`	Answer questions from tabular data
Table Operations	`tabular_classification()`	Classify tabular data rows
Embeddings	`sentence_similarity()`	Compute semantic similarity between sentences
Vision	`image_classification()`	Classify images into categories

Sources: src/huggingface_hub/inference/_client.py:200-500

Client Configuration

Initialization Parameters

Parameter	Type	Default	Description
`model`	`str \	None`	`None`	Default model identifier for all requests
`provider`	`str \	None`	`None`	Inference provider to use (replicate, together, fal-ai, etc.)
`api_key`	`str \	None`	`None`	API key for authentication
`token`	`str \	bool \	None`	`True`	Hugging Face token for authentication
`timeout`	`float \	None`	`None`	Request timeout in seconds
`headers`	`dict[str, str]`	`{}`	Additional HTTP headers

from huggingface_hub import InferenceClient

# Basic usage with default provider
client = InferenceClient()

# Using a specific provider
client = InferenceClient(
    provider="replicate",
    api_key="hf_...",
    model="meta-llama/Meta-Llama-3-8B-Instruct"
)

Sources: src/huggingface_hub/inference/_client.py:50-150

Provider System

Provider Architecture

The provider system normalizes differences between inference services by abstracting request preparation and response parsing.

graph LR
    A[InferenceClient] -->|task + model| B[get_provider_helper]
    B --> C{Provider Type}
    C -->|Built-in| D[Internal Provider Helper]
    C -->|Third-party| E[Provider API Helper]
    
    D --> F[Provider.prepare_request]
    E --> G[External API Call]
    
    F --> H[Normalized Response]
    G --> H

Supported Providers

Provider	Description	Authentication
`replicate`	Replicate hosted models	API key
`together`	Together AI inference	API key
`fal-ai`	Fal.ai generation services	API key
`sambanova`	SambaNova Cloud	API key
`default`	Hugging Face inference API	HF token

Sources: src/huggingface_hub/inference/_providers/__init__.py

Provider Helper Functions

Each provider helper implements two key methods:

prepare_request(): Transforms inputs and parameters into provider-specific API format
get_response(): Parses provider response into normalized output format

provider_helper = get_provider_helper(
    provider="replicate",
    task="text-generation",
    model="meta-llama/Meta-Llama-3-8B-Instruct"
)

request_parameters = provider_helper.prepare_request(
    inputs=prompt,
    parameters={"max_new_tokens": 100},
    headers=client.headers,
    model=model_id,
    api_key=client.token,
)

Sources: src/huggingface_hub/inference/_client.py:150-200

Usage Examples

Text Generation

from huggingface_hub import InferenceClient

client = InferenceClient()

# Basic text generation
output = client.text_generation(
    prompt="The capital of France is",
    model="gpt2"
)

Sources: src/huggingface_hub/inference/_client.py:300-400

Chat Completion

from huggingface_hub import InferenceClient

client = InferenceClient(
    provider="sambanova",
    api_key="hf_..."
)

output = client.chat_completion(
    model="meta-llama/Meta-Llama-3-8B-Instruct",
    messages=[
        {"role": "user", "content": "What is the capital of France?"}
    ]
)

Sources: src/huggingface_hub/inference/_client.py:400-500

Image Generation

from huggingface_hub import InferenceClient

client = InferenceClient(
    provider="replicate",
    api_key="hf_..."
)

image = client.text_to_image(
    "An astronaut riding a horse on the moon.",
    model="black-forest-labs/FLUX.1-schnell",
    extra_body={"output_quality": 100}
)
image.save("astronaut.png")

Sources: src/huggingface_hub/inference/_client.py:500-600

Text-to-Video

from huggingface_hub import InferenceClient

client = InferenceClient()

video = client.text_to_video(
    prompt="A cat playing piano",
    num_inference_steps=50,
    guidance_scale=7.5
)

Sources: src/huggingface_hub/inference/_client.py:600-700

Sentence Similarity

from huggingface_hub import InferenceClient

client = InferenceClient()

similarities = client.sentence_similarity(
    "Machine learning is so easy.",
    other_sentences=[
        "Deep learning is so straightforward.",
        "This is so difficult, like rocket science.",
    ]
)
# Output: [0.7785726189613342, 0.45876261591911316]

Sources: src/huggingface_hub/inference/_client.py:700-800

Zero-Shot Classification

from huggingface_hub import InferenceClient

client = InferenceClient()

text = "A new model offers an explanation for how the Galilean satellites formed."
labels = ["space & cosmos", "scientific discovery", "microbiology", "robots"]

result = client.zero_shot_classification(text, labels)

Sources: src/huggingface_hub/inference/_client.py:350-450

AsyncInferenceClient

For asynchronous workflows, the AsyncInferenceClient provides non-blocking equivalents of all synchronous methods.

from huggingface_hub import AsyncInferenceClient

async def main():
    client = AsyncInferenceClient()
    
    # Async chat completion
    output = await client.chat_completion(
        model="meta-llama/Meta-Llama-3-70B-Instruct",
        messages=[
            {"role": "user", "content": "Hello!"}
        ]
    )
    
    # Async image generation
    image = await client.text_to_image(
        prompt="A beautiful sunset over mountains",
        model="black-forest-labs/FLUX.1-schnell"
    )

Sources: src/huggingface_hub/inference/_generated/_async_client.py:1-200

Error Handling

The inference system defines specific exception types for common error conditions:

Exception	Description
`InferenceTimeoutError`	Request exceeded timeout threshold
`HfHubHTTPError`	HTTP error from the inference provider

from huggingface_hub import InferenceClient, InferenceTimeoutError

client = InferenceClient(timeout=30)

try:
    result = client.text_generation("Hello world")
except InferenceTimeoutError:
    print("Request timed out")
except HfHubHTTPError as e:
    print(f"HTTP error: {e}")

Sources: src/huggingface_hub/inference/_client.py:250-300

Request Flow

sequenceDiagram
    participant User
    participant Client
    participant ProviderHelper
    participant API
    
    User->>Client: text_generation(prompt, model)
    Client->>ProviderHelper: get_provider_helper(provider, task, model)
    Client->>ProviderHelper: prepare_request(inputs, parameters)
    ProviderHelper-->>Client: request_parameters
    Client->>Client: _inner_post(request_parameters)
    Client->>API: HTTP POST
    API-->>Client: response
    Client->>ProviderHelper: get_response(response)
    ProviderHelper-->>Client: normalized_output
    Client-->>User: InferenceOutput

Output Models

The inference client returns typed output objects for each task:

Task	Output Type
Text Generation	`TextGenerationOutput` or `TextGenerationStreamOutput`
Chat Completion	`ChatCompletionOutput`
Image Generation	`PIL.Image.Image`
Video Generation	`bytes`
Summarization	`SummarizationOutput`
Fill Mask	`list[FillMaskOutputElement]`
Zero-Shot Classification	`list[ZeroShotClassificationOutputElement]`
Table Question Answering	`TableQuestionAnsweringOutputElement`
Tabular Classification	`list[str]`
Sentence Similarity	`list[float]`
Image Classification	`list[ImageClassificationOutputElement]`

Sources: src/huggingface_hub/inference/_client.py:200-600

CLI Integration

The CLI provides command-line access to inference functionality:

# Install inference dependencies
pip install huggingface_hub[inference]

# Run inference via CLI
hf inference --model gpt2 --text "The capital of France is"

Sources: setup.py:1-30

Advanced Configuration

Extra Body Parameters

Many inference methods accept extra_body for provider-specific parameters:

client = InferenceClient(provider="replicate", api_key="hf_...")

image = client.text_to_image(
    "A majestic lion",
    model="black-forest-labs/FLUX.1-dev",
    extra_body={
        "output_quality": 100,
        "guidance_scale": 3.5
    }
)

Generate Parameters

The generate_parameters argument allows fine-tuning of generation behavior:

client.text_generation(
    prompt="Write a story",
    model="meta-llama/Meta-Llama-3-8B-Instruct",
    generate_parameters={
        "temperature": 0.7,
        "top_p": 0.9,
        "repetition_penalty": 1.2
    }
)

Summary

The Inference Client and Providers system provides:

Unified API: Consistent interface across all inference tasks
Multi-Provider Support: Seamless integration with Replicate, Together AI, Fal.ai, and Sambanova
Type-Safe Outputs: Well-defined output models for each task
Async Support: Full async/await compatibility via AsyncInferenceClient
Error Handling: Specific exceptions for timeout and HTTP errors
Extensible Design: Provider helper system for adding new inference backends

This architecture enables developers to switch between providers and models without modifying application code, providing flexibility in deployment while maintaining a clean, Pythonic API.

Sources: [src/huggingface_hub/inference/_client.py:1-50]()

HuggingFace File System (HfFileSystem)

Related topics: File Download Operations, File Upload Operations

Section Related Pages

Continue reading this section for the full explanation and source context.

HuggingFace File System (HfFileSystem)

Overview

The HuggingFace File System (HfFileSystem) is an fsspec-based POSIX-like filesystem implementation that provides seamless access to Hugging Face Hub repositories. It enables developers to interact with models, datasets, and Spaces using familiar filesystem operations, abstracting away the complexity of HTTP API calls and caching mechanisms.

Key Characteristics:

Property	Value
Base Class	`fsspec.spec.AbstractFileSystem`
Protocol	`hf://`
Python Version	>= 3.10.0
Entry Point	`hf=huggingface_hub.HfFileSystem`

Sources: setup.py:48

Sources: [setup.py:48](https://github.com/huggingface/huggingface_hub/blob/main/setup.py#L48)

Doramagic Pitfall Log

Source-linked risks stay visible on the manual page so the preview does not read like a recommendation.

high How to stop hf models ls from truncating the results in the table?

The project may affect permissions, credentials, data exposure, or host boundaries.

medium [v1.13.0] new CLI commands and formatting, and HF URI parsing

First-time setup may fail or require extra isolation and rollback planning.

medium [v1.15.0] Region-aware buckets & repos, `hf skills list`, polished CLI help and more

First-time setup may fail or require extra isolation and rollback planning.

medium README/documentation is current enough for a first validation pass.

The project should not be treated as fully validated until this signal is reviewed.

Doramagic Pitfall Log

Doramagic extracted 13 source-linked risk signals. Review them before installing or handing real data to the project.

1. Security or permission risk: How to stop hf models ls from truncating the results in the table?

Severity: high
Finding: Security or permission risk is backed by a source signal: How to stop hf models ls from truncating the results in the table?. Treat it as a review item until the current version is checked.
User impact: The project may affect permissions, credentials, data exposure, or host boundaries.
Recommended check: Open the linked source, confirm whether it still applies to the current version, and keep the first run isolated.
Evidence: Source-linked evidence: https://github.com/huggingface/huggingface_hub/issues/4207

2. Installation risk: [v1.13.0] new CLI commands and formatting, and HF URI parsing

Severity: medium
Finding: Installation risk is backed by a source signal: [v1.13.0] new CLI commands and formatting, and HF URI parsing. Treat it as a review item until the current version is checked.
User impact: First-time setup may fail or require extra isolation and rollback planning.
Recommended check: Open the linked source, confirm whether it still applies to the current version, and keep the first run isolated.
Evidence: Source-linked evidence: https://github.com/huggingface/huggingface_hub/releases/tag/v1.13.0

3. Installation risk: [v1.15.0] Region-aware buckets & repos, `hf skills list`, polished CLI help and more

Severity: medium
Finding: Installation risk is backed by a source signal: [v1.15.0] Region-aware buckets & repos, hf skills list, polished CLI help and more. Treat it as a review item until the current version is checked.
User impact: First-time setup may fail or require extra isolation and rollback planning.
Recommended check: Open the linked source, confirm whether it still applies to the current version, and keep the first run isolated.
Evidence: Source-linked evidence: https://github.com/huggingface/huggingface_hub/releases/tag/v1.15.0

4. Capability assumption: README/documentation is current enough for a first validation pass.

Severity: medium
Finding: README/documentation is current enough for a first validation pass.
User impact: The project should not be treated as fully validated until this signal is reviewed.
Recommended check: Open the linked source, confirm whether it still applies to the current version, and keep the first run isolated.
Evidence: capability.assumptions | github_repo:323591830 | https://github.com/huggingface/huggingface_hub | README/documentation is current enough for a first validation pass.

5. Maintenance risk: Maintainer activity is unknown

Severity: medium
Finding: Maintenance risk is backed by a source signal: Maintainer activity is unknown. Treat it as a review item until the current version is checked.
User impact: Users cannot judge support quality until recent activity, releases, and issue response are checked.
Recommended check: Open the linked source, confirm whether it still applies to the current version, and keep the first run isolated.
Evidence: evidence.maintainer_signals | github_repo:323591830 | https://github.com/huggingface/huggingface_hub | last_activity_observed missing

6. Security or permission risk: no_demo

Severity: medium
Finding: no_demo
User impact: The project may affect permissions, credentials, data exposure, or host boundaries.
Recommended check: Open the linked source, confirm whether it still applies to the current version, and keep the first run isolated.
Evidence: downstream_validation.risk_items | github_repo:323591830 | https://github.com/huggingface/huggingface_hub | no_demo; severity=medium

7. Security or permission risk: no_demo

Severity: medium
Finding: no_demo
User impact: The project may affect permissions, credentials, data exposure, or host boundaries.
Recommended check: Open the linked source, confirm whether it still applies to the current version, and keep the first run isolated.
Evidence: risks.scoring_risks | github_repo:323591830 | https://github.com/huggingface/huggingface_hub | no_demo; severity=medium

8. Security or permission risk: [v1.10.0] Instant file copy and new Kernel repo type

Severity: medium
Finding: Security or permission risk is backed by a source signal: [v1.10.0] Instant file copy and new Kernel repo type. Treat it as a review item until the current version is checked.
User impact: The project may affect permissions, credentials, data exposure, or host boundaries.
Recommended check: Open the linked source, confirm whether it still applies to the current version, and keep the first run isolated.
Evidence: Source-linked evidence: https://github.com/huggingface/huggingface_hub/releases/tag/v1.10.0

9. Security or permission risk: [v1.11.0] Semantic Spaces search, Space logs, and more

Severity: medium
Finding: Security or permission risk is backed by a source signal: [v1.11.0] Semantic Spaces search, Space logs, and more. Treat it as a review item until the current version is checked.
User impact: The project may affect permissions, credentials, data exposure, or host boundaries.
Recommended check: Open the linked source, confirm whether it still applies to the current version, and keep the first run isolated.
Evidence: Source-linked evidence: https://github.com/huggingface/huggingface_hub/releases/tag/v1.11.0

10. Security or permission risk: [v1.12.0] Unified CLI output, bucket search, and more

Severity: medium
Finding: Security or permission risk is backed by a source signal: [v1.12.0] Unified CLI output, bucket search, and more. Treat it as a review item until the current version is checked.
User impact: The project may affect permissions, credentials, data exposure, or host boundaries.
Recommended check: Open the linked source, confirm whether it still applies to the current version, and keep the first run isolated.
Evidence: Source-linked evidence: https://github.com/huggingface/huggingface_hub/releases/tag/v1.12.0

11. Security or permission risk: [v1.14.0] Handle Spaces secrets & variables from CLI and other improvements

Severity: medium
Finding: Security or permission risk is backed by a source signal: [v1.14.0] Handle Spaces secrets & variables from CLI and other improvements. Treat it as a review item until the current version is checked.
User impact: The project may affect permissions, credentials, data exposure, or host boundaries.
Recommended check: Open the linked source, confirm whether it still applies to the current version, and keep the first run isolated.
Evidence: Source-linked evidence: https://github.com/huggingface/huggingface_hub/releases/tag/v1.14.0

12. Maintenance risk: issue_or_pr_quality=unknown

Severity: low
Finding: issue_or_pr_quality=unknown。
User impact: Users cannot judge support quality until recent activity, releases, and issue response are checked.
Recommended check: Open the linked source, confirm whether it still applies to the current version, and keep the first run isolated.
Evidence: evidence.maintainer_signals | github_repo:323591830 | https://github.com/huggingface/huggingface_hub | issue_or_pr_quality=unknown

Source: Doramagic discovery, validation, and Project Pack records

Community Discussion Evidence

These external discussion links are review inputs, not standalone proof that the project is production-ready.

Sources 11

Count of project-level external discussion links exposed on this manual page.

Use Review before install

Open the linked issues or discussions before treating the pack as ready for your environment.

Community Discussion Evidence

Doramagic exposes project-level community discussion separately from official documentation. Review these links before using huggingface_hub with real data or production workflows.

How to stop hf models ls from truncating the results in the table? - github / github_issue
[[v1.15.0] Region-aware buckets & repos, hf skills list, polished CLI h](https://github.com/huggingface/huggingface_hub/releases/tag/v1.15.0) - github / github_release
[[v1.14.0] Handle Spaces secrets & variables from CLI and other improveme](https://github.com/huggingface/huggingface_hub/releases/tag/v1.14.0) - github / github_release
[[v1.13.0] new CLI commands and formatting, and HF URI parsing](https://github.com/huggingface/huggingface_hub/releases/tag/v1.13.0) - github / github_release
[[v1.12.0] Unified CLI output, bucket search, and more](https://github.com/huggingface/huggingface_hub/releases/tag/v1.12.0) - github / github_release
[[v1.11.0] Semantic Spaces search, Space logs, and more](https://github.com/huggingface/huggingface_hub/releases/tag/v1.11.0) - github / github_release
[[v1.10.2] Fix reference cycle in hf_raise_for_status](https://github.com/huggingface/huggingface_hub/releases/tag/v1.10.2) - github / github_release
[[v1.10.1] Fix copy file to folder](https://github.com/huggingface/huggingface_hub/releases/tag/v1.10.1) - github / github_release
[[v1.10.0] Instant file copy and new Kernel repo type](https://github.com/huggingface/huggingface_hub/releases/tag/v1.10.0) - github / github_release
[[v1.9.2] Fix set_space_volume / delete_space_volume return types](https://github.com/huggingface/huggingface_hub/releases/tag/v1.9.2) - github / github_release
README/documentation is current enough for a first validation pass. - GitHub / issue

Source: Project Pack community evidence and pitfall evidence

huggingface_hub

Overview and Architecture

Related Pages

Overview and Architecture

Introduction

Installation and Setup

Related Pages

Installation and Setup

Overview

System Requirements

Python Version

Supported Platforms

Installation Methods

Standard Installation (pip)

Installation with Optional Dependencies

Development Installation

Conda Installation

Dependency Architecture

Core Dependencies

Optional Dependency Groups

Testing Dependencies

Quality Assurance Dependencies

Type Checking Dependencies

All-Inclusive Meta-Group

Installation Workflow

Verification

Post-Installation Configuration

Authentication Setup

Cache Configuration

Entry Points

Troubleshooting

Common Issues

Development Setup Issues

Package Metadata

Authentication System

Related Pages

Authentication System

Overview

Architecture

Token-Based Authentication

Login Functionality

Token Storage and Management

OAuth 2.0 Authentication

Git Credential Integration

Authentication Workflow

Configuration

Security Considerations

Related Components

Quick Reference

File Download Operations

Related Pages

File Download Operations

Overview

Architecture

Component Overview

Module Structure

Core API Functions

hf_hub_download

snapshot_download

Caching Mechanism

Cache Directory Structure

Download Metadata

Lock File Management

Download Workflow

Sequence Diagram

ETag Validation Process

Error Handling

Exception Hierarchy

Common Errors

Command-Line Interface

CLI Download Command

CLI Implementation

Advanced Usage

Dry Run Mode

Progress Tracking

Offline Mode

Repository Types

Best Practices

Summary

File Upload Operations