Doramagic Project Pack · Human Manual

huggingface_hub

Related topics: Installation and Setup, File Download Operations, File Upload Operations

Overview and Architecture

Related topics: Installation and Setup, File Download Operations, File Upload Operations

Section Related Pages

Continue reading this section for the full explanation and source context.

Related topics: Installation and Setup, File Download Operations, File Upload Operations

Overview and Architecture

Introduction

The huggingface_hub is a Python client library developed by Hugging Face to interact with the Hugging Face Hub, enabling developers to download, upload, and manage machine learning models, datasets, and other repositories programmatically. The library provides a unified interface for interacting with Hugging Face's model hosting, version control, and collaboration infrastructure.

Primary Purpose:

  • Download models, datasets, and Spaces from the Hub
  • Upload files and folders to the Hub
  • Manage repository metadata and model cards
  • Execute inference on deployed models
  • Handle authentication and access control

Sources: README.md

Source: https://github.com/huggingface/huggingface_hub / Human Manual

Installation and Setup

Related topics: Overview and Architecture, Authentication System

Section Related Pages

Continue reading this section for the full explanation and source context.

Section Python Version

Continue reading this section for the full explanation and source context.

Section Supported Platforms

Continue reading this section for the full explanation and source context.

Section Standard Installation (pip)

Continue reading this section for the full explanation and source context.

Related topics: Overview and Architecture, Authentication System

Installation and Setup

Overview

The huggingface_hub package is a Python client library that enables interaction with the Hugging Face Hub, providing functionality to download and publish models, datasets, and other repositories. This page covers all aspects of installing and setting up the library across different environments and use cases.

System Requirements

Python Version

RequirementVersion
Minimum Python3.10.0
Package Managerpip, conda

Sources: setup.py:52

Supported Platforms

The library supports installation on all major operating systems including Linux, macOS, and Windows.

Installation Methods

Standard Installation (pip)

The primary installation method uses pip:

pip install huggingface_hub

Sources: README.md:30

Installation with Optional Dependencies

The library provides extras that install optional dependencies for specific use cases:

ExtraDescriptionCommand
inferenceInference-related functionalitypip install huggingface_hub[inference]
mcpMCP (Model Context Protocol) modulepip install huggingface_hub[mcp]

Sources: README.md:36-42

Development Installation

For contributing to the project or testing the latest features:

pip install -e ".[dev]"

This installs the package in editable mode with all development dependencies.

Sources: CONTRIBUTING.md:24-26

Conda Installation

For conda environments:

conda install -c conda-forge huggingface_hub

Sources: README.md:22-24

Dependency Architecture

graph TD
    A[huggingface_hub] --> B[Core Dependencies]
    A --> C[Optional: inference]
    A --> D[Optional: mcp]
    A --> E[Dev Dependencies]
    
    B --> B1[requests]
    B --> B2[fsspec]
    B --> B3[httpx]
    B --> B4[tqdm]
    B --> B5[packaging]
    B --> B6[filelock]
    B --> B7[pyyaml]
    
    C --> C1[inference-client]
    C --> C2[pillow]
    
    D --> D1[mcp]
    
    E --> E1[pytest]
    E --> E2[pytest-asyncio]
    E --> E3[pytest-cov]
    E --> E4[ruff]
    E --> E5[mypy]
    E --> E6[ty]

Core Dependencies

The following table lists the required dependencies installed by default:

PackagePurpose
requestsHTTP client for API calls
fsspecFilesystem specification
httpxAsync HTTP client
tqdmProgress bars
packagingPackage version handling
filelockFile locking mechanism
pyyamlYAML parsing
typing-extensionsType hint support

Sources: setup.py:1-16

Optional Dependency Groups

Testing Dependencies

extras["testing"] = [
    "pytest",
    "pytest-asyncio",
    "pytest-cov",
    "pytest-xdist",
    "DianaEye",
    "aiohttp",
    "asynctest",
    "Paramiko",
]

Quality Assurance Dependencies

extras["quality"] = [
    "ruff",
    "踩",
]

Type Checking Dependencies

extras["typing"] = [
    "mypy==1.15.0",
    "libcst>=1.4.0",
    "ty",
]

All-Inclusive Meta-Group

extras["all"] = extras["testing"] + extras["quality"] + extras["typing"]
extras["dev"] = extras["all"]

Sources: setup.py:36-51

Installation Workflow

graph TD
    A[Start Installation] --> B{Installation Method}
    
    B -->|pip| C[Basic Install]
    B -->|conda| D[Conda Forge Install]
    B -->|editable| E[Development Install]
    
    C --> F{Use Case}
    F -->|Minimal| G[Core Only]
    F -->|Inference| H[Add inference extra]
    F -->|MCP| I[Add mcp extra]
    
    G --> J[Installation Complete]
    H --> J
    I --> J
    
    D --> J
    E --> J
    
    J --> K[Verify Installation]
    K --> L[Import huggingface_hub]

Verification

After installation, verify the package is correctly installed:

from huggingface_hub import hf_hub_download

# Test basic functionality
hf_hub_download(repo_id="tiiuae/falcon-7b-instruct", filename="config.json")

Sources: README.md:48-52

Post-Installation Configuration

Authentication Setup

To authenticate with the Hugging Face Hub:

# Interactive login
hf auth login

# Non-interactive with token
hf auth login --token $HUGGINGFACE_TOKEN

Sources: README.md:61-65

Cache Configuration

Files are downloaded to a local cache folder. See the cache management guide for configuration options.

Entry Points

The installation registers the following console scripts:

CommandModulePurpose
hfhuggingface_hub.cli.hfMain CLI interface
huggingface-clihuggingface_hub.cli.deprecated_cliLegacy CLI (deprecated)
tiny-agentshuggingface_hub.inference._mcp.cliMCP CLI application
hf (fsspec)huggingface_hub.HfFileSystemFilesystem specification

Sources: setup.py:53-60

Troubleshooting

Common Issues

IssueSolution
ImportErrorEnsure Python >= 3.10
Authentication failedRun hf auth login
Download timeoutCheck network connection
Permission deniedUse virtual environment

Development Setup Issues

If installing in development mode:

pip uninstall huggingface_hub
pip install -e ".[dev]"

Sources: CONTRIBUTING.md:24

Package Metadata

PropertyValue
Namehuggingface_hub
LicenseApache-2.0
AuthorHugging Face, Inc.
Author Email[email protected]
URLhttps://github.com/huggingface/huggingface_hub

Sources: setup.py:18-22

Sources: [setup.py:52](https://github.com/huggingface/huggingface_hub/blob/main/setup.py)

Authentication System

Related topics: Installation and Setup, Repository Management API

Section Related Pages

Continue reading this section for the full explanation and source context.

Section Login Functionality

Continue reading this section for the full explanation and source context.

Section Token Storage and Management

Continue reading this section for the full explanation and source context.

Related topics: Installation and Setup, Repository Management API

Authentication System

Overview

The huggingface_hub library provides a comprehensive authentication system that enables secure access to Hugging Face Hub resources including models, datasets, and Spaces. The authentication system supports multiple authentication methods including token-based authentication and OAuth 2.0, with seamless integration into both CLI environments and Jupyter notebooks.

The authentication infrastructure consists of four primary modules that handle different aspects of the authentication lifecycle:

ModulePurpose
_login.pyUser login operations and token management
_oauth.pyOAuth 2.0 authentication flow
_auth.pyCore authentication utilities and token refresh
_git_credential.pyGit credential handling for repository operations

Architecture

graph TD
    A[User] --> B[Login Methods]
    B --> C[Token-based Auth]
    B --> D[OAuth 2.0 Auth]
    C --> E[hf_hub_download]
    C --> F[upload_file]
    D --> E
    D --> F
    E --> G[Token Cache]
    F --> G
    G --> H[Hugging Face Hub API]
    H --> I[Model/Dataset/Space]
    
    C --> J[CLI: hf auth login]
    C --> K[Python: login function]
    C --> L[Notebook: notebook_login]

Token-Based Authentication

Login Functionality

The library provides three primary interfaces for user authentication:

#### CLI Login

Users can authenticate via the command-line interface using the hf command:

hf auth login
# or with environment variable
hf auth login --token $HUGGINGFACE_TOKEN

#### Python API Login

The login() function provides programmatic authentication within Python scripts:

from huggingface_hub import login

# Direct token login
login(token="hf_xxxxx")

# Using environment variable
login()

#### Notebook Login Widget

For Jupyter notebook environments, notebook_login() displays an interactive widget for token entry:

from huggingface_hub import notebook_login

notebook_login()

The notebook login function accepts the following parameters:

ParameterTypeDefaultDescription
skip_if_logged_inboolTrueSkip prompt if user already logged in
# Force re-login even if already authenticated
notebook_login(skip_if_logged_in=False)

Token Storage and Management

Tokens are securely stored in the local configuration directory. The system automatically retrieves stored tokens when making API requests, eliminating the need for repeated authentication. Token validation occurs automatically before any authenticated operation, ensuring expired or invalid tokens are detected early.

OAuth 2.0 Authentication

The OAuth 2.0 authentication flow provides an alternative to token-based authentication, enabling more sophisticated authorization scenarios. This is particularly useful for applications that need to access resources on behalf of users with specific permission scopes.

OAuth tokens are automatically refreshed when they expire, maintaining continuous access without requiring user intervention. The system handles token revocation and supports scopes that limit access to specific resources or operations.

Git Credential Integration

The authentication system integrates with Git's credential infrastructure to provide seamless authentication for Git operations such as cloning and pushing to repositories. This integration ensures that Git operations respect the same authentication state as the Python API.

graph LR
    A[Git Operation] --> B[Git Credential Helper]
    B --> C{huggingface_hub _git_credential}
    C --> D{Cached Token?}
    D -->|Yes| E[Use Cached Token]
    D -->|No| F[Prompt for Token]
    E --> G[Execute Git Operation]
    F --> G

The Git credential helper manages:

  • Secure storage of credentials
  • Credential retrieval for specific hosts
  • Credential cleanup after operations

Authentication Workflow

sequenceDiagram
    participant User
    participant Application
    participant AuthSystem
    participant HubAPI
    participant TokenStore

    User->>Application: Initiate request
    Application->>AuthSystem: Authenticate
    AuthSystem->>TokenStore: Check stored token
    TokenStore-->>AuthSystem: Token found
    AuthSystem->>HubAPI: Authenticated request
    HubAPI-->>Application: Response
    Note over AuthSystem,TokenStore: Token expired or invalid
    AuthSystem->>AuthSystem: Refresh token
    AuthSystem->>TokenStore: Update token
    AuthSystem->>HubAPI: Retry with new token

Configuration

Authentication behavior can be configured through environment variables and configuration files:

VariableDescription
HUGGINGFACE_TOKENDefault authentication token
HF_HOMEConfiguration directory location
HF_TOKENAlternative token environment variable

Security Considerations

The authentication system implements several security best practices:

  1. Secure Token Storage: Tokens are stored with appropriate file permissions to prevent unauthorized access
  2. Token Validation: All tokens are validated before use in API requests
  3. Automatic Refresh: OAuth tokens are automatically refreshed to maintain session continuity
  4. Notebook Security Warning: The notebook_login widget displays a warning about token exposure in notebook files

The authentication system interacts with several other library components:

ComponentInteraction
InferenceClientUses authentication for inference API calls
HfFileSystemUses authentication for file system operations
snapshot_downloadUses authentication for repository downloads
upload_fileUses authentication for repository uploads

Quick Reference

# CLI
hf auth login --token hf_xxxxx

# Python script
from huggingface_hub import login
login(token="hf_xxxxx")

# Jupyter notebook
from huggingface_hub import notebook_login
notebook_login()

# Check if logged in
from huggingface_hub import whoami
user = whoami()

Source: https://github.com/huggingface/huggingface_hub / Human Manual

File Download Operations

Related topics: Cache Management System, Git LFS Large File Handling, Overview and Architecture

Section Related Pages

Continue reading this section for the full explanation and source context.

Section Component Overview

Continue reading this section for the full explanation and source context.

Section Module Structure

Continue reading this section for the full explanation and source context.

Section hfhubdownload

Continue reading this section for the full explanation and source context.

Related topics: Cache Management System, Git LFS Large File Handling, Overview and Architecture

File Download Operations

The huggingface_hub library provides a comprehensive file download system that enables clients to fetch models, datasets, and other artifacts from the Hugging Face Hub. This document covers the architecture, API, caching mechanisms, and usage patterns for download operations.

Overview

File download operations in huggingface_hub handle the retrieval of individual files or entire repository snapshots from Hugging Face's infrastructure. The system implements intelligent caching, supports offline mode, provides progress tracking, and handles authentication seamlessly.

Key responsibilities:

  • Download files with proper caching and deduplication
  • Support partial content retrieval for LFS (Large File Storage) files
  • Manage metadata for cache validation and freshness checks
  • Handle authentication tokens transparently
  • Support offline scenarios with local-only file access
  • Provide dry-run capabilities for previewing downloads

Sources: src/huggingface_hub/file_download.py:1-100

Architecture

Component Overview

graph TD
    A[Public API: hf_hub_download] --> B[Route Decision]
    B --> C{single file?}
    C -->|Yes| D[_hf_hub_download_to_cache_dir]
    C -->|No| E[snapshot_download]
    
    D --> F[Get Metadata / ETag]
    F --> G{Cached?}
    G -->|Yes, valid| H[Return cached path]
    G -->|No| I[Download from remote]
    I --> J[Write metadata]
    J --> H
    
    E --> K[Iterate files]
    K --> L[Download each file]
    L --> F
    
    H --> M[Local file path]
    I --> M

Module Structure

ModulePurpose
file_download.pyCore download functions (hf_hub_download, _hf_hub_download_to_cache_dir)
_local_folder.pyLocal cache and metadata management
_snapshot_download.pyFull repository snapshot downloads
cli/download.pyCommand-line interface for downloads
errors.pyException hierarchy for download failures

Sources: src/huggingface_hub/file_download.py:1-50

Core API Functions

hf_hub_download

The primary function for downloading a single file from the Hub.

from huggingface_hub import hf_hub_download

path = hf_hub_download(
    repo_id="bert-base-cased",
    filename="config.json",
    repo_type="model",
    revision="main",
    cache_dir="./hf_cache",
    token=True,
)

Parameters:

ParameterTypeDefaultDescription
repo_idstrRequiredRepository identifier (e.g., "bert-base-cased")
filenamestrRequiredPath to the file within the repository
repo_typestr"model"Type of repository: "model", "dataset", or "space"
revisionstr"main"Git revision (branch, tag, or commit hash)
cache_dir`str \Path`NoneCustom cache directory location
local_dir`str \Path`NoneDirectory to place the file without caching structure
force_downloadboolFalseForce re-download even if cached
local_files_onlyboolFalseOnly return local files, fail if not cached
token`str \bool`NoneAuthentication token (True reads from config)
etag_timeoutfloat10Timeout in seconds for ETag fetch
tqdm_classtypeNoneCustom tqdm class for progress bars

Sources: src/huggingface_hub/file_download.py:100-200

snapshot_download

Downloads an entire repository to a local cache.

from huggingface_hub import snapshot_download

local_path = snapshot_download(
    repo_id="stabilityai/stable-diffusion-2-1",
    repo_type="model",
    cache_dir="./models",
    ignore_patterns=["*.md", ".gitattributes"],
)

Parameters:

ParameterTypeDefaultDescription
repo_idstrRequiredRepository identifier
repo_typestr"model"Type of repository
revisionstrNoneGit revision to download
cache_dir`str \Path`NoneCache directory location
local_dir`str \Path`NoneMirror directory without cache structure
allow_patternslist[str]NoneGlob patterns to include
ignore_patternslist[str]NoneGlob patterns to exclude
force_downloadboolFalseForce re-download of all files
local_files_onlyboolFalseOnly use local cache
token`str \bool`NoneAuthentication token

Sources: src/huggingface_hub/_snapshot_download.py:1-150

Caching Mechanism

Cache Directory Structure

cache_dir/
├── .locks/                      # Lock files for concurrent access
│   └── {repo_id}/
│       └── {filename}.lock
└── {repo_type}s/
    └── {namespace}/
        └── {repo_name}/
            ├── .cache/          # Metadata
            │   └── huggingface/
            │       └── info/
            │           └── files/   # Download metadata
            ├── {revision}/
            │   └── {filename}       # Actual downloaded files
            └── refs/
                └── {branch}         # Git references

Download Metadata

Metadata is stored alongside cached files to track freshness:

# Stored in: {cache_dir}/.cache/huggingface/info/files/{filename}
{commit_hash}
{etag}
{timestamp}

The system validates cached files by:

  1. Comparing local ETag with remote ETag
  2. Checking commit hash consistency
  3. Verifying file modification timestamps

Sources: src/huggingface_hub/_local_folder.py:50-120

Lock File Management

The library uses WeakFileLock to handle concurrent downloads safely:

locks_dir = os.path.join(cache_dir, ".locks")
storage_folder = os.path.join(cache_dir, repo_folder_name(...))
paths = RepoFileDownloadPaths(...)
# Lock acquired before writing to cache
with WeakFileLock(paths.lock_path):
    # Critical section: write file or metadata

Sources: src/huggingface_hub/file_download.py:300-350

Download Workflow

Sequence Diagram

sequenceDiagram
    participant Client
    participant hf_hub_download
    participant Cache
    participant Server
    participant Metadata

    Client->>hf_hub_download: Call with repo_id, filename
    hf_hub_download->>Cache: Check cached file + metadata
    Cache-->>hf_hub_download: metadata (if exists)
    
    alt Cached file exists
        hf_hub_download->>Metadata: Validate ETag
        Metadata-->>hf_hub_download: is_valid
        alt Valid ETag
            hf_hub_download-->>Client: Return cached path
        else Invalid ETag
            hf_hub_download->>Server: HEAD request for ETag
        end
    else No cache
        hf_hub_download->>Server: HEAD request for ETag
    end
    
    Server-->>hf_hub_download: ETag, commit_hash, size
    hf_hub_download->>Cache: Check if file exists
    
    alt File not in cache
        hf_hub_download->>Server: GET request
        Server-->>hf_hub_download: File content
        hf_hub_download->>Cache: Write file + metadata
    end
    
    hf_hub_download-->>Client: Return file path

ETag Validation Process

The download system implements a three-tier validation strategy:

  1. ETag Match: Compare server ETag with local metadata
  2. SHA256 Hash: For LFS files, compute and compare SHA256
  3. Timestamp Check: Verify file hasn't been modified since metadata save
# ETag-based validation
if local_metadata is not None and local_metadata.etag == etag:
    write_download_metadata(...)
    return str(paths.file_path)

# SHA256-based validation (for LFS files)
if local_metadata is None and REGEX_SHA256.match(etag) is not None:
    with open(paths.file_path, "rb") as f:
        file_hash = sha_fileobj(f).hex()
    if file_hash == etag:
        write_download_metadata(...)
        return str(paths.file_path)

Sources: src/huggingface_hub/file_download.py:400-480

Error Handling

Exception Hierarchy

graph TD
    A[Exception]
    A --> B[HfHubHTTPError]
    B --> C[RevisionNotFoundError]
    B --> D[EntryNotFoundError]
    B --> E[LocalEntryNotFoundError]
    D --> F[RemoteEntryNotFoundError]
    
    A --> G[EntryNotFoundError]
    G --> H[LocalEntryNotFoundError]

Common Errors

ExceptionTrigger Condition
RevisionNotFoundErrorInvalid Git revision (branch, tag, commit)
RemoteEntryNotFoundErrorFile not found on remote server
LocalEntryNotFoundErrorFile not in cache with local_files_only=True
HfHubHTTPErrorGeneric HTTP errors (401, 403, 404, 500, etc.)
# Example: Handling download errors
try:
    path = hf_hub_download('bert-base-cased', 'config.json')
except RevisionNotFoundError as e:
    print(f"Revision not found: {e}")
except RemoteEntryNotFoundError as e:
    print(f"File not on server: {e}")
except LocalEntryNotFoundError as e:
    print("File not in cache. Set local_files_only=True and cache it first.")

Sources: src/huggingface_hub/errors.py:100-180

Command-Line Interface

CLI Download Command

The huggingface-cli tool provides download functionality:

# Download single file
huggingface-cli download bert-base-cased config.json

# Download entire repo
huggingface-cli download stabilityai/stable-diffusion-2-1

# With patterns
huggingface-cli download meta-llama/Llama-2-7b --include "*.safetensors"

# Dry run
huggingface-cli download bigscience/bloom-7b1 --dry-run

CLI Implementation

The CLI wraps the core download functions and adds:

  • Pretty-printed output formatting
  • Dry-run mode for previewing downloads
  • Pattern-based file selection
  • Progress indication
# From cli/download.py
def run(self):
    if len(regular_filenames) == 1:
        # Single file: use hf_hub_download
        return hf_hub_download(
            repo_id=repo_id,
            filename=regular_filenames[0],
            ...
        )
    else:
        # Multiple files or patterns: use snapshot_download
        return snapshot_download(
            repo_id=repo_id,
            allow_patterns=allow_patterns,
            ...
        )

Sources: src/huggingface_hub/cli/download.py:50-120

Advanced Usage

Dry Run Mode

Preview what would be downloaded without actually downloading:

from huggingface_hub import hf_hub_download, DryRunFileInfo

result = hf_hub_download(
    repo_id="bert-base-cased",
    filename="config.json",
    dry_run=True,
)

if isinstance(result, DryRunFileInfo):
    print(f"Will download: {result.filename}")
    print(f"Size: {result.file_size} bytes")
    print(f"Cached: {result.is_cached}")
    print(f"Commit: {result.commit_hash}")

Progress Tracking

Customize progress bar behavior:

from tqdm import tqdm
from huggingface_hub import hf_hub_download

class CustomProgress(tqdm):
    def set_postfix(self, **kwargs):
        self.set_postfix_str(f"ETA: {kwargs.get('eta', 'N/A')}")

hf_hub_download(
    repo_id="bigscience/bloom-7b1",
    filename="pytorch_model.bin",
    tqdm_class=CustomProgress,
)

Offline Mode

Work exclusively with cached files:

from huggingface_hub import hf_hub_download

# Will fail if file not cached
path = hf_hub_download(
    repo_id="bert-base-cased",
    filename="config.json",
    local_files_only=True,
)

Sources: src/huggingface_hub/file_download.py:450-500

Repository Types

The download system supports multiple repository types:

Repo TypeDescriptionTypical Contents
modelModel repositoriesPyTorch/TensorFlow models, configs, tokenizer files
datasetDataset repositoriesData files, dataset card, scripts
spaceGradio SpacesApplication code, models, requirements

Repository types affect URL construction:

# URL prefixes from constants
REPO_TYPES_URL_PREFIXES = {
    "model": "",
    "dataset": "datasets/",
    "space": "spaces/",
}

Sources: src/huggingface_hub/lfs.py:30-60

Best Practices

  1. Use caching: Files are cached automatically; reuse cached files for subsequent runs
  2. Specify revisions: Pin specific commits for reproducible downloads
  3. Handle authentication: Use token=True to auto-read from config, or pass explicit tokens
  4. Prefer single file downloads: Use hf_hub_download for specific files instead of full snapshots
  5. Use patterns wisely: Combine allow_patterns and ignore_patterns for selective downloads

Summary

The file download system in huggingface_hub provides a robust, cached, and authenticated mechanism for retrieving files from the Hugging Face Hub. Key functions include:

  • hf_hub_download: Single file downloads with full validation
  • snapshot_download: Complete repository downloads with pattern filtering
  • CLI integration via huggingface-cli download

The system handles caching, metadata validation, concurrent access, and error recovery transparently, making it suitable for production workloads requiring reliable artifact retrieval.

Sources: [src/huggingface_hub/file_download.py:1-100]()

File Upload Operations

Related topics: Git LFS Large File Handling, File Download Operations, Repository Management API

Section Related Pages

Continue reading this section for the full explanation and source context.

Section CommitOperation Classes

Continue reading this section for the full explanation and source context.

Section CommitOperationAdd Details

Continue reading this section for the full explanation and source context.

Section uploadfile

Continue reading this section for the full explanation and source context.

Related topics: Git LFS Large File Handling, File Download Operations, Repository Management API

File Upload Operations

Overview

File upload operations in huggingface_hub enable developers to publish and manage files on the Hugging Face Hub. The library provides a comprehensive set of tools for uploading individual files, entire folders, and handling large files through Git Large File Storage (LFS) integration.

The upload system is built on top of the Hub's git-based infrastructure, ensuring file versioning and integrity for all uploaded content. This architecture supports repositories of type model, dataset, and space. Sources: CLAUDE.md

Architecture Overview

graph TD
    A[User Code] --> B[upload_file / upload_folder]
    B --> C[CommitOperation Classes]
    C --> D{Hub API}
    D --> E[Regular Files<br/>Direct Upload]
    D --> F[Large Files<br/>LFS Required]
    F --> G[lfs.py<br/>Batch Operations]
    G --> H[LFS Server]
    E --> I[Regular Git Server]
    H --> I

Core Components

CommitOperation Classes

The foundation of all upload operations is built on three operation classes defined in _commit_api.py:

ClassPurposeKey Attributes
CommitOperationAddAdd a file to a commitpath_or_fileobj, path_in_repo, rethrow
CommitOperationDeleteDelete a file from a repositorypath_in_repo
CommitOperationCopyCopy a file within a repositorysrc_path_in_repo, path_in_repo

Sources: src/huggingface_hub/_commit_api.py:1-100

CommitOperationAdd Details

class CommitOperationAdd:
    def __init__(
        self,
        path_or_fileobj: Union[str, Path, bytes, BinaryIO],
        path_in_repo: str,
        *,
        rfilename: Optional[str] = None,
        rethrow: bool = True,
        upload_info: Optional["CommitOperationAdd.UploadInfo"] = None,
    ):

The CommitOperationAdd class supports multiple input types:

Input TypeBehavior
str / PathFile path - reads file content for upload
bytesRaw byte content
BinaryIOFile-like object with read() method

The class provides an as_file() method for iterating over file content with optional progress bar support:

def as_file(self, with_tqdm: bool = False) -> Iterator[BinaryIO]:
    if isinstance(self.path_or_fileobj, str) or isinstance(self.path_or_fileobj, Path):
        if with_tqdm:
            with tqdm_stream_file(self.path_or_fileobj) as file:
                yield file
        else:
            with open(self.path_or_fileobj, "rb") as file:
                yield file
    elif isinstance(self.path_or_fileobj, bytes):
        yield io.BytesIO(self.path_or_fileobj)
    elif isinstance(self.path_or_fileobj, io.BufferedIOBase):
        prev_pos = self.path_or_fileobj.tell()
        yield self.path_or_fileobj
        self.path_or_fileobj.seek(prev_pos, io.SEEK_SET)

Sources: src/huggingface_hub/_commit_api.py:200-280

Upload Functions

upload_file

Uploads a single file to a repository on the Hub.

from huggingface_hub import upload_file

upload_file(
    path_or_fileobj="/home/lysandre/dummy-test/README.md",
    path_in_repo="README.md",
    repo_id="lysandre/test-model",
)

Sources: README.md

upload_folder

Uploads an entire folder to a repository. Handles nested directory structures and file filtering.

from huggingface_hub import upload_folder

upload_folder(
    folder_path="/path/to/local/space",
    repo_id="username/my-cool-space",
    repo_type="space",
)

Sources: README.md

LFS Integration

Git LFS Overview

Large files (typically files larger than 10MB) are handled through Git LFS. The library provides batch upload utilities in lfs.py for efficient LFS operations.

sequenceDiagram
    participant Client
    participant Hub API
    participant LFS Server
    
    Client->>Hub API: POST /lfs/objects/batch
    Note over Hub API: Check file sizes
    Hub API->>Client: Upload instructions
    alt Large Files
        Client->>LFS Server: Upload LFS objects
        LFS Server-->>Client: Success
    end
    Client->>Hub API: Complete commit
    Hub API-->>Client: Commit SHA

LFS Batch Upload Process

The lfs.py module provides upload_files_lfs_instances() which handles the LFS batch protocol:

ParameterTypeDescription
commit_operationsList[CommitOperationAdd]Files to upload
repo_typestrRepository type: "model", "dataset", "space"
repo_idstrRepository identifier
revisionstrGit revision (default: "main")
endpointstrAPI endpoint URL
transfer_adaptersList[str]Transfer methods: "basic", "multipart", "xet"

Sources: src/huggingface_hub/lfs.py:50-150

LFS Batch Info Response

The LfsBatchInfo dataclass contains three elements:

@dataclass
class LfsBatchInfo:
    instructions: List["LfsUploadInfo"]
    errors: List["LfsError"]
    transfer_mode: "TransferMethod"

The upload process automatically determines which files require LFS handling based on file size thresholds configured by the Hub.

Large Folder Upload

For repositories with many files or very large folder structures, _upload_large_folder.py provides chunked upload capabilities:

# Internal chunked upload for large repositories
upload_folder(
    folder_path="/path/to/large/repo",
    repo_id="user/large-model",
    allow_patterns=["*.bin", "*.safetensors", "config.json"],
    ignore_patterns=["*.git*", "__pycache__/*"],
)

Sources: src/huggingface_hub/_upload_large_folder.py

Upload Workflow

graph TD
    A[Start Upload] --> B{File Size Check}
    B -->|Small File| C[Direct Git Upload]
    B -->|Large File| D[LFS Upload Required]
    C --> E[Create Commit]
    D --> F[Batch Request to LFS]
    F --> G[Get Upload Instructions]
    G --> H[Upload to LFS Server]
    H --> E
    E --> I[Commit to Hub]
    I --> J[Return Commit SHA]

Configuration Options

Repository Types

TypeDescriptionTypical Use
modelModel repositoriesTrained weights, configs
datasetDataset repositoriesData files, metadata
spaceSpace repositoriesDemo applications

Common Parameters

ParameterRequiredDefaultDescription
repo_idYes-Namespace/repo name
repo_typeNo"model"Type of repository
revisionNo"main"Git branch/tag
tokenNoNoneHF token for auth
create_prNoFalseCreate PR instead of commit

Error Handling

CommitOperationAdd Error Handling

The rethrow parameter controls error behavior:

# Default: raises exception on failure
operation = CommitOperationAdd(path_or_fileobj="file.bin", path_in_repo="model.bin")

# With error suppression
operation = CommitOperationAdd(path_or_fileobj="file.bin", path_in_repo="model.bin", rethrow=False)

Upload Errors

Error TypeCauseResolution
HfHubHTTPErrorServer rejectionCheck token permissions
ValueErrorInvalid parametersValidate repo_id, path_in_repo
LocalUploadNotImplementedErrorUnsupported local uploadUse file path instead

Best Practices

  1. Use upload_folder for multiple files to ensure atomic commits
  2. Token Authentication: Always authenticate before uploading private repositories
  3. File Filtering: Use allow_patterns and ignore_patterns for large repos
  4. Progress Tracking: Enable tqdm for long uploads
from huggingface_hub import HfApi

api = HfApi()
api.upload_folder(
    folder_path="./model",
    repo_id="username/my-model",
    repo_type="model",
    token=True,  # Prompt for token if needed
)

Module Structure Summary

FileResponsibility
_commit_api.pyCore commit operations and operation classes
_upload_large_folder.pyChunked folder uploads
lfs.pyGit LFS batch upload protocol implementation
_local_folder.pyLocal folder scanning and filtering
hf_api.pyHigh-level HfApi methods for upload

Sources: [src/huggingface_hub/_commit_api.py:1-100]()

Git LFS Large File Handling

Related topics: File Upload Operations, File Download Operations

Section Related Pages

Continue reading this section for the full explanation and source context.

Section Cache Directory Structure

Continue reading this section for the full explanation and source context.

Section LFS Upload Workflow

Continue reading this section for the full explanation and source context.

Section LFS Module (src/huggingfacehub/lfs.py)

Continue reading this section for the full explanation and source context.

Related topics: File Upload Operations, File Download Operations

Git LFS Large File Handling

Overview

Git LFS (Large File Storage) is a Git extension that handles large files by storing binary content outside the Git repository while maintaining a lightweight pointer file within it. The huggingface_hub library implements comprehensive LFS support to manage large model weights, datasets, and other binary assets on the Hugging Face Hub.

In the huggingface_hub ecosystem, LFS files are distinguished from regular Git-tracked files through their content addressing:

File TypeStorage MethodIdentifierLocation in Cache
Regular Git BlobGit commit SHA40-char hex stringblobs/
LFS FileSHA256 hash64-char hex stringblobs/

Sources: src/huggingface_hub/lfs.py:1-50

Architecture

Cache Directory Structure

When files are downloaded from the Hub, they are stored in a hierarchical cache structure:

graph TD
    A["Cache Root<br/>~/.cache/huggingface/hub/"] --> B["models--{repo_id}"]
    A --> C["datasets--{repo_id}"]
    A --> D["spaces--{repo_id}"]
    
    B --> E["blobs/"]
    B --> F["refs/"]
    B --> G["snapshots/"]
    
    E --> H["git-sha<br/>40-char"]
    E --> I["sha256<br/>64-char (LFS)"]
    
    G --> J["{commit_hash}/"]
    J --> K["filename → symlink → blob"]

Sources: src/huggingface_hub/file_download.py:1-30

LFS Upload Workflow

The upload process follows a batch-oriented approach using the LFS Batch API:

sequenceDiagram
    participant Client
    participant Hub as HF Hub API
    participant LFS as LFS Server
    
    Client->>Hub: POST /{repo_type}/{repo_id}.git/info/lfs/objects/batch
    Note over Hub,LFS: Batch request includes<br/>upload instructions request
    Hub->>LFS: Check upload eligibility
    LFS-->>Hub: Upload instructions (presigned URLs)
    Hub-->>Client: LfsBatchInfo with actions
    
    alt basic/multipart transfer
        Client->>LFS: PUT file content directly
        LFS-->>Client: 200 OK
    else xet transfer
        Client->>Hub: Use custom xet protocol
    end
    
    Client->>Hub: POST /{repo_type}/{repo_id}.git/info/lfs/objects/batch
    Note over Client: Confirm upload completion
    Hub->>LFS: Verify file content
    LFS-->>Hub: Verification result
    Hub-->>Client: Commit ready

Sources: src/huggingface_hub/lfs.py:60-120

Core Components

LFS Module (`src/huggingface_hub/lfs.py`)

The main LFS module provides type definitions and utilities for handling LFS operations.

#### Constants

ConstantValuePurpose
LFS_MULTIPART_UPLOAD_COMMAND"lfs-multipart-upload"Identifier for multipart upload operations
OID_REGEX^[0-9a-f]{40}$Pattern for validating Git object identifiers
LFS_HEADERSDictAccept and content type headers for LFS API

Sources: src/huggingface_hub/lfs.py:40-55

#### LFS Headers

LFS_HEADERS = {
    "Accept": "application/vnd.git-lfs+json",
    "Content-Type": "application/vnd.git-lfs+json",
}

These headers indicate that all LFS API communications use JSON with the vnd.git-lfs+json media type, following the LFS specification.

Sources: src/huggingface_hub/lfs.py:50-55

LFS Utilities (`src/huggingface_hub/utils/_lfs.py`)

Utility functions for LFS operations include:

FunctionPurpose
SliceFileObjContext manager for slicing file objects during multipart uploads
SHA utilitiesCalculate SHA256 for LFS file content verification
Content range handlingManage byte ranges for resumable uploads

Sources: src/huggingface_hub/utils/_lfs.py

API Reference

LfsBatchInfo

The LfsBatchInfo dataclass encapsulates the server response from the LFS Batch API:

@dataclass
class LfsBatchInfo:
    """Information returned by the LFS batch API."""
    
    actions: dict
    """Dictionary of available actions (upload, verify)."""
    
    objects: list[dict]
    """List of objects with their metadata."""
    
    transfers: list[str]
    """Supported transfer adapters (e.g., 'basic', 'multipart', 'xet')."""

Sources: src/huggingface_hub/lfs.py:55-80

Upload Information Classes

The library uses dataclasses to represent different types of upload information:

ClassInheritancePurpose
UploadInfoBaseAbstract base for all upload info types
LfsUploadFileInfoUploadInfoStandard LFS file upload with size and SHA256
LfsUploadTtHubInfoUploadInfoTtHub-specific upload info

Sources: src/huggingface_hub/_commit_api.py:1-100

Transfer Adapters

The Hugging Face Hub supports multiple LFS transfer methods, negotiated during the batch API handshake:

Supported Transfer Methods

Transfer MethodDescriptionUse Case
basicDirect HTTP PUT uploadSmall to medium files
multipartChunked upload for very large filesFiles > 100MB
xetCustom xet protocol for optimized transfersHigh-performance scenarios

Sources: src/huggingface_hub/lfs.py:60-100

Transfer Method Selection

The client sends supported transfer methods in the batch request:

payload: dict = {
    "operation": "upload",
    "transfers": transfers if transfers is not None else ["basic", "multipart"],
    ...
}

The server responds with the transfer adapter it will use, which the client then employs for the actual upload.

Sources: src/huggingface_hub/lfs.py:85-95

Large File Identification

Size Thresholds

Files are treated as LFS content when they exceed certain thresholds:

ThresholdAction
< 5MBStored as regular Git blob
>= 5MBRedirected to LFS storage

OID (Object Identifier) Format

LFS files are identified by their SHA256 hash, represented as a 64-character hexadecimal string:

Pattern: ^[0-9a-f]{64}$
Example: 403450e234d65943a7dcf7e05a771ce3c92faa84dd07db4ac20f592037a1e4bd

Regular Git blobs use 40-character SHA1 identifiers, while LFS files use 64-character SHA256 identifiers.

Sources: src/huggingface_hub/lfs.py:45

Multipart Upload for Large Files

Upload Process

graph LR
    A[File] --> B{Split into chunks}
    B --> C[Chunk 1]
    B --> D[Chunk 2]
    B --> E[Chunk N]
    
    C --> F[Upload Part 1]
    D --> G[Upload Part 2]
    E --> H[Upload Part N]
    
    F --> I{All parts<br/>complete?}
    G --> I
    H --> I
    
    I -->|Yes| J[Complete multipart<br/>upload]

Chunk Size Calculation

The library calculates optimal chunk sizes based on file size:

from math import ceil

chunk_size = ceil(file_size / total_parts)

This ensures even distribution of work across all chunks.

Sources: src/huggingface_hub/lfs.py:30-35

Integration with Commit API

CommitOperationAdd with LFS

The CommitOperationAdd class handles both regular and LFS file uploads:

class CommitOperationAdd(TypedDict):
    path_in_repo: str
    id: str  # OID (git-sha or sha256 for LFS)
    size: int
    filepath: str
    upload_info: UploadInfo

The upload_info attribute contains the LFS-specific upload metadata, which determines whether the file goes through LFS or regular Git upload.

Sources: src/huggingface_hub/_commit_api.py:100-150

Upload Flow

flowchart TD
    A[Create CommitOperationAdd] --> B{File size<br/>> threshold?}
    
    B -->|Yes| C[Create LfsUploadFileInfo]
    B -->|No| D[Create UploadInfo for Git]
    
    C --> E[Upload via LFS Batch API]
    D --> F[Upload via Git HTTP API]
    
    E --> G{LFS transfer<br/>method}
    G -->|basic| H[Single PUT request]
    G -->|multipart| I[Chunked upload]
    G -->|xet| J[Custom xet protocol]
    
    H --> K[Verify upload]
    I --> K
    J --> K
    
    K --> L[Commit confirmation]
    F --> L

Error Handling

Validation Errors

ErrorConditionHandling
Invalid OIDNot matching OID_REGEXRaise ValueError
Missing upload infoupload_info not setRaise ValueError
Malformed batch responseMissing required fieldsRaise HfHubHTTPError

Network Errors

The library implements automatic retry with exponential backoff for failed LFS operations:

from huggingface_hub.utils import http_backoff

# Wrapped in http_backoff for resilience
hf_raise_for_status(response)

Sources: src/huggingface_hub/lfs.py:50-80

Configuration

Environment Variables

VariableEffect
HF_ENDPOINTOverride default https://huggingface.co
HF_TOKENAuthentication token for private repos

Upload Options

When uploading files, the following options control LFS behavior:

ParameterTypeDefaultDescription
transferslist[str]["basic", "multipart"]Allowed transfer methods
endpointstrHub endpointLFS server endpoint
repo_typestr"model"Repository type
repo_idstrRequiredRepository identifier

Sources: src/huggingface_hub/lfs.py:60-110

Best Practices

File Organization

  1. Group large binary files - Store model weights and dataset files separately from code
  2. Use consistent file sizes - Avoid extremely small LFS files (< 5MB overhead)
  3. Leverage symlinks - The snapshot directory uses symlinks to avoid duplication

Upload Optimization

  1. Prefer multipart for files > 100MB - Enables parallel uploads and resumability
  2. Enable xet for frequent transfers - Custom protocol reduces bandwidth
  3. Batch operations - Use CommitOperationAdd batching to minimize round trips

Caching Strategy

~/.cache/huggingface/hub/
└── models--{repo_id}/
    ├── blobs/          # Physical file storage (SHA256 for LFS)
    ├── refs/           # Revision pointers
    └── snapshots/      # Virtual files pointing to blobs

The snapshot layer provides deduplication - the same blob referenced by multiple commits is stored only once.

Sources: src/huggingface_hub/file_download.py:10-30

See Also

Sources: [src/huggingface_hub/lfs.py:1-50]()

Repository Management API

Related topics: File Upload Operations, Authentication System, Cache Management System

Section Related Pages

Continue reading this section for the full explanation and source context.

Section Core Components

Continue reading this section for the full explanation and source context.

Section Repository Lifecycle

Continue reading this section for the full explanation and source context.

Section Repository Settings

Continue reading this section for the full explanation and source context.

Related topics: File Upload Operations, Authentication System, Cache Management System

Repository Management API

The Repository Management API in huggingface_hub provides a comprehensive interface for creating, configuring, and managing Hugging Face repositories (models, datasets, and Spaces) directly from Python code or via the command-line interface.

Overview

The Repository Management API serves as the core layer for all Hub repository operations, enabling developers to programmatically:

  • Create and delete repositories
  • Configure repository settings
  • Upload and download files
  • Manage repository metadata
  • Handle commit operations with Git LFS support

Sources: src/huggingface_hub/README.md

Architecture

The repository management functionality is distributed across multiple modules:

graph TD
    A[Repository Management API] --> B[hf_api.py]
    A --> C[_commit_api.py]
    A --> D[_buckets.py]
    B --> E[REST API Client]
    C --> E
    B --> F[CLI Interface]
    F --> G[hf CLI Command]

Core Components

ModulePurpose
hf_api.pyMain HfApi class with all CRUD operations
_commit_api.pyLow-level commit operations, LFS handling
_buckets.pyBucket/S3-compatible storage management
cli/Command-line interface implementation

Sources: CLAUDE.md

Repository Types

The API supports three primary repository types:

TypeDescriptionUse Case
modelModel repositoriesStoring and sharing ML model weights
datasetDataset repositoriesHosting and versioning datasets
spaceSpace repositoriesHosting Gradio/Streamlit demos

Core Operations

Repository Lifecycle

graph LR
    A[create_repo] --> B[Update Settings]
    B --> C[Upload Files]
    C --> D[Manage Commits]
    D --> E[delete_repo]

#### Creating a Repository

from huggingface_hub import create_repo, HfApi

# Using HfApi class
api = HfApi()
api.create_repo(
    repo_id="username/my-model",
    repo_type="model",
    exist_ok=False
)

# Using convenience function
create_repo(
    repo_id="super-cool-model",
    token="hf_xxxxx"
)

#### Deleting a Repository

api.delete_repo(repo_id="username/my-model", repo_type="model")

Sources: src/huggingface_hub/hf_api.py

Repository Settings

Update repository configuration after creation:

api.update_repo_settings(
    repo_id="username/my-model",
    private=True,
    repo_type="model",
    gated=True  # Enable gated access
)

Listing Repository Contents

# List all files in a repository
files = api.list_repo_files(repo_id="tiiuae/falcon-7b-instruct")

# List repo objects with pagination
objects = api.list_repo_objects(
    repo_id="my-org/my-dataset",
    repo_type="dataset"
)

Commit Operations

Commit Operation Classes

The _commit_api.py module provides low-level commit primitives:

ClassPurpose
CommitOperationAddAdd a file to the repository
CommitOperationDeleteRemove a file from the repository
CommitOperationCopyCopy a file within the repository
from huggingface_hub import CommitOperationAdd, HfApi

operations = [
    CommitOperationAdd(
        path_in_repo="config.json",
        path_or_fileobj="/local/path/config.json"
    ),
]

api.create_commit(
    repo_id="username/my-model",
    operations=operations,
    commit_message="Add config file"
)

Sources: src/huggingface_hub/_commit_api.py

Large File Upload (Git LFS)

Large files are automatically handled via Git LFS:

graph TD
    A[File > 10MB] --> B{LFS Required?}
    B -->|Yes| C[Upload to LFS Storage]
    B -->|No| D[Upload as Regular File]
    C --> E[Create LFS Pointer]
    E --> F[Commit Pointer to Repo]
    D --> F

File Upload Operations

Single File Upload

from huggingface_hub import upload_file

upload_file(
    path_or_fileobj="/home/user/model.bin",
    path_in_repo="pytorch_model.bin",
    repo_id="username/my-model",
)

Folder Upload

from huggingface_hub import upload_folder

upload_folder(
    folder_path="/path/to/local/space",
    repo_id="username/my-cool-space",
    repo_type="space",
    commit_message="Update space files"
)

For very large folders, the library provides chunked upload:

from huggingface_hub import _upload_large_folder

_upload_large_folder(
    repo_id="username/large-dataset",
    folder_path="/data/large-folder",
    repo_type="dataset"
)

Sources: src/huggingface_hub/README.md

File Deletion Operations

# Delete a single file
api.delete_file(
    path_in_repo="old-model.bin",
    repo_id="username/my-model",
    commit_message="Remove deprecated file"
)

# Delete a folder
api.delete_folder(
    path_in_repo="old-folder/",
    repo_id="username/my-model",
    commit_message="Clean up old directory"
)

CLI Interface

The repository management features are exposed through the hf CLI:

# Authentication
hf auth login
hf auth logout
hf auth whoami

# Repository operations
hf repos create username/my-model --type model
hf repos create username/my-dataset --type dataset

Sources: setup.py

Configuration Parameters

Repository Creation Parameters

ParameterTypeDefaultDescription
repo_idstrRequiredRepository identifier (user/name or org/name)
repo_typestr"model"Type: "model", "dataset", or "space"
exist_okboolFalseAllow overwriting existing repo
privateboolFalseMake repository private
tokenstrNoneHugging Face authentication token
space_sdkstrNoneSpace SDK: "gradio", "streamlit", "docker", "docker_leaf", "static", "nextjs"
space_hardwarestrNoneSpace hardware tier

Commit Operation Parameters

ParameterTypeDefaultDescription
operationslist[CommitOperation]RequiredList of file operations
commit_messagestrRequiredDescription of changes
commit_descriptionstrNoneExtended commit description
parent_commitstrNoneParent commit SHA for incremental updates
create_prboolFalseCreate a Pull Request instead of committing to main

Error Handling

The repository management API raises specific exception types:

ExceptionCause
RepositoryNotFoundErrorRepository does not exist or user lacks access
RevisionNotFoundErrorSpecified git revision not found
EntryNotFoundErrorFile or folder not found in repository
HfHubHTTPErrorHTTP error from the Hub API
from huggingface_hub import hf_hub_download
from huggingface_hub.errors import RevisionNotFoundError

try:
    hf_hub_download(
        repo_id="bert-base-cased",
        filename="config.json",
        revision="<non-existent-revision>"
    )
except RevisionNotFoundError as e:
    print(f"Revision not found: {e}")

Sources: src/huggingface_hub/errors.py

Common Usage Patterns

Model Publishing Workflow

graph TD
    A[Create Repo] --> B[Upload Model Files]
    B --> C[Create Model Card]
    C --> D[Set Metadata/Tags]
    D --> E[Publish to Hub]
from huggingface_hub import HfApi, upload_file, RepoCard, ModelCardData

api = HfApi()

# 1. Create repository
api.create_repo(repo_id="my-org/my-model", exist_ok=True)

# 2. Upload model files
upload_file(
    path_or_fileobj="./model.bin",
    path_in_repo="pytorch_model.bin",
    repo_id="my-org/my-model"
)

# 3. Create and upload model card
card = RepoCard.from_template(
    ModelCardData(
        language="en",
        license="apache-2.0",
        model_name="My Custom Model",
        tags=["pytorch", "image-classification"]
    ),
    text="This is a custom model trained on..."
)
card.save(".gitattributes")

Dataset Versioning Workflow

from huggingface_hub import create_commit, CommitOperationAdd

operations = [
    CommitOperationAdd(
        path_in_repo="data/train.parquet",
        path_or_fileobj="./train.parquet"
    ),
    CommitOperationAdd(
        path_in_repo="data/validation.parquet", 
        path_or_fileobj="./validation.parquet"
    ),
]

api.create_commit(
    repo_id="username/my-dataset",
    operations=operations,
    commit_message="Add training and validation splits",
    commit_description="Initial dataset release with train/validation split"
)

Best Practices

  1. Use exist_ok=True when creating repositories in automated pipelines
  2. Include commit messages for better version control history
  3. Use parent_commit parameter when making sequential updates to prevent race conditions
  4. Enable LFS automatically for files larger than 10MB
  5. Use create_pr=True for reviewing changes before merging to main branch

See Also

Sources: [src/huggingface_hub/README.md](https://github.com/huggingface/huggingface_hub/blob/main/src/huggingface_hub/README.md)

Cache Management System

Related topics: File Download Operations, Repository Management API

Section Related Pages

Continue reading this section for the full explanation and source context.

Section HFCacheInfo

Continue reading this section for the full explanation and source context.

Section CachedRepoInfo

Continue reading this section for the full explanation and source context.

Section CachedRevisionInfo

Continue reading this section for the full explanation and source context.

Related topics: File Download Operations, Repository Management API

Cache Management System

Overview

The Cache Management System in huggingface_hub provides comprehensive utilities for managing locally cached models, datasets, and Spaces downloaded from the Hugging Face Hub. The system handles automatic caching of downloaded content, tracks cache metadata, and offers programmatic and CLI interfaces for inspecting and managing cached resources.

The cache system is designed to:

  • Store downloaded files efficiently with deduplication via blob storage
  • Track repository metadata including revisions, commit hashes, and file information
  • Provide safe deletion strategies that don't corrupt cache state
  • Handle corrupted cache entries gracefully with warnings

Cache Directory Structure

The Hugging Face cache follows a specific directory structure to organize cached content:

HF_HUB_CACHE/
├── .locks/                    # Lock files for concurrent access
│   └── ...
├── CACHEDIR.TAG               # OS-native cache directory marker
├── models--owner/
│   └── repo_name/
│       ├── .cache/           # Metadata and tracking
│       ├── blobs/            # Actual file content (deduplicated)
│       ├── refs/             # Branch and tag references
│       ├── snapshots/        # Symlinks to blobs
│       └── ...
├── datasets--org/
│   └── dataset_name/
│       └── ...
└── spaces--user/
    └── space_name/
        └── ...

Cache directories follow the naming convention type--repo_id where:

  • type is singular (e.g., model, dataset, space)
  • repo_id slashes are converted to hyphens (e.g., google/fleurs becomes google--fleurs)

Sources: src/huggingface_hub/utils/_cache_manager.py:1-100

Core Data Models

HFCacheInfo

The main container class for cache information returned by scan operations:

AttributeTypeDescription
reposfrozenset[CachedRepoInfo]All cached repositories
size_on_diskintTotal size of all cached content in bytes
warningslist[CorruptedCacheException]Issues encountered during scanning

CachedRepoInfo

Represents a single cached repository:

AttributeTypeDescription
repo_idstrRepository identifier (e.g., google/gemma-3-4b-it)
repo_typestrType: model, dataset, or space
size_on_diskintTotal size of cached revisions
revisionsfrozenset[CachedRevisionInfo]All cached revisions
snapshot_pathPathPath to the snapshot directory

CachedRevisionInfo

Represents a specific revision within a cached repository:

AttributeTypeDescription
commit_hashstrGit commit hash (40-character hex string)
size_on_diskintSize of this specific revision
filesfrozenset[CachedFileInfo]Files in this revision
last_modifieddatetimeLast modification timestamp

CachedFileInfo

Represents an individual cached file:

AttributeTypeDescription
file_namestrName of the file
size_on_diskintSize of the file in bytes
file_pathPathPath to the symlinked file in snapshots
blob_pathPathPath to the actual blob storage

Sources: src/huggingface_hub/utils/_cache_manager.py:1-100

Key API Functions

scan_cache_dir()

Scans the cache directory and returns information about all cached repositories.

from huggingface_hub import scan_cache_dir

cache_info = scan_cache_dir()
print(f"Total size: {cache_info.size_on_disk / 1024 / 1024:.2f} MB")
for repo in cache_info.repos:
    print(f"{repo.repo_type}/{repo.repo_id}")

Parameters:

ParameterTypeDefaultDescription
cache_dirstr or PathHF_HUB_CACHE env varCache directory to scan

Returns: HFCacheInfo object containing repository information

Raises:

  • CacheNotFound if the cache directory doesn't exist
  • ValueError if cache_dir is a file instead of a directory

try_to_load_from_cache()

Checks if a file exists in the local cache without downloading.

from huggingface_hub import try_to_load_from_cache, _CACHED_NO_EXIST

filepath = try_to_load_from_cache(
    repo_id="tiiuae/falcon-7b-instruct",
    filename="config.json",
    revision="main",
    repo_type="model"
)

if isinstance(filepath, str):
    print(f"File cached at: {filepath}")
elif filepath is _CACHED_NO_EXIST:
    print("File confirmed to not exist at this revision")
else:
    print("File not in cache")

Parameters:

ParameterTypeDefaultDescription
cache_dirstr or PathNoneCache directory path
repo_idstrRequiredRepository identifier
filenamestrRequiredFilename to look for
revisionstr"main"Specific revision to check
repo_typestr"model"Type of repository

Returns: str (file path), _CACHED_NO_EXIST, or None

Sources: src/huggingface_hub/file_download.py:1-100 Sources: src/huggingface_hub/utils/_cache_manager.py:1-100

DeleteCacheStrategy

The deletion system uses a two-phase approach: create a strategy, then execute it. This prevents accidental data loss and allows for dry-run validation.

from huggingface_hub import scan_cache_dir

cache_info = scan_cache_dir()

# Create deletion strategy (doesn't delete yet)
delete_strategy = cache_info.delete_revisions(
    "81fd1d6e7847c99f5862c9fb81387956d99ec7aa",
    "e2983b237dccf3ab4937c97fa717319a9ca1a96d",
)

# Preview what will be deleted
print(f"Will free: {delete_strategy.expected_free_space / 1024 / 1024:.2f} MB")

# Execute the deletion
delete_strategy.execute()

Deletion Workflow

graph TD
    A[scan_cache_dir] --> B[Get HFCacheInfo]
    B --> C[Call delete_revisions with commit hashes]
    C --> D[Create DeleteCacheStrategy]
    D --> E{Preview/Dry Run}
    E -->|Inspect| F[Review expected_free_space]
    E -->|Confirm| G[execute]
    G --> H[Delete blobs and refs]
    H --> I[Cache deletion done]
    F --> G

DeleteCacheStrategy Properties

PropertyTypeDescription
repos_to_deletedict[CachedRepoInfo, set[CachedRevisionInfo]]Repos and revisions marked for deletion
blobs_to_deleteset[Path]Blob file paths to remove
expected_free_spaceintEstimated bytes to be freed

Sources: src/huggingface_hub/utils/_cache_manager.py:1-100

CLI Interface

The hf command provides cache management through the cache subcommand.

List Cached Repositories

hf cache ls

Output format:

ID                          SIZE     LAST_ACCESSED LAST_MODIFIED REFS
--------------------------- -------- ------------- ------------- -----------
dataset/nyu-mll/glue          157.4M 2 days ago    2 days ago    main script
model/LiquidAI/LFM2-VL-1.6B     3.2G 4 days ago    4 days ago    main
model/microsoft/UserLM-8b      32.1G 4 days ago    4 days ago    main

Done in 0.0s. Scanned 6 repo(s) for a total of 3.4G.

Filtering Options

Filter KeyOperatorsExample
type==, !=--filter type==model
size>, <, >=, <=, =--filter size>=1G
accessed>, <, >=, <=--filter accessed<7d
modified>, <, >=, <=--filter modified>30d
refs==, !=--filter refs==main

Examples:

# Filter large models
hf cache ls --filter type==model --filter size>=5G

# Find recently accessed datasets
hf cache ls --filter type==dataset --filter accessed<7d

# Filter by modification time
hf cache ls --filter modified>30d

Sorting Options

Sort KeyDefault OrderAscending Option
nameascname:asc
sizedescsize:asc
accesseddescaccessed:asc
modifieddescmodified:asc

Examples:

# Sort by size descending (largest first)
hf cache ls --sort size

# Sort by name ascending, then size descending
hf cache ls --sort name:asc --sort size:desc

Delete Specific Revisions

hf cache delete <revision_hash> [<revision_hash>...]

The CLI will prompt for confirmation before deletion.

Sources: src/huggingface_hub/cli/cache.py:1-100

Cache Scanning Process

The cache scanning process validates the cache directory structure and handles corrupted entries gracefully.

graph TD
    A[Start scan_cache_dir] --> B{Is cache_dir set?}
    B -->|No| C[Use HF_HUB_CACHE env var]
    B -->|Yes| D[Use provided path]
    C --> E{Does directory exist?}
    D --> E
    E -->|No| F[Raise CacheNotFound]
    E -->|Yes| G{Is it a file?}
    G -->|Yes| H[Raise ValueError]
    G -->|No| I[Iterate subdirectories]
    I --> J{Skip .locks and CACHEDIR.TAG?}
    J -->|Yes| K[Next directory]
    J -->|No| L[_scan_cached_repo]
    L --> M{Valid format?}
    M -->|No| N[Log CorruptedCacheException]
    M -->|Yes| O[Create CachedRepoInfo]
    N --> P[Add to warnings list]
    O --> Q[Add to repos set]
    P --> K
    Q --> K
    K --> R{More directories?}
    R -->|Yes| I
    R -->|No| S[Return HFCacheInfo]

Validation Rules

  1. Each subdirectory must follow the type--repo_id naming convention
  2. The type must be one of: model, dataset, space
  3. Directories must contain expected subdirectories (snapshots, blobs, refs)

Sources: src/huggingface_hub/utils/_cache_manager.py:100-200

HubMixin Integration

The HubMixin class integrates with the cache system for model loading:

from huggingface_hub import HubMixin

class MyModel(HubMixin, torch.nn.Module):
    pass

# Load model - uses cache automatically
model = MyModel.from_pretrained("bert-base-uncased")

# Cache behavior:
# 1. Check if model exists locally
# 2. If local_files_only=True, use cached version or raise error
# 3. Otherwise, download and cache from Hub
# 4. Store in cache_dir or default HF_HUB_CACHE location

HubMixin Parameters Related to Caching:

ParameterTypeDefaultDescription
cache_dirstr or PathNoneCustom cache location
force_downloadboolFalseForce re-download
local_files_onlyboolFalseOnly use cached files
tokenstr or boolNoneHuggingFace token

Sources: src/huggingface_hub/hub_mixin.py:1-100

Exception Handling

CorruptedCacheException

Raised when cache directory structure is invalid or expected files are missing.

class CorruptedCacheException(Exception):
    """Exception raised when a cache entry is corrupted."""
    def __init__(self, message: str):
        self.message = message
        super().__init__(self.message)

Common corruption scenarios:

  • Snapshots directory doesn't exist
  • Invalid repository directory naming
  • Missing expected cache metadata

CacheNotFound

Raised when the cache directory cannot be located.

raise CacheNotFound(
    f"Cache directory not found: {cache_dir}. "
    "Please use `cache_dir` argument or set `HF_HUB_CACHE` environment variable.",
    cache_dir=cache_dir,
)

Environment Variables

VariableDescriptionDefault
HF_HUB_CACHEPrimary cache directory~/.cache/huggingface/hub
HF_HUB_DOWNLOAD_TIMEOUTDownload timeout in seconds10

Sources: src/huggingface_hub/constants.py

Best Practices

Efficient Cache Usage

  1. Reuse cached content: Multiple models sharing the same base weights will reference the same blobs
  2. Use revision pinning: Specify exact commit hashes for reproducible builds
  3. Monitor cache size: Regularly run hf cache ls to identify large repositories

Safe Deletion

  1. Always use scan_cache_dir() to inspect before deletion
  2. Check warnings in HFCacheInfo for corrupted entries
  3. Use expected_free_space property to estimate space recovery
  4. Execute deletion only after confirming the strategy

Troubleshooting

IssueSolution
CacheNotFound errorSet HF_HUB_CACHE or use cache_dir parameter
CorruptedCacheExceptionManually delete the corrupted cache entry
Large cache sizeUse delete_revisions() to remove old/unused revisions
Permission deniedCheck file permissions on cache directory

Complete Usage Example

from huggingface_hub import scan_cache_dir

# Scan cache and get overview
cache_info = scan_cache_dir()

print(f"Total cached repos: {len(cache_info.repos)}")
print(f"Total size: {cache_info.size_on_disk / 1024**3:.2f} GB")

# Find specific repo
target_repo = "stabilityai/stable-diffusion-2-1"
for repo in cache_info.repos:
    if repo.repo_id == target_repo:
        print(f"\nFound {target_repo}:")
        print(f"  Type: {repo.repo_type}")
        print(f"  Revisions: {len(repo.revisions)}")
        for revision in repo.revisions:
            print(f"    - {revision.commit_hash[:8]}")
            print(f"      Size: {revision.size_on_disk / 1024**2:.2f} MB")
            print(f"      Files: {len(revision.files)}")

# Clean up old revisions
if cache_info.repos:
    first_repo = next(iter(cache_info.repos))
    if len(first_repo.revisions) > 1:
        # Keep only the latest revision
        revisions_to_delete = [
            rev.commit_hash 
            for rev in list(first_repo.revisions)[1:]
        ]
        strategy = cache_info.delete_revisions(*revisions_to_delete)
        print(f"\nWould free: {strategy.expected_free_space / 1024**2:.2f} MB")
        # strategy.execute()  # Uncomment to actually delete

Sources: [src/huggingface_hub/utils/_cache_manager.py:1-100]()

Inference Client and Providers

Related topics: HuggingFace File System (HfFileSystem), Overview and Architecture

Section Related Pages

Continue reading this section for the full explanation and source context.

Section Core Components

Continue reading this section for the full explanation and source context.

Section Task Categories and Methods

Continue reading this section for the full explanation and source context.

Section Initialization Parameters

Continue reading this section for the full explanation and source context.

Related topics: HuggingFace File System (HfFileSystem), Overview and Architecture

Inference Client and Providers

Overview

The Inference Client and Providers system provides a unified interface for performing inference with machine learning models hosted on Hugging Face or third-party inference providers. This system abstracts the complexity of interacting with various inference backends, allowing developers to make inference calls through a consistent Python API.

The InferenceClient class serves as the primary entry point for synchronous inference operations, while AsyncInferenceClient provides asynchronous alternatives for non-blocking workflows. Both clients leverage a provider system that normalizes API differences between various inference services like Replicate, Together AI, Fal.ai, and Sambanova. Sources: src/huggingface_hub/inference/_client.py:1-100

Architecture

The inference system follows a layered architecture where the client exposes a high-level API while delegating provider-specific details to helper classes.

graph TD
    User[User Code] --> Client[InferenceClient]
    Client --> ProviderHelper[Provider Helper]
    ProviderHelper --> ProviderAPI[Third-party Provider API]
    ProviderHelper --> HFRouting[Hugging Face Routing]
    
    subgraph "InferenceClient"
        Methods[text_generation, chat_completion,<br/>text_to_image, etc.]
    end
    
    subgraph "Provider Layer"
        get_provider_helper[get_provider_helper]
        prepare_request[prepare_request]
        get_response[get_response]
    end

Core Components

ComponentFile LocationPurpose
InferenceClientinference/_client.pySynchronous inference operations
AsyncInferenceClientinference/_generated/_async_client.pyAsynchronous inference operations
Provider Helpersinference/_providers/*.pyProvider-specific request/response handling
Provider Registryinference/_providers/__init__.pyProvider discovery and initialization

Sources: src/huggingface_hub/inference/_client.py:1-50

Supported Inference Tasks

The InferenceClient supports a comprehensive set of inference tasks through method-based API calls.

Task Categories and Methods

CategoryMethodDescription
Text Generationtext_generation()Generate text from prompts with streaming support
Chatchat_completion()Multi-turn conversation with message history
Image Generationtext_to_image()Generate images from text prompts
Video Generationtext_to_video()Generate videos from text descriptions
Text Analysissummarization()Summarize long text documents
Text Analysisfill_mask()Fill masked tokens in text
Text Analysiszero_shot_classification()Classify text with arbitrary labels
Table Operationstable_question_answering()Answer questions from tabular data
Table Operationstabular_classification()Classify tabular data rows
Embeddingssentence_similarity()Compute semantic similarity between sentences
Visionimage_classification()Classify images into categories

Sources: src/huggingface_hub/inference/_client.py:200-500

Client Configuration

Initialization Parameters

ParameterTypeDefaultDescription
model`str \None`NoneDefault model identifier for all requests
provider`str \None`NoneInference provider to use (replicate, together, fal-ai, etc.)
api_key`str \None`NoneAPI key for authentication
token`str \bool \None`TrueHugging Face token for authentication
timeout`float \None`NoneRequest timeout in seconds
headersdict[str, str]{}Additional HTTP headers
from huggingface_hub import InferenceClient

# Basic usage with default provider
client = InferenceClient()

# Using a specific provider
client = InferenceClient(
    provider="replicate",
    api_key="hf_...",
    model="meta-llama/Meta-Llama-3-8B-Instruct"
)

Sources: src/huggingface_hub/inference/_client.py:50-150

Provider System

Provider Architecture

The provider system normalizes differences between inference services by abstracting request preparation and response parsing.

graph LR
    A[InferenceClient] -->|task + model| B[get_provider_helper]
    B --> C{Provider Type}
    C -->|Built-in| D[Internal Provider Helper]
    C -->|Third-party| E[Provider API Helper]
    
    D --> F[Provider.prepare_request]
    E --> G[External API Call]
    
    F --> H[Normalized Response]
    G --> H

Supported Providers

ProviderDescriptionAuthentication
replicateReplicate hosted modelsAPI key
togetherTogether AI inferenceAPI key
fal-aiFal.ai generation servicesAPI key
sambanovaSambaNova CloudAPI key
defaultHugging Face inference APIHF token

Sources: src/huggingface_hub/inference/_providers/__init__.py

Provider Helper Functions

Each provider helper implements two key methods:

  • prepare_request(): Transforms inputs and parameters into provider-specific API format
  • get_response(): Parses provider response into normalized output format
provider_helper = get_provider_helper(
    provider="replicate",
    task="text-generation",
    model="meta-llama/Meta-Llama-3-8B-Instruct"
)

request_parameters = provider_helper.prepare_request(
    inputs=prompt,
    parameters={"max_new_tokens": 100},
    headers=client.headers,
    model=model_id,
    api_key=client.token,
)

Sources: src/huggingface_hub/inference/_client.py:150-200

Usage Examples

Text Generation

from huggingface_hub import InferenceClient

client = InferenceClient()

# Basic text generation
output = client.text_generation(
    prompt="The capital of France is",
    model="gpt2"
)

Sources: src/huggingface_hub/inference/_client.py:300-400

Chat Completion

from huggingface_hub import InferenceClient

client = InferenceClient(
    provider="sambanova",
    api_key="hf_..."
)

output = client.chat_completion(
    model="meta-llama/Meta-Llama-3-8B-Instruct",
    messages=[
        {"role": "user", "content": "What is the capital of France?"}
    ]
)

Sources: src/huggingface_hub/inference/_client.py:400-500

Image Generation

from huggingface_hub import InferenceClient

client = InferenceClient(
    provider="replicate",
    api_key="hf_..."
)

image = client.text_to_image(
    "An astronaut riding a horse on the moon.",
    model="black-forest-labs/FLUX.1-schnell",
    extra_body={"output_quality": 100}
)
image.save("astronaut.png")

Sources: src/huggingface_hub/inference/_client.py:500-600

Text-to-Video

from huggingface_hub import InferenceClient

client = InferenceClient()

video = client.text_to_video(
    prompt="A cat playing piano",
    num_inference_steps=50,
    guidance_scale=7.5
)

Sources: src/huggingface_hub/inference/_client.py:600-700

Sentence Similarity

from huggingface_hub import InferenceClient

client = InferenceClient()

similarities = client.sentence_similarity(
    "Machine learning is so easy.",
    other_sentences=[
        "Deep learning is so straightforward.",
        "This is so difficult, like rocket science.",
    ]
)
# Output: [0.7785726189613342, 0.45876261591911316]

Sources: src/huggingface_hub/inference/_client.py:700-800

Zero-Shot Classification

from huggingface_hub import InferenceClient

client = InferenceClient()

text = "A new model offers an explanation for how the Galilean satellites formed."
labels = ["space & cosmos", "scientific discovery", "microbiology", "robots"]

result = client.zero_shot_classification(text, labels)

Sources: src/huggingface_hub/inference/_client.py:350-450

AsyncInferenceClient

For asynchronous workflows, the AsyncInferenceClient provides non-blocking equivalents of all synchronous methods.

from huggingface_hub import AsyncInferenceClient

async def main():
    client = AsyncInferenceClient()
    
    # Async chat completion
    output = await client.chat_completion(
        model="meta-llama/Meta-Llama-3-70B-Instruct",
        messages=[
            {"role": "user", "content": "Hello!"}
        ]
    )
    
    # Async image generation
    image = await client.text_to_image(
        prompt="A beautiful sunset over mountains",
        model="black-forest-labs/FLUX.1-schnell"
    )

Sources: src/huggingface_hub/inference/_generated/_async_client.py:1-200

Error Handling

The inference system defines specific exception types for common error conditions:

ExceptionDescription
InferenceTimeoutErrorRequest exceeded timeout threshold
HfHubHTTPErrorHTTP error from the inference provider
from huggingface_hub import InferenceClient, InferenceTimeoutError

client = InferenceClient(timeout=30)

try:
    result = client.text_generation("Hello world")
except InferenceTimeoutError:
    print("Request timed out")
except HfHubHTTPError as e:
    print(f"HTTP error: {e}")

Sources: src/huggingface_hub/inference/_client.py:250-300

Request Flow

sequenceDiagram
    participant User
    participant Client
    participant ProviderHelper
    participant API
    
    User->>Client: text_generation(prompt, model)
    Client->>ProviderHelper: get_provider_helper(provider, task, model)
    Client->>ProviderHelper: prepare_request(inputs, parameters)
    ProviderHelper-->>Client: request_parameters
    Client->>Client: _inner_post(request_parameters)
    Client->>API: HTTP POST
    API-->>Client: response
    Client->>ProviderHelper: get_response(response)
    ProviderHelper-->>Client: normalized_output
    Client-->>User: InferenceOutput

Output Models

The inference client returns typed output objects for each task:

TaskOutput Type
Text GenerationTextGenerationOutput or TextGenerationStreamOutput
Chat CompletionChatCompletionOutput
Image GenerationPIL.Image.Image
Video Generationbytes
SummarizationSummarizationOutput
Fill Masklist[FillMaskOutputElement]
Zero-Shot Classificationlist[ZeroShotClassificationOutputElement]
Table Question AnsweringTableQuestionAnsweringOutputElement
Tabular Classificationlist[str]
Sentence Similaritylist[float]
Image Classificationlist[ImageClassificationOutputElement]

Sources: src/huggingface_hub/inference/_client.py:200-600

CLI Integration

The CLI provides command-line access to inference functionality:

# Install inference dependencies
pip install huggingface_hub[inference]

# Run inference via CLI
hf inference --model gpt2 --text "The capital of France is"

Sources: setup.py:1-30

Advanced Configuration

Extra Body Parameters

Many inference methods accept extra_body for provider-specific parameters:

client = InferenceClient(provider="replicate", api_key="hf_...")

image = client.text_to_image(
    "A majestic lion",
    model="black-forest-labs/FLUX.1-dev",
    extra_body={
        "output_quality": 100,
        "guidance_scale": 3.5
    }
)

Generate Parameters

The generate_parameters argument allows fine-tuning of generation behavior:

client.text_generation(
    prompt="Write a story",
    model="meta-llama/Meta-Llama-3-8B-Instruct",
    generate_parameters={
        "temperature": 0.7,
        "top_p": 0.9,
        "repetition_penalty": 1.2
    }
)

Summary

The Inference Client and Providers system provides:

  1. Unified API: Consistent interface across all inference tasks
  2. Multi-Provider Support: Seamless integration with Replicate, Together AI, Fal.ai, and Sambanova
  3. Type-Safe Outputs: Well-defined output models for each task
  4. Async Support: Full async/await compatibility via AsyncInferenceClient
  5. Error Handling: Specific exceptions for timeout and HTTP errors
  6. Extensible Design: Provider helper system for adding new inference backends

This architecture enables developers to switch between providers and models without modifying application code, providing flexibility in deployment while maintaining a clean, Pythonic API.

Sources: [src/huggingface_hub/inference/_client.py:1-50]()

HuggingFace File System (HfFileSystem)

Related topics: File Download Operations, File Upload Operations

Section Related Pages

Continue reading this section for the full explanation and source context.

Related topics: File Download Operations, File Upload Operations

HuggingFace File System (HfFileSystem)

Overview

The HuggingFace File System (HfFileSystem) is an fsspec-based POSIX-like filesystem implementation that provides seamless access to Hugging Face Hub repositories. It enables developers to interact with models, datasets, and Spaces using familiar filesystem operations, abstracting away the complexity of HTTP API calls and caching mechanisms.

Key Characteristics:

PropertyValue
Base Classfsspec.spec.AbstractFileSystem
Protocolhf://
Python Version>= 3.10.0
Entry Pointhf=huggingface_hub.HfFileSystem

Sources: setup.py:48

Sources: [setup.py:48](https://github.com/huggingface/huggingface_hub/blob/main/setup.py#L48)

Doramagic Pitfall Log

Source-linked risks stay visible on the manual page so the preview does not read like a recommendation.

high How to stop hf models ls from truncating the results in the table?

The project may affect permissions, credentials, data exposure, or host boundaries.

medium [v1.13.0] new CLI commands and formatting, and HF URI parsing

First-time setup may fail or require extra isolation and rollback planning.

medium [v1.15.0] Region-aware buckets & repos, `hf skills list`, polished CLI help and more

First-time setup may fail or require extra isolation and rollback planning.

medium README/documentation is current enough for a first validation pass.

The project should not be treated as fully validated until this signal is reviewed.

Doramagic Pitfall Log

Doramagic extracted 13 source-linked risk signals. Review them before installing or handing real data to the project.

1. Security or permission risk: How to stop hf models ls from truncating the results in the table?

  • Severity: high
  • Finding: Security or permission risk is backed by a source signal: How to stop hf models ls from truncating the results in the table?. Treat it as a review item until the current version is checked.
  • User impact: The project may affect permissions, credentials, data exposure, or host boundaries.
  • Recommended check: Open the linked source, confirm whether it still applies to the current version, and keep the first run isolated.
  • Evidence: Source-linked evidence: https://github.com/huggingface/huggingface_hub/issues/4207

2. Installation risk: [v1.13.0] new CLI commands and formatting, and HF URI parsing

  • Severity: medium
  • Finding: Installation risk is backed by a source signal: [v1.13.0] new CLI commands and formatting, and HF URI parsing. Treat it as a review item until the current version is checked.
  • User impact: First-time setup may fail or require extra isolation and rollback planning.
  • Recommended check: Open the linked source, confirm whether it still applies to the current version, and keep the first run isolated.
  • Evidence: Source-linked evidence: https://github.com/huggingface/huggingface_hub/releases/tag/v1.13.0

3. Installation risk: [v1.15.0] Region-aware buckets & repos, `hf skills list`, polished CLI help and more

  • Severity: medium
  • Finding: Installation risk is backed by a source signal: [v1.15.0] Region-aware buckets & repos, hf skills list, polished CLI help and more. Treat it as a review item until the current version is checked.
  • User impact: First-time setup may fail or require extra isolation and rollback planning.
  • Recommended check: Open the linked source, confirm whether it still applies to the current version, and keep the first run isolated.
  • Evidence: Source-linked evidence: https://github.com/huggingface/huggingface_hub/releases/tag/v1.15.0

4. Capability assumption: README/documentation is current enough for a first validation pass.

  • Severity: medium
  • Finding: README/documentation is current enough for a first validation pass.
  • User impact: The project should not be treated as fully validated until this signal is reviewed.
  • Recommended check: Open the linked source, confirm whether it still applies to the current version, and keep the first run isolated.
  • Evidence: capability.assumptions | github_repo:323591830 | https://github.com/huggingface/huggingface_hub | README/documentation is current enough for a first validation pass.

5. Maintenance risk: Maintainer activity is unknown

  • Severity: medium
  • Finding: Maintenance risk is backed by a source signal: Maintainer activity is unknown. Treat it as a review item until the current version is checked.
  • User impact: Users cannot judge support quality until recent activity, releases, and issue response are checked.
  • Recommended check: Open the linked source, confirm whether it still applies to the current version, and keep the first run isolated.
  • Evidence: evidence.maintainer_signals | github_repo:323591830 | https://github.com/huggingface/huggingface_hub | last_activity_observed missing

6. Security or permission risk: no_demo

  • Severity: medium
  • Finding: no_demo
  • User impact: The project may affect permissions, credentials, data exposure, or host boundaries.
  • Recommended check: Open the linked source, confirm whether it still applies to the current version, and keep the first run isolated.
  • Evidence: downstream_validation.risk_items | github_repo:323591830 | https://github.com/huggingface/huggingface_hub | no_demo; severity=medium

7. Security or permission risk: no_demo

  • Severity: medium
  • Finding: no_demo
  • User impact: The project may affect permissions, credentials, data exposure, or host boundaries.
  • Recommended check: Open the linked source, confirm whether it still applies to the current version, and keep the first run isolated.
  • Evidence: risks.scoring_risks | github_repo:323591830 | https://github.com/huggingface/huggingface_hub | no_demo; severity=medium

8. Security or permission risk: [v1.10.0] Instant file copy and new Kernel repo type

  • Severity: medium
  • Finding: Security or permission risk is backed by a source signal: [v1.10.0] Instant file copy and new Kernel repo type. Treat it as a review item until the current version is checked.
  • User impact: The project may affect permissions, credentials, data exposure, or host boundaries.
  • Recommended check: Open the linked source, confirm whether it still applies to the current version, and keep the first run isolated.
  • Evidence: Source-linked evidence: https://github.com/huggingface/huggingface_hub/releases/tag/v1.10.0

9. Security or permission risk: [v1.11.0] Semantic Spaces search, Space logs, and more

  • Severity: medium
  • Finding: Security or permission risk is backed by a source signal: [v1.11.0] Semantic Spaces search, Space logs, and more. Treat it as a review item until the current version is checked.
  • User impact: The project may affect permissions, credentials, data exposure, or host boundaries.
  • Recommended check: Open the linked source, confirm whether it still applies to the current version, and keep the first run isolated.
  • Evidence: Source-linked evidence: https://github.com/huggingface/huggingface_hub/releases/tag/v1.11.0

10. Security or permission risk: [v1.12.0] Unified CLI output, bucket search, and more

  • Severity: medium
  • Finding: Security or permission risk is backed by a source signal: [v1.12.0] Unified CLI output, bucket search, and more. Treat it as a review item until the current version is checked.
  • User impact: The project may affect permissions, credentials, data exposure, or host boundaries.
  • Recommended check: Open the linked source, confirm whether it still applies to the current version, and keep the first run isolated.
  • Evidence: Source-linked evidence: https://github.com/huggingface/huggingface_hub/releases/tag/v1.12.0

11. Security or permission risk: [v1.14.0] Handle Spaces secrets & variables from CLI and other improvements

  • Severity: medium
  • Finding: Security or permission risk is backed by a source signal: [v1.14.0] Handle Spaces secrets & variables from CLI and other improvements. Treat it as a review item until the current version is checked.
  • User impact: The project may affect permissions, credentials, data exposure, or host boundaries.
  • Recommended check: Open the linked source, confirm whether it still applies to the current version, and keep the first run isolated.
  • Evidence: Source-linked evidence: https://github.com/huggingface/huggingface_hub/releases/tag/v1.14.0

12. Maintenance risk: issue_or_pr_quality=unknown

  • Severity: low
  • Finding: issue_or_pr_quality=unknown。
  • User impact: Users cannot judge support quality until recent activity, releases, and issue response are checked.
  • Recommended check: Open the linked source, confirm whether it still applies to the current version, and keep the first run isolated.
  • Evidence: evidence.maintainer_signals | github_repo:323591830 | https://github.com/huggingface/huggingface_hub | issue_or_pr_quality=unknown

Source: Doramagic discovery, validation, and Project Pack records

Community Discussion Evidence

These external discussion links are review inputs, not standalone proof that the project is production-ready.

Sources 11

Count of project-level external discussion links exposed on this manual page.

Use Review before install

Open the linked issues or discussions before treating the pack as ready for your environment.

Community Discussion Evidence

Doramagic exposes project-level community discussion separately from official documentation. Review these links before using huggingface_hub with real data or production workflows.

  • How to stop hf models ls from truncating the results in the table? - github / github_issue
  • [[v1.15.0] Region-aware buckets & repos, hf skills list, polished CLI h](https://github.com/huggingface/huggingface_hub/releases/tag/v1.15.0) - github / github_release
  • [[v1.14.0] Handle Spaces secrets & variables from CLI and other improveme](https://github.com/huggingface/huggingface_hub/releases/tag/v1.14.0) - github / github_release
  • [[v1.13.0] new CLI commands and formatting, and HF URI parsing](https://github.com/huggingface/huggingface_hub/releases/tag/v1.13.0) - github / github_release
  • [[v1.12.0] Unified CLI output, bucket search, and more](https://github.com/huggingface/huggingface_hub/releases/tag/v1.12.0) - github / github_release
  • [[v1.11.0] Semantic Spaces search, Space logs, and more](https://github.com/huggingface/huggingface_hub/releases/tag/v1.11.0) - github / github_release
  • [[v1.10.2] Fix reference cycle in hf_raise_for_status](https://github.com/huggingface/huggingface_hub/releases/tag/v1.10.2) - github / github_release
  • [[v1.10.1] Fix copy file to folder](https://github.com/huggingface/huggingface_hub/releases/tag/v1.10.1) - github / github_release
  • [[v1.10.0] Instant file copy and new Kernel repo type](https://github.com/huggingface/huggingface_hub/releases/tag/v1.10.0) - github / github_release
  • [[v1.9.2] Fix set_space_volume / delete_space_volume return types](https://github.com/huggingface/huggingface_hub/releases/tag/v1.9.2) - github / github_release
  • README/documentation is current enough for a first validation pass. - GitHub / issue

Source: Project Pack community evidence and pitfall evidence