Doramagic Project Pack · Human Manual
huggingface_hub
Related topics: Installation and Setup, File Download Operations, File Upload Operations
Overview and Architecture
Related topics: Installation and Setup, File Download Operations, File Upload Operations
Continue reading this section for the full explanation and source context.
Related Pages
Related topics: Installation and Setup, File Download Operations, File Upload Operations
Overview and Architecture
Introduction
The huggingface_hub is a Python client library developed by Hugging Face to interact with the Hugging Face Hub, enabling developers to download, upload, and manage machine learning models, datasets, and other repositories programmatically. The library provides a unified interface for interacting with Hugging Face's model hosting, version control, and collaboration infrastructure.
Primary Purpose:
- Download models, datasets, and Spaces from the Hub
- Upload files and folders to the Hub
- Manage repository metadata and model cards
- Execute inference on deployed models
- Handle authentication and access control
Sources: README.md
Source: https://github.com/huggingface/huggingface_hub / Human Manual
Installation and Setup
Related topics: Overview and Architecture, Authentication System
Continue reading this section for the full explanation and source context.
Continue reading this section for the full explanation and source context.
Continue reading this section for the full explanation and source context.
Continue reading this section for the full explanation and source context.
Related Pages
Related topics: Overview and Architecture, Authentication System
Installation and Setup
Overview
The huggingface_hub package is a Python client library that enables interaction with the Hugging Face Hub, providing functionality to download and publish models, datasets, and other repositories. This page covers all aspects of installing and setting up the library across different environments and use cases.
System Requirements
Python Version
| Requirement | Version |
|---|---|
| Minimum Python | 3.10.0 |
| Package Manager | pip, conda |
Sources: setup.py:52
Supported Platforms
The library supports installation on all major operating systems including Linux, macOS, and Windows.
Installation Methods
Standard Installation (pip)
The primary installation method uses pip:
pip install huggingface_hub
Sources: README.md:30
Installation with Optional Dependencies
The library provides extras that install optional dependencies for specific use cases:
| Extra | Description | Command |
|---|---|---|
inference | Inference-related functionality | pip install huggingface_hub[inference] |
mcp | MCP (Model Context Protocol) module | pip install huggingface_hub[mcp] |
Sources: README.md:36-42
Development Installation
For contributing to the project or testing the latest features:
pip install -e ".[dev]"
This installs the package in editable mode with all development dependencies.
Sources: CONTRIBUTING.md:24-26
Conda Installation
For conda environments:
conda install -c conda-forge huggingface_hub
Sources: README.md:22-24
Dependency Architecture
graph TD
A[huggingface_hub] --> B[Core Dependencies]
A --> C[Optional: inference]
A --> D[Optional: mcp]
A --> E[Dev Dependencies]
B --> B1[requests]
B --> B2[fsspec]
B --> B3[httpx]
B --> B4[tqdm]
B --> B5[packaging]
B --> B6[filelock]
B --> B7[pyyaml]
C --> C1[inference-client]
C --> C2[pillow]
D --> D1[mcp]
E --> E1[pytest]
E --> E2[pytest-asyncio]
E --> E3[pytest-cov]
E --> E4[ruff]
E --> E5[mypy]
E --> E6[ty]Core Dependencies
The following table lists the required dependencies installed by default:
| Package | Purpose |
|---|---|
requests | HTTP client for API calls |
fsspec | Filesystem specification |
httpx | Async HTTP client |
tqdm | Progress bars |
packaging | Package version handling |
filelock | File locking mechanism |
pyyaml | YAML parsing |
typing-extensions | Type hint support |
Sources: setup.py:1-16
Optional Dependency Groups
Testing Dependencies
extras["testing"] = [
"pytest",
"pytest-asyncio",
"pytest-cov",
"pytest-xdist",
"DianaEye",
"aiohttp",
"asynctest",
"Paramiko",
]
Quality Assurance Dependencies
extras["quality"] = [
"ruff",
"踩",
]
Type Checking Dependencies
extras["typing"] = [
"mypy==1.15.0",
"libcst>=1.4.0",
"ty",
]
All-Inclusive Meta-Group
extras["all"] = extras["testing"] + extras["quality"] + extras["typing"]
extras["dev"] = extras["all"]
Sources: setup.py:36-51
Installation Workflow
graph TD
A[Start Installation] --> B{Installation Method}
B -->|pip| C[Basic Install]
B -->|conda| D[Conda Forge Install]
B -->|editable| E[Development Install]
C --> F{Use Case}
F -->|Minimal| G[Core Only]
F -->|Inference| H[Add inference extra]
F -->|MCP| I[Add mcp extra]
G --> J[Installation Complete]
H --> J
I --> J
D --> J
E --> J
J --> K[Verify Installation]
K --> L[Import huggingface_hub]Verification
After installation, verify the package is correctly installed:
from huggingface_hub import hf_hub_download
# Test basic functionality
hf_hub_download(repo_id="tiiuae/falcon-7b-instruct", filename="config.json")
Sources: README.md:48-52
Post-Installation Configuration
Authentication Setup
To authenticate with the Hugging Face Hub:
# Interactive login
hf auth login
# Non-interactive with token
hf auth login --token $HUGGINGFACE_TOKEN
Sources: README.md:61-65
Cache Configuration
Files are downloaded to a local cache folder. See the cache management guide for configuration options.
Entry Points
The installation registers the following console scripts:
| Command | Module | Purpose |
|---|---|---|
hf | huggingface_hub.cli.hf | Main CLI interface |
huggingface-cli | huggingface_hub.cli.deprecated_cli | Legacy CLI (deprecated) |
tiny-agents | huggingface_hub.inference._mcp.cli | MCP CLI application |
hf (fsspec) | huggingface_hub.HfFileSystem | Filesystem specification |
Sources: setup.py:53-60
Troubleshooting
Common Issues
| Issue | Solution |
|---|---|
| ImportError | Ensure Python >= 3.10 |
| Authentication failed | Run hf auth login |
| Download timeout | Check network connection |
| Permission denied | Use virtual environment |
Development Setup Issues
If installing in development mode:
pip uninstall huggingface_hub
pip install -e ".[dev]"
Sources: CONTRIBUTING.md:24
Package Metadata
| Property | Value |
|---|---|
| Name | huggingface_hub |
| License | Apache-2.0 |
| Author | Hugging Face, Inc. |
| Author Email | [email protected] |
| URL | https://github.com/huggingface/huggingface_hub |
Sources: setup.py:18-22
Sources: [setup.py:52](https://github.com/huggingface/huggingface_hub/blob/main/setup.py)
Authentication System
Related topics: Installation and Setup, Repository Management API
Continue reading this section for the full explanation and source context.
Continue reading this section for the full explanation and source context.
Continue reading this section for the full explanation and source context.
Related Pages
Related topics: Installation and Setup, Repository Management API
Authentication System
Overview
The huggingface_hub library provides a comprehensive authentication system that enables secure access to Hugging Face Hub resources including models, datasets, and Spaces. The authentication system supports multiple authentication methods including token-based authentication and OAuth 2.0, with seamless integration into both CLI environments and Jupyter notebooks.
The authentication infrastructure consists of four primary modules that handle different aspects of the authentication lifecycle:
| Module | Purpose |
|---|---|
_login.py | User login operations and token management |
_oauth.py | OAuth 2.0 authentication flow |
_auth.py | Core authentication utilities and token refresh |
_git_credential.py | Git credential handling for repository operations |
Architecture
graph TD
A[User] --> B[Login Methods]
B --> C[Token-based Auth]
B --> D[OAuth 2.0 Auth]
C --> E[hf_hub_download]
C --> F[upload_file]
D --> E
D --> F
E --> G[Token Cache]
F --> G
G --> H[Hugging Face Hub API]
H --> I[Model/Dataset/Space]
C --> J[CLI: hf auth login]
C --> K[Python: login function]
C --> L[Notebook: notebook_login]Token-Based Authentication
Login Functionality
The library provides three primary interfaces for user authentication:
#### CLI Login
Users can authenticate via the command-line interface using the hf command:
hf auth login
# or with environment variable
hf auth login --token $HUGGINGFACE_TOKEN
#### Python API Login
The login() function provides programmatic authentication within Python scripts:
from huggingface_hub import login
# Direct token login
login(token="hf_xxxxx")
# Using environment variable
login()
#### Notebook Login Widget
For Jupyter notebook environments, notebook_login() displays an interactive widget for token entry:
from huggingface_hub import notebook_login
notebook_login()
The notebook login function accepts the following parameters:
| Parameter | Type | Default | Description |
|---|---|---|---|
skip_if_logged_in | bool | True | Skip prompt if user already logged in |
# Force re-login even if already authenticated
notebook_login(skip_if_logged_in=False)
Token Storage and Management
Tokens are securely stored in the local configuration directory. The system automatically retrieves stored tokens when making API requests, eliminating the need for repeated authentication. Token validation occurs automatically before any authenticated operation, ensuring expired or invalid tokens are detected early.
OAuth 2.0 Authentication
The OAuth 2.0 authentication flow provides an alternative to token-based authentication, enabling more sophisticated authorization scenarios. This is particularly useful for applications that need to access resources on behalf of users with specific permission scopes.
OAuth tokens are automatically refreshed when they expire, maintaining continuous access without requiring user intervention. The system handles token revocation and supports scopes that limit access to specific resources or operations.
Git Credential Integration
The authentication system integrates with Git's credential infrastructure to provide seamless authentication for Git operations such as cloning and pushing to repositories. This integration ensures that Git operations respect the same authentication state as the Python API.
graph LR
A[Git Operation] --> B[Git Credential Helper]
B --> C{huggingface_hub _git_credential}
C --> D{Cached Token?}
D -->|Yes| E[Use Cached Token]
D -->|No| F[Prompt for Token]
E --> G[Execute Git Operation]
F --> GThe Git credential helper manages:
- Secure storage of credentials
- Credential retrieval for specific hosts
- Credential cleanup after operations
Authentication Workflow
sequenceDiagram
participant User
participant Application
participant AuthSystem
participant HubAPI
participant TokenStore
User->>Application: Initiate request
Application->>AuthSystem: Authenticate
AuthSystem->>TokenStore: Check stored token
TokenStore-->>AuthSystem: Token found
AuthSystem->>HubAPI: Authenticated request
HubAPI-->>Application: Response
Note over AuthSystem,TokenStore: Token expired or invalid
AuthSystem->>AuthSystem: Refresh token
AuthSystem->>TokenStore: Update token
AuthSystem->>HubAPI: Retry with new tokenConfiguration
Authentication behavior can be configured through environment variables and configuration files:
| Variable | Description |
|---|---|
HUGGINGFACE_TOKEN | Default authentication token |
HF_HOME | Configuration directory location |
HF_TOKEN | Alternative token environment variable |
Security Considerations
The authentication system implements several security best practices:
- Secure Token Storage: Tokens are stored with appropriate file permissions to prevent unauthorized access
- Token Validation: All tokens are validated before use in API requests
- Automatic Refresh: OAuth tokens are automatically refreshed to maintain session continuity
- Notebook Security Warning: The
notebook_loginwidget displays a warning about token exposure in notebook files
Related Components
The authentication system interacts with several other library components:
| Component | Interaction |
|---|---|
InferenceClient | Uses authentication for inference API calls |
HfFileSystem | Uses authentication for file system operations |
snapshot_download | Uses authentication for repository downloads |
upload_file | Uses authentication for repository uploads |
Quick Reference
# CLI
hf auth login --token hf_xxxxx
# Python script
from huggingface_hub import login
login(token="hf_xxxxx")
# Jupyter notebook
from huggingface_hub import notebook_login
notebook_login()
# Check if logged in
from huggingface_hub import whoami
user = whoami()Source: https://github.com/huggingface/huggingface_hub / Human Manual
File Download Operations
Related topics: Cache Management System, Git LFS Large File Handling, Overview and Architecture
Continue reading this section for the full explanation and source context.
Continue reading this section for the full explanation and source context.
Continue reading this section for the full explanation and source context.
Continue reading this section for the full explanation and source context.
Related Pages
Related topics: Cache Management System, Git LFS Large File Handling, Overview and Architecture
File Download Operations
The huggingface_hub library provides a comprehensive file download system that enables clients to fetch models, datasets, and other artifacts from the Hugging Face Hub. This document covers the architecture, API, caching mechanisms, and usage patterns for download operations.
Overview
File download operations in huggingface_hub handle the retrieval of individual files or entire repository snapshots from Hugging Face's infrastructure. The system implements intelligent caching, supports offline mode, provides progress tracking, and handles authentication seamlessly.
Key responsibilities:
- Download files with proper caching and deduplication
- Support partial content retrieval for LFS (Large File Storage) files
- Manage metadata for cache validation and freshness checks
- Handle authentication tokens transparently
- Support offline scenarios with local-only file access
- Provide dry-run capabilities for previewing downloads
Sources: src/huggingface_hub/file_download.py:1-100
Architecture
Component Overview
graph TD
A[Public API: hf_hub_download] --> B[Route Decision]
B --> C{single file?}
C -->|Yes| D[_hf_hub_download_to_cache_dir]
C -->|No| E[snapshot_download]
D --> F[Get Metadata / ETag]
F --> G{Cached?}
G -->|Yes, valid| H[Return cached path]
G -->|No| I[Download from remote]
I --> J[Write metadata]
J --> H
E --> K[Iterate files]
K --> L[Download each file]
L --> F
H --> M[Local file path]
I --> MModule Structure
| Module | Purpose |
|---|---|
file_download.py | Core download functions (hf_hub_download, _hf_hub_download_to_cache_dir) |
_local_folder.py | Local cache and metadata management |
_snapshot_download.py | Full repository snapshot downloads |
cli/download.py | Command-line interface for downloads |
errors.py | Exception hierarchy for download failures |
Sources: src/huggingface_hub/file_download.py:1-50
Core API Functions
hf_hub_download
The primary function for downloading a single file from the Hub.
from huggingface_hub import hf_hub_download
path = hf_hub_download(
repo_id="bert-base-cased",
filename="config.json",
repo_type="model",
revision="main",
cache_dir="./hf_cache",
token=True,
)
Parameters:
| Parameter | Type | Default | Description | |
|---|---|---|---|---|
repo_id | str | Required | Repository identifier (e.g., "bert-base-cased") | |
filename | str | Required | Path to the file within the repository | |
repo_type | str | "model" | Type of repository: "model", "dataset", or "space" | |
revision | str | "main" | Git revision (branch, tag, or commit hash) | |
cache_dir | `str \ | Path` | None | Custom cache directory location |
local_dir | `str \ | Path` | None | Directory to place the file without caching structure |
force_download | bool | False | Force re-download even if cached | |
local_files_only | bool | False | Only return local files, fail if not cached | |
token | `str \ | bool` | None | Authentication token (True reads from config) |
etag_timeout | float | 10 | Timeout in seconds for ETag fetch | |
tqdm_class | type | None | Custom tqdm class for progress bars |
Sources: src/huggingface_hub/file_download.py:100-200
snapshot_download
Downloads an entire repository to a local cache.
from huggingface_hub import snapshot_download
local_path = snapshot_download(
repo_id="stabilityai/stable-diffusion-2-1",
repo_type="model",
cache_dir="./models",
ignore_patterns=["*.md", ".gitattributes"],
)
Parameters:
| Parameter | Type | Default | Description | |
|---|---|---|---|---|
repo_id | str | Required | Repository identifier | |
repo_type | str | "model" | Type of repository | |
revision | str | None | Git revision to download | |
cache_dir | `str \ | Path` | None | Cache directory location |
local_dir | `str \ | Path` | None | Mirror directory without cache structure |
allow_patterns | list[str] | None | Glob patterns to include | |
ignore_patterns | list[str] | None | Glob patterns to exclude | |
force_download | bool | False | Force re-download of all files | |
local_files_only | bool | False | Only use local cache | |
token | `str \ | bool` | None | Authentication token |
Sources: src/huggingface_hub/_snapshot_download.py:1-150
Caching Mechanism
Cache Directory Structure
cache_dir/
├── .locks/ # Lock files for concurrent access
│ └── {repo_id}/
│ └── {filename}.lock
└── {repo_type}s/
└── {namespace}/
└── {repo_name}/
├── .cache/ # Metadata
│ └── huggingface/
│ └── info/
│ └── files/ # Download metadata
├── {revision}/
│ └── {filename} # Actual downloaded files
└── refs/
└── {branch} # Git references
Download Metadata
Metadata is stored alongside cached files to track freshness:
# Stored in: {cache_dir}/.cache/huggingface/info/files/{filename}
{commit_hash}
{etag}
{timestamp}
The system validates cached files by:
- Comparing local ETag with remote ETag
- Checking commit hash consistency
- Verifying file modification timestamps
Sources: src/huggingface_hub/_local_folder.py:50-120
Lock File Management
The library uses WeakFileLock to handle concurrent downloads safely:
locks_dir = os.path.join(cache_dir, ".locks")
storage_folder = os.path.join(cache_dir, repo_folder_name(...))
paths = RepoFileDownloadPaths(...)
# Lock acquired before writing to cache
with WeakFileLock(paths.lock_path):
# Critical section: write file or metadata
Sources: src/huggingface_hub/file_download.py:300-350
Download Workflow
Sequence Diagram
sequenceDiagram
participant Client
participant hf_hub_download
participant Cache
participant Server
participant Metadata
Client->>hf_hub_download: Call with repo_id, filename
hf_hub_download->>Cache: Check cached file + metadata
Cache-->>hf_hub_download: metadata (if exists)
alt Cached file exists
hf_hub_download->>Metadata: Validate ETag
Metadata-->>hf_hub_download: is_valid
alt Valid ETag
hf_hub_download-->>Client: Return cached path
else Invalid ETag
hf_hub_download->>Server: HEAD request for ETag
end
else No cache
hf_hub_download->>Server: HEAD request for ETag
end
Server-->>hf_hub_download: ETag, commit_hash, size
hf_hub_download->>Cache: Check if file exists
alt File not in cache
hf_hub_download->>Server: GET request
Server-->>hf_hub_download: File content
hf_hub_download->>Cache: Write file + metadata
end
hf_hub_download-->>Client: Return file pathETag Validation Process
The download system implements a three-tier validation strategy:
- ETag Match: Compare server ETag with local metadata
- SHA256 Hash: For LFS files, compute and compare SHA256
- Timestamp Check: Verify file hasn't been modified since metadata save
# ETag-based validation
if local_metadata is not None and local_metadata.etag == etag:
write_download_metadata(...)
return str(paths.file_path)
# SHA256-based validation (for LFS files)
if local_metadata is None and REGEX_SHA256.match(etag) is not None:
with open(paths.file_path, "rb") as f:
file_hash = sha_fileobj(f).hex()
if file_hash == etag:
write_download_metadata(...)
return str(paths.file_path)
Sources: src/huggingface_hub/file_download.py:400-480
Error Handling
Exception Hierarchy
graph TD
A[Exception]
A --> B[HfHubHTTPError]
B --> C[RevisionNotFoundError]
B --> D[EntryNotFoundError]
B --> E[LocalEntryNotFoundError]
D --> F[RemoteEntryNotFoundError]
A --> G[EntryNotFoundError]
G --> H[LocalEntryNotFoundError]Common Errors
| Exception | Trigger Condition |
|---|---|
RevisionNotFoundError | Invalid Git revision (branch, tag, commit) |
RemoteEntryNotFoundError | File not found on remote server |
LocalEntryNotFoundError | File not in cache with local_files_only=True |
HfHubHTTPError | Generic HTTP errors (401, 403, 404, 500, etc.) |
# Example: Handling download errors
try:
path = hf_hub_download('bert-base-cased', 'config.json')
except RevisionNotFoundError as e:
print(f"Revision not found: {e}")
except RemoteEntryNotFoundError as e:
print(f"File not on server: {e}")
except LocalEntryNotFoundError as e:
print("File not in cache. Set local_files_only=True and cache it first.")
Sources: src/huggingface_hub/errors.py:100-180
Command-Line Interface
CLI Download Command
The huggingface-cli tool provides download functionality:
# Download single file
huggingface-cli download bert-base-cased config.json
# Download entire repo
huggingface-cli download stabilityai/stable-diffusion-2-1
# With patterns
huggingface-cli download meta-llama/Llama-2-7b --include "*.safetensors"
# Dry run
huggingface-cli download bigscience/bloom-7b1 --dry-run
CLI Implementation
The CLI wraps the core download functions and adds:
- Pretty-printed output formatting
- Dry-run mode for previewing downloads
- Pattern-based file selection
- Progress indication
# From cli/download.py
def run(self):
if len(regular_filenames) == 1:
# Single file: use hf_hub_download
return hf_hub_download(
repo_id=repo_id,
filename=regular_filenames[0],
...
)
else:
# Multiple files or patterns: use snapshot_download
return snapshot_download(
repo_id=repo_id,
allow_patterns=allow_patterns,
...
)
Sources: src/huggingface_hub/cli/download.py:50-120
Advanced Usage
Dry Run Mode
Preview what would be downloaded without actually downloading:
from huggingface_hub import hf_hub_download, DryRunFileInfo
result = hf_hub_download(
repo_id="bert-base-cased",
filename="config.json",
dry_run=True,
)
if isinstance(result, DryRunFileInfo):
print(f"Will download: {result.filename}")
print(f"Size: {result.file_size} bytes")
print(f"Cached: {result.is_cached}")
print(f"Commit: {result.commit_hash}")
Progress Tracking
Customize progress bar behavior:
from tqdm import tqdm
from huggingface_hub import hf_hub_download
class CustomProgress(tqdm):
def set_postfix(self, **kwargs):
self.set_postfix_str(f"ETA: {kwargs.get('eta', 'N/A')}")
hf_hub_download(
repo_id="bigscience/bloom-7b1",
filename="pytorch_model.bin",
tqdm_class=CustomProgress,
)
Offline Mode
Work exclusively with cached files:
from huggingface_hub import hf_hub_download
# Will fail if file not cached
path = hf_hub_download(
repo_id="bert-base-cased",
filename="config.json",
local_files_only=True,
)
Sources: src/huggingface_hub/file_download.py:450-500
Repository Types
The download system supports multiple repository types:
| Repo Type | Description | Typical Contents |
|---|---|---|
model | Model repositories | PyTorch/TensorFlow models, configs, tokenizer files |
dataset | Dataset repositories | Data files, dataset card, scripts |
space | Gradio Spaces | Application code, models, requirements |
Repository types affect URL construction:
# URL prefixes from constants
REPO_TYPES_URL_PREFIXES = {
"model": "",
"dataset": "datasets/",
"space": "spaces/",
}
Sources: src/huggingface_hub/lfs.py:30-60
Best Practices
- Use caching: Files are cached automatically; reuse cached files for subsequent runs
- Specify revisions: Pin specific commits for reproducible downloads
- Handle authentication: Use
token=Trueto auto-read from config, or pass explicit tokens - Prefer single file downloads: Use
hf_hub_downloadfor specific files instead of full snapshots - Use patterns wisely: Combine
allow_patternsandignore_patternsfor selective downloads
Summary
The file download system in huggingface_hub provides a robust, cached, and authenticated mechanism for retrieving files from the Hugging Face Hub. Key functions include:
hf_hub_download: Single file downloads with full validationsnapshot_download: Complete repository downloads with pattern filtering- CLI integration via
huggingface-cli download
The system handles caching, metadata validation, concurrent access, and error recovery transparently, making it suitable for production workloads requiring reliable artifact retrieval.
Sources: [src/huggingface_hub/file_download.py:1-100]()
File Upload Operations
Related topics: Git LFS Large File Handling, File Download Operations, Repository Management API
Continue reading this section for the full explanation and source context.
Continue reading this section for the full explanation and source context.
Continue reading this section for the full explanation and source context.
Continue reading this section for the full explanation and source context.
Related Pages
Related topics: Git LFS Large File Handling, File Download Operations, Repository Management API
File Upload Operations
Overview
File upload operations in huggingface_hub enable developers to publish and manage files on the Hugging Face Hub. The library provides a comprehensive set of tools for uploading individual files, entire folders, and handling large files through Git Large File Storage (LFS) integration.
The upload system is built on top of the Hub's git-based infrastructure, ensuring file versioning and integrity for all uploaded content. This architecture supports repositories of type model, dataset, and space. Sources: CLAUDE.md
Architecture Overview
graph TD
A[User Code] --> B[upload_file / upload_folder]
B --> C[CommitOperation Classes]
C --> D{Hub API}
D --> E[Regular Files<br/>Direct Upload]
D --> F[Large Files<br/>LFS Required]
F --> G[lfs.py<br/>Batch Operations]
G --> H[LFS Server]
E --> I[Regular Git Server]
H --> ICore Components
CommitOperation Classes
The foundation of all upload operations is built on three operation classes defined in _commit_api.py:
| Class | Purpose | Key Attributes |
|---|---|---|
CommitOperationAdd | Add a file to a commit | path_or_fileobj, path_in_repo, rethrow |
CommitOperationDelete | Delete a file from a repository | path_in_repo |
CommitOperationCopy | Copy a file within a repository | src_path_in_repo, path_in_repo |
Sources: src/huggingface_hub/_commit_api.py:1-100
CommitOperationAdd Details
class CommitOperationAdd:
def __init__(
self,
path_or_fileobj: Union[str, Path, bytes, BinaryIO],
path_in_repo: str,
*,
rfilename: Optional[str] = None,
rethrow: bool = True,
upload_info: Optional["CommitOperationAdd.UploadInfo"] = None,
):
The CommitOperationAdd class supports multiple input types:
| Input Type | Behavior |
|---|---|
str / Path | File path - reads file content for upload |
bytes | Raw byte content |
BinaryIO | File-like object with read() method |
The class provides an as_file() method for iterating over file content with optional progress bar support:
def as_file(self, with_tqdm: bool = False) -> Iterator[BinaryIO]:
if isinstance(self.path_or_fileobj, str) or isinstance(self.path_or_fileobj, Path):
if with_tqdm:
with tqdm_stream_file(self.path_or_fileobj) as file:
yield file
else:
with open(self.path_or_fileobj, "rb") as file:
yield file
elif isinstance(self.path_or_fileobj, bytes):
yield io.BytesIO(self.path_or_fileobj)
elif isinstance(self.path_or_fileobj, io.BufferedIOBase):
prev_pos = self.path_or_fileobj.tell()
yield self.path_or_fileobj
self.path_or_fileobj.seek(prev_pos, io.SEEK_SET)
Sources: src/huggingface_hub/_commit_api.py:200-280
Upload Functions
upload_file
Uploads a single file to a repository on the Hub.
from huggingface_hub import upload_file
upload_file(
path_or_fileobj="/home/lysandre/dummy-test/README.md",
path_in_repo="README.md",
repo_id="lysandre/test-model",
)
Sources: README.md
upload_folder
Uploads an entire folder to a repository. Handles nested directory structures and file filtering.
from huggingface_hub import upload_folder
upload_folder(
folder_path="/path/to/local/space",
repo_id="username/my-cool-space",
repo_type="space",
)
Sources: README.md
LFS Integration
Git LFS Overview
Large files (typically files larger than 10MB) are handled through Git LFS. The library provides batch upload utilities in lfs.py for efficient LFS operations.
sequenceDiagram
participant Client
participant Hub API
participant LFS Server
Client->>Hub API: POST /lfs/objects/batch
Note over Hub API: Check file sizes
Hub API->>Client: Upload instructions
alt Large Files
Client->>LFS Server: Upload LFS objects
LFS Server-->>Client: Success
end
Client->>Hub API: Complete commit
Hub API-->>Client: Commit SHALFS Batch Upload Process
The lfs.py module provides upload_files_lfs_instances() which handles the LFS batch protocol:
| Parameter | Type | Description |
|---|---|---|
commit_operations | List[CommitOperationAdd] | Files to upload |
repo_type | str | Repository type: "model", "dataset", "space" |
repo_id | str | Repository identifier |
revision | str | Git revision (default: "main") |
endpoint | str | API endpoint URL |
transfer_adapters | List[str] | Transfer methods: "basic", "multipart", "xet" |
Sources: src/huggingface_hub/lfs.py:50-150
LFS Batch Info Response
The LfsBatchInfo dataclass contains three elements:
@dataclass
class LfsBatchInfo:
instructions: List["LfsUploadInfo"]
errors: List["LfsError"]
transfer_mode: "TransferMethod"
The upload process automatically determines which files require LFS handling based on file size thresholds configured by the Hub.
Large Folder Upload
For repositories with many files or very large folder structures, _upload_large_folder.py provides chunked upload capabilities:
# Internal chunked upload for large repositories
upload_folder(
folder_path="/path/to/large/repo",
repo_id="user/large-model",
allow_patterns=["*.bin", "*.safetensors", "config.json"],
ignore_patterns=["*.git*", "__pycache__/*"],
)
Sources: src/huggingface_hub/_upload_large_folder.py
Upload Workflow
graph TD
A[Start Upload] --> B{File Size Check}
B -->|Small File| C[Direct Git Upload]
B -->|Large File| D[LFS Upload Required]
C --> E[Create Commit]
D --> F[Batch Request to LFS]
F --> G[Get Upload Instructions]
G --> H[Upload to LFS Server]
H --> E
E --> I[Commit to Hub]
I --> J[Return Commit SHA]Configuration Options
Repository Types
| Type | Description | Typical Use |
|---|---|---|
model | Model repositories | Trained weights, configs |
dataset | Dataset repositories | Data files, metadata |
space | Space repositories | Demo applications |
Common Parameters
| Parameter | Required | Default | Description |
|---|---|---|---|
repo_id | Yes | - | Namespace/repo name |
repo_type | No | "model" | Type of repository |
revision | No | "main" | Git branch/tag |
token | No | None | HF token for auth |
create_pr | No | False | Create PR instead of commit |
Error Handling
CommitOperationAdd Error Handling
The rethrow parameter controls error behavior:
# Default: raises exception on failure
operation = CommitOperationAdd(path_or_fileobj="file.bin", path_in_repo="model.bin")
# With error suppression
operation = CommitOperationAdd(path_or_fileobj="file.bin", path_in_repo="model.bin", rethrow=False)
Upload Errors
| Error Type | Cause | Resolution |
|---|---|---|
HfHubHTTPError | Server rejection | Check token permissions |
ValueError | Invalid parameters | Validate repo_id, path_in_repo |
LocalUploadNotImplementedError | Unsupported local upload | Use file path instead |
Best Practices
- Use
upload_folderfor multiple files to ensure atomic commits - Token Authentication: Always authenticate before uploading private repositories
- File Filtering: Use
allow_patternsandignore_patternsfor large repos - Progress Tracking: Enable tqdm for long uploads
from huggingface_hub import HfApi
api = HfApi()
api.upload_folder(
folder_path="./model",
repo_id="username/my-model",
repo_type="model",
token=True, # Prompt for token if needed
)
Related Documentation
- Upload Guide - Detailed upload instructions
- Repository Management - Repository operations
- Manage Cache - Cache configuration
Module Structure Summary
| File | Responsibility |
|---|---|
_commit_api.py | Core commit operations and operation classes |
_upload_large_folder.py | Chunked folder uploads |
lfs.py | Git LFS batch upload protocol implementation |
_local_folder.py | Local folder scanning and filtering |
hf_api.py | High-level HfApi methods for upload |
Sources: [src/huggingface_hub/_commit_api.py:1-100]()
Git LFS Large File Handling
Related topics: File Upload Operations, File Download Operations
Continue reading this section for the full explanation and source context.
Continue reading this section for the full explanation and source context.
Continue reading this section for the full explanation and source context.
Continue reading this section for the full explanation and source context.
Related Pages
Related topics: File Upload Operations, File Download Operations
Git LFS Large File Handling
Overview
Git LFS (Large File Storage) is a Git extension that handles large files by storing binary content outside the Git repository while maintaining a lightweight pointer file within it. The huggingface_hub library implements comprehensive LFS support to manage large model weights, datasets, and other binary assets on the Hugging Face Hub.
In the huggingface_hub ecosystem, LFS files are distinguished from regular Git-tracked files through their content addressing:
| File Type | Storage Method | Identifier | Location in Cache |
|---|---|---|---|
| Regular Git Blob | Git commit SHA | 40-char hex string | blobs/ |
| LFS File | SHA256 hash | 64-char hex string | blobs/ |
Sources: src/huggingface_hub/lfs.py:1-50
Architecture
Cache Directory Structure
When files are downloaded from the Hub, they are stored in a hierarchical cache structure:
graph TD
A["Cache Root<br/>~/.cache/huggingface/hub/"] --> B["models--{repo_id}"]
A --> C["datasets--{repo_id}"]
A --> D["spaces--{repo_id}"]
B --> E["blobs/"]
B --> F["refs/"]
B --> G["snapshots/"]
E --> H["git-sha<br/>40-char"]
E --> I["sha256<br/>64-char (LFS)"]
G --> J["{commit_hash}/"]
J --> K["filename → symlink → blob"]Sources: src/huggingface_hub/file_download.py:1-30
LFS Upload Workflow
The upload process follows a batch-oriented approach using the LFS Batch API:
sequenceDiagram
participant Client
participant Hub as HF Hub API
participant LFS as LFS Server
Client->>Hub: POST /{repo_type}/{repo_id}.git/info/lfs/objects/batch
Note over Hub,LFS: Batch request includes<br/>upload instructions request
Hub->>LFS: Check upload eligibility
LFS-->>Hub: Upload instructions (presigned URLs)
Hub-->>Client: LfsBatchInfo with actions
alt basic/multipart transfer
Client->>LFS: PUT file content directly
LFS-->>Client: 200 OK
else xet transfer
Client->>Hub: Use custom xet protocol
end
Client->>Hub: POST /{repo_type}/{repo_id}.git/info/lfs/objects/batch
Note over Client: Confirm upload completion
Hub->>LFS: Verify file content
LFS-->>Hub: Verification result
Hub-->>Client: Commit readySources: src/huggingface_hub/lfs.py:60-120
Core Components
LFS Module (`src/huggingface_hub/lfs.py`)
The main LFS module provides type definitions and utilities for handling LFS operations.
#### Constants
| Constant | Value | Purpose |
|---|---|---|
LFS_MULTIPART_UPLOAD_COMMAND | "lfs-multipart-upload" | Identifier for multipart upload operations |
OID_REGEX | ^[0-9a-f]{40}$ | Pattern for validating Git object identifiers |
LFS_HEADERS | Dict | Accept and content type headers for LFS API |
Sources: src/huggingface_hub/lfs.py:40-55
#### LFS Headers
LFS_HEADERS = {
"Accept": "application/vnd.git-lfs+json",
"Content-Type": "application/vnd.git-lfs+json",
}
These headers indicate that all LFS API communications use JSON with the vnd.git-lfs+json media type, following the LFS specification.
Sources: src/huggingface_hub/lfs.py:50-55
LFS Utilities (`src/huggingface_hub/utils/_lfs.py`)
Utility functions for LFS operations include:
| Function | Purpose |
|---|---|
SliceFileObj | Context manager for slicing file objects during multipart uploads |
| SHA utilities | Calculate SHA256 for LFS file content verification |
| Content range handling | Manage byte ranges for resumable uploads |
Sources: src/huggingface_hub/utils/_lfs.py
API Reference
LfsBatchInfo
The LfsBatchInfo dataclass encapsulates the server response from the LFS Batch API:
@dataclass
class LfsBatchInfo:
"""Information returned by the LFS batch API."""
actions: dict
"""Dictionary of available actions (upload, verify)."""
objects: list[dict]
"""List of objects with their metadata."""
transfers: list[str]
"""Supported transfer adapters (e.g., 'basic', 'multipart', 'xet')."""
Sources: src/huggingface_hub/lfs.py:55-80
Upload Information Classes
The library uses dataclasses to represent different types of upload information:
| Class | Inheritance | Purpose |
|---|---|---|
UploadInfo | Base | Abstract base for all upload info types |
LfsUploadFileInfo | UploadInfo | Standard LFS file upload with size and SHA256 |
LfsUploadTtHubInfo | UploadInfo | TtHub-specific upload info |
Sources: src/huggingface_hub/_commit_api.py:1-100
Transfer Adapters
The Hugging Face Hub supports multiple LFS transfer methods, negotiated during the batch API handshake:
Supported Transfer Methods
| Transfer Method | Description | Use Case |
|---|---|---|
basic | Direct HTTP PUT upload | Small to medium files |
multipart | Chunked upload for very large files | Files > 100MB |
xet | Custom xet protocol for optimized transfers | High-performance scenarios |
Sources: src/huggingface_hub/lfs.py:60-100
Transfer Method Selection
The client sends supported transfer methods in the batch request:
payload: dict = {
"operation": "upload",
"transfers": transfers if transfers is not None else ["basic", "multipart"],
...
}
The server responds with the transfer adapter it will use, which the client then employs for the actual upload.
Sources: src/huggingface_hub/lfs.py:85-95
Large File Identification
Size Thresholds
Files are treated as LFS content when they exceed certain thresholds:
| Threshold | Action |
|---|---|
| < 5MB | Stored as regular Git blob |
| >= 5MB | Redirected to LFS storage |
OID (Object Identifier) Format
LFS files are identified by their SHA256 hash, represented as a 64-character hexadecimal string:
Pattern: ^[0-9a-f]{64}$
Example: 403450e234d65943a7dcf7e05a771ce3c92faa84dd07db4ac20f592037a1e4bd
Regular Git blobs use 40-character SHA1 identifiers, while LFS files use 64-character SHA256 identifiers.
Sources: src/huggingface_hub/lfs.py:45
Multipart Upload for Large Files
Upload Process
graph LR
A[File] --> B{Split into chunks}
B --> C[Chunk 1]
B --> D[Chunk 2]
B --> E[Chunk N]
C --> F[Upload Part 1]
D --> G[Upload Part 2]
E --> H[Upload Part N]
F --> I{All parts<br/>complete?}
G --> I
H --> I
I -->|Yes| J[Complete multipart<br/>upload]Chunk Size Calculation
The library calculates optimal chunk sizes based on file size:
from math import ceil
chunk_size = ceil(file_size / total_parts)
This ensures even distribution of work across all chunks.
Sources: src/huggingface_hub/lfs.py:30-35
Integration with Commit API
CommitOperationAdd with LFS
The CommitOperationAdd class handles both regular and LFS file uploads:
class CommitOperationAdd(TypedDict):
path_in_repo: str
id: str # OID (git-sha or sha256 for LFS)
size: int
filepath: str
upload_info: UploadInfo
The upload_info attribute contains the LFS-specific upload metadata, which determines whether the file goes through LFS or regular Git upload.
Sources: src/huggingface_hub/_commit_api.py:100-150
Upload Flow
flowchart TD
A[Create CommitOperationAdd] --> B{File size<br/>> threshold?}
B -->|Yes| C[Create LfsUploadFileInfo]
B -->|No| D[Create UploadInfo for Git]
C --> E[Upload via LFS Batch API]
D --> F[Upload via Git HTTP API]
E --> G{LFS transfer<br/>method}
G -->|basic| H[Single PUT request]
G -->|multipart| I[Chunked upload]
G -->|xet| J[Custom xet protocol]
H --> K[Verify upload]
I --> K
J --> K
K --> L[Commit confirmation]
F --> LError Handling
Validation Errors
| Error | Condition | Handling |
|---|---|---|
| Invalid OID | Not matching OID_REGEX | Raise ValueError |
| Missing upload info | upload_info not set | Raise ValueError |
| Malformed batch response | Missing required fields | Raise HfHubHTTPError |
Network Errors
The library implements automatic retry with exponential backoff for failed LFS operations:
from huggingface_hub.utils import http_backoff
# Wrapped in http_backoff for resilience
hf_raise_for_status(response)
Sources: src/huggingface_hub/lfs.py:50-80
Configuration
Environment Variables
| Variable | Effect |
|---|---|
HF_ENDPOINT | Override default https://huggingface.co |
HF_TOKEN | Authentication token for private repos |
Upload Options
When uploading files, the following options control LFS behavior:
| Parameter | Type | Default | Description |
|---|---|---|---|
transfers | list[str] | ["basic", "multipart"] | Allowed transfer methods |
endpoint | str | Hub endpoint | LFS server endpoint |
repo_type | str | "model" | Repository type |
repo_id | str | Required | Repository identifier |
Sources: src/huggingface_hub/lfs.py:60-110
Best Practices
File Organization
- Group large binary files - Store model weights and dataset files separately from code
- Use consistent file sizes - Avoid extremely small LFS files (< 5MB overhead)
- Leverage symlinks - The snapshot directory uses symlinks to avoid duplication
Upload Optimization
- Prefer multipart for files > 100MB - Enables parallel uploads and resumability
- Enable xet for frequent transfers - Custom protocol reduces bandwidth
- Batch operations - Use
CommitOperationAddbatching to minimize round trips
Caching Strategy
~/.cache/huggingface/hub/
└── models--{repo_id}/
├── blobs/ # Physical file storage (SHA256 for LFS)
├── refs/ # Revision pointers
└── snapshots/ # Virtual files pointing to blobs
The snapshot layer provides deduplication - the same blob referenced by multiple commits is stored only once.
Sources: src/huggingface_hub/file_download.py:10-30
See Also
- Git vs HTTP Protocol - Understanding when LFS is used
- hf_hub_download - File downloading with LFS support
- Commit API Reference - Uploading with commits
Sources: [src/huggingface_hub/lfs.py:1-50]()
Repository Management API
Related topics: File Upload Operations, Authentication System, Cache Management System
Continue reading this section for the full explanation and source context.
Continue reading this section for the full explanation and source context.
Continue reading this section for the full explanation and source context.
Continue reading this section for the full explanation and source context.
Related Pages
Related topics: File Upload Operations, Authentication System, Cache Management System
Repository Management API
The Repository Management API in huggingface_hub provides a comprehensive interface for creating, configuring, and managing Hugging Face repositories (models, datasets, and Spaces) directly from Python code or via the command-line interface.
Overview
The Repository Management API serves as the core layer for all Hub repository operations, enabling developers to programmatically:
- Create and delete repositories
- Configure repository settings
- Upload and download files
- Manage repository metadata
- Handle commit operations with Git LFS support
Sources: src/huggingface_hub/README.md
Architecture
The repository management functionality is distributed across multiple modules:
graph TD
A[Repository Management API] --> B[hf_api.py]
A --> C[_commit_api.py]
A --> D[_buckets.py]
B --> E[REST API Client]
C --> E
B --> F[CLI Interface]
F --> G[hf CLI Command]Core Components
| Module | Purpose |
|---|---|
hf_api.py | Main HfApi class with all CRUD operations |
_commit_api.py | Low-level commit operations, LFS handling |
_buckets.py | Bucket/S3-compatible storage management |
cli/ | Command-line interface implementation |
Sources: CLAUDE.md
Repository Types
The API supports three primary repository types:
| Type | Description | Use Case |
|---|---|---|
model | Model repositories | Storing and sharing ML model weights |
dataset | Dataset repositories | Hosting and versioning datasets |
space | Space repositories | Hosting Gradio/Streamlit demos |
Core Operations
Repository Lifecycle
graph LR
A[create_repo] --> B[Update Settings]
B --> C[Upload Files]
C --> D[Manage Commits]
D --> E[delete_repo]#### Creating a Repository
from huggingface_hub import create_repo, HfApi
# Using HfApi class
api = HfApi()
api.create_repo(
repo_id="username/my-model",
repo_type="model",
exist_ok=False
)
# Using convenience function
create_repo(
repo_id="super-cool-model",
token="hf_xxxxx"
)
#### Deleting a Repository
api.delete_repo(repo_id="username/my-model", repo_type="model")
Sources: src/huggingface_hub/hf_api.py
Repository Settings
Update repository configuration after creation:
api.update_repo_settings(
repo_id="username/my-model",
private=True,
repo_type="model",
gated=True # Enable gated access
)
Listing Repository Contents
# List all files in a repository
files = api.list_repo_files(repo_id="tiiuae/falcon-7b-instruct")
# List repo objects with pagination
objects = api.list_repo_objects(
repo_id="my-org/my-dataset",
repo_type="dataset"
)
Commit Operations
Commit Operation Classes
The _commit_api.py module provides low-level commit primitives:
| Class | Purpose |
|---|---|
CommitOperationAdd | Add a file to the repository |
CommitOperationDelete | Remove a file from the repository |
CommitOperationCopy | Copy a file within the repository |
from huggingface_hub import CommitOperationAdd, HfApi
operations = [
CommitOperationAdd(
path_in_repo="config.json",
path_or_fileobj="/local/path/config.json"
),
]
api.create_commit(
repo_id="username/my-model",
operations=operations,
commit_message="Add config file"
)
Sources: src/huggingface_hub/_commit_api.py
Large File Upload (Git LFS)
Large files are automatically handled via Git LFS:
graph TD
A[File > 10MB] --> B{LFS Required?}
B -->|Yes| C[Upload to LFS Storage]
B -->|No| D[Upload as Regular File]
C --> E[Create LFS Pointer]
E --> F[Commit Pointer to Repo]
D --> FFile Upload Operations
Single File Upload
from huggingface_hub import upload_file
upload_file(
path_or_fileobj="/home/user/model.bin",
path_in_repo="pytorch_model.bin",
repo_id="username/my-model",
)
Folder Upload
from huggingface_hub import upload_folder
upload_folder(
folder_path="/path/to/local/space",
repo_id="username/my-cool-space",
repo_type="space",
commit_message="Update space files"
)
For very large folders, the library provides chunked upload:
from huggingface_hub import _upload_large_folder
_upload_large_folder(
repo_id="username/large-dataset",
folder_path="/data/large-folder",
repo_type="dataset"
)
Sources: src/huggingface_hub/README.md
File Deletion Operations
# Delete a single file
api.delete_file(
path_in_repo="old-model.bin",
repo_id="username/my-model",
commit_message="Remove deprecated file"
)
# Delete a folder
api.delete_folder(
path_in_repo="old-folder/",
repo_id="username/my-model",
commit_message="Clean up old directory"
)
CLI Interface
The repository management features are exposed through the hf CLI:
# Authentication
hf auth login
hf auth logout
hf auth whoami
# Repository operations
hf repos create username/my-model --type model
hf repos create username/my-dataset --type dataset
Sources: setup.py
Configuration Parameters
Repository Creation Parameters
| Parameter | Type | Default | Description |
|---|---|---|---|
repo_id | str | Required | Repository identifier (user/name or org/name) |
repo_type | str | "model" | Type: "model", "dataset", or "space" |
exist_ok | bool | False | Allow overwriting existing repo |
private | bool | False | Make repository private |
token | str | None | Hugging Face authentication token |
space_sdk | str | None | Space SDK: "gradio", "streamlit", "docker", "docker_leaf", "static", "nextjs" |
space_hardware | str | None | Space hardware tier |
Commit Operation Parameters
| Parameter | Type | Default | Description |
|---|---|---|---|
operations | list[CommitOperation] | Required | List of file operations |
commit_message | str | Required | Description of changes |
commit_description | str | None | Extended commit description |
parent_commit | str | None | Parent commit SHA for incremental updates |
create_pr | bool | False | Create a Pull Request instead of committing to main |
Error Handling
The repository management API raises specific exception types:
| Exception | Cause |
|---|---|
RepositoryNotFoundError | Repository does not exist or user lacks access |
RevisionNotFoundError | Specified git revision not found |
EntryNotFoundError | File or folder not found in repository |
HfHubHTTPError | HTTP error from the Hub API |
from huggingface_hub import hf_hub_download
from huggingface_hub.errors import RevisionNotFoundError
try:
hf_hub_download(
repo_id="bert-base-cased",
filename="config.json",
revision="<non-existent-revision>"
)
except RevisionNotFoundError as e:
print(f"Revision not found: {e}")
Sources: src/huggingface_hub/errors.py
Common Usage Patterns
Model Publishing Workflow
graph TD
A[Create Repo] --> B[Upload Model Files]
B --> C[Create Model Card]
C --> D[Set Metadata/Tags]
D --> E[Publish to Hub]from huggingface_hub import HfApi, upload_file, RepoCard, ModelCardData
api = HfApi()
# 1. Create repository
api.create_repo(repo_id="my-org/my-model", exist_ok=True)
# 2. Upload model files
upload_file(
path_or_fileobj="./model.bin",
path_in_repo="pytorch_model.bin",
repo_id="my-org/my-model"
)
# 3. Create and upload model card
card = RepoCard.from_template(
ModelCardData(
language="en",
license="apache-2.0",
model_name="My Custom Model",
tags=["pytorch", "image-classification"]
),
text="This is a custom model trained on..."
)
card.save(".gitattributes")
Dataset Versioning Workflow
from huggingface_hub import create_commit, CommitOperationAdd
operations = [
CommitOperationAdd(
path_in_repo="data/train.parquet",
path_or_fileobj="./train.parquet"
),
CommitOperationAdd(
path_in_repo="data/validation.parquet",
path_or_fileobj="./validation.parquet"
),
]
api.create_commit(
repo_id="username/my-dataset",
operations=operations,
commit_message="Add training and validation splits",
commit_description="Initial dataset release with train/validation split"
)
Best Practices
- Use
exist_ok=Truewhen creating repositories in automated pipelines - Include commit messages for better version control history
- Use
parent_commitparameter when making sequential updates to prevent race conditions - Enable LFS automatically for files larger than 10MB
- Use
create_pr=Truefor reviewing changes before merging to main branch
See Also
- Upload Guide
- Download Guide
- Manage Cache
- HfFileSystem for filesystem-style access
Sources: [src/huggingface_hub/README.md](https://github.com/huggingface/huggingface_hub/blob/main/src/huggingface_hub/README.md)
Cache Management System
Related topics: File Download Operations, Repository Management API
Continue reading this section for the full explanation and source context.
Continue reading this section for the full explanation and source context.
Continue reading this section for the full explanation and source context.
Continue reading this section for the full explanation and source context.
Related Pages
Related topics: File Download Operations, Repository Management API
Cache Management System
Overview
The Cache Management System in huggingface_hub provides comprehensive utilities for managing locally cached models, datasets, and Spaces downloaded from the Hugging Face Hub. The system handles automatic caching of downloaded content, tracks cache metadata, and offers programmatic and CLI interfaces for inspecting and managing cached resources.
The cache system is designed to:
- Store downloaded files efficiently with deduplication via blob storage
- Track repository metadata including revisions, commit hashes, and file information
- Provide safe deletion strategies that don't corrupt cache state
- Handle corrupted cache entries gracefully with warnings
Cache Directory Structure
The Hugging Face cache follows a specific directory structure to organize cached content:
HF_HUB_CACHE/
├── .locks/ # Lock files for concurrent access
│ └── ...
├── CACHEDIR.TAG # OS-native cache directory marker
├── models--owner/
│ └── repo_name/
│ ├── .cache/ # Metadata and tracking
│ ├── blobs/ # Actual file content (deduplicated)
│ ├── refs/ # Branch and tag references
│ ├── snapshots/ # Symlinks to blobs
│ └── ...
├── datasets--org/
│ └── dataset_name/
│ └── ...
└── spaces--user/
└── space_name/
└── ...
Cache directories follow the naming convention type--repo_id where:
typeis singular (e.g.,model,dataset,space)repo_idslashes are converted to hyphens (e.g.,google/fleursbecomesgoogle--fleurs)
Sources: src/huggingface_hub/utils/_cache_manager.py:1-100
Core Data Models
HFCacheInfo
The main container class for cache information returned by scan operations:
| Attribute | Type | Description |
|---|---|---|
repos | frozenset[CachedRepoInfo] | All cached repositories |
size_on_disk | int | Total size of all cached content in bytes |
warnings | list[CorruptedCacheException] | Issues encountered during scanning |
CachedRepoInfo
Represents a single cached repository:
| Attribute | Type | Description |
|---|---|---|
repo_id | str | Repository identifier (e.g., google/gemma-3-4b-it) |
repo_type | str | Type: model, dataset, or space |
size_on_disk | int | Total size of cached revisions |
revisions | frozenset[CachedRevisionInfo] | All cached revisions |
snapshot_path | Path | Path to the snapshot directory |
CachedRevisionInfo
Represents a specific revision within a cached repository:
| Attribute | Type | Description |
|---|---|---|
commit_hash | str | Git commit hash (40-character hex string) |
size_on_disk | int | Size of this specific revision |
files | frozenset[CachedFileInfo] | Files in this revision |
last_modified | datetime | Last modification timestamp |
CachedFileInfo
Represents an individual cached file:
| Attribute | Type | Description |
|---|---|---|
file_name | str | Name of the file |
size_on_disk | int | Size of the file in bytes |
file_path | Path | Path to the symlinked file in snapshots |
blob_path | Path | Path to the actual blob storage |
Sources: src/huggingface_hub/utils/_cache_manager.py:1-100
Key API Functions
scan_cache_dir()
Scans the cache directory and returns information about all cached repositories.
from huggingface_hub import scan_cache_dir
cache_info = scan_cache_dir()
print(f"Total size: {cache_info.size_on_disk / 1024 / 1024:.2f} MB")
for repo in cache_info.repos:
print(f"{repo.repo_type}/{repo.repo_id}")
Parameters:
| Parameter | Type | Default | Description |
|---|---|---|---|
cache_dir | str or Path | HF_HUB_CACHE env var | Cache directory to scan |
Returns: HFCacheInfo object containing repository information
Raises:
CacheNotFoundif the cache directory doesn't existValueErrorif cache_dir is a file instead of a directory
try_to_load_from_cache()
Checks if a file exists in the local cache without downloading.
from huggingface_hub import try_to_load_from_cache, _CACHED_NO_EXIST
filepath = try_to_load_from_cache(
repo_id="tiiuae/falcon-7b-instruct",
filename="config.json",
revision="main",
repo_type="model"
)
if isinstance(filepath, str):
print(f"File cached at: {filepath}")
elif filepath is _CACHED_NO_EXIST:
print("File confirmed to not exist at this revision")
else:
print("File not in cache")
Parameters:
| Parameter | Type | Default | Description |
|---|---|---|---|
cache_dir | str or Path | None | Cache directory path |
repo_id | str | Required | Repository identifier |
filename | str | Required | Filename to look for |
revision | str | "main" | Specific revision to check |
repo_type | str | "model" | Type of repository |
Returns: str (file path), _CACHED_NO_EXIST, or None
Sources: src/huggingface_hub/file_download.py:1-100 Sources: src/huggingface_hub/utils/_cache_manager.py:1-100
DeleteCacheStrategy
The deletion system uses a two-phase approach: create a strategy, then execute it. This prevents accidental data loss and allows for dry-run validation.
from huggingface_hub import scan_cache_dir
cache_info = scan_cache_dir()
# Create deletion strategy (doesn't delete yet)
delete_strategy = cache_info.delete_revisions(
"81fd1d6e7847c99f5862c9fb81387956d99ec7aa",
"e2983b237dccf3ab4937c97fa717319a9ca1a96d",
)
# Preview what will be deleted
print(f"Will free: {delete_strategy.expected_free_space / 1024 / 1024:.2f} MB")
# Execute the deletion
delete_strategy.execute()
Deletion Workflow
graph TD
A[scan_cache_dir] --> B[Get HFCacheInfo]
B --> C[Call delete_revisions with commit hashes]
C --> D[Create DeleteCacheStrategy]
D --> E{Preview/Dry Run}
E -->|Inspect| F[Review expected_free_space]
E -->|Confirm| G[execute]
G --> H[Delete blobs and refs]
H --> I[Cache deletion done]
F --> GDeleteCacheStrategy Properties
| Property | Type | Description |
|---|---|---|
repos_to_delete | dict[CachedRepoInfo, set[CachedRevisionInfo]] | Repos and revisions marked for deletion |
blobs_to_delete | set[Path] | Blob file paths to remove |
expected_free_space | int | Estimated bytes to be freed |
Sources: src/huggingface_hub/utils/_cache_manager.py:1-100
CLI Interface
The hf command provides cache management through the cache subcommand.
List Cached Repositories
hf cache ls
Output format:
ID SIZE LAST_ACCESSED LAST_MODIFIED REFS
--------------------------- -------- ------------- ------------- -----------
dataset/nyu-mll/glue 157.4M 2 days ago 2 days ago main script
model/LiquidAI/LFM2-VL-1.6B 3.2G 4 days ago 4 days ago main
model/microsoft/UserLM-8b 32.1G 4 days ago 4 days ago main
Done in 0.0s. Scanned 6 repo(s) for a total of 3.4G.
Filtering Options
| Filter Key | Operators | Example |
|---|---|---|
type | ==, != | --filter type==model |
size | >, <, >=, <=, = | --filter size>=1G |
accessed | >, <, >=, <= | --filter accessed<7d |
modified | >, <, >=, <= | --filter modified>30d |
refs | ==, != | --filter refs==main |
Examples:
# Filter large models
hf cache ls --filter type==model --filter size>=5G
# Find recently accessed datasets
hf cache ls --filter type==dataset --filter accessed<7d
# Filter by modification time
hf cache ls --filter modified>30d
Sorting Options
| Sort Key | Default Order | Ascending Option |
|---|---|---|
name | asc | name:asc |
size | desc | size:asc |
accessed | desc | accessed:asc |
modified | desc | modified:asc |
Examples:
# Sort by size descending (largest first)
hf cache ls --sort size
# Sort by name ascending, then size descending
hf cache ls --sort name:asc --sort size:desc
Delete Specific Revisions
hf cache delete <revision_hash> [<revision_hash>...]
The CLI will prompt for confirmation before deletion.
Sources: src/huggingface_hub/cli/cache.py:1-100
Cache Scanning Process
The cache scanning process validates the cache directory structure and handles corrupted entries gracefully.
graph TD
A[Start scan_cache_dir] --> B{Is cache_dir set?}
B -->|No| C[Use HF_HUB_CACHE env var]
B -->|Yes| D[Use provided path]
C --> E{Does directory exist?}
D --> E
E -->|No| F[Raise CacheNotFound]
E -->|Yes| G{Is it a file?}
G -->|Yes| H[Raise ValueError]
G -->|No| I[Iterate subdirectories]
I --> J{Skip .locks and CACHEDIR.TAG?}
J -->|Yes| K[Next directory]
J -->|No| L[_scan_cached_repo]
L --> M{Valid format?}
M -->|No| N[Log CorruptedCacheException]
M -->|Yes| O[Create CachedRepoInfo]
N --> P[Add to warnings list]
O --> Q[Add to repos set]
P --> K
Q --> K
K --> R{More directories?}
R -->|Yes| I
R -->|No| S[Return HFCacheInfo]Validation Rules
- Each subdirectory must follow the
type--repo_idnaming convention - The
typemust be one of:model,dataset,space - Directories must contain expected subdirectories (snapshots, blobs, refs)
Sources: src/huggingface_hub/utils/_cache_manager.py:100-200
HubMixin Integration
The HubMixin class integrates with the cache system for model loading:
from huggingface_hub import HubMixin
class MyModel(HubMixin, torch.nn.Module):
pass
# Load model - uses cache automatically
model = MyModel.from_pretrained("bert-base-uncased")
# Cache behavior:
# 1. Check if model exists locally
# 2. If local_files_only=True, use cached version or raise error
# 3. Otherwise, download and cache from Hub
# 4. Store in cache_dir or default HF_HUB_CACHE location
HubMixin Parameters Related to Caching:
| Parameter | Type | Default | Description |
|---|---|---|---|
cache_dir | str or Path | None | Custom cache location |
force_download | bool | False | Force re-download |
local_files_only | bool | False | Only use cached files |
token | str or bool | None | HuggingFace token |
Sources: src/huggingface_hub/hub_mixin.py:1-100
Exception Handling
CorruptedCacheException
Raised when cache directory structure is invalid or expected files are missing.
class CorruptedCacheException(Exception):
"""Exception raised when a cache entry is corrupted."""
def __init__(self, message: str):
self.message = message
super().__init__(self.message)
Common corruption scenarios:
- Snapshots directory doesn't exist
- Invalid repository directory naming
- Missing expected cache metadata
CacheNotFound
Raised when the cache directory cannot be located.
raise CacheNotFound(
f"Cache directory not found: {cache_dir}. "
"Please use `cache_dir` argument or set `HF_HUB_CACHE` environment variable.",
cache_dir=cache_dir,
)
Environment Variables
| Variable | Description | Default |
|---|---|---|
HF_HUB_CACHE | Primary cache directory | ~/.cache/huggingface/hub |
HF_HUB_DOWNLOAD_TIMEOUT | Download timeout in seconds | 10 |
Sources: src/huggingface_hub/constants.py
Best Practices
Efficient Cache Usage
- Reuse cached content: Multiple models sharing the same base weights will reference the same blobs
- Use revision pinning: Specify exact commit hashes for reproducible builds
- Monitor cache size: Regularly run
hf cache lsto identify large repositories
Safe Deletion
- Always use
scan_cache_dir()to inspect before deletion - Check
warningsinHFCacheInfofor corrupted entries - Use
expected_free_spaceproperty to estimate space recovery - Execute deletion only after confirming the strategy
Troubleshooting
| Issue | Solution |
|---|---|
| CacheNotFound error | Set HF_HUB_CACHE or use cache_dir parameter |
| CorruptedCacheException | Manually delete the corrupted cache entry |
| Large cache size | Use delete_revisions() to remove old/unused revisions |
| Permission denied | Check file permissions on cache directory |
Complete Usage Example
from huggingface_hub import scan_cache_dir
# Scan cache and get overview
cache_info = scan_cache_dir()
print(f"Total cached repos: {len(cache_info.repos)}")
print(f"Total size: {cache_info.size_on_disk / 1024**3:.2f} GB")
# Find specific repo
target_repo = "stabilityai/stable-diffusion-2-1"
for repo in cache_info.repos:
if repo.repo_id == target_repo:
print(f"\nFound {target_repo}:")
print(f" Type: {repo.repo_type}")
print(f" Revisions: {len(repo.revisions)}")
for revision in repo.revisions:
print(f" - {revision.commit_hash[:8]}")
print(f" Size: {revision.size_on_disk / 1024**2:.2f} MB")
print(f" Files: {len(revision.files)}")
# Clean up old revisions
if cache_info.repos:
first_repo = next(iter(cache_info.repos))
if len(first_repo.revisions) > 1:
# Keep only the latest revision
revisions_to_delete = [
rev.commit_hash
for rev in list(first_repo.revisions)[1:]
]
strategy = cache_info.delete_revisions(*revisions_to_delete)
print(f"\nWould free: {strategy.expected_free_space / 1024**2:.2f} MB")
# strategy.execute() # Uncomment to actually deleteSources: [src/huggingface_hub/utils/_cache_manager.py:1-100]()
Inference Client and Providers
Related topics: HuggingFace File System (HfFileSystem), Overview and Architecture
Continue reading this section for the full explanation and source context.
Continue reading this section for the full explanation and source context.
Continue reading this section for the full explanation and source context.
Continue reading this section for the full explanation and source context.
Related Pages
Related topics: HuggingFace File System (HfFileSystem), Overview and Architecture
Inference Client and Providers
Overview
The Inference Client and Providers system provides a unified interface for performing inference with machine learning models hosted on Hugging Face or third-party inference providers. This system abstracts the complexity of interacting with various inference backends, allowing developers to make inference calls through a consistent Python API.
The InferenceClient class serves as the primary entry point for synchronous inference operations, while AsyncInferenceClient provides asynchronous alternatives for non-blocking workflows. Both clients leverage a provider system that normalizes API differences between various inference services like Replicate, Together AI, Fal.ai, and Sambanova. Sources: src/huggingface_hub/inference/_client.py:1-100
Architecture
The inference system follows a layered architecture where the client exposes a high-level API while delegating provider-specific details to helper classes.
graph TD
User[User Code] --> Client[InferenceClient]
Client --> ProviderHelper[Provider Helper]
ProviderHelper --> ProviderAPI[Third-party Provider API]
ProviderHelper --> HFRouting[Hugging Face Routing]
subgraph "InferenceClient"
Methods[text_generation, chat_completion,<br/>text_to_image, etc.]
end
subgraph "Provider Layer"
get_provider_helper[get_provider_helper]
prepare_request[prepare_request]
get_response[get_response]
endCore Components
| Component | File Location | Purpose |
|---|---|---|
InferenceClient | inference/_client.py | Synchronous inference operations |
AsyncInferenceClient | inference/_generated/_async_client.py | Asynchronous inference operations |
| Provider Helpers | inference/_providers/*.py | Provider-specific request/response handling |
| Provider Registry | inference/_providers/__init__.py | Provider discovery and initialization |
Sources: src/huggingface_hub/inference/_client.py:1-50
Supported Inference Tasks
The InferenceClient supports a comprehensive set of inference tasks through method-based API calls.
Task Categories and Methods
| Category | Method | Description |
|---|---|---|
| Text Generation | text_generation() | Generate text from prompts with streaming support |
| Chat | chat_completion() | Multi-turn conversation with message history |
| Image Generation | text_to_image() | Generate images from text prompts |
| Video Generation | text_to_video() | Generate videos from text descriptions |
| Text Analysis | summarization() | Summarize long text documents |
| Text Analysis | fill_mask() | Fill masked tokens in text |
| Text Analysis | zero_shot_classification() | Classify text with arbitrary labels |
| Table Operations | table_question_answering() | Answer questions from tabular data |
| Table Operations | tabular_classification() | Classify tabular data rows |
| Embeddings | sentence_similarity() | Compute semantic similarity between sentences |
| Vision | image_classification() | Classify images into categories |
Sources: src/huggingface_hub/inference/_client.py:200-500
Client Configuration
Initialization Parameters
| Parameter | Type | Default | Description | ||
|---|---|---|---|---|---|
model | `str \ | None` | None | Default model identifier for all requests | |
provider | `str \ | None` | None | Inference provider to use (replicate, together, fal-ai, etc.) | |
api_key | `str \ | None` | None | API key for authentication | |
token | `str \ | bool \ | None` | True | Hugging Face token for authentication |
timeout | `float \ | None` | None | Request timeout in seconds | |
headers | dict[str, str] | {} | Additional HTTP headers |
from huggingface_hub import InferenceClient
# Basic usage with default provider
client = InferenceClient()
# Using a specific provider
client = InferenceClient(
provider="replicate",
api_key="hf_...",
model="meta-llama/Meta-Llama-3-8B-Instruct"
)
Sources: src/huggingface_hub/inference/_client.py:50-150
Provider System
Provider Architecture
The provider system normalizes differences between inference services by abstracting request preparation and response parsing.
graph LR
A[InferenceClient] -->|task + model| B[get_provider_helper]
B --> C{Provider Type}
C -->|Built-in| D[Internal Provider Helper]
C -->|Third-party| E[Provider API Helper]
D --> F[Provider.prepare_request]
E --> G[External API Call]
F --> H[Normalized Response]
G --> HSupported Providers
| Provider | Description | Authentication |
|---|---|---|
replicate | Replicate hosted models | API key |
together | Together AI inference | API key |
fal-ai | Fal.ai generation services | API key |
sambanova | SambaNova Cloud | API key |
default | Hugging Face inference API | HF token |
Sources: src/huggingface_hub/inference/_providers/__init__.py
Provider Helper Functions
Each provider helper implements two key methods:
prepare_request(): Transforms inputs and parameters into provider-specific API formatget_response(): Parses provider response into normalized output format
provider_helper = get_provider_helper(
provider="replicate",
task="text-generation",
model="meta-llama/Meta-Llama-3-8B-Instruct"
)
request_parameters = provider_helper.prepare_request(
inputs=prompt,
parameters={"max_new_tokens": 100},
headers=client.headers,
model=model_id,
api_key=client.token,
)
Sources: src/huggingface_hub/inference/_client.py:150-200
Usage Examples
Text Generation
from huggingface_hub import InferenceClient
client = InferenceClient()
# Basic text generation
output = client.text_generation(
prompt="The capital of France is",
model="gpt2"
)
Sources: src/huggingface_hub/inference/_client.py:300-400
Chat Completion
from huggingface_hub import InferenceClient
client = InferenceClient(
provider="sambanova",
api_key="hf_..."
)
output = client.chat_completion(
model="meta-llama/Meta-Llama-3-8B-Instruct",
messages=[
{"role": "user", "content": "What is the capital of France?"}
]
)
Sources: src/huggingface_hub/inference/_client.py:400-500
Image Generation
from huggingface_hub import InferenceClient
client = InferenceClient(
provider="replicate",
api_key="hf_..."
)
image = client.text_to_image(
"An astronaut riding a horse on the moon.",
model="black-forest-labs/FLUX.1-schnell",
extra_body={"output_quality": 100}
)
image.save("astronaut.png")
Sources: src/huggingface_hub/inference/_client.py:500-600
Text-to-Video
from huggingface_hub import InferenceClient
client = InferenceClient()
video = client.text_to_video(
prompt="A cat playing piano",
num_inference_steps=50,
guidance_scale=7.5
)
Sources: src/huggingface_hub/inference/_client.py:600-700
Sentence Similarity
from huggingface_hub import InferenceClient
client = InferenceClient()
similarities = client.sentence_similarity(
"Machine learning is so easy.",
other_sentences=[
"Deep learning is so straightforward.",
"This is so difficult, like rocket science.",
]
)
# Output: [0.7785726189613342, 0.45876261591911316]
Sources: src/huggingface_hub/inference/_client.py:700-800
Zero-Shot Classification
from huggingface_hub import InferenceClient
client = InferenceClient()
text = "A new model offers an explanation for how the Galilean satellites formed."
labels = ["space & cosmos", "scientific discovery", "microbiology", "robots"]
result = client.zero_shot_classification(text, labels)
Sources: src/huggingface_hub/inference/_client.py:350-450
AsyncInferenceClient
For asynchronous workflows, the AsyncInferenceClient provides non-blocking equivalents of all synchronous methods.
from huggingface_hub import AsyncInferenceClient
async def main():
client = AsyncInferenceClient()
# Async chat completion
output = await client.chat_completion(
model="meta-llama/Meta-Llama-3-70B-Instruct",
messages=[
{"role": "user", "content": "Hello!"}
]
)
# Async image generation
image = await client.text_to_image(
prompt="A beautiful sunset over mountains",
model="black-forest-labs/FLUX.1-schnell"
)
Sources: src/huggingface_hub/inference/_generated/_async_client.py:1-200
Error Handling
The inference system defines specific exception types for common error conditions:
| Exception | Description |
|---|---|
InferenceTimeoutError | Request exceeded timeout threshold |
HfHubHTTPError | HTTP error from the inference provider |
from huggingface_hub import InferenceClient, InferenceTimeoutError
client = InferenceClient(timeout=30)
try:
result = client.text_generation("Hello world")
except InferenceTimeoutError:
print("Request timed out")
except HfHubHTTPError as e:
print(f"HTTP error: {e}")
Sources: src/huggingface_hub/inference/_client.py:250-300
Request Flow
sequenceDiagram
participant User
participant Client
participant ProviderHelper
participant API
User->>Client: text_generation(prompt, model)
Client->>ProviderHelper: get_provider_helper(provider, task, model)
Client->>ProviderHelper: prepare_request(inputs, parameters)
ProviderHelper-->>Client: request_parameters
Client->>Client: _inner_post(request_parameters)
Client->>API: HTTP POST
API-->>Client: response
Client->>ProviderHelper: get_response(response)
ProviderHelper-->>Client: normalized_output
Client-->>User: InferenceOutputOutput Models
The inference client returns typed output objects for each task:
| Task | Output Type |
|---|---|
| Text Generation | TextGenerationOutput or TextGenerationStreamOutput |
| Chat Completion | ChatCompletionOutput |
| Image Generation | PIL.Image.Image |
| Video Generation | bytes |
| Summarization | SummarizationOutput |
| Fill Mask | list[FillMaskOutputElement] |
| Zero-Shot Classification | list[ZeroShotClassificationOutputElement] |
| Table Question Answering | TableQuestionAnsweringOutputElement |
| Tabular Classification | list[str] |
| Sentence Similarity | list[float] |
| Image Classification | list[ImageClassificationOutputElement] |
Sources: src/huggingface_hub/inference/_client.py:200-600
CLI Integration
The CLI provides command-line access to inference functionality:
# Install inference dependencies
pip install huggingface_hub[inference]
# Run inference via CLI
hf inference --model gpt2 --text "The capital of France is"
Sources: setup.py:1-30
Advanced Configuration
Extra Body Parameters
Many inference methods accept extra_body for provider-specific parameters:
client = InferenceClient(provider="replicate", api_key="hf_...")
image = client.text_to_image(
"A majestic lion",
model="black-forest-labs/FLUX.1-dev",
extra_body={
"output_quality": 100,
"guidance_scale": 3.5
}
)
Generate Parameters
The generate_parameters argument allows fine-tuning of generation behavior:
client.text_generation(
prompt="Write a story",
model="meta-llama/Meta-Llama-3-8B-Instruct",
generate_parameters={
"temperature": 0.7,
"top_p": 0.9,
"repetition_penalty": 1.2
}
)
Summary
The Inference Client and Providers system provides:
- Unified API: Consistent interface across all inference tasks
- Multi-Provider Support: Seamless integration with Replicate, Together AI, Fal.ai, and Sambanova
- Type-Safe Outputs: Well-defined output models for each task
- Async Support: Full async/await compatibility via AsyncInferenceClient
- Error Handling: Specific exceptions for timeout and HTTP errors
- Extensible Design: Provider helper system for adding new inference backends
This architecture enables developers to switch between providers and models without modifying application code, providing flexibility in deployment while maintaining a clean, Pythonic API.
Sources: [src/huggingface_hub/inference/_client.py:1-50]()
HuggingFace File System (HfFileSystem)
Related topics: File Download Operations, File Upload Operations
Continue reading this section for the full explanation and source context.
Related Pages
Related topics: File Download Operations, File Upload Operations
HuggingFace File System (HfFileSystem)
Overview
The HuggingFace File System (HfFileSystem) is an fsspec-based POSIX-like filesystem implementation that provides seamless access to Hugging Face Hub repositories. It enables developers to interact with models, datasets, and Spaces using familiar filesystem operations, abstracting away the complexity of HTTP API calls and caching mechanisms.
Key Characteristics:
| Property | Value |
|---|---|
| Base Class | fsspec.spec.AbstractFileSystem |
| Protocol | hf:// |
| Python Version | >= 3.10.0 |
| Entry Point | hf=huggingface_hub.HfFileSystem |
Sources: setup.py:48
Sources: [setup.py:48](https://github.com/huggingface/huggingface_hub/blob/main/setup.py#L48)
Doramagic Pitfall Log
Source-linked risks stay visible on the manual page so the preview does not read like a recommendation.
The project may affect permissions, credentials, data exposure, or host boundaries.
First-time setup may fail or require extra isolation and rollback planning.
First-time setup may fail or require extra isolation and rollback planning.
The project should not be treated as fully validated until this signal is reviewed.
Doramagic Pitfall Log
Doramagic extracted 13 source-linked risk signals. Review them before installing or handing real data to the project.
1. Security or permission risk: How to stop hf models ls from truncating the results in the table?
- Severity: high
- Finding: Security or permission risk is backed by a source signal: How to stop hf models ls from truncating the results in the table?. Treat it as a review item until the current version is checked.
- User impact: The project may affect permissions, credentials, data exposure, or host boundaries.
- Recommended check: Open the linked source, confirm whether it still applies to the current version, and keep the first run isolated.
- Evidence: Source-linked evidence: https://github.com/huggingface/huggingface_hub/issues/4207
2. Installation risk: [v1.13.0] new CLI commands and formatting, and HF URI parsing
- Severity: medium
- Finding: Installation risk is backed by a source signal: [v1.13.0] new CLI commands and formatting, and HF URI parsing. Treat it as a review item until the current version is checked.
- User impact: First-time setup may fail or require extra isolation and rollback planning.
- Recommended check: Open the linked source, confirm whether it still applies to the current version, and keep the first run isolated.
- Evidence: Source-linked evidence: https://github.com/huggingface/huggingface_hub/releases/tag/v1.13.0
3. Installation risk: [v1.15.0] Region-aware buckets & repos, `hf skills list`, polished CLI help and more
- Severity: medium
- Finding: Installation risk is backed by a source signal: [v1.15.0] Region-aware buckets & repos,
hf skills list, polished CLI help and more. Treat it as a review item until the current version is checked. - User impact: First-time setup may fail or require extra isolation and rollback planning.
- Recommended check: Open the linked source, confirm whether it still applies to the current version, and keep the first run isolated.
- Evidence: Source-linked evidence: https://github.com/huggingface/huggingface_hub/releases/tag/v1.15.0
4. Capability assumption: README/documentation is current enough for a first validation pass.
- Severity: medium
- Finding: README/documentation is current enough for a first validation pass.
- User impact: The project should not be treated as fully validated until this signal is reviewed.
- Recommended check: Open the linked source, confirm whether it still applies to the current version, and keep the first run isolated.
- Evidence: capability.assumptions | github_repo:323591830 | https://github.com/huggingface/huggingface_hub | README/documentation is current enough for a first validation pass.
5. Maintenance risk: Maintainer activity is unknown
- Severity: medium
- Finding: Maintenance risk is backed by a source signal: Maintainer activity is unknown. Treat it as a review item until the current version is checked.
- User impact: Users cannot judge support quality until recent activity, releases, and issue response are checked.
- Recommended check: Open the linked source, confirm whether it still applies to the current version, and keep the first run isolated.
- Evidence: evidence.maintainer_signals | github_repo:323591830 | https://github.com/huggingface/huggingface_hub | last_activity_observed missing
6. Security or permission risk: no_demo
- Severity: medium
- Finding: no_demo
- User impact: The project may affect permissions, credentials, data exposure, or host boundaries.
- Recommended check: Open the linked source, confirm whether it still applies to the current version, and keep the first run isolated.
- Evidence: downstream_validation.risk_items | github_repo:323591830 | https://github.com/huggingface/huggingface_hub | no_demo; severity=medium
7. Security or permission risk: no_demo
- Severity: medium
- Finding: no_demo
- User impact: The project may affect permissions, credentials, data exposure, or host boundaries.
- Recommended check: Open the linked source, confirm whether it still applies to the current version, and keep the first run isolated.
- Evidence: risks.scoring_risks | github_repo:323591830 | https://github.com/huggingface/huggingface_hub | no_demo; severity=medium
8. Security or permission risk: [v1.10.0] Instant file copy and new Kernel repo type
- Severity: medium
- Finding: Security or permission risk is backed by a source signal: [v1.10.0] Instant file copy and new Kernel repo type. Treat it as a review item until the current version is checked.
- User impact: The project may affect permissions, credentials, data exposure, or host boundaries.
- Recommended check: Open the linked source, confirm whether it still applies to the current version, and keep the first run isolated.
- Evidence: Source-linked evidence: https://github.com/huggingface/huggingface_hub/releases/tag/v1.10.0
9. Security or permission risk: [v1.11.0] Semantic Spaces search, Space logs, and more
- Severity: medium
- Finding: Security or permission risk is backed by a source signal: [v1.11.0] Semantic Spaces search, Space logs, and more. Treat it as a review item until the current version is checked.
- User impact: The project may affect permissions, credentials, data exposure, or host boundaries.
- Recommended check: Open the linked source, confirm whether it still applies to the current version, and keep the first run isolated.
- Evidence: Source-linked evidence: https://github.com/huggingface/huggingface_hub/releases/tag/v1.11.0
10. Security or permission risk: [v1.12.0] Unified CLI output, bucket search, and more
- Severity: medium
- Finding: Security or permission risk is backed by a source signal: [v1.12.0] Unified CLI output, bucket search, and more. Treat it as a review item until the current version is checked.
- User impact: The project may affect permissions, credentials, data exposure, or host boundaries.
- Recommended check: Open the linked source, confirm whether it still applies to the current version, and keep the first run isolated.
- Evidence: Source-linked evidence: https://github.com/huggingface/huggingface_hub/releases/tag/v1.12.0
11. Security or permission risk: [v1.14.0] Handle Spaces secrets & variables from CLI and other improvements
- Severity: medium
- Finding: Security or permission risk is backed by a source signal: [v1.14.0] Handle Spaces secrets & variables from CLI and other improvements. Treat it as a review item until the current version is checked.
- User impact: The project may affect permissions, credentials, data exposure, or host boundaries.
- Recommended check: Open the linked source, confirm whether it still applies to the current version, and keep the first run isolated.
- Evidence: Source-linked evidence: https://github.com/huggingface/huggingface_hub/releases/tag/v1.14.0
12. Maintenance risk: issue_or_pr_quality=unknown
- Severity: low
- Finding: issue_or_pr_quality=unknown。
- User impact: Users cannot judge support quality until recent activity, releases, and issue response are checked.
- Recommended check: Open the linked source, confirm whether it still applies to the current version, and keep the first run isolated.
- Evidence: evidence.maintainer_signals | github_repo:323591830 | https://github.com/huggingface/huggingface_hub | issue_or_pr_quality=unknown
Source: Doramagic discovery, validation, and Project Pack records
Community Discussion Evidence
These external discussion links are review inputs, not standalone proof that the project is production-ready.
Count of project-level external discussion links exposed on this manual page.
Open the linked issues or discussions before treating the pack as ready for your environment.
Community Discussion Evidence
Doramagic exposes project-level community discussion separately from official documentation. Review these links before using huggingface_hub with real data or production workflows.
- How to stop hf models ls from truncating the results in the table? - github / github_issue
- [[v1.15.0] Region-aware buckets & repos,
hf skills list, polished CLI h](https://github.com/huggingface/huggingface_hub/releases/tag/v1.15.0) - github / github_release - [[v1.14.0] Handle Spaces secrets & variables from CLI and other improveme](https://github.com/huggingface/huggingface_hub/releases/tag/v1.14.0) - github / github_release
- [[v1.13.0] new CLI commands and formatting, and HF URI parsing](https://github.com/huggingface/huggingface_hub/releases/tag/v1.13.0) - github / github_release
- [[v1.12.0] Unified CLI output, bucket search, and more](https://github.com/huggingface/huggingface_hub/releases/tag/v1.12.0) - github / github_release
- [[v1.11.0] Semantic Spaces search, Space logs, and more](https://github.com/huggingface/huggingface_hub/releases/tag/v1.11.0) - github / github_release
- [[v1.10.2] Fix reference cycle in hf_raise_for_status](https://github.com/huggingface/huggingface_hub/releases/tag/v1.10.2) - github / github_release
- [[v1.10.1] Fix copy file to folder](https://github.com/huggingface/huggingface_hub/releases/tag/v1.10.1) - github / github_release
- [[v1.10.0] Instant file copy and new Kernel repo type](https://github.com/huggingface/huggingface_hub/releases/tag/v1.10.0) - github / github_release
- [[v1.9.2] Fix set_space_volume / delete_space_volume return types](https://github.com/huggingface/huggingface_hub/releases/tag/v1.9.2) - github / github_release
- README/documentation is current enough for a first validation pass. - GitHub / issue
Source: Project Pack community evidence and pitfall evidence