Hugging Face Introduces Mutable Storage Buckets for ML Artifacts
8
What is the Viqus Verdict?
We evaluate each news story based on its real impact versus its media hype to offer a clear and objective perspective.
AI Analysis:
While the launch has generated considerable buzz and pre-release interest, the core impact is a tangible improvement in developer workflows and resource utilization – a crucial step for wider ML adoption. The focus is on operational efficiency, not a revolutionary new model, but a fundamentally better way to handle a pervasive problem.
Article Summary
Hugging Face is introducing Storage Buckets, a fundamentally new approach to object storage specifically tailored for the dynamic nature of modern machine learning workflows. Traditionally, the Hugging Face Hub primarily served as a repository for final, immutable artifacts like trained models and datasets. However, the reality of ML development involves a continuous stream of intermediate files – checkpoints, optimizer states, processed shards, logs, traces, and more – that frequently change and often require version control. Storage Buckets address this gap by providing mutable, S3-like object storage directly accessible from the Hub, allowing developers to seamlessly manage these transient artifacts. Built on Hugging Face’s Xet backend, these Buckets leverage chunk-based storage and deduplication to optimize bandwidth, transfer speeds, and storage efficiency. This is particularly crucial for large-scale training pipelines and distributed workloads. The key benefits include reduced bandwidth consumption, faster transfers, and improved storage utilization. Furthermore, Buckets offer global storage by default, combined with pre-warming capabilities to bring frequently accessed data closer to compute resources, minimizing latency. The launch is supported by a private beta program with key launch partners, and offers programmatic access via the Hub, CLI, Python client, fsspec, and JavaScript client, facilitating integration with existing workflows and tools. The feature builds on the seamless model/dataset workflow offered by the Hub, aiming to simplify the entire ML artifact lifecycle.Key Points
- Storage Buckets are a new, mutable object storage solution on the Hugging Face Hub designed for managing intermediate ML artifacts.
- They are built on Hugging Face’s Xet backend, utilizing chunk-based storage and deduplication to improve efficiency.
- Buckets offer global storage with pre-warming capabilities for optimal performance in distributed training pipelines.

