The official Python client for the Huggingface Hub.
APACHE-2.0 License
Published by LysandreJik about 3 years ago
The version v0.0.18 of the huggingface_hub
includes tools to manage repository metadata. The following example reads metadata from a repository:
from huggingface_hub import Repository
repo = Repository("xxx", clone_from="yyy")
data = repo.repocard_metadata_load()
The following example completes that metadata before writing it to the repository locally.
data["license"] = "apache-2.0"
repo.repocard_metadata_save(data)
Tag management is now available! Add, check, delete tags locally or remotely directly from the Repository
utility.
The Keras mixin has been revisited:
SavedModel
objects rather than .h5
files.Published by LysandreJik about 3 years ago
The pushing methods now have access to a blocking
boolean parameter to indicate whether the push should happen
asynchronously.
In order to see if the push has finished or its status code (to spot a failure), one should use the command_queue
property on the Repository
object.
For example:
from huggingface_hub import Repository
repo = Repository("<local_folder>", clone_from="<user>/<model_name>")
with repo.commit("Commit message", blocking=False):
# Save data
last_command = repo.command_queue[-1]
# Status of the push command
last_command.status
# Will return the status code
# -> -1 will indicate the push is still ongoing
# -> 0 will indicate the push has completed successfully
# -> non-zero code indicates the error code if there was an error
# if there was an error, the stderr may be inspected
last_command.stderr
# Whether the command finished or if it is still ongoing
last_command.is_done
# Whether the command errored-out.
last_command.failed
When using blocking=False
, the commands will be tracked and your script will exit only when all pushes are done, even
if other errors happen in your script (a failed push counts as done).
The huggingface_hub
library now has a notebook_login
method which can be used to login on notebooks with no access to the shell. In a notebook, login with the following:
from huggingface_hub import notebook_login
notebook_login()
Published by LysandreJik about 3 years ago
The huggingface_hub
version v0.0.16 introduces several quality of life improvements.
Repository
Progress bars are now visible with many git operations, such as pulling, cloning and pushing:
>>> from huggingface_hub import Repository
>>> repo = Repository("local_folder", clone_from="huggingface/CodeBERTa-small-v1")
Cloning https://huggingface.co/huggingface/CodeBERTa-small-v1 into local empty directory.
Download file pytorch_model.bin: 45%|████████████████████████████▋ | 144M/321M [00:13<00:12, 14.7MB/s]
Download file flax_model.msgpack: 42%|██████████████████████████▌ | 134M/319M [00:13<00:13, 14.4MB/s]
There is now branching support in Repository
. This will clone the xxx
repository and checkout the new-branch
revision. If it is an existing branch on the remote, it will checkout that branch. If it is another revision, such as a commit or a tag, it will also checkout that revision.
If the revision does not exist, it will create a branch from the latest commit on the main
branch.
>>> from huggingface_hub import Repository
>>> repo = Repository("local", clone_from="xxx", revision="new-branch")
Once the repository is instantiated, it is possible to manually checkout revisions using the git_checkout
method. If the revision already exists:
>>> repo.git_checkout("main")
If a branch should be created from the current head in the case that it does not exist:
>>> repo.git_checkout("brand-new-branch", create_branch_ok=True)
Revision `brand-new-branch` does not exist. Created and checked out branch `brand-new-branch`
Finally, the commit
context manager has a new branch
parameter to specify to which branch the utility should push:
>>> with repo.commit("New commit on branch brand-new-branch", branch="brand-new-branch"):
... # Save any file or model here, it will be committed to that branch.
... torch.save(model.state_dict())
The login system has been redesigned to leverage git-credential
instead of a token-based authentication system. It leverages the git-credential store
helper. If you're unaware of what this is, you may see the following when logging in with huggingface_hub
:
_| _| _| _| _|_|_| _|_|_| _|_|_| _| _| _|_|_| _|_|_|_| _|_| _|_|_| _|_|_|_|
_| _| _| _| _| _| _| _|_| _| _| _| _| _| _| _|
_|_|_|_| _| _| _| _|_| _| _|_| _| _| _| _| _| _|_| _|_|_| _|_|_|_| _| _|_|_|
_| _| _| _| _| _| _| _| _| _| _|_| _| _| _| _| _| _| _|
_| _| _|_| _|_|_| _|_|_| _|_|_| _| _| _|_|_| _| _| _| _|_|_| _|_|_|_|
Username:
Password:
Login successful
Your token has been saved to /root/.huggingface/token
Authenticated through git-crendential store but this isn't the helper defined on your machine.
You will have to re-authenticate when pushing to the Hugging Face Hub. Run the following command in your terminal to set it as the default
git config --global credential.helper store
Running the command git config --global credential.helper store
will set this as the default way to handle credentials for git authentication. All repositories instantiated with the Repository
utility will have this helper set by default, so no action is required from your part when leveraging it.
The logging system is now similar to the existing logging system in transformers
and datasets
, based on a logging
module that controls the entire library's logging level:
>>> from huggingface_hub import logging
>>> logging.set_verbosity_error()
>>> logging.set_verbosity_info()
Repository
#219 (@LysandreJik)model-index
, and pipeline/task types #265 (@julien-c)Published by osanseviero about 3 years ago
filename
option to lfs_track
#212 (@LysandreJik)interfaces
-> widgets/lib/interfaces
#227 (@mishig25)Published by LysandreJik over 3 years ago
dataset_info
and list_datasets
, documentationDatasets repositories get better support, by first enabling full usage of the Repository
class for datasets repositories:
from huggingface_hub import Repository
repo = Repository("local_directory", clone_from="<user>/<model_id>", repo_type="dataset")
Datasets can now be retrieved from the Python runtime using the list_datasets
method from the HfApi
class:
from huggingface_hub import HfApi
api = HfApi()
datasets = api.list_datasets()
len(datasets)
# 1048 publicly available dataset repositories at the time of writing
Information can be retrieved on specific datasets using the dataset_info
method from the HfApi
class:
from huggingface_hub import HfApi
api = HfApi()
api.dataset_info("squad")
# DatasetInfo: {
# id: squad
# lastModified: 2021-07-07T13:18:53.595Z
# tags: ['pretty_name:SQuAD', 'annotations_creators:crowdsourced', 'language_creators:crowdsourced', 'language_creators:found',
# [...]
Version v0.0.14 introduces a wrapper client for the Inference API. No need to use custom-made requests
anymore. See below for an example.
from huggingface_hub import InferenceApi
api = InferenceApi("bert-base-uncased")
api(inputs="The [MASK] is great")
# [
# {'sequence': 'the music is great', 'score': 0.03599703311920166, 'token': 2189, 'token_str': 'music'},
# {'sequence': 'the price is great', 'score': 0.02146693877875805, 'token': 3976, 'token_str': 'price'},
# {'sequence': 'the money is great', 'score': 0.01866752654314041, 'token': 2769, 'token_str': 'money'},
# {'sequence': 'the fun is great', 'score': 0.01654735580086708, 'token': 4569, 'token_str': 'fun'},
# {'sequence': 'the effect is great', 'score': 0.015102624893188477, 'token': 3466, 'token_str': 'effect'}
# ]
Version v0.0.14 introduces an auto-tracking mechanism with git-lfs for large files. Files that are larger than 10MB can be automatically tracked by using the auto_track_large_files
method:
from huggingface_hub import Repository
repo = Repository("local_directory", clone_from="<user>/<model_id>")
# save large files in `local_directory`
repo.git_add()
repo.auto_track_large_files()
repo.git_commit("Add large files")
repo.git_push()
# No push rejected error anymore!
It is automatically used when leveraging the commit
context manager:
from huggingface_hub import Repository
repo = Repository("local_directory", clone_from="<user>/<model_id>")
with repo.commit("Add large files"):
# add large files
# No push rejected error anymore!
Reminder: the huggingface_hub
library follows semantic versioning and is undergoing active development. While the first major version is not out (v1.0.0), you should expect breaking changes and we strongly recommend pinning the library to a specific version.
Two breaking changes are introduced with version v0.0.14.
whoami
return changes from a tuple to a dictionaryThe whoami
method changes its returned value from a tuple of (<user>, [<organisations>])
to a dictionary containing a lot more information:
In versions v0.0.13 and below, here was the behavior of the whoami
method from the HfApi
class:
from huggingface_hub import HfFolder, HfApi
api = HfApi()
api.whoami(HfFolder.get_token())
# ('<user>', ['<org_0>', '<org_1>'])
In version v0.0.14, this is updated to the following:
from huggingface_hub import HfFolder, HfApi
api = HfApi()
api.whoami(HfFolder.get_token())
# {
# 'type': str,
# 'name': str,
# 'fullname': str,
# 'email': str,
# 'emailVerified': bool,
# 'apiToken': str,
# `plan': str,
# 'avatarUrl': str,
# 'orgs': List[str]
# }
Repository
's use_auth_token
initialization parameter now defaults to True
.The use_auth_token
initialization parameter of the Repository
class now defaults to True
. The behavior is unchanged if users are not logged in, at which point Repository
remains agnostic to the huggingface_hub
.
audio-to-audio
. #94 (@Narsil)rmdir api-inference-community/src/sentence-transformers
#188 (@Pierrci)--no_renames
argument to list deleted files. #205 (@LysandreJik)Published by LysandreJik over 3 years ago
Version 0.0.13 introduces a context manager to save files directly to the Hub. See below for some examples.
from huggingface_hub import Repository
repo = Repository("text-files", clone_from="<user>/text-files", use_auth_token=True)
with repo.commit("My first file."):
with open("file.txt", "w+") as f:
f.write(json.dumps({"key": "value"}))
torch.save
statement:import torch
from huggingface_hub import Repository
model = torch.nn.Transformer()
repo = Repository("torch-files", clone_from="<user>/torch-files", use_auth_token=True)
with repo.commit("Adding my cool model!"):
torch.save(model.state_dict(), "model.pt")
from flax import serialization
from jax import random
from flax import linen as nn
from huggingface_hub import Repository
model = nn.Dense(features=5)
key1, key2 = random.split(random.PRNGKey(0))
x = random.normal(key1, (10,))
params = model.init(key2, x)
bytes_output = serialization.to_bytes(params)
repo = Repository("flax-model", clone_from="<user>/flax-model", use_auth_token=True)
with repo.commit("Adding my cool Flax model!"):
with open("flax_model.msgpack", "wb") as f:
f.write(bytes_output)
Published by LysandreJik over 3 years ago
Patches an issue when cloning a repository twice.
Published by LysandreJik over 3 years ago
hf_hub_download
and Repository
power-upThe huggingface_hub
documentation is now available on hf.co/docs! Additionally, a new step-by-step guide to adding libraries is available.
hf_hub_download
A new method is introduced: hf_hub_download
. It is the equivalent of doing cached_download(hf_hub_url())
, in a single method.
Repository
power-upThe Repository
class is updated to behave more similarly to git. It is now impossible to clone a repository in a folder that already contains files.
The PyTorch Mixin contributed by @vasudevgupta7 is slightly updated to have the push_to_hub
method manage a repository as one would from the command line.
audio-to-audio
task. #93 (@Narsil)rmtree
issue on windows #105 (@SBrandeis)subprocess.run
#104 (@SBrandeis)tags
can be undefined
#107 (@Pierrci)upload_file
docs #136 (@LysandreJik)Published by LysandreJik over 3 years ago
huggingface_hub
with api-inference-community
and hub interfacesv0.0.10 Signs the merging of three components of the HuggingFace stack: the huggingface_hub
repository is now the central platform to contribute new libraries to be supported on the hub.
It regroups three previously separated components:
huggingface_hub
Python library, as the Python library to download, upload, and retrieve information from the hub.api-inference-community
, as the platform where libraries wishing for hub support may be added.interfaces
, as the definition for pipeline types as well as default widget inputs and definitions/UI elements for third-party libraries.Future efforts will be focused on further easing contributing third-party libraries to the Hugging Face Hub
widgets-server
#50 (@julien-c)api-inference-community
to huggingface_hub
. #48 (@Narsil)Published by LysandreJik over 3 years ago
Implementation of an endpoint to programmatically upload (large) files to any repo on the hub, without the need for git, using HTTP POST requests.
HfApi.model_list
method now allows multiple filtersModels may now be filtered using several filters:
Example usage:
>>> from huggingface_hub import HfApi
>>> api = HfApi()
>>> # List all models
>>> api.list_models()
>>> # List only the text classification models
>>> api.list_models(filter="text-classification")
>>> # List only the russian models compatible with pytorch
>>> api.list_models(filter=("ru", "pytorch"))
>>> # List only the models trained on the "common_voice" dataset
>>> api.list_models(filter="dataset:common_voice")
>>> # List only the models from the AllenNLP library
>>> api.list_models(filter="allennlp")
filter
argument #41 (@LysandreJik)ModelInfo
now has a readable representationImprovement of the ModelInfo
class so that it displays information about the object.
library_name
and library_version
in snapshot_download
#38 (@LysandreJik)Published by LysandreJik over 3 years ago
HfApi.model_info
method to retrieve information about a repo given a revision.snapshot_download
utility to download to cache all files stored in that repo at that given revision.Example usage of HfApi.model_info
:
from huggingface_hub import HfApi
hf_api = HfApi()
model_info = hf_api.model_info("lysandre/dummy-hf-hub")
print("Model ID:", model_info.modelId)
for file in model_info.siblings:
print("file:", file.rfilename)
outputs:
Model ID: lysandre/dummy-hf-hub
file: .gitattributes
file: README.md
Example usage of snapshot_download
:
from huggingface_hub import snapshot_download
import os
repo_path = snapshot_download("lysandre/dummy-hf-hub")
print(os.listdir(repo_path))
outputs:
['.gitattributes', 'README.md']
Published by julien-c over 3 years ago
Networking improvements by @Pierrci and @lhoestq (#21 and #22)
Adding mixin class for ease saving, uploading, downloading a PyTorch model. See PR #11 by @vasudevgupta7
Example usage:
from huggingface_hub import ModelHubMixin
class MyModel(nn.Module, ModelHubMixin):
def __init__(self, **kwargs):
super().__init__()
self.config = kwargs.pop("config", None)
self.layer = ...
def forward(self, ...):
return ...
model = MyModel()
# saving model to local directory & pushing to hub
model.save_pretrained("mymodel", push_to_hub=True, config={"act": "gelu"})
# initiatizing model & loading it from trained-weights
model = MyModel.from_pretrained("username/mymodel@main")
Thanks a ton for your contributions ♥️
Published by julien-c over 3 years ago
Published by julien-c over 3 years ago
Published by julien-c over 3 years ago
[Repository] more forgiving lfs_track
added stderr and exceptions (#19)
added stderr and exceptions
added log to push and check for commit error
fixed quality
doc improvements + small tweaks
[git_add] actually take input pattern
less brittle test if local_dir is an actual git repo
Co-authored-by: Julien Chaumond [email protected]
v0.0.3
Published by julien-c over 3 years ago
Published by julien-c over 3 years ago
Published by julien-c almost 4 years ago