learn-huggingface

Repo designed to help learn the Hugging Face ecosystem (transformers, datasets, accelerate + more).

APACHE-2.0 License

Stars
5

Learn Hugging Face 🤗 (work in progress)

I'd like to learn the Hugging Face ecosystem better (transformers, datasets, accelerate + more).

So this repo is to help me learn it and simulatenously teach others.

Each example will include an end-to-end approach of starting with a dataset (custom or existing), building and evaluating a model and creating a demo to share.

Teaching style:

A machine learning cooking show! 👨‍🍳

Mottos:

  • If in doubt, run the code. -- Machine learning is very experimental. So it's good to get in the habit of continually trying things (even if you think they won't work).
  • Visualize, visualize, visualize! - If you're not sure of some dataset or some operation or some predictions, visualize it/them.
  • Experiment, experiment, experiment! - Again, machine learning is very experimental. So keep trying different things!
  • Data, model, demo! - Create/get a dataset, build/train/evaluate a model, create a demo to share.

Project style:

Data, model, demo!

  • Create a new/reuse an existing dataset.
  • Train/evaluate a model.
  • Build a demo to share.

This will be our (rough) workflow:

Contents

All code and text will be free/open-source, video step-by-step walkthroughs are available as a paid upgrade.

Project Description Dataset Model Demo Video Course
Text classification Build project "Food Not Food", a text classification model to classify image captions into "food" if they're about food or "not_food" if they're not about food. This is the ideal place to get started if you've never used the Hugging Face ecosystem. Dataset Model Demo Video Course
More to come soon! Let me know if you'd like to see anything specific by leaving an issue.

Who is it for?

Ideal for:

  • Beginners who love things explained in detail.
  • Someone who wants to create more of their own end-to-end machine learning projects.

Not ideal for:

  • People with 2-3+ years of machine learning projects & experience^.

^Note: This being said, you may actually find some things helpful along the way. Best to explore and see!

Prerequisites

What is Hugging Face?

Hugging Face is a platform that offers access to many different kinds of open-source machine learning models and datasets.

They're also the creators of the popular transformers library (and many more helpful libraries) which is a Python-based library for working with pre-trained models as well as custom models.

If you're getting into the world of AI and machine learning, you're going to come across Hugging Face.

Why Hugging Face?

Many of the biggest companies in the world use Hugging Face for their open-source machine learning projects including Apple, Google, Facebook (Meta), Microsoft, OpenAI, ByteDance and more.

Not only does Hugging Face make it so you can use state-of-the-art machine learning models such as Stable Diffusion (for image generation) and Whipser (for audio transcription) easily, it also makes it so you can share your own models, datasets and resources.

Aside from your own website, consider Hugging Face the homepage of your AI/machine learning profile.

TODO

  • Prerequisites
  • Ecosystem overview (transformers, datasets, accelerate, tokenizers, Spaces, demos, models, hub etc.)
  • Text classification
  • Object detection
  • Named entity recognition
  • LLM fine-tuning
  • VLM fine-tuning
  • RAG workflow
  • Zero-shot image classification/multi-modal workflows (CLIP)

Setup

See setup.

Resources

FAQ

Is this an official Hugging Face website?

No, it's a personal project by myself (Daniel Bourke) to learn and help others learn the Hugging Face ecosystem.