You will learn how to train and fine-tune Llama 2 model from scratch.
Throught the series you will learn about transformers architecture, different attention mechanisms (MHA, MQA and GQA), KV cache, RoPE, and Hugginface Trainer in detail.
By the end, you will have created and trained a LLaMA 2 model with 100M parameters from scratch using PyTorch to do code completion.
🎥 YT Video Playlist:
You will learn how to train and fine-tune Llama 3 model from scratch.
The goal is to code LLaMA 3 from scratch in PyTorch to create models with sizes 3B, 6B, 35B and 45B params.
🎥 YT Video Playlist:
📚 Papers:
Introducing the world's first Llama-3 base model with 6B parameters. This model is a pretrained version of prince-canuma/Llama-3-6B-v0, which was created from Meta-Llama-3-8B using a technique called downcycling . The model was continually pretrained on 1 billion tokens of English-only text from fineweb, achieving impressive results on the evaluation set:
You can use this model to create instruct and chat versions for various use cases such as: Coding assistant, RAG, Function Calling and more.
This model inherits some of the base model's limitations and some additional ones from it's creation process, such as: