Memory optimization and training recipes to extrapolate language models' context length to 1 million tokens, with minimal hardware.
APACHE-2.0 License
Statistics for this project are still being loaded, please check back later.
Some preliminary explorations of Mamba's context scaling.
【EMNLP 2024🔥】Video-LLaVA: Learning United Visual Representation by Alignment Before Projection
The TinyLlama project is an open endeavor to pretrain a 1.1B Llama model on 3 trillion tokens.
Mixture-of-Experts for Large Vision-Language Models
An open platform for training, serving, and evaluating large language models. Release repo for Vi...
LongLLaMA is a large language model capable of handling long contexts. It is based on OpenLLaMA a...
⚡LLM Zoo is a project that provides data, models, and evaluation benchmark for large language mod...
Code and documents of LongLoRA and LongAlpaca (ICLR 2024 Oral)
YaRN: Efficient Context Window Extension of Large Language Models
PixArt-α: Fast Training of Diffusion Transformer for Photorealistic Text-to-Image Synthesis
A family of open-sourced Mixture-of-Experts (MoE) Large Language Models
Fast and memory-efficient exact attention
Official code implementation of Vary: Scaling Up the Vision Vocabulary of Large Vision Language M...
Extend existing LLMs way beyond the original training length with constant memory usage, without ...