LLaMA-Omni is a low-latency and high-quality end-to-end speech interaction model built upon Llama-3.1-8B-Instruct, aiming to achieve speech capabilities at the GPT-4o level.
APACHE-2.0 License
Lifelike Audio-Driven Portrait Animations through Editable Landmark Conditioning
OpenChat: Advancing Open-source Language Models with Imperfect Data
Xwin-LM: Powerful, Stable, and Reproducible LLM Alignment
WhisperPlus: Advancing Speech-to-Text Processing 🚀
[ICLR 2024] Fine-tuning LLaMA to follow Instructions within 1 Hour and 1.2M Parameters
⚡LLM Zoo is a project that provides data, models, and evaluation benchmark for large language mod...
【EMNLP 2024🔥】Video-LLaVA: Learning United Visual Representation by Alignment Before Projection
Amphion (/æmˈfaɪən/) is a toolkit for Audio, Music, and Speech Generation. Its purpose is to supp...
Dromedary: towards helpful, ethical and reliable LLMs.
VILA - a multi-image visual language model with training, inference and evaluation recipe, deploy...
open-source multimodal large language model that can hear, talk while thinking. Featuring real-ti...
Cambrian-1 is a family of multimodal LLMs with a vision-centric design.
An Open-source Toolkit for LLM Development
Mixture-of-Experts for Large Vision-Language Models
LongLLaMA is a large language model capable of handling long contexts. It is based on OpenLLaMA a...