Efficient AI Inference & Serving
APACHE-2.0 License
A recipe for online RLHF and online iterative DPO.
Easy and Efficient Finetuning LLMs. (Supported LLama, LLama2, LLama3, Qwen, Baichuan, GLM , Fal...
AirLLM 70B inference with single 4GB GPU
A high-performance inference system for large language models, designed for production environments.
[NeurIPS'23 Oral] Visual Instruction Tuning (LLaVA) built towards GPT-4V level capabilities and b...
LLaVA-NeXT-Image-Llama3-Lora, Modified from https://github.com/arielnlee/LLaVA-1.6-ft
Inferflow is an efficient and highly configurable inference engine for large language models (LLMs).
TextGen: Implementation of Text Generation models, include LLaMA, BLOOM, GPT2, BART, T5, SongNet ...
LMDeploy is a toolkit for compressing, deploying, and serving LLMs.
LightLLM is a Python-based LLM (Large Language Model) inference and serving framework, notable fo...
An efficient, flexible and full-featured toolkit for fine-tuning LLM (InternLM2, Llama3, Phi3, Qw...
EAGLE: Exploring The Design Space for Multimodal LLMs with Mixture of Encoders
A high-throughput and memory-efficient inference and serving engine for LLMs
KoAlpaca: 한국어 명령어를 이해하는 오픈소스 언어모델
High-speed Large Language Model Serving on PCs with Consumer-grade GPUs