Llama is an open-source toolkit for training and fine-tuning large language models (LLMs). It provides tools for efficient model development, including data preprocessing, training scripts, and model evaluation. Suitable for research and production, Llama supports various architectures and scales to accommodate different hardware setups.
PipeInfer: Accelerating LLM Inference using Asynchronous Pipelined Speculation
Inferflow is an efficient and highly configurable inference engine for large language models (LLMs)
High-speed Large Language Model Serving on PCs with Consumer-grade GPUs
An innovative library for efficient LLM inference via low-bit quantization