LLMCompiler: An LLM Compiler for Parallel Function Calling
MIT License
An efficient, flexible and full-featured toolkit for fine-tuning LLM (InternLM2, Llama3, Phi3, Qw...
Inferflow is an efficient and highly configurable inference engine for large language models (LLMs).
[NeurIPS 2023] LLM-Pruner: On the Structural Pruning of Large Language Models. Support Llama-3/3....
Python bindings for llama.cpp
High-speed Large Language Model Serving on PCs with Consumer-grade GPUs
LLM-Inference-Bench
LLM inference in C/C++
A lightweight library that leverages Language Models (LLMs) to enable natural language interactio...
LLaVA-MORE: Enhancing Visual Instruction Tuning with LLaMA 3.1
Query LLM with Chain-of-Tought
LLM inference in Fortran
`llm-chain` is a powerful rust crate for building chains in large language models allowing you to...
A high-throughput and memory-efficient inference and serving engine for LLMs
LightLLM is a Python-based LLM (Large Language Model) inference and serving framework, notable fo...
Running Llama 2 and other Open-Source LLMs on CPU Inference Locally for Document Q&A