llm-inference-benchmark

LLM Inference benchmark

MIT License

Stars

341

Ecosystems: Python

Total Pull Requests

Merged Pull Requests

Total Issues

Time to Close Issues

N/A

Access 14k+ open source AI models across 30+ tasks with the Bytez inference API ✨

Efficient, scalable and enterprise-grade CPU/GPU inference server for 🤗 Hugging Face transformer ...

LLM as a Chatbot Service

4 bits quantization of LLaMA using GPTQ

Explore training for quantized models

AutoAWQ implements the AWQ algorithm for 4-bit quantization with a 2x speedup during inference. D...

Openai-style, fast & lightweight local language model inference w/ documents