LLM Inference benchmark
MIT License
Access 14k+ open source AI models across 30+ tasks with the Bytez inference API ✨
Efficient, scalable and enterprise-grade CPU/GPU inference server for 🤗 Hugging Face transformer ...
LLM as a Chatbot Service
4 bits quantization of LLaMA using GPTQ
Explore training for quantized models
AutoAWQ implements the AWQ algorithm for 4-bit quantization with a 2x speedup during inference. D...
Openai-style, fast & lightweight local language model inference w/ documents