Speed Benchmarking 7B LLM on different gcloud VMs
GPL-3.0 License
An innovative library for efficient LLM inference via low-bit quantization
LLM-Inference-Bench
An open-source cloud-native of large multi-modal models (LMMs) serving framework.
本地部署大语言模型的中文教学
Access 14k+ open source AI models across 30+ tasks with the Bytez inference API ✨
A high-throughput and memory-efficient inference and serving engine for LLMs
This repository demonstrates how to do inference with llama-2-7b-chat using llama.cpp on a machin...
Run any Large Language Model behind a unified API
Practical Llama 3 inference in Java
Deploying Qwen2 (or any other GGUF models) into AWS Lambda
开源社区第一个能下载、能运行的中文 LLaMA2 模型!
Train Llama 2 & 3 on the SQuAD v2 task as an example of how to specialize a generalized (foundati...
LLM Benchmark for Throughput via Ollama (Local LLMs)
Chatbot Builds
LightLLM is a Python-based LLM (Large Language Model) inference and serving framework, notable fo...