Deploying Qwen2 (or any other GGUF models) into AWS Lambda
Speed Benchmarking 7B LLM on different gcloud VMs
AirLLM 70B inference with single 4GB GPU
Run any Large Language Model behind a unified API
Running Llama 2 and other Open-Source LLMs on CPU Inference Locally for Document Q&A
LightLLM is a Python-based LLM (Large Language Model) inference and serving framework, notable fo...
Finetune llama2-70b and codellama on MacBook Air without quantization
Ampere optimized llama.cpp
本地部署大语言模型的中文教学
WebAssembly binding for llama.cpp - Enabling in-browser LLM inference
♾️ toolkit for air-gapped LLMs on consumer-grade hardware
An open-source cloud-native of large multi-modal models (LMMs) serving framework.
Practical Llama 3 inference in Java
DeveloperGPT is a LLM-powered command line tool that enables natural language to terminal command...
Bootstrap a server from llama-cpp in a few lines of python
A lightweight library that leverages Language Models (LLMs) to enable natural language interactio...