This repository demonstrates how to do inference with llama-2-7b-chat using llama.cpp on a machine with minimal specs.
Running Llama 2 and other Open-Source LLMs on CPU Inference Locally for Document Q&A
Chatbot from pretrained LLaMA-2 LLM model, fine-tuned with medical research papers using RAG (Ret...
♾️ toolkit for air-gapped LLMs on consumer-grade hardware
LightLLM is a Python-based LLM (Large Language Model) inference and serving framework, notable fo...
llama.go is like llama.cpp in pure Golang!
telegram bot for self-hosted local inference of stable diffusion, text-to-speech and large langua...
Run any Large Language Model behind a unified API
Inference code for CodeLlama models
🚀 this project aims to develop an app using an existing open-source LLM with data collected for d...
Inferflow is an efficient and highly configurable inference engine for large language models (LLMs).
LLM inference in C/C++
Chat with your favourite LLaMA models in a native macOS app
Training LLaMA language model with MMEngine! It supports LoRA fine-tuning!
AirLLM 70B inference with single 4GB GPU
Inference code for Llama models