Ampere optimized llama.cpp
Inferflow is an efficient and highly configurable inference engine for large language models (LLMs).
Running Llama 2 and other Open-Source LLMs on CPU Inference Locally for Document Q&A
llama.go is like llama.cpp in pure Golang!
开源社区第一个能下载、能运行的中文 LLaMA2 模型!
An innovative library for efficient LLM inference via low-bit quantization
Run any Llama 2 locally with gradio UI on GPU or CPU from anywhere (Linux/Windows/Mac). Use `llam...
🚀 this project aims to develop an app using an existing open-source LLM with data collected for d...
Run any Large Language Model behind a unified API
This repository contains a web application designed to execute relatively compact, locally-operat...
🏗️ Fine-tune, build, and deploy open-source LLMs easily!
LLM inference in Fortran
AirLLM 70B inference with single 4GB GPU
A self-hosted, offline, ChatGPT-like chatbot. Powered by Llama 2. 100% private, with no data leav...
Practical Llama 3 inference in Java
A high-performance inference system for large language models, designed for production environments.