Inference Vision Transformer (ViT) in plain C/C++ with ggml
MIT License
An open-source cloud-native of large multi-modal models (LMMs) serving framework.
Chinese-Vicuna: A Chinese Instruction-following LLaMA-based Model —— 一个中文低资源的llama+lora方案,结构参考alpaca
A high-performance inference system for large language models, designed for production environments.
Inferflow is an efficient and highly configurable inference engine for large language models (LLMs).
AirLLM 70B inference with single 4GB GPU
A quick and optimized solution to manage llama based gguf quantized models, download gguf files, ...
An innovative library for efficient LLM inference via low-bit quantization
A lightweight library that leverages Language Models (LLMs) to enable natural language interactio...
♾️ toolkit for air-gapped LLMs on consumer-grade hardware
Finetune llama2-70b and codellama on MacBook Air without quantization
Lightweight inference library for ONNX files, written in C++. It can run Stable Diffusion XL 1.0 ...
Finetune Llama 3.2, Mistral, Phi & Gemma LLMs 2-5x faster with 80% less memory
High-speed Large Language Model Serving on PCs with Consumer-grade GPUs
Run any Llama 2 locally with gradio UI on GPU or CPU from anywhere (Linux/Windows/Mac). Use `llam...
glai - GGUF LLAMA AI - Package for simplified model handling and text generation with Llama model...