TensorZero

TensorZero enables LLM applications that learn from real-world experience.

Integrate our model gateway
Send metrics or feedback
Optimize prompts, models, and inference-time strategies
Unlock compounding improvements in quality, cost, and latency

It provides a data & learning flywheel for LLMs by unifying:

Inference: one API for all LLMs, with <1ms P99 overhead
Observability: inference & feedback → your database
Optimization: from prompts to fine-tuning and RL (& even 🍓? →)
Experimentation: built-in A/B testing, routing, fallbacks

Overview

The TensorZero Gateway is a high-performance model gateway written in Rust 🦀 that provides a unified API interface for all major LLM providers, allowing for seamless cross-platform integration and fallbacks.
It handles structured schema-based inference with <1ms P99 latency overhead (see Benchmarks) and built-in observability, experimentation, and inference-time optimizations.
It also collects downstream metrics and feedback associated with these inferences, with first-class support for multi-step LLM systems.
Everything is stored in a ClickHouse data warehouse that you control for real-time, scalable, and developer-friendly analytics.
Over time, TensorZero Recipes leverage this structured dataset to optimize your prompts and models: run pre-built recipes for common workflows like fine-tuning, or create your own with complete flexibility using any language and platform.
Finally, the gateway's experimentation features and GitOps orchestration enable you to iterate and deploy with confidence, be it a single LLM or thousands of LLMs.

Our goal is to help engineers build, manage, and optimize the next generation of LLM applications: systems that learn from real-world experience. Read more about our Vision & Roadmap.

Get Started

Next steps? The Quick Start shows it's easy to set up an LLM application with TensorZero. If you want to dive deeper, the Tutorial teaches how to build a simple chatbot, an email copilot, a weather RAG system, and a structured data extraction pipeline.

Questions? Ask us on Slack or Discord.

Using TensorZero at work? Email us at [email protected] to set up a Slack or Teams channel with your team (free).

Examples

We are working on a series of complete runnable examples illustrating TensorZero's data & learning flywheel.

Writing Haikus to Satisfy a Judge with Hidden Preferences

This example fine-tunes GPT-4o Mini to generate haikus tailored to a specific taste. You'll see TensorZero's "data flywheel in a box" in action: better variants leads to better data, and better data leads to better variants. You'll see progress by fine-tuning the LLM multiple times.

Improving Data Extraction (NER) by Fine-Tuning a Llama 3 Model

This example shows that an optimized Llama 3.1 8B model can be trained to outperform GPT-4o on a Named Entity Recognition (NER) task using a small amount of training data, and served by Fireworks at a fraction of the cost and latency.

Improving LLM Chess Ability with Best-of-N Sampling

This example showcases how best-of-N sampling can significantly enhance an LLM's chess-playing abilities by selecting the most promising moves from multiple generated options.

Improving Data Extraction (NER) with Dynamic In-Context Learning

This example demonstrates how Dynamic In-Context Learning (DICL) can enhance Named Entity Recognition (NER) performance by leveraging relevant historical examples to improve data extraction accuracy and consistency without having to fine-tune a model.

Improving Math Reasoning with a Custom Recipe for Automated Prompt Engineering (DSPy)

TensorZero provides a number of pre-built optimization recipes covering common LLM engineering workflows. But you can also easily create your own recipes and workflows! This example shows how to optimize a TensorZero function using an arbitrary tool — here, DSPy.

& many more on the way!

Package Rankings

Top 38.88% on Npmjs.org

Related Projects

Llama-2-Open-Source-LLM-CPU-Inference

Running Llama 2 and other Open-Source LLMs on CPU Inference Locally for Document Q&A

06 Jul 2023 945

xtuner

An efficient, flexible and full-featured toolkit for fine-tuning LLM (InternLM2, Llama3, Phi3, Qw...

11 Jul 2023 3,820

Get-Things-Done-with-Prompt-Engineering-and-LangChain

LangChain & Prompt Engineering tutorials on Large Language Models (LLMs) such as ChatGPT with cus...

12 Apr 2023 1,094

ScaleLLM

A high-performance inference system for large language models, designed for production environments.

24 Jul 2023 289

KnowLM

An Open-sourced Knowledgable Large Language Model Framework.

01 Apr 2023 980

LLaVA-NeXT-Image-Llama3-Lora

LLaVA-NeXT-Image-Llama3-Lora, Modified from https://github.com/arielnlee/LLaVA-1.6-ft

24 Jun 2024 37

llm.js

Simple LLM library for JavaScript

21 Apr 2023 8

lightllm

LightLLM is a Python-based LLM (Large Language Model) inference and serving framework, notable fo...

22 Jul 2023 1,967

LLaVA

[NeurIPS'23 Oral] Visual Instruction Tuning (LLaVA) built towards GPT-4V level capabilities and b...

17 Apr 2023 19,659

open-strawberry

Building open version of OpenAI o1 via reasoning traces (Groq, ollama, Anthropic, Gemini, OpenAI,...

15 Sep 2024 69

xllm

🦖 X—LLM: Cutting Edge & Easy LLM Finetuning

10 Nov 2023 375

awesome-llm-and-aigc

🚀🚀🚀A collection of some awesome public projects about Large Language Model, Vision Foundation Mod...

15 Feb 2023 516

airllm

AirLLM 70B inference with single 4GB GPU

12 Jun 2023 4,536

LLamaTuner

Easy and Efficient Finetuning LLMs. (Supported LLama, LLama2, LLama3, Qwen, Baichuan, GLM , Fal...

25 May 2023 568

xTuring

Build, customize and control you own LLMs. From data pre-processing to fine-tuning, xTuring provi...

19 Mar 2023 2,527

tensorzero