ovai - ollama-vertex-ai

HTTP proxy for accessing Vertex AI with the REST API interface of ollama. Optionally forwarding requests for other models to ollama. Written in Go.

Synopsis

Get embeddings for a text:

❯ curl localhost:22434/api/embeddings -d '{
  "model": "textembedding-gecko@003",
  "prompt": "Half-orc is the best race for a barbarian."
}'

{ "embedding": [0.05424513295292854, -0.023687424138188362, ...] }

Setup

Download Make sure that you have installed Go 1.22 or newer.

Download an archive with the executable for your hardware and operating system from GitHub Releases.
Download a JSON file with your Google account key from Google Project Console and save it to the current directory under the name google-account.json.
Optionally create a file model-defaults.json in the current directory to change the default model parameters.
Run the server:

❯ ovai

Listening on http://localhost:22434 ...

Configuring

The following properties from google-account.json are used:

{
  "project_id": "...",
  "private_key_id": "...",
  "private_key": "-----BEGIN PRIVATE KEY-----\n...\n-----END PRIVATE KEY-----\n",
  "client_email": "...",
  "scope": "https://www.googleapis.com/auth/cloud-platform", // optional, can be missing
  "auth_uri": "https://www.googleapis.com/oauth2/v4/token"   // optional, can be missing
}

Set the environment variable PORT to override the default port 22434.

Set the environment variable DEBUG to one or more strings separated by commas to customise logging on stderr. The default value is ovai when run on the command line and ovai:srv inside the Docker container.

`DEBUG` value	What will be logged
`ovai`	important information about the bodies of requests and responses
`ovai:srv`	methods and URLs of requests and status codes of responses
`ovai:net`	requests forwarded to Vertex AI and received responses
`ovai,ovai:*`	all information above

Set the environment variable OLLAMA_ORIGIN to the origin of the ollama service to enable forwarding to ollama. If the requested model doesn't start with gemini, chat-bison, text-bison or textembedding-gecko, the request will be forwarded to the ollama service. This can be used for using ovai as the single service with the ollama interface, which recognises both Vertex AI and ollama models.

Set the environment variable NETWORK to enforce IPV4 or IPV6. The default behaviour is to depend on tHe Happy Eyeballs implementation in Go and in the underlying OS. valid values:

`NETWORK` value	What will be used
`IPV4`	enforce the network connection via IPV4 only
`IPV6`	enforce the network connection via IPV6 only

Docker

For example, run a container for testing purposes with verbose logging, deleted on exit, exposing the port 22434:

docker run --rm -it -p 22434:22434 -e DEBUG=ovai,ovai:* \
  -v ${PWD}/google-account.json:/usr/src/app/google-account.json \
  ghcr.io/prantlf/ovai

For example, run a container named ovai in the background with custom defaults, forwarding to ollama, exposing the port 22434:

docker run --rm -dt -p 22434:22434 --name ovai \
  --add-host host.docker.internal:host-gateway \
  -e OLLAMA_ORIGIN=http://host.docker.internal:11434 \
  -v ${PWD}/google-account.json:/usr/src/app/google-account.json \
  -v ${PWD}/model-defaults.json:/usr/src/app/model-defaults.json \
  prantlf/ovai

And the same task as above, only using Docker Compose (place docker-compose.yml to the current directory) to make it easier:

docker-compose up -d

The image is available as both ghcr.io/prantlf/ovai (GitHub) or prantlf/ovai (DockerHub).

Building

Make sure that you have installed Go 1.22 or newer.

git clone https://github.com/prantlf/ovai.git
cd ovai
make

Executing ./ovai, make docker-start or make docker-up will require the google-account.json file in the current directory.

API

See the original REST API documentation for details about the interface. See also the lifecycle of the Vertex AI models.

Embeddings

Creates a vector from the specified prompt. See the available embedding models.

❯ curl localhost:22434/api/embeddings -d '{
  "model": "textembedding-gecko@003",
  "prompt": "Half-orc is the best race for a barbarian."
}'

{ "embedding": [0.05424513295292854, -0.023687424138188362, ...] }

The returned vector of floats has 768 dimensions.

Text

Generates a text using the specified prompt. See the available bison text models and gemini chat models.

❯ curl localhost:22434/api/generate -d '{
  "model": "gemini-1.5-pro-preview-0409",
  "prompt": "Describe guilds from Dungeons and Dragons.",
  "images": [],
  "stream": false
}'

{
  "model": "gemini-1.5-pro-preview-0409",
  "created_at": "2024-05-10T14:10:54.885Z",
  "response": "Guilds serve as organizations that bring together individuals with ...",
  "done": true,
  "total_duration": 13884049373,
  "load_duration": 0,
  "prompt_eval_count": 7,
  "prompt_eval_duration: 3471012343,
  "eval_count: 557,
  "eval_duration: 10413037030
}

The property stream has to be always set to false, because the streaming mode isn't supported. The property options is optional with the following defaults:

"options": {
  "num_predict": 8192,
  "temperature": 1,
  "top_p": 0.95,
  "top_k": 40
}

Chat

Replies to a chat with the specified message history. See the available bison chat models and gemini chat models.

❯ curl localhost:22434/api/chat -d '{
  "model": "gemini-1.0-pro",
  "messages": [
    {
      "role": "system",
      "content": "You are an expert on Dungeons and Dragons."
    },
    {
      "role": "user",
      "content": "What race is the best for a barbarian?",
      "images": []
    }
  ],
  "stream": false
}'

{
  "model": "gemini-1.0-pro",
  "created_at": "2024-05-06T23:32:05.219Z",
  "message": {
    "role": "assistant",
    "content": "Half-Orcs are a strong and resilient race, making them ideal for barbarians. ..."
  },
  "done": true,
  "total_duration": 2325524053,
  "load_duration": 0,
  "prompt_eval_count": 9,
  "prompt_eval_duration: 581381013,
  "eval_count: 292,
  "eval_duration: 1744143040
}

The property stream has to be always set to false, because the streaming mode isn't supported. The property options is optional with the following defaults:

"options": {
  "num_predict": 8192,
  "temperature": 1,
  "top_p": 0.95,
  "top_k": 40
}

Show

Show information about a model.

❯ curl localhost:22434/api/chat -d '{"name":"moondream"}'

{
  "license": "....",
  "modelfile": "...",
  "parameters": "temperature 0\nstop \"\u003c|endoftext|\u003e\"\nstop \"Question:\"",
  "template": "{{ if .Prompt }} Question: {{ .Prompt }}\n\n{{ end }} Answer: {{ .Response }}\n\n",
  "details": {
    "parent_model": "",
    "format": "gguf",
    "family": "phi2",
    "families": [
      "phi2",
      "clip"
    ],
    "parameter_size": "1B",
    "quantization_level": "Q4_0"
  }
}

Ping

Checks that the server is running.

❯ curl -f localhost:22434/api/ping -X HEAD

Shutdown

Gracefully shuts down the HTTP server and exits the process.

❯ curl localhost:22434/api/shutdown -X POST

Contributing

In lieu of a formal styleguide, take care to maintain the existing coding style. Lint and test your code.

License

Licensed under the MIT License.

Package Rankings

Top 7.48% on Proxy.golang.org

Related Projects

gollama

Gollama: Your offline conversational AI companion. An interactive tool for generating creative re...

26 Feb 2024 93

ollama-operator

Yet another operator for running large language models on Kubernetes with ease. Powered by Ollama! 🐫

10 Apr 2024 7

gollama

Go manage your Ollama models

30 May 2024 399

ollama-ai

A Ruby gem for interacting with Ollama's API that allows you to run open source AI LLMs (Large La...

06 Jan 2024 178

aicommit2

A Reactive CLI that generates git commit messages with various AI

30 Jan 2024 11

modelfusion

The TypeScript library for building AI applications.

25 May 2023 889

ollama-gui

A Web Interface for chatting with your local LLMs via the ollama API

08 Oct 2023 512

rust-genai

Rust multiprovider generative AI client (Ollama, OpenAi, Anthropic, Groq, Gemini, Cohere, ...)

01 Jun 2024 182

obsidian-bmo-chatbot

Generate and brainstorm ideas while creating your notes using Large Language Models (LLMs) from O...

19 Mar 2023 344

pikobrain

Function-calling API for LLM from multiple providers

04 Aug 2024 4

vnc-lm

vnc-lm is a Discord bot that lets you talk with and configure language models in your server. It ...

31 Aug 2024 25

py-gpt

Desktop AI Assistant powered by GPT-4, GPT-4 Vision, GPT-3.5, Gemini, Claude, Llama 3, DALL-E, La...

09 Apr 2023 547

open-llm-webui

This repository contains a web application designed to execute relatively compact, locally-operat...

17 May 2023 39

astra-assistants-api

Drop in replacement for the OpenAI Assistants API

15 Nov 2023 142

Webscout

Search for anything using the Google, DuckDuckGo, phind.com. Also containes AI models, can transc...

27 Feb 2024 24

ovai

ovai - ollama-vertex-ai

Synopsis

Setup

Configuring

Docker

Building

API

Embeddings

Text

Chat

Tags

Show

Ping

Shutdown

Contributing

License

Related Projects

gollama

ollama-operator

gollama

ollama-ai

aicommit2

modelfusion

ollama-gui

rust-genai

obsidian-bmo-chatbot

pikobrain

vnc-lm

py-gpt

open-llm-webui

astra-assistants-api

Webscout