Why do ML/AI in Postgres?

Data for ML & AI systems is inherently larger and more dynamic than the models. It's more efficient, manageable and reliable to move models to the database, rather than constantly moving data to the models.

Architecture

Features at a glance

In-Database ML/AI: Run machine learning and AI operations directly within PostgreSQL
GPU Acceleration: Leverage GPU power for faster computations and model inference
Large Language Models: Integrate and use state-of-the-art LLMs from Hugging Face
RAG Pipeline: Built-in functions for chunking, embedding, ranking, and transforming text
Vector Search: Efficient similarity search using pgvector integration
Diverse ML Algorithms: 47+ classification and regression algorithms available
High Performance: 8-40X faster inference compared to HTTP-based model serving
Scalability: Support for millions of transactions per second and horizontal scaling
NLP Tasks: Wide range of natural language processing capabilities
Security: Enhanced data privacy by keeping models and data together
Seamless Integration: Works with existing PostgreSQL tools and client libraries

Getting started

The only prerequisites for using PostgresML is a Postgres database with our open-source pgml extension installed.

PostgresML Cloud

Our serverless cloud is the easiest and recommend way to get started.

Sign up for a free PostgresML account. You'll get a free database in seconds, with access to GPUs and state of the art LLMs.

Self-hosted

If you don't want to use our cloud you can self host it.

docker run \
    -it \
    -v postgresml_data:/var/lib/postgresql \
    -p 5433:5432 \
    -p 8000:8000 \
    ghcr.io/postgresml/postgresml:2.7.12 \
    sudo -u postgresml psql -d postgresml

For more details, take a look at our Quick Start with Docker documentation.

Ecosystem

We have a number of other tools and libraries that are specifically designed to work with PostgreML. Remeber PostgresML is a postgres extension running inside of Postgres so you can connect with psql and use any of your favorite tooling and client libraries like psycopg to connect and run queries.

PostgresML Specific Client Libraries:

Korvus - Korvus is a Python, JavaScript, Rust and C search SDK that unifies the entire RAG pipeline in a single database query.
postgresml-django - postgresml-django is a Python module that integrates PostgresML with Django ORM.

Large language models

PostgresML brings models directly to your data, eliminating the need for costly and time-consuming data transfers. This approach significantly enhances performance, security, and scalability for AI-driven applications.

By running models within the database, PostgresML enables:

Reduced latency and improved query performance
Enhanced data privacy and security
Simplified infrastructure management
Seamless integration with existing database operations

Hugging Face

PostgresML supports a wide range of state-of-the-art deep learning architectures available on the Hugging Face model hub. This integration allows you to:

Access thousands of pre-trained models
Utilize cutting-edge NLP, computer vision, and other AI models
Easily experiment with different architectures

OpenAI and other providers

While cloud-based LLM providers offer powerful capabilities, making API calls from within the database can introduce latency, security risks, and potential compliance issues. Currently, PostgresML does not directly support integration with remote LLM providers like OpenAI.

RAG

PostgresML transforms your PostgreSQL database into a powerful vector database for Retrieval-Augmented Generation (RAG) applications. It leverages pgvector for efficient storage and retrieval of embeddings.

Our RAG implementation is built on four key SQL functions:

Chunk: Splits text into manageable segments
Embed: Generates vector embeddings from text using pre-trained models
Rank: Performs similarity search on embeddings
Transform: Applies language models for text generation or transformation

For more information on using RAG with PostgresML see our guide on Unified RAG.

Chunk

The pgml.chunk function chunks documents using the specified splitter. This is typically done before embedding.

pgml.chunk(
    splitter TEXT,    -- splitter name
    text TEXT,        -- text to embed
    kwargs JSON       -- optional arguments (see below)
)

See pgml.chunk docs for more information.

Embed

The pgml.embed function generates embeddings from text using in-database models.

pgml.embed(
    transformer TEXT,
    "text" TEXT,
    kwargs JSONB
)

See pgml.embed docs for more information.

Rank

The pgml.rank function uses Cross-Encoders to score sentence pairs.

This is typically used as a re-ranking step when performing search.

pgml.rank(
    transformer TEXT,
    query TEXT,
    documents TEXT[],
    kwargs JSONB
)

Docs coming soon.

Transform

The pgml.transform function can be used to generate text.

SELECT pgml.transform(
    task   => TEXT OR JSONB,     -- Pipeline initializer arguments
    inputs => TEXT[] OR BYTEA[], -- inputs for inference
    args   => JSONB              -- (optional) arguments to the pipeline.
)

See pgml.transform docs for more information.

See our Text Generation guide for a guide on generating text.

Machine learning

Some highlights:

Training a classification model

Training

SELECT * FROM pgml.train(
    'Handwritten Digit Image Classifier',
    algorithm => 'xgboost',
    'classification',
    'pgml.digits',
    'target'
);

Inference

SELECT pgml.predict(
    'My Classification Project',
    ARRAY[0.1, 2.0, 5.0]
) AS prediction;

NLP

The pgml.transform function exposes a number of available NLP tasks.

Available tasks are:

Package Rankings

Top 6.75% on Proxy.golang.org

Top 4.9% on Pypi.org

Top 14.51% on Npmjs.org

Related Projects

postgresql-multimodal-retrieval

Vector/Hybrid Search & Retrieval on PostgreSQL database using Vision Language Model.

05 Jul 2024 3

pogi

Javascript library for PostgreSQL and node.js

01 Oct 2016 140

awesome-postgres

A curated list of awesome PostgreSQL software, libraries, tools and resources, inspired by awesom...

02 Aug 2015 9,603

gpdb

Greenplum Database - Massively Parallel PostgreSQL for Analytics. An open-source massively parall...

23 Oct 2015 6,199

postgres_dba

The missing set of useful tools for Postgres DBAs and all engineers

10 May 2017 1,039

PostgreSQL-Is-Awesome

Resources I found during my journey with PostgreSQL.

28 Feb 2021 17

pg-nano

Postgres native driver for TypeScript: automatic type definitions for Postgres functions, instant...

30 Aug 2024 37

libpg-query-node

libpg_query PG port for node.js

03 Jul 2020 48

postgres-operator

Postgres operator creates and manages PostgreSQL clusters running in Kubernetes

12 May 2017 3,951

pgsql-parser

PostgreSQL Query Parser for Node.js

17 Jun 2020 152

plpgsql_bm25

BM25 search implemented in PL/pgSQL

12 Sep 2024 3

pgwire

PostgreSQL wire protocol implemented as a rust library.

11 Jun 2022 509

postgres_lsp

A Language Server for Postgres

17 May 2023 3,132

peerdb

Fast, Simple and a cost effective tool to replicate data from Postgres to Data Warehouses, Queues...

15 Apr 2023 1,849

postgres-new

In-browser Postgres sandbox with AI assistance

22 Jul 2024 2,260