Bot releases are visible (Hide)

uform - v3.0.2 Latest Release

Published by ashvardanian 6 months ago

3.0.2 (2024-04-25)

Make

Change NPM name (e97977e)

uform - v3.0.1

Published by ashvardanian 6 months ago

3.0.1 (2024-04-25)

Make

Upgrade CI (83fc71a)

uform - UForm v3 for 3 platforms 🕸️🍏🐍

Published by ashvardanian 6 months ago

Multimodal Embeddings for JavaScript, Swift, and Python

How many AI models can run on-device out of the box? UForm multimodal embeddings can 🥳

Model	Parameters	Languages	Architecture
`uform3-image-text-english-large` 🆕	365M	1	6 text layers, ViT-L/14, 6 multimodal layers
`uform3-image-text-english-base`	143M	1	2 text layers, ViT-B/16, 2 multimodal layers
`uform3-image-text-english-small` 🆕	79M	1	2 text layers, ViT-S/16, 2 multimodal layers
`uform3-image-text-multilingual-base`	206M	21	8 text layers, ViT-B/16, 4 multimodal layers

JavaScript

Load the models and preprocessors for different modalities:

import { getModel, Modality, TextProcessor, TextEncoder, ImageEncoder, ImageProcessor } from '@unum-cloud/uform';

const { configPath, modalityPaths, tokenizerPath } = await getModel({
    modelId: 'unum-cloud/uform3-image-text-english-small',
    modalities: [Modality.TextEncoder, Modality.ImageEncoder],
});

Embed images:

const imageProcessor = new ImageProcessor(configPath);
await imageProcessor.init();
const processedImages = await imageProcessor.process("path/to/image.png");

const imageEncoder = new ImageEncoder(modalityPaths.image_encoder, imageProcessor);
await imageEncoder.init();
const imageOutput = await imageEncoder.encode(processedImages);
assert(imageOutput.embeddings.dims.length === 2, "Output should be 2D");

Embed queries:

const textProcessor = new TextProcessor(configPath, tokenizerPath);
await textProcessor.init();
const processedTexts = await textProcessor.process("a small red panda in a zoo");

const textEncoder = new TextEncoder(modalityPaths.text_encoder, textProcessor);
await textEncoder.init();
const textOutput = await textEncoder.encode(processedTexts);
assert(textOutput.embeddings.dims.length === 2, "Output should be 2D");
await textEncoder.dispose();

Swift

Embed images:

let imageModel = try await ImageEncoder(modelName: "unum-cloud/uform3-image-text-english-small")
let imageURL = "https://github.com/ashvardanian/ashvardanian/blob/master/demos/bbq-on-beach.jpg?raw=true"
guard let url = URL(string: imageURL),
    let imageSource = CGImageSourceCreateWithURL(url as CFURL, nil),
    let cgImage = CGImageSourceCreateImageAtIndex(imageSource, 0, nil) {
    throw Exception("Could not load image from URL: \(imageURL)")
}

var imageEmbedding: Embedding = try imageModel.encode(cgImage)
var imageVector: [Float32] = embedding.asFloats()

Embed queries:

let textModel = try await TextEncoder(modelName: "unum-cloud/uform3-image-text-english-small")
let text = "A group of friends enjoy a barbecue on a sandy beach, with one person grilling over a large black grill, while the other sits nearby, laughing and enjoying the camaraderie."
let textEmbedding: Embedding = try textModel.encode(text)
let textVector: [Float32] = textEmbedding.asFloats()

Python

Load model:

from uform import get_model, Modality

model_name = 'unum-cloud/uform3-image-text-english-small'
modalities = [Modality.TEXT_ENCODER, Modality.IMAGE_ENCODER]
processors, models = get_model(model_name, modalities=modalities)

Embed images:

import requests
from io import BytesIO
from PIL import Image

image_url = 'https://media-cdn.tripadvisor.com/media/photo-s/1b/28/6b/53/lovely-armenia.jpg'
image = Image.open(BytesIO(requests.get(image_url).content))

processor_image = processors[Modality.IMAGE_ENCODER]
model_image = models[Modality.IMAGE_ENCODER]
image_data = processor_image(image)
image_features, image_embedding = model_image.encode(image_data, return_features=True)

Embed queries:

text = 'a cityscape bathed in the warm glow of the sun, with varied architecture and a towering, snow-capped mountain rising majestically in the background'

model_text = models[Modality.TEXT_ENCODER]
processor_text = processors[Modality.TEXT_ENCODER]

text_data = processor_text(text)
text_features, text_embedding = model_text.encode(text_data, return_features=True)

Thanks to @xenova and @sroussey for help with JavaScript!
Thanks to @vmanot and @pcuenca for their work on Swift!

uform - v2.1.1

Published by ashvardanian 6 months ago

2.1.1 (2024-04-16)

Fix

Importing ViT in gen_model.py (#80) (21f49ba), closes #80

uform - v2.1.0

Published by ashvardanian 6 months ago

2.1.0 (2024-04-14)

Add

Initial Swift support (00bd84c)

Fix

Image preprocessing in Swift (f2772d0)

Improve

Fetching nested configs (729b9d9)

Make

Formatting Swift code (f6faf4c)

uform - v2.0.2

Published by ashvardanian 7 months ago

2.0.2 (2024-03-28)

Make

Fix PyPi CI version with hash (364afe6)

uform - v2.0.1

Published by ashvardanian 7 months ago

2.0.1 (2024-03-28)

Make

PyPi upload version (9453802)

uform - Multimodal Matryoshka, Multimodal DPO, and ONNX 🎉

Published by ashvardanian 7 months ago

DPO Preview

Today we are releasing a new batch of multimodal models trained with Nebius and already available on HuggingFace 🤗

Matryoshka style multimodal embeddings ranging from 64 to 256 and 768 dimensions 🖼️
Improved multimodal chat in 1.2B parameters, tuned with Direct Preference Optimization 💬
ONNX backend, making PyTorch dependency optional for lightning fast deployments ⚡

uform - v1.1.1: Polishing the Repo

Published by ashvardanian 8 months ago

Great thanks to @lmmx, @blackforestboi, and @kapulkin for their patches to the project!

Performance observations for M2 CPUs (#56) (8374ef6), closes #56
Passing labels to text_decoder to compute loss. (#65) (f445a8b), closes #65
Larger batch benchmarks (fdc8587)
pre-commit config and linters (#62) (0a3efac), closes #62

uform - v1.1.0

Published by ashvardanian 8 months ago

1.1.0 (2024-02-15)

Add

gen2 model (#66) (37c26bc), closes #66

uform - v1.0.3

Published by ashvardanian 10 months ago

1.0.3 (2023-12-29)

Improve

basic benchmark (042ae87)

uform - v1.0.2

Published by ashvardanian 10 months ago

1.0.2 (2023-12-28)

Make

Deprecate Anaconda (1ec8097)

uform - UForm v1: Multimodal Chat in 1.5 Billion Parameters

Published by ashvardanian 10 months ago

UForm v1: Multimodal Chat in 1.5 Billion Parameters

The UForm family of tiny multimodal transformer models just got bigger! In addition to the existing CLIP-like embedding models, we now have a generative model useful for image captioning, visual question answering, and multimodal chats. All that is in just a billion parameters, small enough to fit even on mobile devices 🎉

Repository: https://github.com/unum-cloud/uform
Generative model: https://huggingface.co/unum-cloud/uform-gen
Chat model: https://huggingface.co/unum-cloud/uform-gen-chat

Evaluation Metrics

Being the smallest model of its kind, unum-cloud/uform-gen is hard to compare to others. Next in size are the 5x larger LLaVAs and InstructBLIP, with 7 billion parameters. LLaVA performs noticeably better on VQAv2: 78.5 vs 66.5. On captioning, CLIPScore and RefCLIPScore are relatively close across all models.

Model	Size	Caption Length	CLIPScore	RefCLIPScore
`llava-hf/llava-1.5-7b-hf`	7B	Long	0.878	0.529
`llava-hf/llava-1.5-7b-hf`	7B	Short	0.886	0.531

`Salesforce/instructblip-vicuna-7b`	7B	Long	0.902	0.534
`Salesforce/instructblip-vicuna-7b`	7B	Short	0.848	0.523

`unum-cloud/uform-gen`	1.5B	Long	0.847	0.523
`unum-cloud/uform-gen`	1.5B	Short	0.842	0.522

`unum-cloud/uform-gen-chat`	1.5B	Long	0.860	0.525
`unum-cloud/uform-gen-chat`	1.5B	Short	0.858	0.525

Throughput

On RTX 3090, using vanilla PyTorch for inference, with bfloat16 arithmetic and greedy decoding, one should expect the following numbers for throughput.

Model	Size	Speed	Speedup
`llava-hf/llava-1.5-7b-hf`	7B	~ 40 tokens/second
`Salesforce/instructblip-vicuna-7b`	7B	~ 40 tokens/second
`unum-cloud/uform-gen`	1.5B	~ 140 tokens/second	x 3.5

uform - v0.4.8

Published by ashvardanian about 1 year ago

0.4.8 (2023-10-13)

Make

pass ANACONDA_API_TOKEN as env. var. (ed020d3)

uform - v0.4.7

Published by ashvardanian about 1 year ago

0.4.7 (2023-10-13)

Make

urllib3 after v2 breaks Anaconda pipeline (05ed238)

uform - v0.4.6

Published by ashvardanian about 1 year ago

0.4.6 (2023-10-13)

Make

depend on urllib3 (79f7519)

uform - v0.4.5

Published by ashvardanian about 1 year ago

0.4.5 (2023-10-13)

Make

Push to Anaconda (72a2de4)

uform - v0.4.4

Published by ashvardanian about 1 year ago

0.4.4 (2023-09-20)

Docs

Add "Training Objectives" (76bdba9)
Shorter lines (87ddf43)
Update speed table (5dfcfe8)

Improve

Expose TextEncoder and other models classes (47d969b)

uform - v0.4.3

Published by ashvardanian about 1 year ago

0.4.3 (2023-09-01)

Docs

Improved intro (d52225c)
Intro Accents (708ab38)
New intro (a32cfb3)
update citation information (766dd04)

Make

Add rebase action (45c2976)

uform - v0.4.2

Published by ashvardanian about 1 year ago

0.4.2 (2023-08-17)

Docs

Update Graphcore code (baa18d5)

Fix

Sphinx last version not work (c3a0cc7)

Package Rankings

Top 9.38% on Pypi.org

Top 22.3% on Swiftpackageindex.com

Related Projects

Keras-OneClassAnomalyDetection

[5 FPS - 150 FPS] Learning Deep Features for One-Class Classification (AnomalyDetection). Corresp...

06 Jan 2019 127

the-incredible-pytorch

The Incredible PyTorch: a curated list of tutorials, papers, projects, communities and more relat...

11 Feb 2017 11,389

stable-fast

Best inference performance optimization framework for HuggingFace Diffusers on NVIDIA GPUs.

17 Oct 2023 1,156

DeepLearningExamples

State-of-the-Art Deep Learning scripts organized by models - easy to train and deploy with reprod...

02 May 2018 13,307

Deep-Learning-in-Production

In this repository, I will share some useful notes and references about deploying deep learning-b...

03 May 2018 4,294

ncnn

ncnn is a high-performance neural network inference framework optimized for the mobile platform

30 Jun 2017 19,444

text-generation-inference

Large Language Model Text Generation Inference

08 Oct 2022 7,916

min-dalle

min(DALL·E) is a fast, minimal port of DALL·E Mini to PyTorch

27 Jun 2022 3,479