visual-exploration-vectors

MIT License

Stars
15
Committers
1

A visual exploration of vectors

A vector embedding encodes an input as a list of floating point numbers.

"dog" → [0.017198, -0.007493, -0.057982, 0.054051, -0.028336, 0.019245,…]

Different models output different embeddings, with varying lengths.

Model Encodes Vector length
word2vec words 300
Sbert (Sentence-Transformers) text (up to ~400 words) 768
OpenAI ada-002 text (up to 8191 tokens) 1536
Azure Computer Vision image or text 1024

Vector embeddings are commonly used for similarity search, fraud detection, recommendation systems, and RAG (Retrieval-Augmented Generation).

This repository contains a visual exploration of vectors, using several embedding models.

Go through notebooks in this order:

  1. Prepare text vectors: OpenAI ada-002, Word2Vec Google News
  2. Vector models
  3. Vector distance metrics
  4. Multi-word vectors
  5. Vector quantization
  6. Prepare multimodal vectors
  7. Explore multimodal vectors