Efficiently read embedding in streaming from any filesystem
MIT License
Bot releases are hidden (Show)
A Simple Bulk Labelling Tool
Quickly find closest words using an efficient knn and word embeddings
Tensorflow implementation of contextualized word representations from bi-directional language models
Generative Models by Stability AI
Fast dataset format and loader
Easily turn large sets of audio urls to an audio dataset.
Easily convert common crawl to a dataset of caption and document. Image/text Audio/text Video/tex...
Fast, DB Backed pretrained word embeddings for natural language processing.
Original reference implementation of "3D Gaussian Splatting for Real-Time Radiance Field Rendering"
An example of how to use spaCy for extremely large files without running into memory issues
Turn any collection of files into a dataset
Easily turn large sets of image urls to an image dataset. Can download, resize and package 100M u...
Ongoing research training transformer language models at scale, including: BERT & GPT-2
Home of StarCoder: fine-tuning & inference!
Explore large language models in 512MB of RAM