CLIP (Contrastive Language-Image Pretraining), Predict the most relevant text snippet given an image
MIT License
Code for the paper "PixelCNN++: A PixelCNN Implementation with Discretized Logistic Mixture Likel...
Code for the paper "DeepType: Multilingual Entity Linking by Neural Type System Evolution"
Code for reproducing results in "Glow: Generative Flow with Invertible 1x1 Convolutions"
Code for reproducing key results in the paper "Improving Variational Inference with Inverse Autor...
Release for Improved Denoising Diffusion Probabilistic Models
Repository for the paper "Very Deep VAEs Generalize Autoregressive Models and Can Outperform Them...
a self-hosted webui for 30+ generative ai
An Optimized Speech-to-Text Pipeline for the Whisper Model Supporting Multiple Inference Engine
Code for Implicit Generation and Generalization with Energy Based Models
Official repo for consistency models.
Code for the paper "Understanding RL Vision"
Faster Whisper transcription with CTranslate2