Examples of using sparse attention, as in "Generating Long Sequences with Sparse Transformers"
Code and model for the paper "Improving Language Understanding by Generative Pre-Training"
Efficient GPU kernels for block-sparse matrix multiplication and convolution
Code for reproducing key results in the paper "Improving Variational Inference with Inverse Autor...
Code for reproducing results in "Glow: Generative Flow with Invertible 1x1 Convolutions"
OpenAI Baselines: high-quality implementations of reinforcement learning algorithms
Submissions for AI and Efficiency SOTA's
Distilled variant of Whisper for speech recognition. 6x faster, 50% smaller, within 1% word error...
Code for the Neural GPU model originally described in "Neural GPUs Learn Algorithms"
JAX implementation of OpenAI's Whisper model for up to 70x speed-up on TPU.
Faster Whisper transcription with CTranslate2
Code for the paper, "Distribution Augmentation for Generative Modeling", ICML 2020.