Explorations into the proposal from the paper "Grokfast, Accelerated Grokking by Amplifying Slow Gradients"
MIT License
Implementation of MeshGPT, SOTA Mesh generation using Attention, in Pytorch
Usable Implementation of "Bootstrap Your Own Latent" self-supervised learning, from Deepmind, in ...
Implementation of RLHF (Reinforcement Learning with Human Feedback) on top of the PaLM architectu...
Implementation and replication of ProGen, Language Modeling for Protein Generation, in Jax
🚀 A simple way to launch, train, and use PyTorch models on almost any device and distributed conf...
An implementation of Phasic Policy Gradient, a proposed improvement of Proximal Policy Gradients,...
🦁 Lion, new optimizer discovered by Google Brain using genetic algorithms that is purportedly bet...
Unofficial implementation of iTransformer - SOTA Time Series Forecasting using Attention networks...
Implementation of the proposed Adam-atan2 from Google Deepmind in Pytorch
TextGrad: Automatic ''Differentiation'' via Text -- using large language models to backpropagate ...
Implementation of Q-Transformer, Scalable Offline Reinforcement Learning via Autoregressive Q-Fun...
GaLore: Memory-Efficient LLM Training by Gradient Low-Rank Projection
Continual Hyperparameter Selection Framework. Compares 11 state-of-the-art Lifelong Learning meth...
Implementation of GigaGAN, new SOTA GAN out of Adobe. Culmination of nearly a decade of research ...
GLM (General Language Model)