Open Source Ecosystems

Grokfast - Pytorch (wip)

Explorations into "Grokfast, Accelerated Grokking by Amplifying Slow Gradients", out of Seoul National University in Korea. In particular, will compare it with NAdam on modular addition as well as a few other tasks, since I am curious why those experiments are left out of the paper. If it holds up, will polish it up into a nice package for quick use.

The official repository can be found here

Install

$ pip install grokfast-pytorch

Usage

import torch
from torch import nn

# toy model

model = nn.Linear(10, 1)

# import GrokFastAdamW and instantiate with parameters

from grokfast_pytorch import GrokFastAdamW

opt = GrokFastAdamW(
    model.parameters(),
    lr = 1e-4,
    weight_decay = 1e-2
)

# forward and backwards

loss = model(torch.randn(10))
loss.backward()

# optimizer step

opt.step()
opt.zero_grad()

Todo

run all experiments on small transformer
- modular addition
- pathfinder-x
- run against nadam and some other optimizers
- see if exp_avg could be repurposed for amplifying slow grads
add the foreach version only if above experiments turn out well

Citations

@inproceedings{Lee2024GrokfastAG,
    title   = {Grokfast: Accelerated Grokking by Amplifying Slow Gradients},
    author  = {Jaerin Lee and Bong Gyun Kang and Kihoon Kim and Kyoung Mu Lee},
    year    = {2024},
    url     = {https://api.semanticscholar.org/CorpusID:270123846}
}

@misc{kumar2024maintaining,
    title={Maintaining Plasticity in Continual Learning via Regenerative Regularization},
    author={Saurabh Kumar and Henrik Marklund and Benjamin Van Roy},
    year={2024},
    url={https://openreview.net/forum?id=lyoOWX0e0O}
}

Package Rankings

Top 35.74% on Pypi.org

Related Projects

phasic-policy-gradient

An implementation of Phasic Policy Gradient, a proposed improvement of Proximal Policy Gradients,...

27 Sep 2020 42

GLM

GLM (General Language Model)

18 Mar 2021 3,170

lion-pytorch

🦁 Lion, new optimizer discovered by Google Brain using genetic algorithms that is purportedly bet...

15 Feb 2023 2,018

meshgpt-pytorch

Implementation of MeshGPT, SOTA Mesh generation using Attention, in Pytorch

29 Nov 2023 642

CLsurvey

Continual Hyperparameter Selection Framework. Compares 11 state-of-the-art Lifelong Learning meth...

06 Apr 2020 192

progen

Implementation and replication of ProGen, Language Modeling for Protein Generation, in Jax

09 Jun 2021 109

textgrad

TextGrad: Automatic ''Differentiation'' via Text -- using large language models to backpropagate ...

11 Jun 2024 1,603

q-transformer

Implementation of Q-Transformer, Scalable Offline Reinforcement Learning via Autoregressive Q-Fun...

20 Sep 2023 338

byol-pytorch

Usable Implementation of "Bootstrap Your Own Latent" self-supervised learning, from Deepmind, in ...

16 Jun 2020 1,687

GaLore

GaLore: Memory-Efficient LLM Training by Gradient Low-Rank Projection

07 Mar 2024 1,179

accelerate

🚀 A simple way to launch, train, and use PyTorch models on almost any device and distributed conf...

30 Oct 2020 7,759

PaLM-rlhf-pytorch

Implementation of RLHF (Reinforcement Learning with Human Feedback) on top of the PaLM architectu...

09 Dec 2022 7,595

gigagan-pytorch

Implementation of GigaGAN, new SOTA GAN out of Adobe. Culmination of nearly a decade of research ...

10 Mar 2023 1,813

adam-atan2-pytorch

Implementation of the proposed Adam-atan2 from Google Deepmind in Pytorch

30 Jul 2024 91

iTransformer

Unofficial implementation of iTransformer - SOTA Time Series Forecasting using Attention networks...

11 Oct 2023 429