Open Source Ecosystems

Adam-atan2 - Pytorch

Implementation of the proposed Adam-atan2 optimizer in Pytorch

A multi-million dollar paper out of google deepmind proposes a small change to Adam update rule (using atan2) to remove the epsilon altogether for numerical stability and scale invariance

It also contains some features for improving plasticity (continual learning field)

Install

$ pip install adam-atan2-pytorch

Usage

import torch
from torch import nn

# toy model

model = nn.Linear(10, 1)

# import AdamAtan2 and instantiate with parameters

from adam_atan2_pytorch import AdamAtan2

opt = AdamAtan2(model.parameters(), lr = 1e-4)

# forward and backwards

for _ in range(100):
  loss = model(torch.randn(10))
  loss.backward()

  # optimizer step

  opt.step()
  opt.zero_grad()

Citations

@inproceedings{Everett2024ScalingEA,
    title   = {Scaling Exponents Across Parameterizations and Optimizers},
    author  = {Katie Everett and Lechao Xiao and Mitchell Wortsman and Alex Alemi and Roman Novak and Peter J. Liu and Izzeddin Gur and Jascha Narain Sohl-Dickstein and Leslie Pack Kaelbling and Jaehoon Lee and Jeffrey Pennington},
    year    = {2024},
    url     = {https://api.semanticscholar.org/CorpusID:271051056}
}

@inproceedings{Kumar2023MaintainingPI,
    title   = {Maintaining Plasticity in Continual Learning via Regenerative Regularization},
    author  = {Saurabh Kumar and Henrik Marklund and Benjamin Van Roy},
    year    = {2023},
    url     = {https://api.semanticscholar.org/CorpusID:261076021}
}

@article{Lewandowski2024LearningCB,
    title   = {Learning Continually by Spectral Regularization},
    author  = {Alex Lewandowski and Saurabh Kumar and Dale Schuurmans and Andr'as Gyorgy and Marlos C. Machado},
    journal = {ArXiv},
    year    = {2024},
    volume  = {abs/2406.06811},
    url     = {https://api.semanticscholar.org/CorpusID:270380086}
}

Related Projects

grokfast-pytorch

Explorations into the proposal from the paper "Grokfast, Accelerated Grokking by Amplifying Slow ...

15 Jun 2024 83

iTransformer

Unofficial implementation of iTransformer - SOTA Time Series Forecasting using Attention networks...

11 Oct 2023 429

Adan-pytorch

Implementation of the Adan (ADAptive Nesterov momentum algorithm) Optimizer in Pytorch

25 Aug 2022 247

SAC-pytorch

Implementation of Soft Actor Critic and some of its improvements in Pytorch

12 Feb 2024 34

phasic-policy-gradient

An implementation of Phasic Policy Gradient, a proposed improvement of Proximal Policy Gradients,...

27 Sep 2020 42

PaLM-rlhf-pytorch

Implementation of RLHF (Reinforcement Learning with Human Feedback) on top of the PaLM architectu...

09 Dec 2022 7,595

lion-pytorch

🦁 Lion, new optimizer discovered by Google Brain using genetic algorithms that is purportedly bet...

15 Feb 2023 2,018

PaLM-jax

Implementation of the specific Transformer architecture from PaLM - Scaling Language Modeling wit...

08 Apr 2022 184

adamw_bfloat16

AdamW optimizer for bfloat16 models in pytorch 🔥.

14 Dec 2021 19

taylor-series-linear-attention

Explorations into the recently proposed Taylor Series Linear Attention

23 Dec 2023 88

ali-pytorch

Adversarially Learned Inference in Pytorch

24 Mar 2017 30

MEGABYTE-pytorch

Implementation of MEGABYTE, Predicting Million-byte Sequences with Multiscale Transformers, in Py...

15 May 2023 620

PaLM-pytorch

Implementation of the specific Transformer architecture from PaLM - Scaling Language Modeling wit...

04 Apr 2022 822

dalle-mini

DALL·E Mini - Generate images from a text prompt

03 Jul 2021 14,752

adanet

Fast and flexible AutoML with learning guarantees.

28 Jun 2018 3,468