Adan-pytorch

Implementation of the Adan (ADAptive Nesterov momentum algorithm) Optimizer in Pytorch

MIT License

Downloads

1.7K

Stars

247

Committers

View Code on GitHub

Ecosystems: Python

Adan - Pytorch

Implementation of the Adan (ADAptive Nesterov momentum algorithm) Optimizer in Pytorch.

Explanation from Davis Blalock

Official Adan code

Install

$ pip install adan-pytorch

Usage

from adan_pytorch import Adan

# mock model

import torch
from torch import nn

model = torch.nn.Sequential(
    nn.Linear(16, 16),
    nn.GELU()
)

# instantiate Adan with model parameters

optim = Adan(
    model.parameters(),
    lr = 1e-3,                  # learning rate (can be much higher than Adam, up to 5-10x)
    betas = (0.02, 0.08, 0.01), # beta 1-2-3 as described in paper - author says most sensitive to beta3 tuning
    weight_decay = 0.02         # weight decay 0.02 is optimal per author
)

# train

for _ in range(10):
    loss = model(torch.randn(16)).sum()
    loss.backward()
    optim.step()
    optim.zero_grad()

Citations

@article{Xie2022AdanAN,
    title   = {Adan: Adaptive Nesterov Momentum Algorithm for Faster Optimizing Deep Models},
    author  = {Xingyu Xie and Pan Zhou and Huan Li and Zhouchen Lin and Shuicheng Yan},
    journal = {ArXiv},
    year    = {2022},
    volume  = {abs/2208.06677}
}

Package Rankings

Top 9.69% on Pypi.org

Related Projects

ADNet-tensorflow

esgd

ESGD-M is a stochastic non-convex second order optimizer, suitable for training deep learning mod...

UNAD

Official implementation of UNAD: Universal Anatomy-initialized Noise Distribution Learning Framew...

adamw_bfloat16

AdamW optimizer for bfloat16 models in pytorch 🔥.

pytorch-admm-pruning

Prune DNN using Alternating Direction Method of Multipliers (ADMM)

PyTorch-ML

Implement DNN or ML models and advanced policies with PyTorch.(Include experiment)

n-grammer-pytorch

Implementation of N-Grammer, augmenting Transformers with latent n-grams, in Pytorch

lion-pytorch

🦁 Lion, new optimizer discovered by Google Brain using genetic algorithms that is purportedly bet...

15 Feb 2023 2,018

GaLore

GaLore: Memory-Efficient LLM Training by Gradient Low-Rank Projection

07 Mar 2024 1,179

ali-pytorch

Adversarially Learned Inference in Pytorch

MAE-pytorch

Unofficial PyTorch implementation of Masked Autoencoders Are Scalable Vision Learners

13 Nov 2021 2,591

nanotron

Minimalistic large language model 3D-parallelism training

11 Sep 2023 1,080

pytorch-deep-markov-model

PyTorch re-implementation of [Structured Inference Networks for Nonlinear State Space Models, AAA...

adanet

Fast and flexible AutoML with learning guarantees.

28 Jun 2018 3,468

minichatgpt

minichatgpt - To Train ChatGPT In 5 Minutes

23 Feb 2023 155