lion-tf

A TensorFlow implementation of the Lion optimizer

Stars

10

View Code on GitHub View on X

Ecosystems: Python

lion-tf

A TensorFlow implementation of the Lion optimizer from Symbolic Discovery of Optimization Algorithms. Partially copied from the lucidrains PyTorch implementation.

The maths seem right and it successfully trained a couple of models for me, but that doesn't mean I haven't forgotten something stupid, or that there isn't room for optimization!

In general, the code trusts in 🙏XLA🙏 to efficiently reuse buffers and save memory rather than manually doing all the ops in-place like the PyTorch version does. Note that the optimizer will be compiled with XLA even if you don't use jit_compile for the rest of your model!

Installation

pip install git+https://github.com/Rocketknight1/lion-tf.git

Usage

from lion_tf import Lion

model.compile(Lion(1e-5))

Tips

Lion likes much lower learning rates than Adam - I'd suggest a factor of 10 lower as a good starting point. When fine-tuning pre-trained models, learning rates are already quite low, which means the optimal LR for Lion can be very low. I found 1e-5 or less worked well for fine-tuning BERT!

Related Projects

nanotron

Minimalistic large language model 3D-parallelism training

11 Sep 2023 1,080

ml-design-patterns

Software Architecture for ML engineers

14 Jun 2021 373

adamw_bfloat16

AdamW optimizer for bfloat16 models in pytorch 🔥.

lion-pytorch

🦁 Lion, new optimizer discovered by Google Brain using genetic algorithms that is purportedly bet...

15 Feb 2023 2,018

Efficient-PyTorch

My best practice of training large dataset using PyTorch.

29 Mar 2018 1,081

MAE-pytorch

Unofficial PyTorch implementation of Masked Autoencoders Are Scalable Vision Learners

13 Nov 2021 2,591

bert-squeeze

🛠️ Tools for Transformers compression using PyTorch Lightning ⚡

minichatgpt

minichatgpt - To Train ChatGPT In 5 Minutes

23 Feb 2023 155

minimal-llama

06 Mar 2023 442

learning-to-learn

Learning to Learn in TensorFlow

06 Dec 2016 4,062

Deep-Learning-in-Production

In this repository, I will share some useful notes and references about deploying deep learning-b...

03 May 2018 4,294

BertSum

Code for paper Fine-tune BERT for Extractive Summarization

25 Mar 2019 1,464

minimal-opt

MAML-Pytorch

Elegant PyTorch implementation of paper Model-Agnostic Meta-Learning (MAML)

01 Feb 2018 2,273