adamw_bfloat16

AdamW optimizer for bfloat16 models in pytorch 🔥.

MIT License

Stars
19

Bot releases are hidden (Show)

adamw_bfloat16 - v0.2.0: add implementation based on torch.compile Latest Release

Published by arogozhnikov 11 months ago

  • new implementation is faster, but not cudagraph-compatible
  • old implementation is moved to cudagraph.py
  • requires torch >= 2.0
adamw_bfloat16 - v0.1.0: basic implementation of AdamW

Published by arogozhnikov almost 3 years ago

Initial implementation of AdamW for pytorch supports cuda graphs
and has a built-in mechanism for control of learning rate, because external are unlikely to make a friendship with cuda graphs