AdamW optimizer for bfloat16 models in pytorch 🔥.
MIT License
Bot releases are hidden (Show)
Published by arogozhnikov 11 months ago
cudagraph.py
Published by arogozhnikov almost 3 years ago
Initial implementation of AdamW for pytorch supports cuda graphs
and has a built-in mechanism for control of learning rate, because external are unlikely to make a friendship with cuda graphs