Implementation of Mega, the Single-head Attention with Multi-headed EMA architecture that currently holds SOTA on Long Range Arena
MIT License
Published by lucidrains over 1 year ago
Published by lucidrains about 2 years ago
Published by lucidrains about 2 years ago
Published by lucidrains about 2 years ago
Published by lucidrains about 2 years ago
Published by lucidrains about 2 years ago
Published by lucidrains about 2 years ago
Published by lucidrains about 2 years ago
Full Changelog: https://github.com/lucidrains/Mega-pytorch/compare/0.0.6...0.0.7
Published by lucidrains about 2 years ago
Published by lucidrains about 2 years ago
Published by lucidrains about 2 years ago
Published by lucidrains about 2 years ago
Published by lucidrains about 2 years ago
Published by lucidrains about 2 years ago