Some personal experiments around routing tokens to different autoregressive attention, akin to mixture-of-experts
MIT License
Published by lucidrains over 1 year ago
Published by lucidrains over 1 year ago
Published by lucidrains over 1 year ago
Published by lucidrains over 1 year ago
Published by lucidrains over 1 year ago
Published by lucidrains over 1 year ago
Published by lucidrains over 1 year ago
Published by lucidrains over 1 year ago
Published by lucidrains over 1 year ago
Published by lucidrains over 1 year ago
Published by lucidrains over 1 year ago
Published by lucidrains over 1 year ago
Published by lucidrains over 1 year ago
Published by lucidrains over 1 year ago
Published by lucidrains over 1 year ago
Published by lucidrains over 1 year ago
Published by lucidrains over 1 year ago
Published by lucidrains over 1 year ago
Published by lucidrains over 1 year ago