Implementation of a hierarchical memory module using coordinate descent routing
MIT License
No README available, please check again later.
Implementation of an Attention layer where each head can attend to more than just one token, usin...
Some personal experiments around routing tokens to different autoregressive attention, akin to mi...
Usable implementation of Emerging Symbol Binding Network (ESBN), in Pytorch
A simple cross attention that updates both the source and target in one step
Implementation of a memory efficient multi-head attention as proposed in the paper, "Self-attenti...
Implementation of Deepmind's RoboCat, Self-Improving Foundation Agent for Robotic Manipulation, i...
Implementation of VideoGigaGAN, SOTA video upsampling out of Adobe AI labs, in Pytorch
Implementation of Hierarchical Transformer Memory (HTM) for Pytorch
Implementation of Memory-Compressed Attention, from the paper "Generating Wikipedia By Summarizin...
A variant of Transformer-XL where the memory is updated not with a queue, but with attention
Implementation of DecompOpt - Controllable and Decomposed Diffusion Models for Structure-based Mo...
Implementation of the Llama architecture with RLHF + Q-learning
Implementations and explorations into the ReST𝐸𝑀 algorithm in the new deepmind paper "Beyond Huma...
Implementation of H-Transformer-1D, Hierarchical Attention for Sequence Learning
Implementation of JEPA, Yann LeCun's vision of how AGI would be built, in Pytorch