Yet another random morning idea to be quickly tried and architecture shared if it works; to allow the transformer to pause for any amount of time on any token
MIT License
Implementation of GateLoop Transformer in Pytorch and Jax
Another attempt at a long-context / efficient transformer by me
Experiments around a simple idea for inducing multiple hierarchical predictive model within a GPT
An implementation of local windowed attention for language modeling
Implementation of RLHF (Reinforcement Learning with Human Feedback) on top of the PaLM architectu...
Explorations into the recently proposed Taylor Series Linear Attention
Implementation of the transformer proposed in "Building Blocks for a Complex-Valued Transformer A...
Unofficial implementation of iTransformer - SOTA Time Series Forecasting using Attention networks...
Exploring an idea where one forgets about efficiency and carries out attention across each edge o...
Implementation of H-Transformer-1D, Hierarchical Attention for Sequence Learning
An attempt to merge ESBN with Transformers, to endow Transformers with the ability to emergently ...
Implementation of Parti, Google's pure attention-based text-to-image neural network, in Pytorch
Implementation of CALM from the paper "LLM Augmented LLMs: Expanding Capabilities through Composi...
Implementation of Q-Transformer, Scalable Offline Reinforcement Learning via Autoregressive Q-Fun...
Implementation of a Transformer that Ponders, using the scheme from the PonderNet paper