Exploration into the proposed "Self Reasoning Tokens" by Felipe Bonetto
MIT License
Yet another random morning idea to be quickly tried and architecture shared if it works; to allow...
Implementation of Token Shift GPT - An autoregressive model that solely relies on shifting the se...
Implementation of Hourglass Transformer, in Pytorch, from Google and OpenAI
Implementation of an Attention layer where each head can attend to more than just one token, usin...
Implementation of the GBST block from the Charformer paper, in Pytorch
Implementation of CALM from the paper "LLM Augmented LLMs: Expanding Capabilities through Composi...
Implementation of MagViT2 Tokenizer in Pytorch
Implementation of Infini-Transformer in Pytorch
A library for advanced large language model reasoning
Implementation of Fast Transformer in Pytorch
Implementation of Feedback Transformer in Pytorch
Implementation of Block Recurrent Transformer - Pytorch
Some personal experiments around routing tokens to different autoregressive attention, akin to mi...
Implementation of Agent Attention in Pytorch
An implementation of local windowed attention for language modeling