Open Source Ecosystems

Pause Transformer (wip)

Yet another random morning idea to be quickly tried and architecture shared if it works; to allow the transformer to pause for any amount of time on any token.

Again, the idea relies on axial attention; one axis attends along the sequence length as in the usual transformer, the other along a thinking or pause dimension.

Todo

allow for custom pause distributions across token
see if one can do a two pass, using the logit entropy as a way to decide how to shape the pause mask
run experiments on enwik8, but if do not see anything, move onwards to something harder, say arithmetic

Citations

@inproceedings{Goyal2023ThinkBY,
    title   = {Think before you speak: Training Language Models With Pause Tokens},
    author  = {Sachin Goyal and Ziwei Ji and Ankit Singh Rawat and Aditya Krishna Menon and Sanjiv Kumar and Vaishnavh Nagarajan},
    year    = {2023},
    url     = {https://api.semanticscholar.org/CorpusID:263608983}
}

Package Rankings

Top 30.07% on Pypi.org

Related Projects

q-transformer

Implementation of Q-Transformer, Scalable Offline Reinforcement Learning via Autoregressive Q-Fun...

20 Sep 2023 338

h-transformer-1d

Implementation of H-Transformer-1D, Hierarchical Attention for Sequence Learning

28 Jul 2021 153

esbn-transformer

An attempt to merge ESBN with Transformers, to endow Transformers with the ability to emergently ...

07 Jun 2021 14

local-attention

An implementation of local windowed attention for language modeling

05 Jul 2020 375

iTransformer

Unofficial implementation of iTransformer - SOTA Time Series Forecasting using Attention networks...

11 Oct 2023 429

gateloop-transformer

Implementation of GateLoop Transformer in Pytorch and Jax

06 Nov 2023 86

taylor-series-linear-attention

Explorations into the recently proposed Taylor Series Linear Attention

23 Dec 2023 88

PaLM-rlhf-pytorch

Implementation of RLHF (Reinforcement Learning with Human Feedback) on top of the PaLM architectu...

09 Dec 2022 7,595

complex-valued-transformer

Implementation of the transformer proposed in "Building Blocks for a Complex-Valued Transformer A...

06 Oct 2023 57

CALM-pytorch

Implementation of CALM from the paper "LLM Augmented LLMs: Expanding Capabilities through Composi...

09 Jan 2024 167

parti-pytorch

Implementation of Parti, Google's pure attention-based text-to-image neural network, in Pytorch

22 Jun 2022 522

panoptic-transformer

Another attempt at a long-context / efficient transformer by me

22 Nov 2021 37

simple-hierarchical-transformer

Experiments around a simple idea for inducing multiple hierarchical predictive model within a GPT

06 Apr 2023 204

quartic-transformer

Exploring an idea where one forgets about efficiency and carries out attention across each edge o...

03 Feb 2024 43

ponder-transformer

Implementation of a Transformer that Ponders, using the scheme from the PonderNet paper

25 Aug 2021 78

pause-transformer