GPT, but made only out of MLPs
MIT License
A minimal PyTorch re-implementation of the OpenAI GPT (Generative Pretrained Transformer) training
Implementation of Perceiver AR, Deepmind's new long-context attention network based on Perceiver ...
Official repository of the xLSTM.
Implementation of Hourglass Transformer, in Pytorch, from Google and OpenAI
Plug and Play Language Model implementation. Allows to steer topic and attributes of GPT-2 models.
GLM (General Language Model)
Implementation of Token Shift GPT - An autoregressive model that solely relies on shifting the se...
(Unofficial) Implementation of dilated attention from "LongNet: Scaling Transformers to 1,000,000...
A GPT, made only of MLPs, in Jax
An attempt at the implementation of Glom, Geoffrey Hinton's new idea that integrates concepts fro...
Implementation of gMLP, an all-MLP replacement for Transformers, in Pytorch
Code and documents of LongLoRA and LongAlpaca (ICLR 2024 Oral)
Some preliminary explorations of Mamba's context scaling.
An implementation of masked language modeling for Pytorch, made as concise and simple as possible
TinyGPT-V: Efficient Multimodal Large Language Model via Small Backbones