LayerNorm(SmallInit(Embedding)) in a Transformer to improve convergence
Statistics for this project are still being loaded, please check back later.
Applying "Load What You Need: Smaller Versions of Multilingual BERT" to LaBSE
All Model summary in PyTorch similar to `model.summary()` in Keras
[ACL 2023] One Embedder, Any Task: Instruction-Finetuned Text Embeddings
Useful additional layers for PyTorch.
Unofficial PyTorch implementation of Masked Autoencoders Are Scalable Vision Learners
Unofficial PyTorch Implementation of EvoNorm
A minimal PyTorch re-implementation of the OpenAI GPT (Generative Pretrained Transformer) training
This library provides packages on DoubleML / Causal Machine Learning and Neural Networks in Pytho...
Visual Attention based OCR
RWKV-v2-RNN trained on the Pile. See https://github.com/BlinkDL/RWKV-LM for details.
Temporary remove unused tokens during training to save ram and speed.
Here we will test various linear attention designs.
Tensorflow implementation of contextualized word representations from bi-directional language models
Pytorch and JAX Implementation of Scalable Diffusion Models with Transformers | Diffusion Transfo...
An implementation of masked language modeling for Pytorch, made as concise and simple as possible