macaron-net

This repo contains the codes and pretrained models for our paper:

Understanding and Improving Transformer From a Multi-Particle Dynamic System Point of View Yiping Lu*, Zhuohan Li*, Di He, Zhiqing Sun, Bin Dong, Tao Qin, Liwei Wang, Tie-Yan Liu

The two sub-directories includes reproducible codes, pre-trained models and instructions for the machine translation and unsupervised pretraining (BERT) tasks. Please find the READMEs in the sub-directories for the detailed instructions for reproduction.

Both implementations are based on open-sourced fairseq (v0.6.0). The codes for unsupervised pretraining tasks are based on StackingBERT. Note that currently the codes in bert subdirectories cannot be used to train translation models. We are working on merging two code bases and planning to release the unified version in the near future.

Citation

@article{lu2019understanding,
  title={Understanding and Improving Transformer From a Multi-Particle Dynamic System Point of View},
  author={Lu, Yiping and Li, Zhuohan and He, Di and Sun, Zhiqing and Dong, Bin and Qin, Tao and Wang, Liwei and Liu, Tie-Yan},
  journal={arXiv preprint arXiv:1906.02762},
  year={2019}
}

Related Projects

mm-cot

Official implementation for "Multimodal Chain-of-Thought Reasoning in Language Models" (stay tune...

02 Feb 2023 3,760

bert-squeeze

🛠️ Tools for Transformers compression using PyTorch Lightning ⚡

19 Oct 2021 79

VisCPM

[ICLR'24 spotlight] Chinese and English Multimodal Large Model Series (Chat and Paint) | 基于CPM基础模...

30 Jun 2023 1,075

transformers

03 Jan 2024 0

OFA

Official repository of OFA (ICML 2022). Paper: OFA: Unifying Architectures, Tasks, and Modalities...

29 Jan 2022 2,401

long_llama

LongLLaMA is a large language model capable of handling long contexts. It is based on OpenLLaMA a...

06 Jul 2023 1,448

dalle-mini

DALL·E Mini - Generate images from a text prompt

03 Jul 2021 14,752

parti-pytorch

Implementation of Parti, Google's pure attention-based text-to-image neural network, in Pytorch

22 Jun 2022 522

MetaTransformer

Meta-Transformer for Unified Multimodal Learning

08 Jul 2023 1,506

LPD

code for EMNLP 2022 paper Better Few-Shot Relation Extraction with Label Prompt Dropout

09 Oct 2022 18

graph-transformer-pytorch

Implementation of Graph Transformer in Pytorch, for potential use in replicating Alphafold2

18 Jun 2021 197

compressive-transformer-pytorch

Pytorch implementation of Compressive Transformers, from Deepmind

24 Jun 2020 155

bert4keras

keras implement of transformers for humans

26 Aug 2019 5,363

fairseq

Facebook AI Research Sequence-to-Sequence Toolkit written in Python.

29 Aug 2017 29,423

gluon-cv

Gluon CV Toolkit

26 Feb 2018 5,814