A non-exhaustive collection of vision transformer models implemented in TensorFlow.
APACHE-2.0 License
This repository contains a non-exhaustive collection of vision transformer models implemented in TensorFlow by me. Not to confuse with the original Vision Transformers paper [1], the architectures implemented here are generally referred to as Vision Transformers since they make use of Transformers in some way or the other for the vision modality.
[1] Dosovitskiy, Alexey, et al. An Image Is Worth 16x16 Words: Transformers for Image Recognition at Scale. arXiv, 3 June 2021. arXiv.org, https://doi.org/10.48550/arXiv.2010.11929.