titok-pytorch

Implementation of TiTok, proposed by Bytedance in "An Image is Worth 32 Tokens for Reconstruction and Generation"

MIT License

Downloads

88

Stars

160

Committers

View Code on GitHub

Ecosystems: Python

TiTok - Pytorch (wip)

Implementation of TiTok, proposed by Bytedance in An Image is Worth 32 Tokens for Reconstruction and Generation

Install

$ pip install titok-pytorch

Usage

import torch
from titok_pytorch import TiTokTokenizer

images = torch.randn(2, 3, 256, 256)

titok = TiTokTokenizer(
    dim = 1024,
    patch_size = 32,
    num_latent_tokens = 32,   # they claim only 32 tokens needed
    codebook_size = 4096      # codebook size 4096
)

loss = titok(images)
loss.backward()

# after much training
# extract codes for gpt, maskgit, whatever

codes = titok.tokenize(images) # (2, 32)

# reconstructing images from codes

recon_images = titok.codebook_ids_to_images(codes)

assert recon_images.shape == images.shape

Todo

add multi-resolution patches

Citations

@article{yu2024an,
  author    = {Qihang Yu and Mark Weber and Xueqing Deng and Xiaohui Shen and Daniel Cremers and Liang-Chieh Chen},
  title     = {An Image is Worth 32 Tokens for Reconstruction and Generation},
  journal   = {arxiv: 2406.07550},
  year      = {2024}
}

Package Rankings

Top 35.69% on Pypi.org

Related Projects

magvit2-pytorch

Implementation of MagViT2 Tokenizer in Pytorch

10 Oct 2023 552

MEGABYTE-pytorch

Implementation of MEGABYTE, Predicting Million-byte Sequences with Multiscale Transformers, in Py...

15 May 2023 620

transformer-in-transformer

Implementation of Transformer in Transformer, pixel level attention paired with patch level atten...

02 Mar 2021 300

res-mlp-pytorch

Implementation of ResMLP, an all MLP solution to image classification, in Pytorch

10 May 2021 192

NWT-pytorch

Implementation of NWT, audio-to-video generation, in Pytorch

token-shift-gpt

Implementation of Token Shift GPT - An autoregressive model that solely relies on shifting the se...

MAE-pytorch

Unofficial PyTorch implementation of Masked Autoencoders Are Scalable Vision Learners

13 Nov 2021 2,591

ITTR-pytorch

Implementation of the Hybrid Perception Block and Dual-Pruned Self-Attention block from the ITTR ...

byol-pytorch

Usable Implementation of "Bootstrap Your Own Latent" self-supervised learning, from Deepmind, in ...

16 Jun 2020 1,687

charformer-pytorch

Implementation of the GBST block from the Charformer paper, in Pytorch

30 Jun 2021 116

fast-transformer-pytorch

Implementation of Fast Transformer in Pytorch

23 Aug 2021 171

hourglass-transformer-pytorch

Implementation of Hourglass Transformer, in Pytorch, from Google and OpenAI

LaTeX-OCR

pix2tex: Using a ViT to convert images of equations into LaTeX code.

11 Dec 2020 12,124

pytorch-toolbelt

PyTorch extensions for fast R&D prototyping and Kaggle farming

15 Mar 2019 1,476

zorro-pytorch

Implementation of Zorro, Masked Multimodal Transformer, in Pytorch