Open Source Ecosystems

Iterative Token Evaluation and Refinement for Real-World Super-Resolution

1Chaofeng Chen, 1Shangchen Zhou, 1Liang Liao, 1Haoning Wu, 2Wenxiu Sun, 2Qiong Yan, 1Weisi Lin 1S-Lab, Nanyang Technological University, 2Sensetime Research

Pipeline of ITER. The input $I_l$ first passes through a distortion removal network $E_l$ to obtain the initially restored tokens $S_l$, which are composed of indexes of the quantized features in the codebook of VQGAN. Then, a reverse discrete diffusion process, conditioned on $S_l$, is used to generate textures. The process starts from completely masked tokens $S_T$. The refinement network (also called the de-masking network) $\phi_r$ generates refined outputs $S_{T-1}$ with $S_l$ as a condition. Then, $\phi_e$ evaluates $S_{T-1}$ to obtain the evaluation mask $m_{T-1}$, which determines the tokens to keep and refine for step $T-1$ through a masked sampling process. Repeat this process $T$ times to obtain de-masked outputs $S_0$, and then reconstruct the restored images $I_{sr}$ using the VQGAN decoder $D_H$. We found that $T\leq8$ is enough to get good results with ITER, which is much more efficient than other diffusion-based approaches.

🔧 Dependencies and Installation

# git clone this repository
git clone https://github.com/chaofengc/ITER.git
cd ITER 

# create new anaconda env
conda create -n iter python=3.8
source activate iter 

# install python dependencies
pip3 install -r requirements.txt
python setup.py develop

⚡Quick Inference

python inference_iter.py -s 2 -i ./testset/lrx4/frog.jpg
python inference_iter.py -s 4 -i ./testset/lrx4/frog.jpg

Example results

Left: real images | Right: super-resolved images with scale factor 4

👨‍💻Train the model

⏬ Download Datasets

The training datasets can be downloaded from 🤗hugging face. You may also refer to FeMaSR to prepare your own training data.

‍🔁 Training

Below are brief examples for training the model. Please modify the corresponding configuration files to suit your needs. Note that the codes are re-writtend and models are retrained from scratch, so the results may be slightly different from the paper.

Stage I: Train the Swin-VQGAN

accelerate launch --multi_gpu --num_processes=8 --mixed_precision=bf16 basicsr/train.py -opt options/train_ITER_HQ_stage.yml

Stage II & III: Train the LQ encoder and the refinement network

accelerate launch --main_process_port=29600 --multi_gpu --num_processes=8 --mixed_precision=bf16 basicsr/train.py -opt options/train_ITER_LQ_stage_X2.yml

accelerate launch --main_process_port=29600 --multi_gpu --num_processes=8 --mixed_precision=bf16 basicsr/train.py -opt options/train_ITER_LQ_stage_X4.yml

📝 Citation

If you find this code useful for your research, please cite our paper:

@inproceedings{chen2024iter,
  title={Iterative Token Evaluation and Refinement for Real-World Super-Resolution},
  author={Chaofeng Chen and Shangchen Zhou and Liang Liao and Haoning Wu and Wenxiu Sun and Qiong Yan and Weisi Lin},
  booktitle={Proceedings of the AAAI Conference on Artificial Intelligence},
  year={2024},
}

⚖️ License

This work is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License and NTU S-Lab License 1.0.

❤️ Acknowledgement

This project is based on BasicSR.

Badges

Extracted from project README

Related Projects

HRNet-Semantic-Segmentation

The OCR approach is rephrased as Segmentation Transformer: https://arxiv.org/abs/1909.11065. This...

09 Apr 2019 3,130

Vary

Official code implementation of Vary: Scaling Up the Vision Vocabulary of Large Vision Language M...

07 Dec 2023 1,518

StableSR

[IJCV2024] Exploiting Diffusion Prior for Real-World Image Super-Resolution

02 Apr 2023 2,125

FeMaSR

PyTorch codes for "Real-World Blind Super-Resolution via Feature Matching with Implicit High-Reso...

05 Jan 2022 161

Face-Sketch-SCG

Semi-supervised Cycle-GAN for face photo-sketch translation in the wild, CVIU2023

15 May 2020 9

TexForce

Official PyTorch codes for "Enhancing Diffusion Models with Text-Encoder Reinforcement Learning",...

27 Nov 2023 46

eg3d

15 Dec 2021 3,211

VAR

[GPT beats diffusion🔥] [scaling laws in visual generation📈] Official impl. of "Visual Autoregress...

01 Apr 2024 2,568

GOT-OCR2.0

Official code implementation of General OCR Theory: Towards OCR-2.0 via a Unified End-to-end Model

02 Sep 2024 5,166

OFA

Official repository of OFA (ICML 2022). Paper: OFA: Unifying Architectures, Tasks, and Modalities...

29 Jan 2022 2,401

DB

A PyTorch implementation of "Real-time Scene Text Detection with Differentiable Binarization".

18 Nov 2019 2,084

PSFRGAN

PyTorch codes for "Progressive Semantic-Aware Style Transformation for Blind Face Restoration", C...

17 Sep 2020 366

DiffBIR

Official codes of DiffBIR: Towards Blind Image Restoration with Generative Diffusion Prior

28 Aug 2023 3,289

HAT

CVPR2023 - Activating More Pixels in Image Super-Resolution Transformer Arxiv - HAT: Hybrid Atten...

27 Apr 2022 1,203

BiRefNet

[CAAI AIR'24] Bilateral Reference for High-Resolution Dichotomous Image Segmentation

17 Aug 2022 1,079