AutoFocusFormer

AFF-Base:

This software project accompanies the research paper, AutoFocusFormer: Image Segmentation off the Grid (CVPR 2023).

Chen Ziwen, Kaushik Patnaik, Shuangfei Zhai, Alvin Wan, Zhile Ren, Alex Schwing, Alex Colburn, Li Fuxin

arXiv | video narration | AFF-Classification (this repo) | AFF-Segmentation

Introduction

AutoFocusFormer (AFF) is the first adaptive-downsampling network capable of dense prediction tasks such as semantic/instance segmentation.

AFF abandons the traditional grid structure of image feature maps, and automatically learns to retain the most important pixels with respect to the task goal.

AFF consists of a local-attention transformer backbone and a task-specific head. The backbone consists of four stages, each stage containing three modules: balanced clustering, local-attention transformer blocks, and adaptive downsampling.

AFF demonstrates significant savings on FLOPs (see our models with 1/5 downsampling rate), and significant improvement on recognition of small objects.

Notably, AFF-Small achieves 44.0 instance segmentation AP and 66.9 panoptic segmentation PQ on Cityscapes val with a backbone of only 42.6M parameters, a performance on par with Swin-Large, a backbone with 197M params (saving 78%!).

Main Results on ImageNet with Pretrained Models

name	pretrain	resolution	acc@1	acc@5	#params	FLOPs	FPS	1K model
AFF-Mini	ImageNet-1K	224x224	78.2	93.6	6.75M	1.08G	1337	Apple ML
AFF-Mini-1/5	ImageNet-1K	224x224	77.5	93.3	6.75M	0.72G	1678	Apple ML
AFF-Tiny	ImageNet-1K	224x224	83.0	96.3	27M	4G	528	Apple ML
AFF-Tiny-1/5	ImageNet-1K	224x224	82.4	95.9	27M	2.74G	682	Apple ML
AFF-Small	ImageNet-1K	224x224	83.5	96.6	42.6M	8.16G	321	Apple ML
AFF-Small-1/5	ImageNet-1K	224x224	83.4	96.5	42.6M	5.69G	424	Apple ML

FPS is obtained on a single V100 GPU.

We train with a total batch size 4096.

name	pretrain	resolution	acc@1	acc@5	#params	FLOPs	22K model	1K model
AFF-Base	ImageNet-22K	384x384	86.2	98.0	75.34M	42.54G	Apple ML	Apple ML

Getting Started

Clone this repo

git clone [email protected]:apple/ml-autofocusformer.git
cd ml-autofocusformer

One can download pre-trained checkpoints through the links in the table above.

Create environment and install requirements

sh create_env.sh

See further documentation inside the script file.

Our experiments are run with CUDA==11.6 and pytorch==1.12.

Prepare data

We use standard ImageNet dataset, which can be downloaded from http://image-net.org/.

For standard folder dataset, move validation images to labeled sub-folders. The file structure should look like:

$ tree imagenet
imagenet/
 training
    class1
       img1.jpeg
       img2.jpeg
       ...
    class2
       img3.jpeg
       ...
    ...
 validation
     class1
        img4.jpeg
        img5.jpeg
        ...
     class2
        img6.jpeg
        ...
     ...

Train and evaluate

Modify the arguments in script run_aff.sh (e.g., path to dataset) and run

sh run_aff.sh

for training or evaluation.

Run python main.py -h to see full documentation of the args.

One can also directly modify the config files in configs/.

Citing AutoFocusFormer

@inproceedings{autofocusformer,
    title = {AutoFocusFormer: Image Segmentation off the Grid},
    booktitle = {Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR)},
    author = {Ziwen, Chen and Patnaik, Kaushik and Zhai, Shuangfei and Wan, Alvin and Ren, Zhile and Schwing, Alex and Colburn, Alex and Fuxin, Li},
    year = {2023},
}

Badges

Extracted from project README

Related Projects

ml-aspanformer

05 Dec 2022 200

ml-fastvit

This repository contains the official implementation of the research paper, "FastViT: A Fast Hybr...

15 Aug 2023 1,818

ml-autofocusformer-segmentation

This is an official implementation for "AutoFocusFormer: Image Segmentation off the Grid".

11 May 2023 65

ml-aim

This repository provides the code and model checkpoints of the research paper: Scalable Pre-trai...

12 Jan 2024 687