OTHER License
This repository is the official implementation of StoP.
Given a partial image of a dog, can you precisely determine the location of its tail? Existing Masked Image Modeling (MIM) models like MAE and I-JEPA predict tokens deterministically and do not model location uncertainties (a), we propose to predict the target (masked tokens) in stochastic positions (StoP) which prevents overfitting to locations features. StoP leads to improved MIM performance on downstream tasks, including linear probing on ImageNet (b).
Please follow the installation instruction from the I-JEPA repo.
The commands for pretraining I-JEPA + StoP. In the original setting, all models were trained using 4 V100 GPU nodes. ViT-B/L were trained with float32 while ViT-H is trained on half precision (float16).
Training command:
torchrun --nnodes=4 --nproc-per-node=8 --node_rank=<node_rank 0-3> --master_addr=<master_addr> --master_port=<master_port> --backend=nccl main.py --fname configs/pretrain/vit-b16.yaml
python logistic_eval.py \
--subset-path imagenet_subsets1/1percent.txt \
--root-path /path/to/datasets --image-folder imagenet_folder/ \
--device cuda:0 \
--pretrained /path/to/checkpoint/folder \
--fname checkpoint_name.pth.tar \
--model-name deit_base \
--patch-size 16 \
--penalty l2 \
--lambd 0.0025
To perform linear probing on ImageNet, you can follow instruction from VISSL. Alternatiely, we provide a bash script to convert checkpoints to vissl format and launch experiments on 8 V100 machines each with 8 gpus on SLURM:
bash bash/in1k_eval_vissl.sh <output_dir> <checkpoint_path> <dataset_root> <arch>
Arch. | Dataset | Epochs | Checkpoint |
---|---|---|---|
ViT-B/16 | ImageNet | 600 | link |
ViT-L/16 | ImageNet | 600 | link |
ViT-H/16 | ImageNet | 300 | link |
The codebase relies on the implementation of I-JEPA.
If you found this code helpful, feel free to cite our work:
@inproceedings{barstochastic,
title={Stochastic positional embeddings improve masked image modeling},
author={Bar, Amir and Bordes, Florian and Shocher, Assaf and Assran, Mido and Vincent, Pascal and Ballas, Nicolas and Darrell, Trevor and Globerson, Amir and LeCun, Yann},
booktitle={Forty-first International Conference on Machine Learning}
}```