awesome-self-supervised-learning

Curated List of papers on Self-Supervised Representation Learning

Stars
2

Awesome Self Supervised Learning awesome Discord

Check out LightlySSL a computer vision framework for self-supervised learning by the team at lightly.ai.

2024

Title Relevant Links
Scalable Pre-training of Large Autoregressive Image Models arXiv Open In Colab
SAM 2: Segment Anything in Images and Videos arXiv Google Drive
Automatic Data Curation for Self-Supervised Learning: A Clustering-Based Approach arXiv
GLID: Pre-training a Generalist Encoder-Decoder Vision Model arXiv Google Drive
Rethinking Patch Dependence for Masked Autoencoders arXiv Google Drive
You Don't Need Data-Augmentation in Self-Supervised Learning arXiv
Occam's Razor for Self Supervised Learning: What is Sufficient to Learn Good Representations? arXiv
Asymmetric Masked Distillation for Pre-Training Small Foundation Models CVPR GitHub
Revisiting Feature Prediction for Learning Visual Representations from Video arXiv GitHub
Rethinking Patch Dependence for Masked Autoencoders arXiv GitHub
ARVideo: Autoregressive Pretraining for Self-Supervised Video Representation Learning arXiv

2023

Title Relevant Links
A Cookbook of Self-Supervised Learning arXiv
Masked Autoencoders Enable Efficient Knowledge Distillers arXiv Google Drive
Understanding and Generalizing Contrastive Learning from the Inverse Optimal Transport Perspective CVPR Google Drive
CycleCL: Self-supervised Learning for Periodic Videos arXiv Google Drive
Temperature Schedules for Self-Supervised Contrastive Methods on Long-Tail Data arXiv Google Drive
Reverse Engineering Self-Supervised Learning arXiv Google Drive
Improved baselines for vision-language pre-training arXiv Google Drive
DINOv2: Learning Robust Visual Features without Supervision arXiv Google Drive
Segment Anything arXiv Google Drive
Self-Supervised Learning from Images with a Joint-Embedding Predictive Architecture arXiv Google Drive
Self-supervised Object-Centric Learning for Videos NeurIPS
Patch n’ Pack: NaViT, a Vision Transformer for any Aspect Ratio and Resolution NeurIPS
An Information-Theoretic Perspective on Variance-Invariance-Covariance Regularization NeurIPS
The Role of Entropy and Reconstruction in Multi-View Self-Supervised Learning arXiv GitHub
Fast Segment Anything arXiv GitHub
Faster Segment Anything: Towards Lightweight SAM for Mobile Applications arXiv GitHub
What Do Self-Supervised Vision Transformers Learn? arXiv GitHub
Improved baselines for vision-language pre-training arXiv GitHub
Active Self-Supervised Learning: A Few Low-Cost Relationships Are All You Need arXiv
EfficientSAM: Leveraged Masked Image Pretraining for Efficient Segment Anything arXiv GitHub
DropPos: Pre-Training Vision Transformers by Reconstructing Dropped Positions arXiv GitHub
VideoMAE V2: Scaling Video Masked Autoencoders with Dual Masking CVPR
MGMAE: Motion Guided Masking for Video Masked Autoencoding CVPR GitHub

2022

Title Relevant Links
Masked Siamese Networks for Label-Efficient Learning arXiv Google Drive Open In Colab
The Hidden Uniform Cluster Prior in Self-Supervised Learning arXiv Open In Colab
Unsupervised Visual Representation Learning by Synchronous Momentum Grouping arXiv Open In Colab
TiCo: Transformation Invariance and Covariance Contrast for Self-Supervised Visual Representation Learning arXiv Open In Colab
VICReg: Variance-Invariance-Covariance Regularization for Self-Supervised Learning arXiv Open In Colab
VICRegL: Self-Supervised Learning of Local Visual Features arXiv Open In Colab
VideoMAE: Masked Autoencoders are Data-Efficient Learners for Self-Supervised Video Pre-Training arXiv Google Drive
Improving Visual Representation Learning through Perceptual Understanding arXiv Google Drive
RankMe: Assessing the downstream performance of pretrained self-supervised representations by their rank arXiv Google Drive
A Closer Look at Self-Supervised Lightweight Vision Transformers arXiv GitHub
Beyond neural scaling laws: beating power law scaling via data pruning arXiv GitHub
A simple, efficient and scalable contrastive masked autoencoder for learning visual representations arXiv
Masked Autoencoders are Robust Data Augmentors arXiv
Is Self-Supervised Learning More Robust Than Supervised Learning? arXiv
Can CNNs Be More Robust Than Transformers? arXiv GitHub
Patch-level Representation Learning for Self-supervised Vision Transformers arXiv GitHub

2021

Title Relevant Links
Barlow Twins: Self-Supervised Learning via Redundancy Reduction arXiv Open In Colab
Decoupled Contrastive Learning arXiv Open In Colab
Dense Contrastive Learning for Self-Supervised Visual Pre-Training arXiv Open In Colab
Emerging Properties in Self-Supervised Vision Transformers arXiv Open In Colab
Masked Autoencoders Are Scalable Vision Learners arXiv Open In Colab
With a Little Help from My Friends: Nearest-Neighbor Contrastive Learning of Visual Representations arXiv Open In Colab
SimMIM: A Simple Framework for Masked Image Modeling arXiv Open In Colab
Exploring Simple Siamese Representation Learning arXiv Open In Colab
When Does Contrastive Visual Representation Learning Work? arXiv
Efficient Visual Pretraining with Contrastive Detection arXiv

2020

Title Relevant Links
Bootstrap your own latent: A new approach to self-supervised Learning arXiv Open In Colab
A Simple Framework for Contrastive Learning of Visual Representations arXiv Open In Colab
Unsupervised Learning of Visual Features by Contrasting Cluster Assignments arXiv Open In Colab

2019

Title Relevant Links
Momentum Contrast for Unsupervised Visual Representation Learning arXiv Open In Colab

2018

Title Relevant Links
Unsupervised Feature Learning via Non-Parametric Instance-level Discrimination arXiv

2016

Title Relevant Links
Context Encoders: Feature Learning by Inpainting arXiv