(Minimally) implements SimCLR (https://arxiv.org/abs/2002.05709) in TensorFlow 2.
MIT License
(Minimally) implements SimCLR (A Simple Framework for Contrastive Learning of Visual Representations by Chen et al.) in TensorFlow 2. Uses many delicious pieces of tf.keras
and TensorFlow's core APIs. A report is available here.
I did not code everything from scratch. This particular research paper felt super amazing to read and often felt natural to understand, that's why I wanted to try it out myself and come up with a minimal implementation. I reused the works of the following for different purposes -
Following are the articles I studied for understanding SimCLR other than the paper:
Thanks a ton to the ML-GDE program for providing the GCP Credits using which I could run the experiments, store the intermediate results on GCS buckets as necessary. All the notebooks can be run on Colab though.
Model: "model"
_________________________________________________________________
Layer (type) Output Shape Param #
=================================================================
input_2 (InputLayer) [(None, 224, 224, 3)] 0
_________________________________________________________________
resnet50 (Model) (None, 7, 7, 2048) 23587712
_________________________________________________________________
global_average_pooling2d (Gl (None, 2048) 0
_________________________________________________________________
dense (Dense) (None, 256) 524544
_________________________________________________________________
activation (Activation) (None, 256) 0
_________________________________________________________________
dense_1 (Dense) (None, 128) 32896
_________________________________________________________________
activation_1 (Activation) (None, 128) 0
_________________________________________________________________
dense_2 (Dense) (None, 50) 6450
=================================================================
Total params: 24,151,602
Trainable params: 24,098,482
Non-trainable params: 53,120
loss: 1.1009 - accuracy: 0.5840 - val_loss: 1.1486 - val_accuracy: 0.5280
This is when I only took the base encoder network i.e. without any non-linear projections. I presented results with different projection heads as well (available here) but this one came to be the best.
This is when I only took the base encoder network i.e. without any non-linear projections. I presented results with different projection heads as well (available here) but this one came to be the best.
Here's the architecture that was used:
Model: "model"
_________________________________________________________________
Layer (type) Output Shape Param #
=================================================================
input_4 (InputLayer) [(None, 224, 224, 3)] 0
_________________________________________________________________
resnet50 (Model) (None, 7, 7, 2048) 23587712
_________________________________________________________________
global_average_pooling2d_1 ( (None, 2048) 0
_________________________________________________________________
dense_1 (Dense) (None, 256) 524544
_________________________________________________________________
activation (Activation) (None, 256) 0
_________________________________________________________________
dense_2 (Dense) (None, 5) 1285
=================================================================
Total params: 24,113,541
Trainable params: 24,060,421
Non-trainable params: 53,120
loss: 0.6623 - accuracy: 0.7528 - val_loss: 1.0171 - val_accuracy: 0.6440
We see a 12% increase here. The accuracy with the SimCLR framework could further be increased with better pre-training in terms of the following aspect:
SimCLR benefits from larger data. Ting Chen (the first author of the paper) suggested to go for an augmentation policy (when using custom datasets) that's not too easy nor too hard for the contrastive task i.e. the contrastive accuracy should be high (e.g. > 80%).
Available here - Pretrained_Weights
.