CSE5519 Advances in Computer Vision (Topic F: 2021 and before: Representation Learning)

A Simple Framework for Contrastive Learning of Visual Representations

~~Laughing my ass off when I see 75% accuracy on ImageNet. Can’t believe what the authors think after few years, when Deep Learning is becoming the dominant paradigm in Computer Vision.~~

In this work, we introduce a simple framework for contrastive learning of visual representations, which we call SimCLR.

Wait, that IS a NEURAL NETWORK?

General Framework

A stochastic data augmentation module

A neural network base encoder $f(\cdot)$

A small neural network projection head $g(\cdot)$

A contrastive loss function

Novelty in SimCLR

Semi-supervised learning with data augmentation.

Tip

In the section “Training with Large Batch Size”, the authors mentioned that:

To keep it simple, we do not train the model with a memory bank (Wu et al., 2018; He et al., 2019). Instead, we vary the training batch size N from 256 to 8192. A batch size of 8192 gives us 16382 negative examples per positive pair from both augmentation views. They use LARS optimizer for stabilizing the training.

What does memory bank means here? And what is LARS optimizer, and how does it benefit the training?