CSE5519 Advances in Computer Vision (Topic A: 2023 - 2024: Semantic Segmentation)

Segment Anything

Novelty in Segment Anything

Brute force approach with large scale training data (400x) more

Dataset construction

Model-assisted manual annotation
Semi-automatic annotation
Automatic annotation (predict mask for 32x32 patches)

Tip

This paper shows a remarkable breakthrough in semantic segmentation with a brute force approach using a large scale training data. The authors use a transformer encoder to get the final segmentation map.

I’m really interested in the scalability of the model. Is there any approach to reduce the training data size or the model size with comparable performance via distillation or other techniques?

Last updated on March 9, 2026

CSE5519 Advances in Computer Vision (Topic J: 2023 - 2024: Open-Vocabulary Object Detection)CSE5519 Advances in Computer Vision (Topic G: 2024: Correspondence Estimation and Structure from Motion)