CSE559A Lecture 14
Object Detection
AP (Average Precision)
Benchmarks
PASCAL VOC Challenge
20 Challenge classes.
CNN increases the accuracy of object detection.
COCO dataset
Common objects in context.
Semantic segmentation. Every pixel is classified to tags.
Instance segmentation. Every pixel is classified and grouped into instances.
Object detection: outline
Proposal generation
Object recognition
R-CNN
Proposal generation
Use CNN to extract features from proposals.
with SVM to classify proposals.
Use selective search to generate proposals.
Use AlexNet finetuned on PASCAL VOC to extract features.
Pros:
- Much more accurate than previous approaches
- Andy deep architecture can immediately be “plugged in”
Cons:
- Not a single end-to-end trainable system
- Fine-tune network with softmax classifier (log loss)
- Train post-hoc linear SVMs (hinge loss)
- Train post-hoc bounding box regressors (least squares)
- Training is slow 2000CNN passes for each image
- Inference (detection) was slow
Fast R-CNN
Proposal generation
Use CNN to extract features from proposals.
ROI pooling and ROI alignment
ROI pooling:
- Pooling is applied to the feature map.
- Pooling is applied to the proposal.
ROI alignment:
- Align the proposal to the feature map.
- Align the proposal to the feature map.
Use bounding box regression to refine the proposal.
Last updated on