CSE559A Lecture 14

Object Detection

AP (Average Precision)

20 Challenge classes.

CNN increases the accuracy of object detection.

Common objects in context.

Semantic segmentation. Every pixel is classified to tags.

Instance segmentation. Every pixel is classified and grouped into instances.

Proposal generation

Object recognition

Proposal generation

Use CNN to extract features from proposals.

with SVM to classify proposals.

Use selective search to generate proposals.

Use AlexNet finetuned on PASCAL VOC to extract features.

Pros:

Cons:

Not a single end-to-end trainable system
- Fine-tune network with softmax classifier (log loss)
- Train post-hoc linear SVMs (hinge loss)
- Train post-hoc bounding box regressors (least squares)
Training is slow 2000CNN passes for each image
Inference (detection) was slow

Proposal generation

Use CNN to extract features from proposals.

ROI pooling:

ROI alignment:

Use bounding box regression to refine the proposal.