CSE559A Lecture 20

Local feature descriptors

Detection: Identify the interest points

Description: Extract vector feature descriptor surrounding each interest point.

Matching: Determine correspondence between descriptors in two views

Image representation

Histogram of oriented gradients (HOG)

Quantization
- Grids: fast but applicable only with few dimensions
- Clustering: slower but can quantize data in higher dimensions
Matching
- Histogram intersection or Euclidean may be faster
- Chi-squared often works better
- Earth mover’s distance is good for when nearby bins represent similar values

SIFT vector formation

Computed on rotated and scaled version of window according to computed orientation & scale

resample the window

Based on gradients weighted by a Gaussian of variance half the window (for smooth falloff)

4x4 array of gradient orientation histogram weighted by magnitude

8 orientations x 4x4 array = 128 dimensions

Motivation: some sensitivity to spatial layout, but not too much.

For matching:

Extraordinarily robust detection and description technique
Can handle changes in viewpoint
- Up to about 60 degree out-of-plane rotation
Can handle significant changes in illumination
- Sometimes even day vs. night
Fast and efficient—can run in real time
Lots of code available

SURF

Fast approximation of SIFT idea
Efficient computation by 2D box filters & integral images
- 6 times faster than SIFT
Equivalent quality for object identification

Shape context

Shape context descriptor

Self-similarity Descriptor

Self-similarity descriptor

Local feature matching

Matching

Simplest approach: Pick the nearest neighbor. Threshold on absolute distance

Problem: Lots of self similarity in many photos

Solution: Nearest neighbor with low ratio test

Comparison of keypoint detectors

Deep Learning for Correspondence Estimation

Deep learning for correspondence estimation

Optical Flow

Field

Motion field: the projection of the 3D scene motion into the image Magnitude of vectors is determined by metric motion Only caused by motion

Optical flow: the apparent motion of brightness patterns in the image Magnitude of vectors is measured in pixels Can be caused by lightning

Brightness constancy constraint, aperture problem

Machine Learning Approach

Collect examples of inputs and outputs
Design a prediction model suitable for the task
- Invariances, Equivariances; Complexity; Input and Output shapes and semantics
Specify loss functions and train model
Limitations: Requires training the model; Requires a sufficiently complete training dataset; Must re-learn known facts; Higher computational complexity

Optimization Approach

Define properties we expect to hold for a correct solution
Translate properties into a cost function
Derive an algorithm to solve for the cost function
Limitations: Often requires making overly simple assumptions on properties; Some tasks can’t be easily defined

Given frames at times $t-1$ and $t$ , estimate the apparent motion field $u(x,y)$ and $v(x,y)$ between them Brightness constancy constraint: projection of the same point looks the same in every frame

I(x,y,t-1) = I(x+u(x,y),y+v(x,y),t)

Additional assumptions:

Small motion: points do not move very far
Spatial coherence: points move like their neighbors

Trick for solving:

Brightness constancy constraint:

I(x,y,t-1) = I(x+u(x,y),y+v(x,y),t)

Linearize the right-hand side using Taylor expansion:

I(x,y,t-1) \approx I(x,y,t) + I_x u(x,y) + I_y v(x,y)

I_x u(x,y) + I_y v(x,y) + I(x,y,t) - I(x,y,t-1) = 0

Hence,

I_x u(x,y) + I_y v(x,y) + I_t = 0