Skip to Content
CSE559AComputer Vision (Lecture 20)

CSE559A Lecture 20

Local feature descriptors

Detection: Identify the interest points

Description: Extract vector feature descriptor surrounding each interest point.

Matching: Determine correspondence between descriptors in two views

Image representation

Histogram of oriented gradients (HOG)

  • Quantization
    • Grids: fast but applicable only with few dimensions
    • Clustering: slower but can quantize data in higher dimensions
  • Matching
    • Histogram intersection or Euclidean may be faster
    • Chi-squared often works better
    • Earth mover’s distance is good for when nearby bins represent similar values

SIFT vector formation

Computed on rotated and scaled version of window according to computed orientation & scale

  • resample the window

Based on gradients weighted by a Gaussian of variance half the window (for smooth falloff)

4x4 array of gradient orientation histogram weighted by magnitude

8 orientations x 4x4 array = 128 dimensions

Motivation: some sensitivity to spatial layout, but not too much.

For matching:

  • Extraordinarily robust detection and description technique
  • Can handle changes in viewpoint
    • Up to about 60 degree out-of-plane rotation
  • Can handle significant changes in illumination
    • Sometimes even day vs. night
  • Fast and efficient—can run in real time
  • Lots of code available

SURF

  • Fast approximation of SIFT idea
  • Efficient computation by 2D box filters & integral images
    • 6 times faster than SIFT
  • Equivalent quality for object identification

Shape context

Shape context descriptor

Self-similarity Descriptor

Self-similarity descriptor

Local feature matching

Matching

Simplest approach: Pick the nearest neighbor. Threshold on absolute distance

Problem: Lots of self similarity in many photos

Solution: Nearest neighbor with low ratio test

Comparison of keypoint detectors

Deep Learning for Correspondence Estimation

Deep learning for correspondence estimation

Optical Flow

Field

Motion field: the projection of the 3D scene motion into the image Magnitude of vectors is determined by metric motion Only caused by motion

Optical flow: the apparent motion of brightness patterns in the image Magnitude of vectors is measured in pixels Can be caused by lightning

Brightness constancy constraint, aperture problem

Machine Learning Approach

  • Collect examples of inputs and outputs
  • Design a prediction model suitable for the task
    • Invariances, Equivariances; Complexity; Input and Output shapes and semantics
  • Specify loss functions and train model
  • Limitations: Requires training the model; Requires a sufficiently complete training dataset; Must re-learn known facts; Higher computational complexity

Optimization Approach

  • Define properties we expect to hold for a correct solution
  • Translate properties into a cost function
  • Derive an algorithm to solve for the cost function
  • Limitations: Often requires making overly simple assumptions on properties; Some tasks can’t be easily defined

Given frames at times t1t-1 and tt, estimate the apparent motion field u(x,y)u(x,y) and v(x,y)v(x,y) between them Brightness constancy constraint: projection of the same point looks the same in every frame

I(x,y,t1)=I(x+u(x,y),y+v(x,y),t)I(x,y,t-1) = I(x+u(x,y),y+v(x,y),t)

Additional assumptions:

  • Small motion: points do not move very far
  • Spatial coherence: points move like their neighbors

Trick for solving:

Brightness constancy constraint:

I(x,y,t1)=I(x+u(x,y),y+v(x,y),t)I(x,y,t-1) = I(x+u(x,y),y+v(x,y),t)

Linearize the right-hand side using Taylor expansion:

I(x,y,t1)I(x,y,t)+Ixu(x,y)+Iyv(x,y)I(x,y,t-1) \approx I(x,y,t) + I_x u(x,y) + I_y v(x,y) Ixu(x,y)+Iyv(x,y)+I(x,y,t)I(x,y,t1)=0I_x u(x,y) + I_y v(x,y) + I(x,y,t) - I(x,y,t-1) = 0

Hence,

Ixu(x,y)+Iyv(x,y)+It=0I_x u(x,y) + I_y v(x,y) + I_t = 0
Last updated on