Skip to Content
CSE5519CSE5519 Advances in Computer Vision (Lecture 3)

CSE5519 Advances in Computer Vision (Lecture 3)

Reminders

First Example notebook due Sep 18

Project proposal due Sep 23

Continued: A brief history (time) of computer vision

Theme changes

1980

  • “Definitive” detectors
    • Edges: Canny (1986); corners: Harris & Stephens (1988)
  • Multiscale image representations
    • Witkin (1983), Burt & Adelson (1984), Koenderink (1984, 1987), etc.
    • Markov Random Field models: Geman & Geman (1984)
  • Segmentation by energy minimization
    • Kass, Witkin & Terzopoulos (1987), Mumford & Shah (1989)

Conferences, journals, books

  • Conferences: ICPR (1973), CVPR (1983), ICCV (1987), ECCV (1990)
  • Journals: TPAMI (1979), IJCV (1987)
  • Books: Duda & Hart (1972), Marr (1982), Ballard & Brown (1982), Horn (1986)

1980s: The dead ends

  • Alignment-based recognition
    • Faugeras & Hebert (1983), Grimson & Lozano-Perez (1984), Lowe (1985), Huttenlocher & Ullman (1987), etc.
  • Aspect graphs
    • Koenderink & Van Doorn (1979), Plantinga & Dyer (1986), Hebert & Kanade (1985), Ikeuchi & Kanade (1988), Gigus & Malik (1990)
  • Invariants: Mundy & Zisserman (1992)

1980s: Meanwhile…

  • Neocognitron: Fukushima (1980)
  • Back-propagation: Rumelhart, Hinton & Williams (1986)
    • Origins in control theory and optimization: Kelley (1960), Dreyfus (1962), Bryson & Ho (1969), Linnainmaa (1970)
    • Application to neural networks: Werbos (1974)
    • Interesting blog post: Backpropagating through time Or, How come BP hasn’t been invented earlier?
  • Parallel Distributed Processing: Rumelhart et al. (1987)
  • Neural networks for digit recognition: LeCun et al. (1989)

1990s

Multi-view geometry, statistical and appearance-based models for recognition, first approaches for (class-specific) object detection

Geometry (mostly) solved

  • Fundamental matrix: Faugeras (1992)
  • Normalized 8-point algorithm: Hartley (1997)
  • RANSAC for robust fundamental matrix estimation: Torr & Murray (1997)
  • Bundle adjustment: Triggs et al. (1999)
  • Hartley & Zisserman book (2000)
  • Projective structure from motion: Faugeras and Luong (2001)

Data enters the scene

  • Appearance-based models: Turk & Pentland (1991), Murase & Nayar (1995)

PCA for face recognition: Turk & Pentland (1991) Image manifolds

Keypoint-based image indexing

  • Schmid & Mohr (1996), Lowe (1999)

Constellation models for object categories

  • Burl, Weber & Perona (1998), Weber, Welling & Perona (2000)

First sustained use of classifiers and negative data

  • Face detectors: Rowley, Baluja & Kanade (1996), Osuna, Freund & Girosi (1997), Schneiderman & Kanade (1998), Viola & Jones (2001)
  • Convolutional nets: LeCun et al. (1998)

Graph cut image inference

  • Boykov, Veksler & Zabih (1998)

Segmentation

  • Normalized cuts: Shi & Malik (2000)
  • Berkeley segmentation dataset: Martin et al. (2001)

Video processing

  • Layered motion models: Adelson & Wang (1993)
  • Robust optical flow: Black & Anandan (1993)
  • Probabilistic curve tracking: Isard & Blake (1998)

2000s: Keypoints and reconstruction

Keypoints craze

  • Kadir & Brady (2001), Mikolajczyk & Schmid (2002), Matas et al. (2004), Lowe (2004), Bay et al. (2006), etc.

3D reconstruction “in the wild”

  • SFM in the wild
  • Multi-view stereo, stereo on GPU’s

Generic object recognition

  • Constellation models
  • Bags of features
  • Datasets: Caltech-101 -> ImageNet

Generic object detection

  • PASCAL dataset
  • HOG, Deformable part models

Action and activity recognition:

“misc. early efforts”

1990s-2000s: Dead ends (?)

Probabilistic graphical models

Perceptual organization

2010s: Deep learning, big data

They can be more accurate (often much more accurate).

They are faster (often much faster).

They are adaptable to new problems.

Deep Convolutional Neural Networks

  • Many layers, some of which are convolutional (usually near the input)
  • Early layers “extract features”
  • Trained using stochastic gradient descent on very large datasets
  • Many possible loss functions (depending on task)

Additional benefits:

  • High-quality software frameworks
  • “New” network layers
    • Dropout (enables simultaneously training many models)
    • ReLU activation (enables faster training because gradients don’t become zero)
  • Bigger datasets
    • reduces overfitting
    • improves robustness
    • enable larger, deeper networks
  • Deeper networks eliminate the need for hand-engineered features

Where did we go wrong?

In retrospect, computer vision has had several periods of “spinning its wheels”

  • We’ve always prioritized methods that could already do interesting things over potentially more promising methods that could not yet deliver
  • We’ve undervalued simple methods, data, and learning
  • When nothing worked, we distracted ourselves with fancy math
  • On a few occasions, we unaccountably ignored methods that later proved to be “game changers” (RANSAC, SIFT)
  • We’ve had some problems with bandwagon jumping and intellectual snobbery

But it’s not clear whether any of it mattered in the end.

Last updated on