Skip to Content
CSE5519CSE5519 Advances in Computer Vision (Topic G: 2022 and before: Correspondence Estimation and Structure from Motion)

CSE5519 Advances in Computer Vision (Topic G: 2022 and before: Correspondence Estimation and Structure from Motion)

Pixel-Perfect Structure-from-Motion with Featuremetric Refinement

link to the paper 

Leveraging dense local information to refine sparse observations. It is inherently amenable to SfM as it can optimize all locations over multiple views in a track simultaneously.

Both bundle and keypoint adjustments are based on geometric observations, namely keypoint locations and flow, but do not account for their respective uncertainties.

Learned representation: SfM can handle image collections with unconstrained viewing conditions exhibiting large changes in terms of illumination, resolution, or camera models. The image representation used should be robust to such changes and ensure an accurate refinement in any condition. We thus turn to features computed by deep CNNs, which can exhibit high invariance by capturing a large context, yet retain fine local details.

Tip

This paper is a good example of how to use deep features for SfM with CNN and do the bundle adjustment and keypoint adjustment over the predicted features for better results.

It seems to be techniques behind the scene of the first topic that interests me when I joined the computer vision class. The collection of cameras and predicted cloud points really impressed me.

With RANSAC and subpixel estimation we have pretty decent results for 3D reconstruction that is scalable with noise detection.

I’m a bit curious about the performance of the model in more complicated scenes like structures of a tree or other natural scenes. How the model deal with high frequency details if we fit the “smooth surface” too much?

Last updated on