CSE5519 Advances in Computer Vision (Topic C: 2021 and before: Neural Rendering)
NeRF: Representing Scenes as Neural Radiance Fields for View Synthesis
We represent a static scene as a continuous 5D function:
where denotes a 3D position in space, specifies a viewing direction, is the volume density at point (which acts as a differential opacity controlling how much radiance is accumulated along a ray), and is the emitted RGB radiance in direction at that point.
Our method learns this function by optimizing a deep, fully-connected neural network (a multilayer perceptron, or MLP) that maps each 5D input coordinate to a corresponding volume density and view-dependent color .
The expected camera ray color where is the camera position and is the camera direction is:
Where is the transmittance along the ray:
Novelty in NeRF
Positional encoding
deep networks are biased towards learning lower frequency functions.
They additionally show that mapping the inputs to a higher dimensional space using high frequency functions before passing them to the network enables better fitting of data that contains high frequency variation.
Let be the positional encoding of that maps to where is the number of frequencies.
Hierarchical volume sampling
Optimize coarse and find network simultaneously.
Let be the coarse prediction of the camera ray color.
We sample a second set of locations from this distribution using inverse transform sampling, evaluate our “fine” network at the union of the first and second set of samples, and compute the final rendered color of the ray but with all samples.
- This paper reminds me of Gaussian Splatting. In this paper setting, we can treat the scene as a function of 5D coordinates. (all the cameras are focusing on the world origin) However, in general settings, we have 6D coordinates (3D position and 3D direction). Is there any way to use Gaussian Splatting to reconstruct the scene?
- In the positional encoding, the function reminds me of the Fourier transform. Is there any connection between the two?
Volume Rendering
Output of color and density.