1Stanford University
2Simon Fraser University
3Google
(Left) Illustration of per-point provenance. We model the origin or provenance of each point or “from
where it was seen.”. Our ProvNeRF takes as input the sparse training cameras (yellow), and outputs the prove-
nances for each 3D point modeled as a stochastic process. For 3D points (orange triangle and red circle), the
corresponding output provenances are illustrated by the orange the red locations, which depict from where these
points were observed. (Right) Multiple downstream applications enabled by ProvNeRF, namely uncertainty esti-
mation, criteria-based viewpoint optimization and sparse view novel view synthesis.
Abstract
Neural radiance fields (NeRFs) have gained popularity across various applications.
However, they face challenges in the sparse view setting, lacking sufficient constraints from volume rendering.
Reconstructing and understanding a 3D scene from sparse and unconstrained cameras is a long-standing problem in classical computer vision with diverse applications.
While recent works have explored NeRFs in sparse, unconstrained view scenarios,
their focus has been primarily on enhancing reconstruction and novel view synthesis.
Our approach takes a broader perspective by posing the question: "from where has each point been seen?"
-- which gates how well we can understand and reconstruct it.
In other words, we aim to determine the origin or provenance of each 3D point and its associated information under sparse, unconstrained views.
We introduce ProvNeRF, a model that enriches a traditional NeRF representation by incorporating per-point provenance, modeling likely source locations for each point.
We achieve this by extending implicit maximum likelihood estimation (IMLE) for stochastic processes. Notably, our method is compatible with any pre-trained NeRF model and the associated training camera poses.
We demonstrate that modeling per-point provenance offers several advantages, including uncertainty estimation, criteria-based view selection, and improved novel view synthesis, compared to state-of-the-art methods.
The uncertainty and depth error maps are shown
with color bars specified. Uncertainty values and depth errors are normalized per test image for the result to be
comparable.
Close-up View Optimization We show the optimized
view comparison of our provenance-aided viewpoint selection compared to the baselines under the close-up objective.
Notice that our method (in red) both maximizes the plush's area size while obtaining a high quality reconstruction. On the other hand, both the retrieval and optimization baselines fail to balance between the
two.
Close-up View Optimization Graph showning PSNR and Area Size plots with the objective of maximizing the projected area of the target. Notice that our provenance-aided optimization obtains the best balance between area size and PSNR.
Normal Vector Alignment View Optimization We show the optimized
view comparison of our provenance-aided viewpoint selection compared to the baselines under the normal vector view alignment objective.
Notice that our method is able to obtain a bird-eye view of the book while keeping the book within its viewpoint. On the other hand, the retrieved view fails to align with the normal of the book while the optimized view simply does not see the book at all.
Normal Vector Alignment View Optimization Graph showing PSNR and Dot product with the normal vector with the objective of normal vector alignment. Notice that our provenance-aided optimization obtains the best balance between dot product and PSNR.
Application 3: Novel View Synthesis
Qualitative Result on Scannet Notice we are able to remove clouds and small floaters from the original SCADE model.
Qualitative Result on Tanks and Temple
Citation
@article{nakayama2023provnerf,
title={ProvNeRF: Modeling per Point Provenance in NeRFs as a Stochastic Process},
author={Kiyohiro Nakayama and Mikaela Angelina Uy and Yang You and Ke Li and Leonidas Guibas},
journal = {arXiv:2401.08140},
year={2023}
}
Acknowledgements
This work is supported by ARL grant W911NF-21-2-0104, a Vannevar Bush Faculty Fellowship, an Apple Scholars in AI/ML PhD Fellowship, a Snap Research Fellowship, the Outstanding Doctoral Graduates Development Scholarship of Shanghai Jiao Tong University, and the Natural Sciences and Engineering Research Council of Canada.