🎓IntelliCap: Intelligent Guidance for Consistent View Sampling

IEEE International Symposium on Mixed and Augmented Reality (ISMAR) 2025 (To Appear)

Xplore (TBA) arXiv Data (TBA) Video (TBA)

1Keio University, 2University of Stuttgart

yasunaga_ismar25_teaser.jpg
Intelligent guidance for consistent view sampling using a combination of AR and AI tools. (a) Our system visualizes areas that lack view samples for spatial and angular coverage to synthesize view-dependent effects (the pink-white stripes and sphere, respectively). (b) The user is encouraged to erase the pink-white stripes by pointing the smartphone camera at the visualizations. (c) Meanwhile, our system identifies objects that potentially need denser samples evaluated by LLM, and generates spherical proxies to encourage the operator to take more images around them. (d) Our approach can avoid exhaustive exploration, yet guarantee the final rendering quality.

Abstract Novel view synthesis from images, for example, with 3D Gaussian splatting, has made great progress. Rendering fidelity and speed are now ready even for demanding virtual reality applications. However, the problem of assisting humans in collecting the input images for these rendering algorithms has received much less attention. High-quality view synthesis requires uniform and dense view sampling. Unfortunately, these requirements are not easily addressed by human camera operators, who are in a hurry, impatient, or lack understanding of the scene structure and the photographic process. Existing approaches to guide humans during image acquisition concentrate on single objects or neglect view-dependent material characteristics. We propose a novel situated visualization technique for scanning at multiple scales. During the scanning of a scene, our method identifies important objects that need extended image coverage to properly represent view-dependent appearance. To this end, we leverage semantic segmentation and category identification, ranked by a vision-language model. Spherical proxies are generated around highly ranked objects to guide the user during scanning. Our results show superior performance in real scenes compared to conventional view sampling strategies.

BibTex
@inproceedings{yasunaga_ismar25,
    author={Yasunaga, Ayaka and Saito, Hideo and Schmalstieg, Dieter and Mori, Shohei},
    booktitle={IEEE International Symposium on Mixed and Augmented Reality (ISMAR)}, 
    title={IntelliCap: Intelligent Guidance for Consistent View Sampling}, 
    year={2025}
}

Acknowledgement This work was supported by the Alexander von Humboldt Foundation funded by the German Federal Ministry of Education and Research, the Deutsche Forschungsgemeinschaft (DFG, German Research Foundation) under Germany's Excellence Strategy – EXC 2120/1 – 390831618, and partly by a grant from JST Support for Pioneering Research Initiated by the Next Generation (# JPMJSP2123)