Single-image indoor localization using cross-domain learning from BIM models

References
- Acharya, D. (2020). Visual indoor localisation using a 3D building model. PhD thesis, The University of Melbourne.
- Acharya, D. and Khoshelham, K. (2023). Reverse domain adaptation for indoor camera pose regression. ISPRS Annals of the Photogrammetry, Remote Sensing and Spatial Information Sciences, X-1/W1-2023:453–460, doi:10.5194/isprs-annals-X-1-W1-2023-453-2023.
- Acharya, D., Khoshelham, K., and Winter, S. (2019). BIMPoseNet: Indoor camera localisation using a 3D indoor model and deep learning from synthetic images. ISPRS Journal of Photogrammetry and Remote Sensing, 150:245–258, doi:10.1016/j.isprsjprs.2019.02.020.
- Acharya, D., Tatli, C. J., and Khoshelham, K. (2023). Synthetic-real image domain adaptation for indoor camera pose regression using a 3D model. ISPRS Journal of Photogrammetry and Remote Sensing, 202:405–421, doi:10.1016/j.isprsjprs.2023.06.013.
- Acharya, D., Tennakoon, R., Muthu, S., Khoshelham, K., Hoseinnezhad, R., and Bab-Hadiashar, A. (2022). Single-image localisation using 3D models: Combining hierarchical edge maps and semantic segmentation for domain adaptation. Automation in Construction, 136:104152, doi:10.1016/j.autcon.2022.104152.
- Agarwal, S., Snavely, N., Simon, I., Seitz, S. M., and Szeliski, R. (2009). Building Rome in a day. In 2009 IEEE 12th International Conference on Computer Vision, pages 72–79. doi:10.1109/ICCV.2009.5459148.
- Bach, T. B., Dinh, T. T., and Lee, J.-H. (2022). FeatLoc: Absolute pose regressor for indoor 2D sparse features with simplistic view synthesizing. ISPRS Journal of Photogrammetry and Remote Sensing, 189:50–62, doi:10.1016/j.isprsjprs.2022.04.021.
- Bay, H., Tuytelaars, T., and Van Gool, L. (2006). SURF: Speeded Up Robust Features. In Leonardis, A., Bischof, H., and Pinz, A., editors, Computer Vision – ECCV 2006, pages 404–417, Berlin, Heidelberg. Springer Berlin Heidelberg.
- Blanton, H. (2021). Revisiting Absolute Pose Regression. PhD thesis, University of Kentucky.
- Bousmalis, K., Silberman, N., Dohan, D., Erhan, D., and Krishnan, D. (2016). Unsupervised Pixel-Level Domain Adaptation with Generative Adversarial Networks. 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pages 95–104.
- Clark, R., Wang, S., Markham, A., Trigoni, N., and Wen, H. (2017). VidLoc: 6-DoF Video-Clip Relocalization. CoRR, abs/1702.06521.
- Deng, J., Dong, W., Socher, R., Li, L.-J., Li, K., and Fei-Fei, L. (2009). ImageNet: A large-scale hierarchical image database. In 2009 IEEE Conference on Computer Vision and Pattern Recognition, pages 248–255. doi:10.1109/CVPR.2009.5206848.
- Dosovitskiy, A., Beyer, L., Kolesnikov, A., Weissenborn, D., Zhai, X., Unterthiner, T., Dehghani, M., Minderer, M., Heigold, G., Gelly, S., Uszkoreit, J., and Houlsby, N. (2020). An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale.
- Dozat, T. (2016). Incorporating Nesterov Momentum into Adam. In Proceedings of the 4th International Conference on Learning Representations (ICLR) Workshop.
- Furukawa, Y. and Ponce, J. (2010). Accurate, Dense, and Robust Multiview Stereopsis. IEEE Transactions on Pattern Analysis and Machine Intelligence, 32(8):1362–1376, doi:10.1109/TPAMI.2009.161.
- Goodfellow, I. J., Pouget-Abadie, J., Mirza, M., Xu, B., Warde-Farley, D., Ozair, S., Courville, A., and Bengio, Y. (2014). Generative Adversarial Networks.
- He, K., Zhang, X., Ren, S., and Sun, J. (2015). Deep Residual Learning for Image Recognition.
- Kendall, A. and Cipolla, R. (2016). Modelling uncertainty in deep learning for camera relocalization. In Proceedings of the IEEE International Conference on Robotics and Automation (ICRA), pages 4762–4769. doi:10.1109/ICRA.2016.7487679.
- Kendall, A. and Cipolla, R. (2017). Geometric loss functions for camera pose regression with deep learning. CoRR, abs/1704.00390.
- Kendall, A., Grimes, M., and Cipolla, R. (2015). Convolutional networks for real-time 6-DOF camera relocalization. CoRR, abs/1505.07427.
- Li, M., Qin, J., Li, D., Chen, R., Liao, X., and Guo, B. (2021). VNLSTMPoseNet: A novel deep ConvNet for real-time 6-DOF camera relocalization in urban streets. Geo-spatial Information Science, 24(3):422–437, doi:10.1080/10095020.2021.1960779.
- Lowe, D. G. (2004). Distinctive Image Features from Scale-Invariant Keypoints. International Journal of Computer Vision, 60:91–110, doi:10.1023/B:VISI.0000029664.99615.94.
- Nurutdinova, I. and Fitzgibbon, A. (2015). Towards Pointless Structure from Motion: 3D Reconstruction and Camera Parameters from General 3D Curves. In 2015 IEEE International Conference on Computer Vision (ICCV), pages 2363–2371. doi:10.1109/ICCV.2015.272.
- Peng, X., Sun, B., Ali, K., and Saenko, K. (2015). Learning Deep Object Detectors from 3D Models.
- Sattler, T., Zhou, Q., Pollefeys, M., and Leal-Taixé, L. (2019). Understanding the Limitations of CNN-based Absolute Camera Pose Regression. CoRR, abs/1903.07504.
- Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S., Anguelov, D., Erhan, D., Vanhoucke, V., and Rabinovich, A. (2015). Going deeper with convolutions. In 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pages 1–9. doi:10.1109/CVPR.2015.7298594.
- Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A. N., Kaiser, L., and Polosukhin, I. (2017). Attention Is All You Need.
- Walch, F., Hazirbas, C., Leal-Taixé, L., Sattler, T., Hilsenbeck, S., and Cremers, D. (2016). Image-based Localization with Spatial LSTMs. CoRR, abs/1611.07890.
- Yao, D., Zhu, H., Ren, B., and Zhuang, X. (2024). Improving single image localization through domain adaptation and large kernel attention with synthetic data. Engineering Applications of Artificial Intelligence, 137:108951, doi:10.1016/j.engappai.2024.108951.
- Zhou, B., Lapedriza, A., Khosla, A., Oliva, A., and Torralba, A. (2018). Places: A 10 Million Image Database for Scene Recognition. IEEE Transactions on Pattern Analysis and Machine Intelligence, 40(6):1452–1464, doi:10.1109/TPAMI.2017.2723009.
- Zhu, J.-Y., Park, T., Isola, P., and Efros, A. A. (2020). Unpaired Image-to-Image Translation using Cycle-Consistent Adversarial Networks.
Language: English
Page range: 50 - 58
Submitted on: Nov 27, 2025
Accepted on: Apr 4, 2026
Published on: May 6, 2026
Published by: Warsaw University of Technology
In partnership with: Paradigm Publishing Services
Publication frequency: 2 issues per year
Keywords:
Related subjects:
© 2026 Piotr Ryszko, Dorota Włodarczyk, Małgorzata Jarząbek-Rychard, published by Warsaw University of Technology
This work is licensed under the Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 License.