P. Vyas, C. Saxena, A. Badapanda, and A. Goswami, “Outdoor monocular depth estimation: A research review,” arXiv preprint arXiv:2205.01399, May 2022. https://doi.org/10.48550/arXiv.2205.01399
Q. Li et al., “Deep learning based monocular depth prediction: Datasets, methods and applications,” arXiv preprint arXiv:2011.04123, 2020. https://arxiv.org/pdf/2011.04123
A. Masoumian, H. A. Rashwan, J. Cristiano, M. S. Asif, and D. Puig, “Monocular depth estimation using deep learning: A review,” Sensors, vol. 22, no. 14, Art. no. 5353, Jul. 2022. https://doi.org/10.3390/s22145353
Y. Ming, X. Meng, C. Fan, and H. Yu, “Deep learning for monocular depth estimation: A review,” Neurocomputing, vol. 438, pp. 14–33, May 2021. https://doi.org/10.1016/j.neucom.2020.12.089
Y. Li and J. Ibanez-Guzman, “Lidar for autonomous driving: The principles, challenges, and trends for automotive lidar and perception systems,” IEEE Signal Processing Magazine, vol. 37, no. 4, pp. 50–61, Jul. 2020. https://doi.org/10.1109/MSP.2020.2973615
J. Hasch, “Driving towards 2020: Automotive radar technology trends,” in 2015 IEEE MTT-S International Conference on Microwaves for Intelligent Mobility (ICMIM), Heidelberg, Germany, Apr. 2015, pp. 1–4. https://doi.org/10.1109/ICMIM.2015.7117956
C. Zhao, Q. Sun, C. Zhang, Y. Tang, and F. Qian, “Monocular depth estimation based on deep learning: An overview,” Science China Technological Sciences, vol. 63, no. 9, pp. 1612–1627, June 2020. https://doi.org/10.1007/s11431-020-1582-8
A. Geiger, P. Lenz, and R. Urtasun, “Are we ready for autonomous driving? The KITTI vision benchmark suite,” in Conference on Computer Vision and Pattern Recognition (CVPR), Providence, RI, USA, June 2012, pp. 3354–3361. https://doi.org/10.1109/CVPR.2012.6248074
G. Yang, X. Song, C. Huang, Z. Deng, J. Shi, and B. Zhou, “DrivingStereo: A large-scale dataset for stereo matching in autonomous driving scenarios,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Long Beach, CA, USA, June 2019, pp. 899–908. https://doi.org/10.1109/CVPR.2019.00099
D. Eigen, C. Puhrsch, and R. Fergus, “Depth map prediction from a single image using a multi-scale deep network,” Adv. Neural. Inf. Process. Syst., vol. 27, 2014. https://doi.org/10.48550/arXiv.1406.2283
I. Laina, C. Rupprecht, V. Belagiannis, F. Tombari, and N. Navab, “Deeper depth prediction with fully convolutional residual networks,” in 2016 Fourth International Conference on 3D Vision (3DV), Stanford, CA, USA, Oct. 2016, pp. 239–248. https://doi.org/10.1109/3DV.2016.32
B. Li, C. Shen, Y. Dai, A. van den Hengel, and M. He, “Depth and surface normal estimation from monocular images using regression on deep features and hierarchical CRFS,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Boston, MA, USA, June 2015, pp. 1119–1127. https://doi.org/10.1109/CVPR.2015.7298715
I. Alhashim and P. Wonka, “High quality monocular depth estimation via transfer learning,” arXiv preprint arXiv:1812.11941, Dec. 2018. https://doi.org/10.48550/arXiv.1812.11941
G. Huang, Z. Liu, L. van der Maaten, and K. Q. Weinberger, “Densely connected convolutional networks,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, HI, USA, July 2017, pp. 4700–4708. https://doi.org/10.1109/CVPR.2017.243
K. He, X. Zhang, S. Ren, and J. Sun, “Deep residual learning for image recognition,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, NV, USA, June 2016, pp. 770–778. https://doi.org/10.1109/CVPR.2016.90
C.-H. Yeh, Y.-P. Huang, C.-Y. Lin, and C.-Y. Chang, “Transfer2Depth: Dual attention network with transfer learning for monocular depth estimation,” IEEE Access, vol. 8, pp. 86081–86090, May 2020. https://doi.org/10.1109/ACCESS.2020.2992815
C. Godard, O. Mac Aodha, and G. J. Brostow, “Unsupervised monocular depth estimation with left-right consistency,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, HI, USA, July 2017, pp. 270–279. https://doi.org/10.1109/CVPR.2017.699
R. Garg, V. Kumar B.G., G. Carneiro, and I. Reid, “Unsupervised CNN for single view depth estimation: Geometry to the rescue,” in Computer Vision–ECCV 2016: 14th European Conference, Amsterdam, The Netherlands, Part VIII 14, Oct. 2016, pp. 740–756. https://doi.org/10.1007/978-3-319-46484-8_45
M. Poggi, F. Aleotti, F. Tosi, and S. Mattoccia, “Towards real-time unsupervised monocular depth estimation on CPU,” in 2018 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), Madrid, Spain, Oct. 2018, pp. 5848–5854. https://doi.org/10.1109/IROS.2018.8593814
J. Liu, Q. Li, R. Cao, W. Tang, and G. Qiu, “MiniNet: An extremely lightweight convolutional neural network for real-time unsupervised monocular depth estimation,” ISPRS Journal of Photogrammetry and Remote Sensing, vol. 166, pp. 255–267, Aug. 2020. https://doi.org/10.1016/j.isprsjprs.2020.06.004
C. Godard, O. Mac Aodha, M. Firman, and G. J. Brostow, “Digging into self-supervised monocular depth estimation,” in Proceedings of the IEEE/CVF International Conference on Computer Vision, Seoul, Korea (South), Oct. 2019, pp. 3828–3838. https://doi.org/10.1109/ICCV.2019.00393
J. Jin, B. Tao, X. Qian, J. Hu, and G. Li, “Lightweight monocular absolute depth estimation based on attention mechanism,” Journal of Electronic Imaging, vol. 33, no. 2, Mar. 2024, Art. no. 23010. https://doi.org/10.1117/1.JEI.33.2.023010
N. Zhang, F. Nex, G. Vosselman, and N. Kerle, “Lite-Mono: A lightweight CNN and transformer architecture for self-supervised monocular depth estimation,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Vancouver, BC, Canada, June 2023, pp. 18537–18546. https://doi.org/10.1109/CVPR52729.2023.01778
J. Wang et al., “WeatherDepth: Curriculum contrastive learning for self-supervised depth estimation under adverse weather conditions,” arXiv preprint arXiv:2310.05556, Oct. 2023. https://doi.org/10.48550/arXiv.2310.05556
C. Zhao et al., “MonoViT: Self-supervised monocular depth estimation with a vision transformer,” in 2022 International Conference on 3D Vision (3DV), Prague, Czech Republic, Sep. 2022, pp. 668–678. https://doi.org/10.1109/3DV57658.2022.00077
M. A. Rahman and S. A. Fattah, “ DwinFormer: Dual window transformers for end-to-end monocular depth estimation,” IEEE Sensors Journal, vol. 23, no. 18, Aug. 2023. https://doi.org/10.1109/JSEN.2023.3299782
G. Manimaran and J. Swaminathan, “Focal-WNet: An architecture unifying convolution and attention for depth estimation,” in 2022 IEEE 7th International Conference for Convergence in Technology (I2CT), Mumbai, India, Apr. 2022, pp. 1–7. https://doi.org/10.1109/I2CT54291.2022.9824488
Z. Li, Z. Chen, X. Liu, and J. Jiang, “DepthFormer: Exploiting long-range correlation and local information for accurate monocular depth estimation,” Machine Intelligence Research, vol. 20, no. 6, pp. 837–854, Dec. 2023. https://doi.org/10.1007/s11633-023-1458-0
A. Dosovitskiy et al., “An image is worth 16×16 words: Transformers for image recognition at scale,” arXiv preprint arXiv:2010.11929, Oct. 2020. https://doi.org/10.48550/arXiv.2010.11929
D. Shim and H. J. Kim, “SwinDepth: Unsupervised depth estimation using monocular sequences via Swin transformer and densely cascaded network,” in 2023 IEEE International Conference on Robotics and Automation (ICRA), London, United Kingdom, May 2023, pp. 4983–4990. https://doi.org/10.1109/ICRA48891.2023.10160657
C. Ning and H. Gan, “Trap attention: Monocular depth estimation with manual traps,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Vancouver, BC, Canada, June 2023, pp. 5033–5043. https://doi.org/10.1109/CVPR52729.2023.00487
A. Astudillo, A. Barrera, C. Guindel, A. Al-Kaff, and F. García, “DAttNet: monocular depth estimation network based on attention mechanisms,” Neural Computing and Applications, vol. 36, no. 7, pp. 3347–3356, Dec. 2023. https://doi.org/10.1007/s00521-023-09210-8
A. Agarwal and C. Arora, “Attention attention everywhere: Monocular depth prediction with skip attention,” in Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision, Waikoloa, HI, USA, Jan. 2023, pp. 5861–5870. https://doi.org/10.1109/WACV56688.2023.00581
W. Zhao, Y. Song, and T. Wang, “SAU-Net: Monocular depth estimation combining multi-scale features and attention mechanisms,” IEEE Access, vol. 11, Dec. 2023, pp. 137734–137746. https://doi.org/10.1109/ACCESS.2023.3339152
Z. Liu et al., “Swin transformer: Hierarchical vision transformer using shifted windows,” in Proceedings of the IEEE/CVF International Conference on Computer Vision, Montreal, QC, Canada, Oct. 2021, pp. 10012–10022. https://doi.org/10.1109/ICCV48922.2021.00986
O. Ronneberger, P. Fischer, and T. Brox, “U-net: Convolutional networks for biomedical image segmentation,” in Medical Image Computing and Computer-Assisted Intervention – MICCAI 2015: 18th International Conference, part III 18, Munich, Germany, Oct. 2015, pp. 234–241. https://doi.org/10.1007/978-3-319-24574-4_28
D. Xing, J. Shen, C. Ho, and A. Tzes, “ROIFormer: semantic-aware region of interest transformer for efficient self-supervised monocular depth estimation,” in Proceedings of the AAAI Conference on Artificial Intelligence, vol. 37, no. 3, 2023, pp. 2983–2991. https://doi.org/10.1609/aaai.v37i3.25401
L. Yan, F. Yu, and C. Dong, “EMTNet: efficient mobile transformer network for real-time monocular depth estimation,” Pattern Analysis and Applications, vol. 26, no. 4, pp. 1833–1846, Oct. 2023. https://doi.org/10.1007/s10044-023-01205-4
K. Han, Y. Wang, Q. Tian, J. Guo, C. Xu, and C. Xu, “GhostNet: More features from cheap operations,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Seattle, WA, USA, June 2020, pp. 1580–1589. https://doi.org/10.1109/CVPR42600.2020.00165
L. Song et al., “Spatial-aware dynamic lightweight self-supervised monocular depth estimation,” IEEE Robotics and Automation Letters, vol. 9, no. 1, pp. 883–890, Nov. 2023. https://doi.org/10.1109/LRA.2023.3337991
L. Papa, P. Russo, and I. Amerini, “METER: a mobile vision transformer architecture for monocular depth estimation,” IEEE Transactions on Circuits and Systems for Video Technology, vol. 33, no. 10, pp. 5882–5893, Mar. 2023. https://doi.org/10.1109/TCSVT.2023.3260310
Q. Liu and S. Zhou, “LightDepthNet: Lightweight CNN architecture for monocular depth estimation on edge devices,” IEEE Transactions on Circuits and Systems II: Express Briefs, vol. 71, no. 4, pp. 2389–2393, Nov. 2023. https://doi.org/10.1109/TCSII.2023.3337369
M. Tang, Z. Zhao, and J. Qiu, “A foggy weather simulation algorithm for traffic image synthesis based on monocular depth estimation,” Sensors, vol. 24, no. 6, Mar. 2024, Art. no. 1966. https://doi.org/10.3390/s24061966
K. Saunders, G. Vogiatzis, and L. J. Manso, “Self-supervised monocular depth estimation: Let’s talk about the weather,” in Proceedings of the IEEE/CVF International Conference on Computer Vision, Paris, France, Oct. 2023, pp. 8907–8917. https://doi.org/10.1109/ICCV51070.2023.00818
M. Tremblay, S. S. Halder, R. de Charette, and J. F. Lalonde, “Rain rendering for evaluating and improving robustness to bad weather,” International Journal of Computer Vision, vol. 129, no. 2, pp. 341–360, Feb. 2021. https://doi.org/10.1007/s11263-020-01366-3
F. Pizzati and R. de Charette, “CoMoGAN: continuous model-guided image-to-image translation”, [Online]. Available: https://github.com/cvrits/CoMoGAN. Accessed on: Jul. 04, 2024.
X. Huang, P. Wang, X. Cheng, D. Zhou, Q. Geng, and R. Yang, “The ApolloScape open dataset for autonomous driving and its application,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 42, no. 10, pp. 2702–2719, Oct. 2020. https://doi.org/10.1109/TPAMI.2019.2926463
N. Silberman, D. Hoiem, P. Kohli, and R. Fergus, “Indoor segmentation and support inference from RGBD images,” in Computer Vision – ECCV 2012: 12th European Conference on Computer Vision, Part V 12, Florence, Italy, Oct. 2012, pp. 746–760. https://doi.org/10.1007/978-3-642-33715-4_54
M. Cordts et al., “The cityscapes dataset for semantic urban scene understanding,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, NV, USA, June 2016, pp. 3213–3223. https://doi.org/10.1109/CVPR.2016.350
A. Saxena, S. Chung, and A. Ng, “Learning depth from single monocular images,” Neural Information Processing Systems (NIPS), vol. 18, pp. 1–8, Dec. 2005.